October 2019
ggplot2
data.frame/tibble
ggplot2
when you run into problems.ggplot2
source: thinkR
x | y | shape |
---|---|---|
25 | 11 | circle |
0 | 0 | circle |
75 | 53 | square |
200 | 300 | square |
x = x, y = y, shape = shape
dot / point
What if we want to split circles and squares?
Now, dot shapes and facets provide the same information.
We could use the shape for another meaningful variable…
x | y | shape |
---|---|---|
25 | 11 | circle |
0 | 0 | circle |
75 | 53 | square |
200 | 300 | square |
Data visualisation is not meant just to be seen but to be read, like written text Alberto Cairo
geom_point()
geom_line()
geom_bar()
geom_boxplot()
geom_histogram()
geom_density()
Have a look at the cheatsheet or the ggplot2 online documentation to list more possibilities.
tibble
iris <- as_tibble(iris) iris
# A tibble: 150 x 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species <dbl> <dbl> <dbl> <dbl> <fct> 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa 7 4.6 3.4 1.4 0.3 setosa 8 5 3.4 1.5 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 4.9 3.1 1.5 0.1 setosa # … with 140 more rows
saving the data frame as a tibble
enables the smart tibble printing and avoids to list all 150 rows
ggplot(data = iris) + geom_point(mapping = aes(x = Petal.Width, y = Petal.Length))
ggplot2
introduces a break in the workflow from %>%
to +
ggplot1
ggplot1
was released in 2005 until 2008 by Hadley Wickham.
If the pipe ( %>% in 2014) had been invented before,
ggplot2
would have never existed Hadley Wickham
ggplot1
: original syntax
# devtools::install_github("hadley/ggplot1") library(ggplot1) p <- ggplot(mtcars, list(x = mpg, y = wt)) # need temp p object to avoid too many ()'s scbrewer(ggpoint(p, list(colour = gear)))
ggplot1
with the pipe
library(ggplot1) mtcars %>% ggplot(list(x = mpg, y = wt)) %>% ggpoint(list(colour = gear)) %>% scbrewer()
ggplot2
library(ggplot2) mtcars %>% ggplot(aes(x = mpg, y = wt)) + geom_point(aes(colour = as.factor(gear))) + scale_colour_brewer("gear", type = "qual")
tibble
to the variable each ggplot2 geom
is expecting.geom_point()
for example requires at least the x and y coordinates to draw each point.In our example we need to tell geom_point()
which columns should be used as x
and y
ggplot(iris) + geom_point(aes(x = Petal.Width, y = Petal.Length))
geom
has specific requirement depending on its input
geom_boxplot()
expects 1 discrete and 1 continuousgeom_point()
accepts additional arguments such as the colour
, the transparency (alpha
) or the size
ggplot(iris) + geom_point(aes(x = Petal.Width, y = Petal.Length), # end of aes() colour = "blue", alpha = 0.6, size = 3)
Note that parameters defined outside the aesthetics aes()
are applied to all data.
colour
, alpha
or size
can also be mapped to a column in the data frame.ggplot(iris) + geom_point(aes(x = Petal.Width, y = Petal.Length, colour = Species), alpha = 0.6, size = 3)
Note that the colour
argument is now inside aes()
and must refer to a column in the dataframe.
shape
and colour
to Species
ggplot(iris) + geom_point(aes(x = Petal.Width, y = Petal.Length, shape = Species, colour = Species), alpha = 0.6, size = 3)
ggplot()
is passing aesthetics to all geomsmtcars %>% ggplot(aes(x = wt, y = mpg, colour = factor(am))) + geom_point() + geom_smooth(method = "lm", se = FALSE)
geom_point()
mtcars %>% ggplot(aes(x = wt, y = mpg)) + geom_point(aes(colour = factor(am))) + geom_smooth(method = "lm", se = FALSE)
labs()
function
ggplot(iris) + geom_point(aes(x = Petal.Width, y = Petal.Length, colour = Species), alpha = 0.6, size = 3) + labs(x = "Width", y = "Length", colour = "flower", title = "Iris dataset", subtitle = "petal measures", tag = "A", caption = "Fisher, R. A. (1936)")
ggplot(iris) + geom_histogram(aes(x = Petal.Length, fill = Species), alpha = 0.8, bins = 30)
The default bin value is 30 and will be printed out as a warning.
Specify your own to avoid the warning.
The density is the count divided by the total number of occurences.
ggplot(iris) + geom_density(aes(x = Petal.Length, fill = Species), alpha = 0.6)
ggplot(iris) + geom_histogram(aes(x = Petal.Length, y = stat(density)), fill = "darkgrey", binwidth = 0.1) + geom_density(aes(x = Petal.Length, fill = Species, colour = Species), alpha = 0.4) + theme_classic()
stat(var)
are intermediate values calculated by ggplot2
using stat functionsgeom
uses a stat
function to transform the data:
geom_histogram()
uses stat_bin()
(y = stat(count / max(count)))
stat(density)
.stat_identity
is used in geom_col()
(no transformation)geom_bar()
geom_bar()
counts the number of values in each categorygeom_bar()
uses stat_count()
(creates count
column)ggplot(iris) + geom_bar(aes(x = Species)) # or: geom_bar(aes(x = Species, y = stat(count)))
geom_col()
geom_col()
uses stat_identity()
leaving the data as is.y
aesthetic is mandatory for geom_col()
geom_bar()
with stat = "identity"
will let geom_bar()
to behave like geom_col()
ggplot(iris) + geom_col(aes(x = Species, y = Petal.Length)) #ggplot(iris) + # geom_bar(aes(x = Species, y = Petal.Length), # stat = "identity")
Let’s use the mtcars
dataset now.
mtcars %>% ggplot() + geom_bar(aes(x = factor(cyl), fill = factor(gear)))
Do not stack the barcharts but adjust the horizontal position.
mtcars %>% mutate(cyl = factor(cyl), gear = factor(gear)) %>% ggplot() + # position_dodge2 from v3.0 preserves single or total geom_bar(aes(x = cyl, fill = gear), position = position_dodge2(preserve = "single"))
Let’s stack the barcharts but show proportions.
mtcars %>% mutate(cyl = factor(cyl), gear = factor(gear)) %>% ggplot() + geom_bar(aes(x = cyl, fill = gear), position = "fill")
We can easily switch to polar coordinates:
mtcars %>% mutate(cyl = factor(cyl), gear = factor(gear)) %>% ggplot() + geom_bar(aes(x = cyl, fill = gear), position = "fill") + coord_polar()
ggplot(mtcars) + geom_boxplot(aes(x = factor(cyl), y = mpg))
ggplot(mtcars) + geom_boxplot(aes(x = factor(cyl), y = mpg, fill = factor(am)))
scale_fill_manual()
and scale_color_manual()
ggplot(mtcars) + geom_boxplot(aes(x = factor(cyl), y = mpg, fill = factor(am), color = factor(am))) + scale_fill_manual(values = c("red", "lightblue")) + scale_color_manual(values = c("purple", "blue"))
library(RColorBrewer) par(mar = c(0, 4, 0, 0)) display.brewer.all()
ggplot(mtcars) + geom_boxplot(aes(x = factor(cyl), y = mpg, fill = factor(am), colour = factor(am))) + scale_fill_brewer(palette = "Pastel2") + scale_colour_brewer(palette = "Set1")
The default gradient generated by ggplot2 is not very good…
ggplot(mtcars, aes(x = wt, y = mpg, colour = hp)) + geom_point(size = 3)
viridis
palette instead.ggplot(mtcars, aes(x = wt, y = mpg, colour = hp)) + geom_point(size = 3) + scale_colour_viridis_c()
ggplot2
since v3.0ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl))) + geom_point(size = 3) + scale_colour_viridis_d()
facet_wrap()
~
: lhs ~ rhs
)vars()
functionggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_wrap(~ cyl)
You can specify the number of columns
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_wrap(~ cyl, ncol = 2)
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_wrap(~ cyl, scales = "free_x")
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_wrap(~ cyl, scales = "free")
facet_grid()
to lay out panels in a gridthe rows on the left and columns on the right separated by a tilde ~
(i.e by)
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_grid(am ~ cyl)
facet_grid()
cont..
) means no faceting for this axis. Mimic facet_wrap()
labeller
argument allows many customisations of strip titlesggplot(mtcars) + # apply to all geoms! aes(x = wt, y = mpg) + geom_point() + facet_grid(. ~ cyl, labeller = label_both) + theme(strip.text = element_text(face = "bold"))
fig.height
, fig.width
fig.asp
…ggplot
object, 2nd argumentggsave("my_name.png", p, width = 60, height = 30, units = "mm") ggsave("my_name.pdf", p, width = 50, height = 50, units = "mm")
ggplot2
introduced the possibility for the community to contribute and create extensions.
They are referenced on a dedicated site
never trust summary statistics alone; always visualize your data Alberto Cairo
source: Justin Matejka, George Fitzmaurice Same Stats, Different Graphs…
geom_tile()
heatmapgeom_bind2d()
2D binninggeom_abline()
slopestat_ellipse()
stat_summary()
easy mean 95CI etc.geom_smooth()
linear/splines/non linearggforce::facet_grid_paginate()
facetsgridExtra::marrangeGrob()
plotsposition_jitter()
random shiftquasirandom()
is bettercoord_cartesian()
for zooming incoord_flip()
exchanges x & yscale_x_log10()
and yscale_x_sqrt()
and ySepal.Length
is not exposediris_plot <- function(flower) { ggplot(iris, aes(x = Species, y = flower)) + geom_violin() + ggbeeswarm::geom_quasirandom() } iris_plot(flower = Sepal.Length)
Error in FUN(X[[i]], ...): object 'Sepal.Length' not found
iris_plot <- function(flower) { ggplot(iris, aes(x = Species, y = flower)) + geom_violin() + ggbeeswarm::geom_quasirandom(groupOnX = TRUE) } iris_plot(flower = "Sepal.Length")
curly-curly
iris_plot <- function(flower) { ggplot(iris, aes(x = Species, y = {{flower}})) + geom_violin() + ggbeeswarm::geom_quasirandom(groupOnX = TRUE) } iris_plot(flower = Sepal.Length)
{{}}
is a shortcut forenquo()
create a quosure, name with its env of origin!!
is bang bang that evaluate name in the appropriate contextesquisse
A compilation of some of my gifs created with #rstats #ggplot2 #gganimate #tweenr https://t.co/nCppSOZv4W
— Marcus Volz (@mgvolz) 4 avril 2017