October 2019
ggplot2
data.frame/tibbleggplot2 when you run into problems.ggplot2
source: thinkR
| x | y | shape |
|---|---|---|
| 25 | 11 | circle |
| 0 | 0 | circle |
| 75 | 53 | square |
| 200 | 300 | square |
x = x, y = y, shape = shape
dot / point
What if we want to split circles and squares?
Now, dot shapes and facets provide the same information.
We could use the shape for another meaningful variable…
| x | y | shape |
|---|---|---|
| 25 | 11 | circle |
| 0 | 0 | circle |
| 75 | 53 | square |
| 200 | 300 | square |
Data visualisation is not meant just to be seen but to be read, like written text Alberto Cairo
geom_point()
geom_line()
geom_bar()
geom_boxplot()
geom_histogram()
geom_density()
Have a look at the cheatsheet or the ggplot2 online documentation to list more possibilities.
tibble
iris <- as_tibble(iris) iris
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# … with 140 more rows
saving the data frame as a tibble enables the smart tibble printing and avoids to list all 150 rows
ggplot(data = iris) + geom_point(mapping = aes(x = Petal.Width, y = Petal.Length))

ggplot2 introduces a break in the workflow from %>% to +
ggplot1ggplot1 was released in 2005 until 2008 by Hadley Wickham.
If the pipe ( %>% in 2014) had been invented before,
ggplot2would have never existed Hadley Wickham
ggplot1: original syntax
# devtools::install_github("hadley/ggplot1")
library(ggplot1)
p <- ggplot(mtcars, list(x = mpg, y = wt))
# need temp p object to avoid too many ()'s
scbrewer(ggpoint(p, list(colour = gear)))ggplot1 with the pipe
library(ggplot1) mtcars %>% ggplot(list(x = mpg, y = wt)) %>% ggpoint(list(colour = gear)) %>% scbrewer()
ggplot2
library(ggplot2)
mtcars %>%
ggplot(aes(x = mpg, y = wt)) +
geom_point(aes(colour = as.factor(gear))) +
scale_colour_brewer("gear", type = "qual")tibble to the variable each ggplot2 geom is expecting.geom_point() for example requires at least the x and y coordinates to draw each point.In our example we need to tell geom_point() which columns should be used as x and y
ggplot(iris) + geom_point(aes(x = Petal.Width, y = Petal.Length))
geom has specific requirement depending on its input
geom_boxplot() expects 1 discrete and 1 continuousgeom_point() accepts additional arguments such as the colour, the transparency (alpha) or the sizeggplot(iris) +
geom_point(aes(x = Petal.Width, y = Petal.Length), # end of aes()
colour = "blue", alpha = 0.6, size = 3)
Note that parameters defined outside the aesthetics aes() are applied to all data.
colour, alpha or size can also be mapped to a column in the data frame.ggplot(iris) +
geom_point(aes(x = Petal.Width, y = Petal.Length,
colour = Species), alpha = 0.6, size = 3)
Note that the colour argument is now inside aes() and must refer to a column in the dataframe.
shape and colour to Species
ggplot(iris) +
geom_point(aes(x = Petal.Width, y = Petal.Length, shape = Species, colour = Species),
alpha = 0.6, size = 3)
ggplot() is passing aesthetics to all geomsmtcars %>%
ggplot(aes(x = wt, y = mpg,
colour = factor(am))) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)

geom_point()
mtcars %>% ggplot(aes(x = wt, y = mpg)) + geom_point(aes(colour = factor(am))) + geom_smooth(method = "lm", se = FALSE)

labs() function
ggplot(iris) +
geom_point(aes(x = Petal.Width, y = Petal.Length, colour = Species),
alpha = 0.6, size = 3) +
labs(x = "Width", y = "Length",
colour = "flower",
title = "Iris dataset", subtitle = "petal measures",
tag = "A", caption = "Fisher, R. A. (1936)")
ggplot(iris) +
geom_histogram(aes(x = Petal.Length, fill = Species),
alpha = 0.8, bins = 30)
The default bin value is 30 and will be printed out as a warning.
Specify your own to avoid the warning.
The density is the count divided by the total number of occurences.
ggplot(iris) +
geom_density(aes(x = Petal.Length, fill = Species),
alpha = 0.6)
ggplot(iris) + geom_histogram(aes(x = Petal.Length, y = stat(density)), fill = "darkgrey", binwidth = 0.1) + geom_density(aes(x = Petal.Length, fill = Species, colour = Species), alpha = 0.4) + theme_classic()

stat(var) are intermediate values calculated by ggplot2 using stat functionsgeom uses a stat function to transform the data:
geom_histogram() uses stat_bin()(y = stat(count / max(count)))stat(density).stat_identity is used in geom_col() (no transformation)geom_bar()geom_bar() counts the number of values in each categorygeom_bar() uses stat_count() (creates count column)ggplot(iris) + geom_bar(aes(x = Species)) # or: geom_bar(aes(x = Species, y = stat(count)))

geom_col()geom_col() uses stat_identity() leaving the data as is.y aesthetic is mandatory for geom_col()geom_bar() with stat = "identity" will let geom_bar() to behave like geom_col()ggplot(iris) +
geom_col(aes(x = Species,
y = Petal.Length))
#ggplot(iris) +
# geom_bar(aes(x = Species, y = Petal.Length),
# stat = "identity")
Let’s use the mtcars dataset now.
mtcars %>%
ggplot() +
geom_bar(aes(x = factor(cyl),
fill = factor(gear)))
Do not stack the barcharts but adjust the horizontal position.
mtcars %>% mutate(cyl = factor(cyl), gear = factor(gear)) %>% ggplot() + # position_dodge2 from v3.0 preserves single or total geom_bar(aes(x = cyl, fill = gear), position = position_dodge2(preserve = "single"))

Let’s stack the barcharts but show proportions.
mtcars %>% mutate(cyl = factor(cyl), gear = factor(gear)) %>% ggplot() + geom_bar(aes(x = cyl, fill = gear), position = "fill")

We can easily switch to polar coordinates:
mtcars %>% mutate(cyl = factor(cyl), gear = factor(gear)) %>% ggplot() + geom_bar(aes(x = cyl, fill = gear), position = "fill") + coord_polar()

ggplot(mtcars) + geom_boxplot(aes(x = factor(cyl), y = mpg))

ggplot(mtcars) +
geom_boxplot(aes(x = factor(cyl),
y = mpg,
fill = factor(am)))
scale_fill_manual() and scale_color_manual()ggplot(mtcars) +
geom_boxplot(aes(x = factor(cyl), y = mpg, fill = factor(am), color = factor(am))) +
scale_fill_manual(values = c("red", "lightblue")) +
scale_color_manual(values = c("purple", "blue"))
library(RColorBrewer) par(mar = c(0, 4, 0, 0)) display.brewer.all()

ggplot(mtcars) +
geom_boxplot(aes(x = factor(cyl), y = mpg,
fill = factor(am), colour = factor(am))) +
scale_fill_brewer(palette = "Pastel2") +
scale_colour_brewer(palette = "Set1")
The default gradient generated by ggplot2 is not very good…
ggplot(mtcars, aes(x = wt, y = mpg, colour = hp)) + geom_point(size = 3)

viridis palette instead.ggplot(mtcars, aes(x = wt, y = mpg, colour = hp)) + geom_point(size = 3) + scale_colour_viridis_c()

ggplot2 since v3.0ggplot(mtcars,
aes(x = wt, y = mpg,
colour = factor(cyl))) +
geom_point(size = 3) +
scale_colour_viridis_d()

facet_wrap()~: lhs ~ rhs)vars() functionggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_wrap(~ cyl)

You can specify the number of columns
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_wrap(~ cyl, ncol = 2)

ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_wrap(~ cyl, scales = "free_x")

ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_wrap(~ cyl, scales = "free")

facet_grid() to lay out panels in a gridthe rows on the left and columns on the right separated by a tilde ~ (i.e by)
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_grid(am ~ cyl)

facet_grid() cont..) means no faceting for this axis. Mimic facet_wrap()labeller argument allows many customisations of strip titlesggplot(mtcars) +
# apply to all geoms!
aes(x = wt, y = mpg) +
geom_point() +
facet_grid(. ~ cyl,
labeller = label_both) +
theme(strip.text = element_text(face = "bold"))
fig.height, fig.widthfig.asp…
ggplot object, 2nd argumentggsave("my_name.png", p, width = 60, height = 30, units = "mm")
ggsave("my_name.pdf", p, width = 50, height = 50, units = "mm")ggplot2 introduced the possibility for the community to contribute and create extensions.
They are referenced on a dedicated site
never trust summary statistics alone; always visualize your data Alberto Cairo
source: Justin Matejka, George Fitzmaurice Same Stats, Different Graphs…
geom_tile() heatmapgeom_bind2d() 2D binninggeom_abline() slopestat_ellipse()stat_summary() easy mean 95CI etc.geom_smooth() linear/splines/non linearggforce::facet_grid_paginate() facetsgridExtra::marrangeGrob() plotsposition_jitter() random shiftquasirandom() is bettercoord_cartesian() for zooming incoord_flip() exchanges x & yscale_x_log10() and yscale_x_sqrt() and ySepal.Length is not exposediris_plot <- function(flower) {
ggplot(iris, aes(x = Species, y = flower)) +
geom_violin() +
ggbeeswarm::geom_quasirandom()
}
iris_plot(flower = Sepal.Length)
Error in FUN(X[[i]], ...): object 'Sepal.Length' not found

iris_plot <- function(flower) {
ggplot(iris, aes(x = Species, y = flower)) +
geom_violin() +
ggbeeswarm::geom_quasirandom(groupOnX = TRUE)
}
iris_plot(flower = "Sepal.Length")

curly-curlyiris_plot <- function(flower) {
ggplot(iris, aes(x = Species, y = {{flower}})) +
geom_violin() +
ggbeeswarm::geom_quasirandom(groupOnX = TRUE)
}
iris_plot(flower = Sepal.Length)

{{}} is a shortcut forenquo() create a quosure, name with its env of origin!! is bang bang that evaluate name in the appropriate contextesquisse
A compilation of some of my gifs created with #rstats #ggplot2 #gganimate #tweenr https://t.co/nCppSOZv4W
— Marcus Volz (@mgvolz) 4 avril 2017