September 2019
rmarkdown?
+
+ 
Markdown is used to format the text
<h1></h1> in HTML)<!DOCTYPE html> <html> <body> <h1>This is a heading</h1> <p>This is some text in a paragraph.</p> </body> </html>
#)# This is a heading This is some text in a paragraph
#, ##, ### …**This will be bold**)*This will be italic*)http://example.com is auto-linked[description](http://example.com)code ( inline coding stuff )``` This is *verbatim* code # Even headers are not interpreted ```
Help > Cheatsheets menu.10:00
Before writing your own Rmarkdown document, use the excellent ressource on commonmark.org to learn the basics of markdown formatting.
An alternative online ressource can be found on www.markdowntutorial.com
from the Rmarkdown cheatsheet
The only two things that make @JennyBryan 😤😠🤯. Instead use projects + here::here() #rstats pic.twitter.com/GwxnHePL4n
— Hadley Wickham (@hadleywickham) December 11, 2017
Use here package to build paths
.Rproj).git).here filesource: Jennifer Bryan’s article and test repo
gitdatahere::here()here::dr_here()
Use the knit button in RStudio
```)```{r} is the minimum to define a starting R chunk
Use backticks ( ) followed by the keyword r:\ ``r
Type in 1 + 1 = `r 1+1` to render 1 + 1 = 2.
rmarkdown::render()
csl) files[@citation-key]--- title: "Sample Document" output: html_document bibliography: bibliography.bib csl: nature.csl --- Insert your reference [@my-reference] like I did.
readr
readr to import your data into R.csv, .tsv, …).xls, .xlsx).sas from SAS, .sav from SPSS, .dta from Stata)
read.csv(), read.delim())
Use as_tibble() to convert a data.frame to a tibble
tibble vs data.framedata.frame
iris
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa 7 4.6 3.4 1.4 0.3 setosa 8 5.0 3.4 1.5 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 4.9 3.1 1.5 0.1 setosa 11 5.4 3.7 1.5 0.2 setosa 12 4.8 3.4 1.6 0.2 setosa 13 4.8 3.0 1.4 0.1 setosa 14 4.3 3.0 1.1 0.1 setosa 15 5.8 4.0 1.2 0.2 setosa 16 5.7 4.4 1.5 0.4 setosa 17 5.4 3.9 1.3 0.4 setosa 18 5.1 3.5 1.4 0.3 setosa 19 5.7 3.8 1.7 0.3 setosa 20 5.1 3.8 1.5 0.3 setosa 21 5.4 3.4 1.7 0.2 setosa 22 5.1 3.7 1.5 0.4 setosa 23 4.6 3.6 1.0 0.2 setosa 24 5.1 3.3 1.7 0.5 setosa 25 4.8 3.4 1.9 0.2 setosa 26 5.0 3.0 1.6 0.2 setosa 27 5.0 3.4 1.6 0.4 setosa 28 5.2 3.5 1.5 0.2 setosa 29 5.2 3.4 1.4 0.2 setosa 30 4.7 3.2 1.6 0.2 setosa 31 4.8 3.1 1.6 0.2 setosa 32 5.4 3.4 1.5 0.4 setosa 33 5.2 4.1 1.5 0.1 setosa 34 5.5 4.2 1.4 0.2 setosa 35 4.9 3.1 1.5 0.2 setosa 36 5.0 3.2 1.2 0.2 setosa 37 5.5 3.5 1.3 0.2 setosa 38 4.9 3.6 1.4 0.1 setosa 39 4.4 3.0 1.3 0.2 setosa 40 5.1 3.4 1.5 0.2 setosa 41 5.0 3.5 1.3 0.3 setosa 42 4.5 2.3 1.3 0.3 setosa 43 4.4 3.2 1.3 0.2 setosa 44 5.0 3.5 1.6 0.6 setosa 45 5.1 3.8 1.9 0.4 setosa 46 4.8 3.0 1.4 0.3 setosa 47 5.1 3.8 1.6 0.2 setosa 48 4.6 3.2 1.4 0.2 setosa 49 5.3 3.7 1.5 0.2 setosa 50 5.0 3.3 1.4 0.2 setosa 51 7.0 3.2 4.7 1.4 versicolor 52 6.4 3.2 4.5 1.5 versicolor 53 6.9 3.1 4.9 1.5 versicolor 54 5.5 2.3 4.0 1.3 versicolor 55 6.5 2.8 4.6 1.5 versicolor 56 5.7 2.8 4.5 1.3 versicolor 57 6.3 3.3 4.7 1.6 versicolor 58 4.9 2.4 3.3 1.0 versicolor 59 6.6 2.9 4.6 1.3 versicolor 60 5.2 2.7 3.9 1.4 versicolor 61 5.0 2.0 3.5 1.0 versicolor 62 5.9 3.0 4.2 1.5 versicolor 63 6.0 2.2 4.0 1.0 versicolor 64 6.1 2.9 4.7 1.4 versicolor 65 5.6 2.9 3.6 1.3 versicolor 66 6.7 3.1 4.4 1.4 versicolor 67 5.6 3.0 4.5 1.5 versicolor 68 5.8 2.7 4.1 1.0 versicolor 69 6.2 2.2 4.5 1.5 versicolor 70 5.6 2.5 3.9 1.1 versicolor 71 5.9 3.2 4.8 1.8 versicolor 72 6.1 2.8 4.0 1.3 versicolor 73 6.3 2.5 4.9 1.5 versicolor 74 6.1 2.8 4.7 1.2 versicolor 75 6.4 2.9 4.3 1.3 versicolor 76 6.6 3.0 4.4 1.4 versicolor 77 6.8 2.8 4.8 1.4 versicolor 78 6.7 3.0 5.0 1.7 versicolor 79 6.0 2.9 4.5 1.5 versicolor 80 5.7 2.6 3.5 1.0 versicolor 81 5.5 2.4 3.8 1.1 versicolor 82 5.5 2.4 3.7 1.0 versicolor 83 5.8 2.7 3.9 1.2 versicolor 84 6.0 2.7 5.1 1.6 versicolor 85 5.4 3.0 4.5 1.5 versicolor 86 6.0 3.4 4.5 1.6 versicolor 87 6.7 3.1 4.7 1.5 versicolor 88 6.3 2.3 4.4 1.3 versicolor 89 5.6 3.0 4.1 1.3 versicolor 90 5.5 2.5 4.0 1.3 versicolor 91 5.5 2.6 4.4 1.2 versicolor 92 6.1 3.0 4.6 1.4 versicolor 93 5.8 2.6 4.0 1.2 versicolor 94 5.0 2.3 3.3 1.0 versicolor 95 5.6 2.7 4.2 1.3 versicolor 96 5.7 3.0 4.2 1.2 versicolor 97 5.7 2.9 4.2 1.3 versicolor 98 6.2 2.9 4.3 1.3 versicolor 99 5.1 2.5 3.0 1.1 versicolor 100 5.7 2.8 4.1 1.3 versicolor 101 6.3 3.3 6.0 2.5 virginica 102 5.8 2.7 5.1 1.9 virginica 103 7.1 3.0 5.9 2.1 virginica 104 6.3 2.9 5.6 1.8 virginica 105 6.5 3.0 5.8 2.2 virginica 106 7.6 3.0 6.6 2.1 virginica 107 4.9 2.5 4.5 1.7 virginica 108 7.3 2.9 6.3 1.8 virginica 109 6.7 2.5 5.8 1.8 virginica 110 7.2 3.6 6.1 2.5 virginica 111 6.5 3.2 5.1 2.0 virginica 112 6.4 2.7 5.3 1.9 virginica 113 6.8 3.0 5.5 2.1 virginica 114 5.7 2.5 5.0 2.0 virginica 115 5.8 2.8 5.1 2.4 virginica 116 6.4 3.2 5.3 2.3 virginica 117 6.5 3.0 5.5 1.8 virginica 118 7.7 3.8 6.7 2.2 virginica 119 7.7 2.6 6.9 2.3 virginica 120 6.0 2.2 5.0 1.5 virginica 121 6.9 3.2 5.7 2.3 virginica 122 5.6 2.8 4.9 2.0 virginica 123 7.7 2.8 6.7 2.0 virginica 124 6.3 2.7 4.9 1.8 virginica 125 6.7 3.3 5.7 2.1 virginica 126 7.2 3.2 6.0 1.8 virginica 127 6.2 2.8 4.8 1.8 virginica 128 6.1 3.0 4.9 1.8 virginica 129 6.4 2.8 5.6 2.1 virginica 130 7.2 3.0 5.8 1.6 virginica 131 7.4 2.8 6.1 1.9 virginica 132 7.9 3.8 6.4 2.0 virginica 133 6.4 2.8 5.6 2.2 virginica 134 6.3 2.8 5.1 1.5 virginica 135 6.1 2.6 5.6 1.4 virginica 136 7.7 3.0 6.1 2.3 virginica 137 6.3 3.4 5.6 2.4 virginica 138 6.4 3.1 5.5 1.8 virginica 139 6.0 3.0 4.8 1.8 virginica 140 6.9 3.1 5.4 2.1 virginica 141 6.7 3.1 5.6 2.4 virginica 142 6.9 3.1 5.1 2.3 virginica 143 5.8 2.7 5.1 1.9 virginica 144 6.8 3.2 5.9 2.3 virginica 145 6.7 3.3 5.7 2.5 virginica 146 6.7 3.0 5.2 2.3 virginica 147 6.3 2.5 5.0 1.9 virginica 148 6.5 3.0 5.2 2.0 virginica 149 6.2 3.4 5.4 2.3 virginica 150 5.9 3.0 5.1 1.8 virginica
tibble vs data.frametibble
# library(tibble) as_tibble(iris)
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# … with 140 more rowstibble adjusts to width
# A tibble: 150 x 5
Sepal.Length Sepal.Width
<dbl> <dbl>
1 5.1 3.5
2 4.9 3
3 4.7 3.2
4 4.6 3.1
5 5 3.6
6 5.4 3.9
7 4.6 3.4
8 5 3.4
9 4.4 2.9
10 4.9 3.1
# … with 140 more rows, and 3
# more variables:
# Petal.Length <dbl>,
# Petal.Width <dbl>,
# Species <fct>tibble()base::data.frame() but
data.frame(`bad name` = 1:4,
x = rep(letters[1:2], 2)) %>%
str()'data.frame': 4 obs. of 2 variables: $ bad.name: int 1 2 3 4 $ x : Factor w/ 2 levels "a","b": 1 2 1 2
tibble(`bad name` = 1:4,
x = rep(letters[1:2], 2)) %>%
str()Classes 'tbl_df', 'tbl' and 'data.frame': 4 obs. of 2 variables: $ bad name: int 1 2 3 4 $ x : chr "a" "b" "a" "b"
Seven file formats are supported by the readr package:
read_csv(): comma separated (CSV) filesread_tsv(): tab separated filesread_delim(): general delimited filesread_fwf(): fixed width filesread_table(): tabular files where colums are separated by white-space.read_log(): web log filesreadxl
To import excel files (.xls and .xlsx):
read_excel()
read_xls()read_xlsx()
read_sas() for SASread_sav() for SPSSread_dta() for Statamtcars.csv file to your project folder (using your favourite browser)"mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb" 21,6,160,110,3.9,2.62,16.46,0,1,4,4 21,6,160,110,3.9,2.875,17.02,0,1,4,4 22.8,4,108,93,3.85,2.32,18.61,1,1,4,1 21.4,6,258,110,3.08,3.215,19.44,1,0,3,1 18.7,8,360,175,3.15,3.44,17.02,0,0,3,2 ...
readrmtcars.csv datasetImport Dataset button to import the mtcars.csv file.
read_csv().zip, .gz, …)read_csv()
read_csv(here::here("data", "mtcars.csv"))Parsed with column specification: cols( mpg = col_double(), cyl = col_double(), disp = col_double(), hp = col_double(), drat = col_double(), wt = col_double(), qsec = col_double(), vs = col_double(), am = col_double(), gear = col_double(), carb = col_double() )
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# … with 22 more rowsguess_max optionmessage = FALSE in your rmarkdown chunk option.col_types = cols()col_types to avoid any problemParsed with column specification: cols( mpg = col_double(), cyl = col_double(), disp = col_double(), hp = col_double(), drat = col_double(), wt = col_double(), qsec = col_double(), vs = col_double(), am = col_double(), gear = col_double(), carb = col_double() )
col_types argumentexa <- here::here("data", "example.csv")read_csv(exa, col_types = cols())
# A tibble: 3 x 3 animal colour value <chr> <chr> <dbl> 1 dog red 1 2 cat blue 2 3 chicken green 6
animal, colour and valuecols() functiondouble, integer, character, logical, factor, date, datetime or timeUsing a function defining each type:
col_double()col_integer()col_character()col_logical()col_factor()col_date()col_datetime()col_time()Or telling to guess or skip a column:
col_guess()col_skip()read_csv(exa,
col_types = cols(
animal = col_character(),
colour = col_character(),
value = col_integer()
))
# A tibble: 3 x 3 animal colour value <chr> <chr> <int> 1 dog red 1 2 cat blue 2 3 chicken green 6
Using a single character to define each type:
c = characteri = integern = numberd = doublel = logicalD = dateT = date timet = timeOr telling to guess or skip a column:
? = guess_ or - = skipread_csv(exa,
col_types = cols(
animal = "c",
colour = "c",
value = "i"
))
# A tibble: 3 x 3 animal colour value <chr> <chr> <int> 1 dog red 1 2 cat blue 2 3 chicken green 6
read_csv(exa, col_types = "cci")
example.csv file but
colour columnvalue column as doubleread_csv(exa, col_types = cols(animal = col_character(),
colour = col_skip(),
value = col_double()))# A tibble: 3 x 2 animal value <chr> <dbl> 1 dog 1 2 cat 2 3 chicken 6
read_csv(exa, col_types = cols(animal = "c",
colour = "_",
value = "d"))# A tibble: 3 x 2 animal value <chr> <dbl> 1 dog 1 2 cat 2 3 chicken 6
read_csv(exa, col_types = "c_d")
# A tibble: 3 x 2 animal value <chr> <dbl> 1 dog 1 2 cat 2 3 chicken 6
col_names argumentTRUE, FALSE or a character vector.TRUETRUE, the first row will be used as column namesFALSE, names are generated (X1, X2, X3, …)read_csv(exa,
col_names = TRUE)
# A tibble: 3 x 3 animal colour value <chr> <chr> <dbl> 1 dog red 1 2 cat blue 2 3 chicken green 6
read_csv(exa,
col_names = FALSE)
# A tibble: 4 x 3 X1 X2 X3 <chr> <chr> <chr> 1 animal colour value 2 dog red 1 3 cat blue 2 4 chicken green 6
read_csv(exa,
col_names = c("name", "colname", "number"))
# A tibble: 4 x 3 name colname number <chr> <chr> <chr> 1 animal colour value 2 dog red 1 3 cat blue 2 4 chicken green 6
col_names is handy if they are no column names in the filedplyr::rename() (see upcoming dplyr lecture).read_csv(exa, col_names = c("name", "colname", "number"))
# A tibble: 4 x 3 name colname number <chr> <chr> <chr> 1 animal colour value 2 dog red 1 3 cat blue 2 4 chicken green 6
read_csv(exa, col_names = TRUE) %>%
rename(name = animal,
colname = colour,
number = value)
# A tibble: 3 x 3 name colname number <chr> <chr> <dbl> 1 dog red 1 2 cat blue 2 3 chicken green 6
skip argumentTo skip the first n rows
n_max argumentTo stop reading after n rows
You might want to adjust col_names to get what you want
readr_example("mtcars.csv") %>%
read_csv(skip = 3,
n_max = 3,
col_names = FALSE)# A tibble: 3 x 11
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
2 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
3 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2readr_example("mtcars.csv") %>%
read_csv(skip = 3, n_max = 3,
col_names = c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"))# A tibble: 3 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
2 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
3 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2readr functionsread_csv()
read_csv2()
read_tsv()
read_delim()
read_delim(file, delim = "|", ...)
read_fwf()
fread from data.tableinstall.packages("data.table")readrvroominstall.packages("vroom")readr)ALTREP frameworkreadr in some conditions
readr to import your flat file data into R
vignette("readr")