R Studio is a surprisingly flexible interface that allows for the creation of documents that can assist in communicating your findings. I have adopted this approach of creating a GitHub repository and active report file in almost all of the projects I have been recently involved in. This allows for all collaborators on the projec to stay up to date on the analysis, workflow, and data work. It also provides a consolidate area where all the files that are associated with a project can be maintained. The report document works ancillary to supporting this data similar to meta-data file explaining the information present in the repository. Ideally, when you get better at generating these files it is possible to keep them dynamic and constantly updating.
For example, lets say you have an experiment that surveys many times throughout its duration and has a particular analysis you expect. As the project continues along, certain individuals may expect updates that require you to conduct analysis and create figures. Having these dynamic reports though can make it as easy as dropping the newly surveyed dataset into the repository and clicking knit
to revise the report.
Markdown is a lightweight language that is meant to be a simplified version of HTML. It was created by two individuals, one of them being Aaron Swartz who is quite a controversal but remarkable individual. The relevancy here is that for those who have used HTML may have noticed the level of redundancy in applying text modifiers.
Here are some examples below of the differences
HMTL | Markdown | Output |
---|---|---|
<b> text </b> |
** text ** |
text |
<h4> heading </h4> |
#### heading |
#### heading |
<a href="http://www.google.com">Google</a |
[Google](http://www.google.com) |
The simplified version of HTML makes it more accessible to the users. A cheatsheet of all the codes can be found within R Studio under help or here.
R Markdown is an amplified version of Markdown. It contains all the same language used to simplify HTML, but also allows for a few others things, such as providing themes and color palettes. The most important feature though is the inclusion of R Code within the documents. This is what allows you to generate a report with analyses, figures, dataests, all embeded within your document.
First, let’s experiment with writing an R Markdown document. Go to File –> New File –> R Markdown Document. Title it something simple without spaces. It will open with a default file that shows some of the basics of R Markdown. Currently, the file is only present as a Markdown file and has no formatting. To see it the way we would like, we need to knit
the file which is shown with a ball of yarn on the top panel. When you click knit it will likely ask to save the file, which you can use the name you gave it earlier. The output has your name, date it was created, headers when specified, normal text, and also outputs from R, such a figures. This is a very simple document, but contains all the elements of what you would normally use when generating a report.
The chunks in R Markdown are how you add R text and outputs into your report. You can choose what to display in the report as well. Maybe you want to show all the code and output, but sometimes you want a more simple version that just shows the outputs. This can be toggled by what is specified in the chunk.
The way to start a chunk can either be typed out manually or by clicking the insert
button here. Markdown here also supports other coding formats such as Python or SQL. The syntax is three Grave Accents followed by {r}
. Then the R code is put in there and functions identically to how your normal R console will work. Three Grave Accents close the chunk.
x <- 2 + 2
x
## [1] 4
Here the default is to display both the R code and R output. We can toggle that output only by saying echo=FALSE
## [1] 4
Let’s try to plot with the R chunk using the default cars
dataset.
plot(cars$speed,cars$dist)
You could build up your report like so:
We tested the performance of 50 new cars travelling at different speeds to see the distance they could travel (Figure 1). The vehicles were drive on the same track over the course of 10 days. Speed was measured in miles per hour and cars were randomly chosen to drive at speeds between 0 and 25 miles per hour. Distance was measured in miles and was based on the distance travelled in 4 hours.
## Plot the cars speed by distance
plot(cars$speed,cars$dist, xlab="Speed (mph)", ylab="Distance (miles)", pch=19, cex=1.3, cex.axis=1.3, cex.lab=1.5)
## Run a linear model comparing speed and distance
m1 <- lm(dist~speed, data=cars)
summary(m1) ## output of linear model
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
## add linear model to plot
abline(m1, col="Red", lwd=2)
To compare the distance traveled based on speed, we conducted a linear model with distance as the response variable and speed as the predictor variable. We found that speed significantly predicted distance traveled (F = 89.6, p < 0.001, R2 = 0.64).
The full dataset is available below:
knitr::kable(cars)
speed | dist |
---|---|
4 | 2 |
4 | 10 |
7 | 4 |
7 | 22 |
8 | 16 |
9 | 10 |
10 | 18 |
10 | 26 |
10 | 34 |
11 | 17 |
11 | 28 |
12 | 14 |
12 | 20 |
12 | 24 |
12 | 28 |
13 | 26 |
13 | 34 |
13 | 34 |
13 | 46 |
14 | 26 |
14 | 36 |
14 | 60 |
14 | 80 |
15 | 20 |
15 | 26 |
15 | 54 |
16 | 32 |
16 | 40 |
17 | 32 |
17 | 40 |
17 | 50 |
18 | 42 |
18 | 56 |
18 | 76 |
18 | 84 |
19 | 36 |
19 | 46 |
19 | 68 |
20 | 32 |
20 | 48 |
20 | 52 |
20 | 56 |
20 | 64 |
22 | 66 |
23 | 54 |
24 | 70 |
24 | 92 |
24 | 93 |
24 | 120 |
25 | 85 |
Perhaps though you have a large dataset and you want to be able to go through it more easily. For example the mtcars dataset that is 32 rows by 11 columns can look unwieldy in a report.
knitr::kable(mtcars)
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |
AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |
Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |
Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |
Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |
Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |
Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |
Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |
This is where datatable
can come in that provides a better user interface for data frames that are presented in reports. Allowing for easier movement within the dataset and the option to search.
library(DT)
datatable(mtcars)
Alternatively there are many ways to specify the kable
function that can be found here.
There are a few other manipulations you can add to your R Chunk that could come in handy when presenting your findings. Two other features I use often are the warning
and message
. Warnings often arise during your data analysis, and sometimes they can be disregarded. For example, when NAs are treated as zeros and you want them to be treated as zeros. Having this message appear in your report can make it look untidy or that you are ignoring something that is potentially relevant. Of course, you may not always want to ignore warnings, but assuming you do, adding it within the R chunch label can help {r warning=FALSE}
. The same is true with messages. These most commonly arise when you load packages and certain functions are over-written (i.e. masked) by new functions. I tend to mute both of these options by setting {r message=FALSE, warning=FALSE}
.
For example, here is what happens when I load the package dplyr without these settings
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Now we try it again with the warning and message set to false.
## first remove the previously loaded package
detach("package:dplyr", unload=TRUE)
## re-load the package
library(dplyr)
Another issue we sometimes run into is the size of figures that sometimes does not fit within the page or looks awkward. Taking a look at our figure from before, we may decide that it looks too bunched up.
## Plot the cars speed by distance
plot(cars$speed,cars$dist, xlab="Speed (mph)", ylab="Distance (miles)", pch=19, cex=1.3, cex.axis=1.3, cex.lab=1.5)
Using fig.width
or fig.height
you can fit the plot better into the page and stretch it out.
## Plot the cars speed by distance
plot(cars$speed,cars$dist, xlab="Speed (mph)", ylab="Distance (miles)", pch=19, cex=1.3, cex.axis=1.3, cex.lab=1.5)