3.0 Writing active reports in R Studio

R Studio is a surprisingly flexible interface that allows for the creation of documents that can assist in communicating your findings. I have adopted this approach of creating a GitHub repository and active report file in almost all of the projects I have been recently involved in. This allows for all collaborators on the projec to stay up to date on the analysis, workflow, and data work. It also provides a consolidate area where all the files that are associated with a project can be maintained. The report document works ancillary to supporting this data similar to meta-data file explaining the information present in the repository. Ideally, when you get better at generating these files it is possible to keep them dynamic and constantly updating.

For example, lets say you have an experiment that surveys many times throughout its duration and has a particular analysis you expect. As the project continues along, certain individuals may expect updates that require you to conduct analysis and create figures. Having these dynamic reports though can make it as easy as dropping the newly surveyed dataset into the repository and clicking knit to revise the report.

3.1 R-Markdown Syntax

Markdown is a lightweight language that is meant to be a simplified version of HTML. It was created by two individuals, one of them being Aaron Swartz who is quite a controversal but remarkable individual. The relevancy here is that for those who have used HTML may have noticed the level of redundancy in applying text modifiers.

Here are some examples below of the differences

HMTL Markdown Output
<b> text </b> ** text ** text
<h4> heading </h4> #### heading #### heading
<a href="http://www.google.com">Google</a [Google](http://www.google.com) Google

The simplified version of HTML makes it more accessible to the users. A cheatsheet of all the codes can be found within R Studio under help or here.

R Markdown is an amplified version of Markdown. It contains all the same language used to simplify HTML, but also allows for a few others things, such as providing themes and color palettes. The most important feature though is the inclusion of R Code within the documents. This is what allows you to generate a report with analyses, figures, dataests, all embeded within your document.

First, let’s experiment with writing an R Markdown document. Go to File –> New File –> R Markdown Document. Title it something simple without spaces. It will open with a default file that shows some of the basics of R Markdown. Currently, the file is only present as a Markdown file and has no formatting. To see it the way we would like, we need to knit the file which is shown with a ball of yarn on the top panel. When you click knit it will likely ask to save the file, which you can use the name you gave it earlier. The output has your name, date it was created, headers when specified, normal text, and also outputs from R, such a figures. This is a very simple document, but contains all the elements of what you would normally use when generating a report.

3.2 Using Chunks in R Markdown

The chunks in R Markdown are how you add R text and outputs into your report. You can choose what to display in the report as well. Maybe you want to show all the code and output, but sometimes you want a more simple version that just shows the outputs. This can be toggled by what is specified in the chunk.

The way to start a chunk can either be typed out manually or by clicking the insert button here. Markdown here also supports other coding formats such as Python or SQL. The syntax is three Grave Accents followed by {r}. Then the R code is put in there and functions identically to how your normal R console will work. Three Grave Accents close the chunk.

x <- 2 + 2
x
## [1] 4

Here the default is to display both the R code and R output. We can toggle that output only by saying echo=FALSE

## [1] 4

Let’s try to plot with the R chunk using the default cars dataset.

plot(cars$speed,cars$dist)

You could build up your report like so:

3.3 Cars Experiment

We tested the performance of 50 new cars travelling at different speeds to see the distance they could travel (Figure 1). The vehicles were drive on the same track over the course of 10 days. Speed was measured in miles per hour and cars were randomly chosen to drive at speeds between 0 and 25 miles per hour. Distance was measured in miles and was based on the distance travelled in 4 hours.

## Plot the cars speed by distance
plot(cars$speed,cars$dist, xlab="Speed (mph)", ylab="Distance (miles)", pch=19, cex=1.3, cex.axis=1.3, cex.lab=1.5)

## Run a linear model comparing speed and distance
m1 <- lm(dist~speed, data=cars)
summary(m1) ## output of linear model
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12
## add linear model to plot
abline(m1, col="Red", lwd=2)

To compare the distance traveled based on speed, we conducted a linear model with distance as the response variable and speed as the predictor variable. We found that speed significantly predicted distance traveled (F = 89.6, p < 0.001, R2 = 0.64).

The full dataset is available below:

knitr::kable(cars)
speed dist
4 2
4 10
7 4
7 22
8 16
9 10
10 18
10 26
10 34
11 17
11 28
12 14
12 20
12 24
12 28
13 26
13 34
13 34
13 46
14 26
14 36
14 60
14 80
15 20
15 26
15 54
16 32
16 40
17 32
17 40
17 50
18 42
18 56
18 76
18 84
19 36
19 46
19 68
20 32
20 48
20 52
20 56
20 64
22 66
23 54
24 70
24 92
24 93
24 120
25 85

Perhaps though you have a large dataset and you want to be able to go through it more easily. For example the mtcars dataset that is 32 rows by 11 columns can look unwieldy in a report.

knitr::kable(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

This is where datatable can come in that provides a better user interface for data frames that are presented in reports. Allowing for easier movement within the dataset and the option to search.

library(DT)
datatable(mtcars) 

Alternatively there are many ways to specify the kable function that can be found here.

3.4 Other chunk features

There are a few other manipulations you can add to your R Chunk that could come in handy when presenting your findings. Two other features I use often are the warning and message. Warnings often arise during your data analysis, and sometimes they can be disregarded. For example, when NAs are treated as zeros and you want them to be treated as zeros. Having this message appear in your report can make it look untidy or that you are ignoring something that is potentially relevant. Of course, you may not always want to ignore warnings, but assuming you do, adding it within the R chunch label can help {r warning=FALSE}. The same is true with messages. These most commonly arise when you load packages and certain functions are over-written (i.e. masked) by new functions. I tend to mute both of these options by setting {r message=FALSE, warning=FALSE}.

For example, here is what happens when I load the package dplyr without these settings

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Now we try it again with the warning and message set to false.

## first remove the previously loaded package
detach("package:dplyr", unload=TRUE)
## re-load the package
library(dplyr)

Another issue we sometimes run into is the size of figures that sometimes does not fit within the page or looks awkward. Taking a look at our figure from before, we may decide that it looks too bunched up.

## Plot the cars speed by distance
plot(cars$speed,cars$dist, xlab="Speed (mph)", ylab="Distance (miles)", pch=19, cex=1.3, cex.axis=1.3, cex.lab=1.5)

Using fig.width or fig.height you can fit the plot better into the page and stretch it out.

## Plot the cars speed by distance
plot(cars$speed,cars$dist, xlab="Speed (mph)", ylab="Distance (miles)", pch=19, cex=1.3, cex.axis=1.3, cex.lab=1.5)