Reports

3.0 Writing active reports in R Studio

R Studio is a surprisingly flexible interface that allows for the creation of documents that can assist in communicating your findings. I have adopted this approach of creating a GitHub repository and active report file in almost all of the projects I have been recently involved in. This allows for all collaborators on the projec to stay up to date on the analysis, workflow, and data work. It also provides a consolidate area where all the files that are associated with a project can be maintained. The report document works ancillary to supporting this data similar to meta-data file explaining the information present in the repository. Ideally, when you get better at generating these files it is possible to keep them dynamic and constantly updating.

For example, lets say you have an experiment that surveys many times throughout its duration and has a particular analysis you expect. As the project continues along, certain individuals may expect updates that require you to conduct analysis and create figures. Having these dynamic reports though can make it as easy as dropping the newly surveyed dataset into the repository and clicking knit to revise the report.

3.1 R-Markdown Syntax

Markdown is a lightweight language that is meant to be a simplified version of HTML. It was created by two individuals, one of them being Aaron Swartz who is quite a controversal but remarkable individual. The relevancy here is that for those who have used HTML may have noticed the level of redundancy in applying text modifiers.

Here are some examples below of the differences

HMTL	Markdown	Output
`<b> text </b>`	` text `	text
`<h4> heading </h4>`	`#### heading`	#### heading
`<a href="http://www.google.com">Google</a`	`[Google](http://www.google.com)`	Google

The simplified version of HTML makes it more accessible to the users. A cheatsheet of all the codes can be found within R Studio under help or here.

R Markdown is an amplified version of Markdown. It contains all the same language used to simplify HTML, but also allows for a few others things, such as providing themes and color palettes. The most important feature though is the inclusion of R Code within the documents. This is what allows you to generate a report with analyses, figures, dataests, all embeded within your document.

First, let’s experiment with writing an R Markdown document. Go to File –> New File –> R Markdown Document. Title it something simple without spaces. It will open with a default file that shows some of the basics of R Markdown. Currently, the file is only present as a Markdown file and has no formatting. To see it the way we would like, we need to knit the file which is shown with a ball of yarn on the top panel. When you click knit it will likely ask to save the file, which you can use the name you gave it earlier. The output has your name, date it was created, headers when specified, normal text, and also outputs from R, such a figures. This is a very simple document, but contains all the elements of what you would normally use when generating a report.

3.2 Using Chunks in R Markdown

The chunks in R Markdown are how you add R text and outputs into your report. You can choose what to display in the report as well. Maybe you want to show all the code and output, but sometimes you want a more simple version that just shows the outputs. This can be toggled by what is specified in the chunk.

The way to start a chunk can either be typed out manually or by clicking the insert button here. Markdown here also supports other coding formats such as Python or SQL. The syntax is three Grave Accents followed by {r}. Then the R code is put in there and functions identically to how your normal R console will work. Three Grave Accents close the chunk.

x <- 2 + 2
x

## [1] 4

Here the default is to display both the R code and R output. We can toggle that output only by saying echo=FALSE

## [1] 4

Let’s try to plot with the R chunk using the default cars dataset.

plot(cars$speed,cars$dist)

You could build up your report like so:

3.3 Cars Experiment

We tested the performance of 50 new cars travelling at different speeds to see the distance they could travel (Figure 1). The vehicles were drive on the same track over the course of 10 days. Speed was measured in miles per hour and cars were randomly chosen to drive at speeds between 0 and 25 miles per hour. Distance was measured in miles and was based on the distance travelled in 4 hours.

## Plot the cars speed by distance
plot(cars$speed,cars$dist, xlab="Speed (mph)", ylab="Distance (miles)", pch=19, cex=1.3, cex.axis=1.3, cex.lab=1.5)

## Run a linear model comparing speed and distance
m1 <- lm(dist~speed, data=cars)
summary(m1) ## output of linear model

## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

## add linear model to plot
abline(m1, col="Red", lwd=2)

To compare the distance traveled based on speed, we conducted a linear model with distance as the response variable and speed as the predictor variable. We found that speed significantly predicted distance traveled (F = 89.6, p < 0.001, R2 = 0.64).

The full dataset is available below:

knitr::kable(cars)

speed	dist
4	2
4	10
7	4
7	22
8	16
9	10
10	18
10	26
10	34
11	17
11	28
12	14
12	20
12	24
12	28
13	26
13	34
13	34
13	46
14	26
14	36
14	60
14	80
15	20
15	26
15	54
16	32
16	40
17	32
17	40
17	50
18	42
18	56
18	76
18	84
19	36
19	46
19	68
20	32
20	48
20	52
20	56
20	64
22	66
23	54
24	70
24	92
24	93
24	120
25	85

Perhaps though you have a large dataset and you want to be able to go through it more easily. For example the mtcars dataset that is 32 rows by 11 columns can look unwieldy in a report.

knitr::kable(mtcars)

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160.0	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160.0	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108.0	93	3.85	2.320	18.61	1	1	4	1
Hornet 4 Drive	21.4	6	258.0	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360.0	175	3.15	3.440	17.02	0	0	3	2
Valiant	18.1	6	225.0	105	2.76	3.460	20.22	1	0	3	1
Duster 360	14.3	8	360.0	245	3.21	3.570	15.84	0	0	3	4
Merc 240D	24.4	4	146.7	62	3.69	3.190	20.00	1	0	4	2
Merc 230	22.8	4	140.8	95	3.92	3.150	22.90	1	0	4	2
Merc 280	19.2	6	167.6	123	3.92	3.440	18.30	1	0	4	4
Merc 280C	17.8	6	167.6	123	3.92	3.440	18.90	1	0	4	4
Merc 450SE	16.4	8	275.8	180	3.07	4.070	17.40	0	0	3	3
Merc 450SL	17.3	8	275.8	180	3.07	3.730	17.60	0	0	3	3
Merc 450SLC	15.2	8	275.8	180	3.07	3.780	18.00	0	0	3	3
Cadillac Fleetwood	10.4	8	472.0	205	2.93	5.250	17.98	0	0	3	4
Lincoln Continental	10.4	8	460.0	215	3.00	5.424	17.82	0	0	3	4
Chrysler Imperial	14.7	8	440.0	230	3.23	5.345	17.42	0	0	3	4
Fiat 128	32.4	4	78.7	66	4.08	2.200	19.47	1	1	4	1
Honda Civic	30.4	4	75.7	52	4.93	1.615	18.52	1	1	4	2
Toyota Corolla	33.9	4	71.1	65	4.22	1.835	19.90	1	1	4	1
Toyota Corona	21.5	4	120.1	97	3.70	2.465	20.01	1	0	3	1
Dodge Challenger	15.5	8	318.0	150	2.76	3.520	16.87	0	0	3	2
AMC Javelin	15.2	8	304.0	150	3.15	3.435	17.30	0	0	3	2
Camaro Z28	13.3	8	350.0	245	3.73	3.840	15.41	0	0	3	4
Pontiac Firebird	19.2	8	400.0	175	3.08	3.845	17.05	0	0	3	2
Fiat X1-9	27.3	4	79.0	66	4.08	1.935	18.90	1	1	4	1
Porsche 914-2	26.0	4	120.3	91	4.43	2.140	16.70	0	1	5	2
Lotus Europa	30.4	4	95.1	113	3.77	1.513	16.90	1	1	5	2
Ford Pantera L	15.8	8	351.0	264	4.22	3.170	14.50	0	1	5	4
Ferrari Dino	19.7	6	145.0	175	3.62	2.770	15.50	0	1	5	6
Maserati Bora	15.0	8	301.0	335	3.54	3.570	14.60	0	1	5	8
Volvo 142E	21.4	4	121.0	109	4.11	2.780	18.60	1	1	4	2

This is where datatable can come in that provides a better user interface for data frames that are presented in reports. Allowing for easier movement within the dataset and the option to search.

library(DT)
datatable(mtcars)

Alternatively there are many ways to specify the kable function that can be found here.

3.4 Other chunk features

There are a few other manipulations you can add to your R Chunk that could come in handy when presenting your findings. Two other features I use often are the warning and message. Warnings often arise during your data analysis, and sometimes they can be disregarded. For example, when NAs are treated as zeros and you want them to be treated as zeros. Having this message appear in your report can make it look untidy or that you are ignoring something that is potentially relevant. Of course, you may not always want to ignore warnings, but assuming you do, adding it within the R chunch label can help {r warning=FALSE}. The same is true with messages. These most commonly arise when you load packages and certain functions are over-written (i.e. masked) by new functions. I tend to mute both of these options by setting {r message=FALSE, warning=FALSE}.

For example, here is what happens when I load the package dplyr without these settings

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Now we try it again with the warning and message set to false.

## first remove the previously loaded package
detach("package:dplyr", unload=TRUE)
## re-load the package
library(dplyr)

Another issue we sometimes run into is the size of figures that sometimes does not fit within the page or looks awkward. Taking a look at our figure from before, we may decide that it looks too bunched up.

## Plot the cars speed by distance
plot(cars$speed,cars$dist, xlab="Speed (mph)", ylab="Distance (miles)", pch=19, cex=1.3, cex.axis=1.3, cex.lab=1.5)

Using fig.width or fig.height you can fit the plot better into the page and stretch it out.

## Plot the cars speed by distance
plot(cars$speed,cars$dist, xlab="Speed (mph)", ylab="Distance (miles)", pch=19, cex=1.3, cex.axis=1.3, cex.lab=1.5)