Introduction to R functions

library(tidyverse)
library(ggthemes)

Introduction

Writing functions is a fundamental component of programming and therefore scares many people. However! Functions can be easier than you’d think. For example, think of what your perfect bar plot looks like. How much time do you spend making your bar plots look like that? Wouldn’t it be great if the very first time you plotted a barplot it looked exactly the way you wanted it to? With functions you can do that, and all the knowledge you’d need is how to make the barplot look the way you want. This works shop gives a quick overview of how to use functions to make your life easier. Demonstrations will be done in R, but are applicable to most languages.

One of the more powerful aspects of programming languages relative GUI statistical softwares is the ability to create functions. Functions are extremely powerful because they allow for the automation of repetitive tasks and reduce bloated code from the more traditional “copy-paste” format. This becomes extremely powerful when combined with the apply family or loops. However, even for simple tasks, functions can make your life easier.

Why use functions?

  • save complicated operations or mathematical calculations
  • reduce repetitive tasks
  • standardize formats (e.g. plots, maps, etc)
  • combine with apply family for automation of tasks

Function Syntax

The format of function is fairly simple:

function.name <- function(arguments){ arguments + operation }

Here arguments can be any lists of text or characters that will later be used within the formula. Within the function, the operate will have two general components. Firstly, the portion of the function that will remain unchanged each time it is used. The second are the arguments that will be changed each time.

## Write the first function
plus1 <- function(x){
x+1 ## the arugment and operation
}
plus1(x=5)
## [1] 6

For these functions there are two items that will remain unchanged. The first is the calculation. In the above example, the addition operation will not change and remains the same. These types of functions are useful when you have a particular index that you are trying to calculate, but the inputs will keep changing.

For instance say you want to convert degrees to radians. The function you would write would be this.

deg2rad <- function(deg) {
  (deg * pi) / (180)
}

deg2rad(90)
## [1] 1.570796
## For simpler functions you can write them as one line
deg2rad <- function(deg) {(deg * pi) / (180)} 

Log response ratio is a calculation when comparing two groups based on a treatment vs control format.

LRR <- function(treatment,control){
  log(treatment/control)
}
LRR(treatment=30, control=20)
## [1] 0.4054651

Here we see that the log response ratio of our two individuals. Specifying the treatments is not entirely necessary. Depending on how you set up your arguments, R will assume your inputs will be in the order that they are presented. The function can also handle any type of data format such as number, vector, shapefile, dataframe, etc, just as long as what is specified within the function can handle it.

treat1 = c(30,40,50,35,45,55)
control1= c(20,25,30,23,25,20)
LRR(treat1, control1)
## [1] 0.4054651 0.4700036 0.5108256 0.4198538 0.5877867 1.0116009

You can also utilize functions alreay present within R.

## Create function to calculate average
average <- function(x){
  sum(x)/length(x) 
}

## assign LRR to object
results <- LRR(treat1, control1)
## calculate the average of results from LRR
average(results)
## [1] 0.5675893

There is no standard error function present in base R, so this simple formula below is probably one that I use the most.

se <- function(x){ sd(x)/sqrt(length(x)) } ## sd/sqrt(n)

Functions for plots

Creating plots requires a lot of ajustments. The aesthetic component is getting better and packages such as ggplot2 help format plots to be more appealing. However, defaults are not ideal and often not publication quality. It can be helpful to create a function that matches your ideal plot archetype and then edit the inputs only.

Lets use the cars dataset to conduct and plot a correlation

plot(cars$speed, cars$dist)

On default this plot is pretty ugly. Let’s try and make it better

## Improve the aesthetic of my plot
par(mar = c(4.5, 4.5, 0.5, 0.5))
plot(cars$speed, cars$dist, pch=19, cex.axis=1.3, cex.lab=1.5, frame.plot = F)

## Revise it as a function
myscatter <- function(x, y){
par(mar = c(4.5, 4.5, 0.5, 0.5))
plot(x, y, pch=19, cex.axis=1.3, cex.lab=1.5, frame.plot = F)
}
## Test function
myscatter(cars$speed, cars$dist)

The scatter plot is looking good but realistically there are thing you are going to want to change. For instance, the x and y labels.

## Ellipses allow for optional arugments to be passed
myscatter <- function(x, y, ...){ 
par(mar = c(4.5, 4.5, 0.5, 0.5))
plot(x, y, pch=19, cex.axis=1.3, cex.lab=1.5, frame.plot = F, ...)
}
## Continue plotting all the same
myscatter(cars$speed, cars$dist, xlab="Speed (kph)", ylab="Distance (km)")

## Ovewrite new defaults
myscatter(cars$speed, cars$dist, xlab="Speed (kph)", ylab="Distance (km)", cex.lab=1.8, cex.axis=1.1)

But I use ggplot…

source("functions.r") ## Using source to port over complex functions

## Default
ggplot(data=cars, aes(x=speed, y=dist)) + geom_point()

## With functions
ggplot(data=cars, aes(x=speed, y=dist)) + geom_point() + theme_Publication()

Functions for summaries

The reason I find functions so helpful is that they can be embeded into other vector analyses or automation tools. Calculating summary statistics or metrics can be improved vastly using functions.

Let’s explore a grazed and ungrazed system for the amount of residual dry matter (thatch) at the end of the season (50 quadrats each).

## Make up dataframe
data <- data.frame(treatment = rep(c("grazed","ungrazed"), each=50),
  biomass = c( rnorm(50, 10, 2), rnorm(50, 15, 2)))

## calculate 95% conf interval for each treatment
aggregate(biomass~treatment, data=data, FUN=se)
##   treatment   biomass
## 1    grazed 0.2335315
## 2  ungrazed 0.3343033
## Dplyr version
data %>% group_by(treatment) %>% summarize(avg=mean(biomass), error=se(biomass))
## # A tibble: 2 x 3
##   treatment   avg error
##   <fct>     <dbl> <dbl>
## 1 grazed     10.4 0.234
## 2 ungrazed   14.6 0.334
## Getting fancy
data %>% group_by(treatment) %>% summarize(avg=mean(biomass), error=se(biomass), lower95=avg+error*1.96, upper95=avg+error*1.96)
## # A tibble: 2 x 5
##   treatment   avg error lower95 upper95
##   <fct>     <dbl> <dbl>   <dbl>   <dbl>
## 1 grazed     10.4 0.234    10.8    10.8
## 2 ungrazed   14.6 0.334    15.2    15.2

What if we wanted to make a function that calculates all this for us rather than doing it by hand.

summarystat <- function(x){
  avg <- mean(x)
  error <- se(x)
  lower95 <- avg+error*1.96
  upper95 <- avg+error*1.96
  statOut <- c(avg, error, lower95, upper95)
  return(statOut) ## need to specify what you out
}
summarystat(rnorm(20, 5, 2)) ## not very informative
## [1] 3.7718051 0.4324708 4.6194478 4.6194478
summarystat <- function(x){
  avg <- mean(x)
  error <- se(x)
  lower95 <- avg+error*1.96
  upper95 <- avg+error*1.96
  statOut <- data.frame(avg, error, lower95, upper95)
  return(statOut)
}
summarystat(rnorm(20, 5, 2)) ## much better
##        avg     error  lower95  upper95
## 1 5.084034 0.4298999 5.926638 5.926638

Integrating functions that produce unequal outputs can be difficult. Vector analyses sometimes don’t appreciate that.

data %>% group_by(treatment) %>% summarize(summarystat(biomass)) ## error
data %>% group_by(treatment) %>% do(summarystat(.$biomass)) ## error
## # A tibble: 2 x 5
## # Groups:   treatment [2]
##   treatment   avg error lower95 upper95
##   <fct>     <dbl> <dbl>   <dbl>   <dbl>
## 1 grazed     10.4 0.234    10.8    10.8
## 2 ungrazed   14.6 0.334    15.2    15.2

Final exploration using the apply family. Apply allows you to iterate across data. This could be up-down, left-right in dataframes or it can across lists, raster stacks, etc etc. We are going to use a simple example of top-down with our function.

datacomp <- data.frame(species = rep(c("Plantago","Poa"), each=50),
                       withNeighbour=rnorm(50,5,1),
                       alone = rnorm(50, 7,2))

head(datacomp)
##    species withNeighbour     alone
## 1 Plantago      4.968000 10.005864
## 2 Plantago      5.005805  9.961258
## 3 Plantago      5.191624  8.247742
## 4 Plantago      7.283433  9.476809
## 5 Plantago      5.790685  5.249817
## 6 Plantago      4.004638  9.431161
zscore <- function(x){
  x - mean(x)/sd(x)
}

apply(datacomp[,2:3], 2, zscore)
##        withNeighbour       alone
##   [1,]    0.04707497  7.02389437
##   [2,]    0.08488023  6.97928752
##   [3,]    0.27069910  5.26577148
##   [4,]    2.36250820  6.49483859
##   [5,]    0.86976057  2.26784677
##   [6,]   -0.91628669  6.44919046
##   [7,]   -1.44116872  3.63461731
##   [8,]    0.45121551 -1.04287977
##   [9,]   -0.49093861  6.19114459
##  [10,]   -0.96760406  3.12465406
##  [11,]   -1.08487837  7.82438426
##  [12,]    0.55659194  3.99253962
##  [13,]    0.60448782  2.87293090
##  [14,]    1.00849969  1.73820772
##  [15,]   -0.59380640  6.15283919
##  [16,]    0.33091721  7.29867742
##  [17,]   -0.04199374  6.98039289
##  [18,]    0.53114494  0.44569424
##  [19,]   -1.41533272  5.64769648
##  [20,]   -0.73005586  3.46479187
##  [21,]   -0.06836939  4.83052099
##  [22,]    0.62026609  3.05612124
##  [23,]   -0.83726592 -0.94412524
##  [24,]   -0.64301053  4.45055774
##  [25,]    0.16369918  3.88469610
##  [26,]    0.05177801  4.96165364
##  [27,]   -0.06344578  1.09683140
##  [28,]   -0.18958630  4.64836499
##  [29,]    1.89595478  0.51660520
##  [30,]    0.54159359  9.04762448
##  [31,]   -0.07332334  5.72463860
##  [32,]   -1.66600124  5.78978381
##  [33,]    0.09117839  3.09440518
##  [34,]    1.14447103  5.41278968
##  [35,]   -1.22955475  5.08865881
##  [36,]    0.19182237  2.38487064
##  [37,]    0.66129242  6.09478618
##  [38,]    0.25934441  1.65601414
##  [39,]   -0.69398360  4.90886299
##  [40,]   -0.36334554  3.21527973
##  [41,]    1.78927842  3.71864365
##  [42,]   -0.43858697  5.85924017
##  [43,]   -2.23124891  2.91175072
##  [44,]    0.07654003  1.91966318
##  [45,]    1.25540415  1.70291133
##  [46,]    2.86097520  4.64520712
##  [47,]   -0.22621446  1.72593690
##  [48,]   -1.01490111  3.66384585
##  [49,]   -0.32451912  0.05303566
##  [50,]   -1.02792475  0.95288311
##  [51,]    0.04707497  7.02389437
##  [52,]    0.08488023  6.97928752
##  [53,]    0.27069910  5.26577148
##  [54,]    2.36250820  6.49483859
##  [55,]    0.86976057  2.26784677
##  [56,]   -0.91628669  6.44919046
##  [57,]   -1.44116872  3.63461731
##  [58,]    0.45121551 -1.04287977
##  [59,]   -0.49093861  6.19114459
##  [60,]   -0.96760406  3.12465406
##  [61,]   -1.08487837  7.82438426
##  [62,]    0.55659194  3.99253962
##  [63,]    0.60448782  2.87293090
##  [64,]    1.00849969  1.73820772
##  [65,]   -0.59380640  6.15283919
##  [66,]    0.33091721  7.29867742
##  [67,]   -0.04199374  6.98039289
##  [68,]    0.53114494  0.44569424
##  [69,]   -1.41533272  5.64769648
##  [70,]   -0.73005586  3.46479187
##  [71,]   -0.06836939  4.83052099
##  [72,]    0.62026609  3.05612124
##  [73,]   -0.83726592 -0.94412524
##  [74,]   -0.64301053  4.45055774
##  [75,]    0.16369918  3.88469610
##  [76,]    0.05177801  4.96165364
##  [77,]   -0.06344578  1.09683140
##  [78,]   -0.18958630  4.64836499
##  [79,]    1.89595478  0.51660520
##  [80,]    0.54159359  9.04762448
##  [81,]   -0.07332334  5.72463860
##  [82,]   -1.66600124  5.78978381
##  [83,]    0.09117839  3.09440518
##  [84,]    1.14447103  5.41278968
##  [85,]   -1.22955475  5.08865881
##  [86,]    0.19182237  2.38487064
##  [87,]    0.66129242  6.09478618
##  [88,]    0.25934441  1.65601414
##  [89,]   -0.69398360  4.90886299
##  [90,]   -0.36334554  3.21527973
##  [91,]    1.78927842  3.71864365
##  [92,]   -0.43858697  5.85924017
##  [93,]   -2.23124891  2.91175072
##  [94,]    0.07654003  1.91966318
##  [95,]    1.25540415  1.70291133
##  [96,]    2.86097520  4.64520712
##  [97,]   -0.22621446  1.72593690
##  [98,]   -1.01490111  3.66384585
##  [99,]   -0.32451912  0.05303566
## [100,]   -1.02792475  0.95288311