Reducing dimensionality

3.0 What is multivariate data

Multivariate data is what occurs when you have multiple response or predictor variables. Multivariate data is typically in the “wide” data format and can resemble items such as a list of species, soil chemistry variables, or plant traits. As ecologists trying to understand these variables, we need to minimize the dimensionality of these variables. The reason being is that we struggle to understand trends beyond 3-dimensions.

Imagine plotting the abundance of one species against another. Then imagine adding a third species to that graph in the form of another axis. This might still make sense, now imagine adding a four axis, what does that look like? Imagine adding 35 axes. Because it is impossible to comprehend a multi-dimensional structure that comprises our multivariate data, we use data reduction techniques to put this information into two dimensions.

3.1 Relevance of ecological indices

Ecological indices are an approach to simplify mutivariate data into univariate data to allow for easier analysis or description. Indices as a result have become incredibly popular for ecologists as a way to reduce the multi-dimensionality of ecosystems into something that is more manageable. One commonly described index is Shannon-Weaver’s Diversity Index that produces a single value based on the relative abundances of many species. These indices have arguable benefits and limitations. With multivariate analyses becoming easier to conduct, there has been a departure from the majority of these indices. However, ecological indices are not going away just yet. Particularly, because science has a component of communication and making it easier to describe trends has obvious benefits. In other cases, indices can convey more information than the species identity on its own.

Let’s begin with a familiar one, Shannon-weaver’s. This requires us to load the vegan library that is responsible for the majority of multivariate analyses in ecology.

library(LearnCommAnalysis)
library(vegan)


## check to see where the species list are
head(multivar)

##   year Region Site Elevation Rep Microsite S.barbatus B.rubens
## 1 2016 Mojave  2.5      1100   1     Shrub          1       12
## 2 2016 Mojave  2.5      1100   1      Open          3        9
## 3 2016 Mojave  2.5      1100   2     Shrub          0       15
## 4 2016 Mojave  2.5      1100   2      Open          0        3
## 5 2016 Mojave  2.5      1100   3     Shrub          2        3
## 6 2016 Mojave  2.5      1100   3      Open          0        2
##   E.cicutarium B.hordeaceus P.secunda D.kelloggii P.ovata C.barbigera
## 1            0            0         0           0       0           0
## 2            0            0         0           0       0           0
## 3            0            0         0           0       0           0
## 4            0            0         0           0       0           0
## 5            0            0         0           0       0           1
## 6            0            0         0           0       0           0
##   L.gracilis C.exserta A.wrangelianus D.capitatum Pectocarya.spp.
## 1          0         0              0           0               3
## 2          0         0              0           0               2
## 3          0         0              0           0               1
## 4          0         0              0           0               0
## 5          0         0              0           0               5
## 6          0         0              0           0               0
##   E.penduliflora A.tessellata A.adscensionis C.fremontii E.minutiflora
## 1              0            0              0           0             0
## 2              0            0              0           0             0
## 3              0            0              0           0             0
## 4              0            0              0           0             0
## 5              0            0              0           0             0
## 6              0            0              0           0             0
##   P.membranaceum L.lasiocarpa M.glabrata R.neomexicana E.eremicum C.rigida
## 1              0            0          0             0          0        0
## 2              0            0          0             0          0        0
## 3              0            0          0             0          0        0
## 4              0            1          0             0          0        0
## 5              0            0          0             0          0        0
## 6              0            0          0             0          0        0
##   P.crenulata E.wallacei C.brevipes C.claviformis Erinoginum.spp.
## 1           0          0          0             0               0
## 2           0          0          0             0               0
## 3           0          0          0             0               0
## 4           0          0          0             0               2
## 5           0          0          0             0               0
## 6           0          0          0             0               0
##   M.bellioides E.exilis C.lasiophyllus L.setosissima A.tuberosa B.nigria
## 1            0        0              0             0          0        0
## 2            1        0              0             0          0        0
## 3            2        0              0             0          0        0
## 4            0        0              0             0          0        0
## 5            0        0              0             0          0        0
## 6            0        0              0             0          0        0
##   N.demissum A.lentiginosus A.grandiflora H.vulgare C.intermedia
## 1          0              0             0         0            0
## 2          0              0             0         0            0
## 3          0              0             0         0            0
## 4          0              0             0         0            0
## 5          0              0             0         0            0
## 6          0              0             0         0            0

## separate species and predictor columns
spp.data <- multivar[,7:46] ## select only species
pred.data <- multivar[,1:6] ## select only predictors and identifiers

## calculate shannon index
div.data <- diversity(spp.data, index = "shannon")

## combine with predictors
div.data <- cbind(pred.data, div.data)

## see what it looks like
head(div.data)

##   year Region Site Elevation Rep Microsite  div.data
## 1 2016 Mojave  2.5      1100   1     Shrub 0.7029189
## 2 2016 Mojave  2.5      1100   1      Open 1.0775734
## 3 2016 Mojave  2.5      1100   2     Shrub 0.5566469
## 4 2016 Mojave  2.5      1100   2      Open 1.0114043
## 5 2016 Mojave  2.5      1100   3     Shrub 1.2406843
## 6 2016 Mojave  2.5      1100   3      Open 0.0000000

## example of plot
mean.div <- div.data %>% group_by(Microsite, Elevation) %>% summarize(Average.Diversity=mean(div.data))

ggplot(mean.div) + geom_bar(aes(x=Elevation, y=Average.Diversity, fill=Microsite), color="black", stat="identity", position = "dodge") + scale_fill_brewer()

Here, we have reduced the entire community composition into a single diversity index and used the mean to compare among sites. We can see that there appears to be a general trend with elevation. The sites at higher elevations tend to have larger differences between treatments.

3.2 Diversity indices

The above diversity indices are usually to estimate local diversity. What if we were interested in the diversity between sites (beta) or the overall regional diversity (gamma). We are able to calculate these through a variety of different metrics that have been explored previously. First, let’s explore beta diversity on a sample dataset using betadiver. If we do not specify the index we can the a, b, and c values for all sites. If an index is specified, then distance-based matrixes are generated using the relevant equation.

data(dune)

beta.n <- betadiver(dune)
plot(beta.n)

beta.whit <- betadiver(dune, "w")
beta.whit

##             1          2          3          4          5          6
## 2  0.33333333                                                       
## 3  0.46666667 0.20000000                                            
## 4  0.55555556 0.21739130 0.13043478                                 
## 5  0.47368421 0.25000000 0.33333333 0.33333333                      
## 6  0.50000000 0.42857143 0.42857143 0.50000000 0.12000000           
## 7  0.55555556 0.39130435 0.47826087 0.46153846 0.11111111 0.08333333
## 8  0.64705882 0.45454545 0.27272727 0.28000000 0.53846154 0.47826087
## 9  0.55555556 0.39130435 0.21739130 0.23076923 0.40740741 0.41666667
## 10 0.52941176 0.27272727 0.36363636 0.36000000 0.15384615 0.21739130
## 11 0.71428571 0.57894737 0.47368421 0.45454545 0.47826087 0.40000000
## 12 0.85714286 0.57894737 0.36842105 0.36363636 0.56521739 0.50000000
## 13 0.73333333 0.50000000 0.40000000 0.39130435 0.66666667 0.61904762
## 14 1.00000000 0.76470588 0.64705882 0.70000000 0.80952381 0.77777778
## 15 1.00000000 0.77777778 0.55555556 0.61904762 0.72727273 0.68421053
## 16 0.84615385 0.77777778 0.55555556 0.61904762 0.81818182 0.78947368
## 17 0.66666667 0.64705882 0.76470588 0.80000000 0.52380952 0.44444444
## 18 0.71428571 0.47368421 0.36842105 0.45454545 0.39130435 0.40000000
## 19 1.00000000 0.78947368 0.68421053 0.63636364 0.65217391 0.60000000
## 20 1.00000000 0.88888889 0.66666667 0.71428571 0.81818182 0.78947368
##             7          8          9         10         11         12
## 2                                                                   
## 3                                                                   
## 4                                                                   
## 5                                                                   
## 6                                                                   
## 7                                                                   
## 8  0.52000000                                                       
## 9  0.38461538 0.20000000                                            
## 10 0.20000000 0.50000000 0.52000000                                 
## 11 0.45454545 0.42857143 0.45454545 0.33333333                      
## 12 0.45454545 0.33333333 0.18181818 0.61904762 0.55555556           
## 13 0.56521739 0.27272727 0.30434783 0.63636364 0.57894737 0.26315789
## 14 0.80000000 0.47368421 0.70000000 0.78947368 0.75000000 0.62500000
## 15 0.71428571 0.30000000 0.52380952 0.70000000 0.64705882 0.52941176
## 16 0.80952381 0.30000000 0.52380952 0.80000000 0.88235294 0.52941176
## 17 0.50000000 0.78947368 0.80000000 0.47368421 0.50000000 0.87500000
## 18 0.45454545 0.52380952 0.54545455 0.23809524 0.22222222 0.66666667
## 19 0.63636364 0.61904762 0.63636364 0.61904762 0.44444444 0.55555556
## 20 0.80952381 0.40000000 0.61904762 0.80000000 0.76470588 0.64705882
##            13         14         15         16         17         18
## 2                                                                   
## 3                                                                   
## 4                                                                   
## 5                                                                   
## 6                                                                   
## 7                                                                   
## 8                                                                   
## 9                                                                   
## 10                                                                  
## 11                                                                  
## 12                                                                  
## 13                                                                  
## 14 0.52941176                                                       
## 15 0.55555556 0.20000000                                            
## 16 0.55555556 0.46666667 0.37500000                                 
## 17 0.76470588 0.85714286 0.86666667 1.00000000                      
## 18 0.68421053 0.75000000 0.64705882 0.88235294 0.62500000           
## 19 0.68421053 0.75000000 0.64705882 0.88235294 0.50000000 0.55555556
## 20 0.66666667 0.33333333 0.25000000 0.25000000 0.86666667 0.64705882
##            19
## 2            
## 3            
## 4            
## 5            
## 6            
## 7            
## 8            
## 9            
## 10           
## 11           
## 12           
## 13           
## 14           
## 15           
## 16           
## 17           
## 18           
## 19           
## 20 0.64705882

Whittaker’s index has limitations in that it increases sampling effort. Alternatives for estimating beta-diversity include dissimiliarity indices, such as Jaccard or Sorensen for presence/absence or Bray-Curtis for abundance data. These can be calculated using vegdist. More information and potential limitations of these indices can be found here

Using the raw distance based matrix though can be uninformative. Instead, we may choose to find out the average beta-diversity values among treatments. Exploring the dune dataset again but now adding different manure treatments and then follow that up with a cluster analysis to determine the treatments that were similar in composition

data(dune.env)

## join column to summarize by
dune2 <- cbind(dune.env["Manure"],dune)

dune.man <- dune2 %>% group_by(Manure) %>% summarise_all(sum) %>% data.frame(.)

BC.div <- vegdist(dune.man[,-1], method="bray") ## drop first column with treatment

clus <- hclust(BC.div, "ward.D2") ## run cluster analysis
plot(clus, label=dune.man$Manure) ## add labels for manure levels

3.3 Species accumulation curves

Rarefaction curves and species accumulation curves are two different methods for determining changes in species richness with area or survey effort. However, these two items are not the same thing! This is probably the best definition of the difference:

Rarefaction curves are useful for comparing species richness values for different sampling efforts. Rarefaction cannot be used for extrapolation as it does not provide an estimate of asymptotic richness. In addition, under-sampling which is common …, often result in an over estimate of the number of ‘rare’ species (e.g. singletons and doubletons); and the greater number of ‘rare’ species reported in a dataset the more likely it is that other species that are present have not been detected.

Using species accumulation curves it is possible to determine the heterogeneity of diversity between treatments or sites. You can also compare rarefaction curves to species accumulation curves. The difference between the two curves indicates aggregation or overdispersion of conspecific individuals. Put differently, the more the species accumulation curve deviates from the rarefaction curve, the less often individuals of the same species are found clustered together.

To examine this further, we can plot both rarefaction and species accumulation curves in vegan

data(dune)

accum1 <- specaccum(dune, method="rarefaction")
accum2 <-specaccum(dune, method="random")

plot(accum1, col="blue", ylim=c(0,30), ci.type="polygon", ci.col="#00FFFF40")
plot(accum2, add=T, col="orange", ci.type="polygon", ci.col="#FFA50040")

Based on these curves we can see our data approach an asymptote. There is the chance though that some species were missed from our sampling. We can estimate approximately the expected number of species that were missed. These are based on exrapolated richness values calculated by a few individuals, includes Bob O’Hara who invented the estimateR software. See also the references by Colwell & Coddington 1994 and Chiu et al. 2014

We use the function specpool to determine the number of unsampled species. There are multiple calculations that are provided based on different authors. We try it here to see the output

specpool(dune)

##     Species    chao  chao.se jack1 jack1.se    jack2    boot  boot.se  n
## All      30 32.1375 3.242503 32.85 1.645448 33.84474 31.5404 1.153121 20

Here we can see that We are missing btween 1-5 species based on the different approaches and standard error. This relatively small missing number of species is expected given we saw asymptotes in our datasets. We can try this again using different treatments. Here n represents the different number of sites.

data(dune.env)

specpool(dune, dune.env$Management)

##    Species     chao   chao.se    jack1 jack1.se    jack2     boot  boot.se
## BF      16 17.19048 1.5895675 19.33333 2.211083 19.83333 17.74074 1.646379
## HF      21 21.51429 0.9511693 23.40000 1.876166 22.05000 22.56864 1.821518
## NM      21 22.87500 2.1582871 26.00000 3.291403 25.73333 23.77696 2.300982
## SF      21 29.88889 8.6447967 27.66667 3.496029 31.40000 23.99496 1.850288
##    n
## BF 3
## HF 5
## NM 6
## SF 6

3.4 Effect-size estimates

What if we were less interested in the raw values of our diversity index and instead were more interested in the relative difference between our treatments. Imagine our earlier example of shrub effects at different elevations. What if we want to know only if the shrub has relatively higher diversity than the open microsites, rather than what that value was.

These are generally called effect-size estimates and are popular in community ecology. One of the more commonly refered effect-size estimates are Hedge’s g or Cohen’s d. Both of these are often used in effect size estimates and are ratios of means and standard deviations between two populations. In this case, the two populations would be our two treatments. However, community ecology tends to use a few others. Two that are used commonly are log-response Ratio and relative interaction index. We will introduce the latter.

The relative interaction index (RII) was created by Cristina Armas in opposition to LRR. It’s benefits are in its simplicity. It is symetrical, between -1 and 1, where -1 is a control effect and +1 is a treatment effect. RII is also bound between -1 and 1 making it comparable to other studies or responses. The formula for it is below: \[\frac{treatment - control}{treatment + control}\]

Where this shines for community ecology, is that treatment can represent the presence of competing of facilitating species, and the control is the absence of that species. Let’s go back to our previous example. We want to know if the shrub is having a facilitative (+1) or competitive (-1) effect on the entire plant community. We calculate RII using the RII function

?se
?rii.fun

## convert identifier values as factors
div.data[,1:5] <- apply(div.data[,1:5], 2, as.factor)

## Use RII function to calculate RII
rii.data <- rii.fun(div.data, "Microsite", Shrub, Open)


## see what it looks like
head(rii.data)

##    year Region Site Elevation Rep    div.data
## 1  2016 Mojave  2.5      1100   1 -0.21042183
## 3  2016 Mojave  2.5      1100   2 -0.29001436
## 5  2016 Mojave  2.5      1100   3  1.00000000
## 7  2016 Mojave  2.5      1100   4  0.28313590
## 9  2016 Mojave  2.5      1100   5  0.03178844
## 11 2016 Mojave  2.5      1100   6 -0.18866369

## example of plot
mean.rii <- rii.data %>% group_by(Elevation) %>% summarize(RII.diversity=mean(div.data),error=se(div.data))

## plot RII with confidence band
ggplot(mean.rii, aes(x=Elevation, y=RII.diversity)) + geom_point() +  geom_errorbar(aes(ymin=RII.diversity-error, ymax=RII.diversity+error), width=.2)

## Add line of significant  
ggplot(mean.rii, aes(x=Elevation, y=RII.diversity)) + geom_point() +  geom_errorbar(aes(ymin=RII.diversity-error, ymax=RII.diversity+error), width=.2)+ geom_hline(yintercept=0, linetype="dashed", color = "red", lwd=1)

Once we plot the calculated RII effect sizes and their associated error we can see that there does not appear to be a significant facilitation effect at low elevations. However, at the highest elevations, the shrub is having a positive effect.

This can appear as an over simplification of your dataset, but effect size measures such as RII are useful for relative comparisons over larger gradients. For example, imagine you measured the relative different of every plant species along an elevation gradient. Trying to detect an observable trend in the relative differences of say 20 species using a bar graph is challenging. Additionally, effect size measures can be used to compare between predictors or responses variables to determine which is more strongly effected by the treatment.

For those looking for a more robust calculation of effect sizes, you can explore bootstrapped effect sizes using the package bootES