Ordinations

3.0 What are ordinations

An ordination is a data reduction technique that uses significantly more complicated math than the above examples. The basic premise is that an ordination fits a linearly uncorrelated components through mulitvariate data to maximize the most variation explained. Similar to fitting a line-of-best-fit through a cloud of data points. There are different approaches both among ordinations and within ordinations, but conceptually this is what is occurring. Ordinations can generally be separated into three categories: unconstrained (indirect gradient analysis), constrained (direct gradient analysis), and NMDS. This website does an excellent job breaking down all the different examples of each with a short description of what is occuring.

Technically, NMDS is a type of unconstrained ordination. However, I have intentionally left it in a third category because it does something very different. NMDS calculates rank order and is a non-parametric version of other ordinations. Similar to how Kruskal-Wallis is the rank order equivalent of ANOVA. NMDS is hugely flexible and versatile, but is often criticised for not matching a distribution (again, like how Kruskal-Wallis does not follow a dsitribution) and the output may be suggested as correct, when it is not the optimal representation of data. The ecological community continues to debate this and for this exercise I will focus on principal component analysis (PCA) and redundancy analysis (RDA). However, for those interested in a great tutorial on NMDS check here.

3.1 PCA and CCA

PCA is a indirect gradient analysis. Indirect, in that we are trying to determine a principal component (line of best fit) through the data that explains the most variation but not attribute that component to anything. We use the dune dataset within vegan as an example.

library(vegan)
library(LearnCommAnalysis)

## Load data
data(dune)

## conduct pca
pca <- rda(dune)

## see output variables of PCA
summary(pca)

## 
## Call:
## rda(X = dune) 
## 
## Partitioning of variance:
##               Inertia Proportion
## Total           84.12          1
## Unconstrained   84.12          1
## 
## Eigenvalues, and their contribution to the variance 
## 
## Importance of components:
##                           PC1     PC2     PC3     PC4    PC5     PC6
## Eigenvalue            24.7953 18.1466 7.62913 7.15277 5.6950 4.33331
## Proportion Explained   0.2947  0.2157 0.09069 0.08503 0.0677 0.05151
## Cumulative Proportion  0.2947  0.5105 0.60115 0.68618 0.7539 0.80539
##                           PC7     PC8    PC9    PC10    PC11    PC12
## Eigenvalue            3.19936 2.78186 2.4820 1.85377 1.74712 1.31358
## Proportion Explained  0.03803 0.03307 0.0295 0.02204 0.02077 0.01561
## Cumulative Proportion 0.84342 0.87649 0.9060 0.92803 0.94880 0.96441
##                          PC13     PC14     PC15     PC16     PC17     PC18
## Eigenvalue            0.99051 0.637794 0.550827 0.350584 0.199556 0.148798
## Proportion Explained  0.01177 0.007582 0.006548 0.004167 0.002372 0.001769
## Cumulative Proportion 0.97619 0.983768 0.990316 0.994483 0.996855 0.998624
##                           PC19
## Eigenvalue            0.115753
## Proportion Explained  0.001376
## Cumulative Proportion 1.000000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:  6.322924 
## 
## 
## Species scores
## 
##                PC1      PC2       PC3       PC4      PC5       PC6
## Achimill -0.603786  0.12392  0.008464  0.159574  0.40871  0.127857
## Agrostol  1.373953 -0.96401  0.166905  0.266466 -0.08765  0.047368
## Airaprae  0.023415  0.25078 -0.194768 -0.326043  0.05574 -0.079619
## Alopgeni  0.531234 -1.42784 -0.505241 -0.042885 -0.44293  0.278566
## Anthodor -0.559138  0.56761 -0.476205  0.015781  0.34408 -0.135783
## Bellpere -0.333560 -0.18881  0.140638 -0.084177  0.12541  0.134771
## Bromhord -0.523468 -0.19656  0.164222  0.005671  0.38612  0.257634
## Chenalbu  0.017494 -0.05462 -0.055349 -0.010582  0.02664  0.016405
## Cirsarve  0.002398 -0.10237  0.063716 -0.048735 -0.03212 -0.036055
## Comapalu  0.168933  0.10522  0.063625  0.052352  0.13056  0.108129
## Eleopalu  1.278257  0.21782  0.469213  0.667986  0.20877  0.189927
## Elymrepe -0.450692 -0.80310  0.340783 -0.243514  0.25145 -0.691715
## Empenigr  0.014054  0.10956 -0.099378 -0.161788 -0.02289 -0.001195
## Hyporadi -0.014612  0.42079 -0.223096 -0.535685 -0.10309 -0.025185
## Juncarti  0.679423 -0.07604  0.243642  0.310903 -0.08877 -0.248736
## Juncbufo  0.065583 -0.45959 -0.548944 -0.018900 -0.10571 -0.087168
## Lolipere -1.455985 -0.39306  1.013109  0.170493 -0.52430  0.114397
## Planlanc -0.913938  0.55455 -0.244341  0.617631 -0.12494 -0.098047
## Poaprat  -0.899147 -0.55712  0.542805 -0.042467 -0.27815 -0.026353
## Poatriv  -0.756003 -1.56056 -0.480385  0.351099  0.36641  0.044066
## Ranuflam  0.625121  0.06099  0.124760  0.233953  0.13645  0.087328
## Rumeacet -0.582581  0.06663 -0.574256  0.775879 -0.08772 -0.361433
## Sagiproc  0.156823 -0.42388 -0.331722 -0.454322 -0.43262  0.037181
## Salirepe  0.293607  0.45555 -0.023780 -0.196209 -0.20176 -0.097569
## Scorautu -0.453771  0.39268 -0.212281 -0.382424 -0.27635  0.395164
## Trifprat -0.417853  0.16572 -0.234524  0.570030 -0.09646 -0.128045
## Trifrepe -0.581801 -0.02115 -0.167299  0.196535  0.18714  0.928758
## Vicilath -0.106710  0.11571  0.092827 -0.055592 -0.15433  0.129733
## Bracruta  0.148626  0.47690 -0.168758  0.509177 -0.96307  0.029481
## Callcusp  0.538513  0.17963  0.175086  0.238876  0.25531  0.169209
## 
## 
## Site scores (weighted sums of species scores)
## 
##         PC1     PC2     PC3     PC4      PC5      PC6
## 1  -0.85678 -0.1724  2.6079 -1.1296  0.45074 -2.49113
## 2  -1.64477 -1.2299  0.8867 -0.9859  2.03463  1.81057
## 3  -0.44010 -2.3827  0.9297 -0.4601 -1.02783 -0.05183
## 4   0.04795 -2.0463  1.2737 -0.9742 -0.64210 -0.72074
## 5  -1.62445  0.2900 -1.5927  1.5398  1.86008 -2.21191
## 6  -1.97427  1.0802 -1.1501  3.3534 -1.52026  0.03127
## 7  -1.79263  0.3220 -0.2200  1.4714  0.01245 -0.42583
## 8   0.88980 -1.0905  0.9250  0.5165 -1.08897  0.94777
## 9   0.00904 -1.6570 -0.4661 -0.2826 -0.10821 -2.16570
## 10 -1.91463  0.4940  0.7058  0.2676  1.36985  2.62386
## 11 -1.04110  1.2081  1.4203 -0.9566 -2.71745  1.11200
## 12  1.01822 -1.4598 -3.2509 -0.3247 -1.75331  1.01550
## 13  0.69939 -2.1837 -2.2128 -0.4231  1.06502  0.65585
## 14  1.49047  0.9772  0.5447  0.2733  2.38875  2.47896
## 15  1.88644  1.1261  0.7271  0.7732  0.22113 -0.31750
## 16  2.84848 -0.2081  0.7041  2.1012  0.29311 -0.08124
## 17  0.04666  1.7279 -0.9135 -1.6663  1.80070 -1.55572
## 18 -0.26936  1.7157  0.1648 -0.5770 -2.10498  0.33880
## 19  0.28094  2.1901 -1.9865 -3.2341 -0.45760 -0.02389
## 20  2.34069  1.2991  0.9029  0.7178 -0.07573 -0.96909

## plot PCA
biplot(pca)

We used the function rda from vegan because it calculates a PCA when no constraining factor is specified. We use this instead of the base R version of PCA princomp because it is less versatile and because the output from the rda function has more options that can be used in other analyses.

Here PCA1 and PCA2 represents our indirect gradients that determine these species compositions. We are unsure what the mechanism behind these components are and that can be better understood in the constrained ordinations later. First, we check our ordination to make sure PCA1 and PCA2 are the best representations of our data. From the summary we can see that 29.5% and 21.6% of the variation is explained in those two axes, whereas the third only explains 9%. This suggests the first two axes are the best representation of our data. Finally, we look at the plot and use the clustering of species together to determine trends. Species that are parallel are correlated, species that are perpendicular are uncorrelated, and species that are opposite are negatively correlated.

If we are working with species abundance data, it is commonly recommended to use Hellinger transformation. Hellinger has a few advantages including not giving rare species high weight and low sensitivity to double zeros. The double zero problem is when two sites mutually are without a certain species and thus are sorted as more similar because of this share dissimilarity. The work of Legendre does a great job of explaining when transformations of ordinations are required.

Let’s transform our data and present it again.

## Load data
dune.hell <- decostand(dune, method="hellinger")

## conduct pca
pca <- rda(dune.hell)

## see output variables of PCA
summary(pca)

## 
## Call:
## rda(X = dune.hell) 
## 
## Partitioning of variance:
##               Inertia Proportion
## Total          0.5559          1
## Unconstrained  0.5559          1
## 
## Eigenvalues, and their contribution to the variance 
## 
## Importance of components:
##                          PC1    PC2     PC3     PC4     PC5     PC6
## Eigenvalue            0.1699 0.1111 0.05634 0.04397 0.04097 0.03019
## Proportion Explained  0.3057 0.1998 0.10135 0.07910 0.07371 0.05431
## Cumulative Proportion 0.3057 0.5054 0.60679 0.68590 0.75960 0.81391
##                           PC7     PC8     PC9    PC10     PC11    PC12
## Eigenvalue            0.02104 0.01794 0.01379 0.01090 0.009991 0.00912
## Proportion Explained  0.03785 0.03226 0.02481 0.01961 0.017972 0.01641
## Cumulative Proportion 0.85177 0.88403 0.90885 0.92846 0.946428 0.96283
##                           PC13     PC14     PC15     PC16     PC17
## Eigenvalue            0.006376 0.005166 0.002942 0.002745 0.001651
## Proportion Explained  0.011469 0.009293 0.005292 0.004939 0.002969
## Cumulative Proportion 0.974303 0.983597 0.988888 0.993827 0.996796
##                           PC18      PC19
## Eigenvalue            0.001138 0.0006432
## Proportion Explained  0.002047 0.0011570
## Cumulative Proportion 0.998843 1.0000000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:  1.80276 
## 
## 
## Species scores
## 
##                 PC1       PC2       PC3       PC4        PC5        PC6
## Achimill -0.2241355  0.057048 -0.068674 -0.173281 -0.0498964 -0.0006022
## Agrostol  0.4142446 -0.180068  0.020078 -0.020756  0.0029054  0.0193453
## Airaprae -0.0350694  0.154644  0.116317 -0.039999 -0.1384374 -0.0376930
## Alopgeni  0.1386550 -0.318444  0.186419  0.001329  0.0030171 -0.0215368
## Anthodor -0.1962910  0.238579  0.095768 -0.179434 -0.0380470 -0.0471885
## Bellpere -0.1148332 -0.054380 -0.057640  0.064957  0.0010453  0.0896219
## Bromhord -0.1410948 -0.053134 -0.049586 -0.057365  0.0210053  0.0938974
## Chenalbu  0.0121539 -0.025919  0.036102 -0.015353 -0.0038179  0.0173353
## Cirsarve -0.0026599 -0.031975  0.004872  0.021813 -0.0182940  0.0064877
## Comapalu  0.1100812  0.048635 -0.061981 -0.028257  0.0102828  0.1064664
## Eleopalu  0.3796078  0.069099 -0.185862 -0.064110  0.0144632  0.0035959
## Elymrepe -0.1456860 -0.248314 -0.094249  0.010195 -0.1482516 -0.0563573
## Empenigr  0.0004895  0.057213  0.063829  0.026314 -0.0393467 -0.0066958
## Hyporadi -0.0559119  0.197829  0.139068  0.036814 -0.1427093 -0.0326104
## Juncarti  0.2478729 -0.012497 -0.092252  0.005525  0.0265396 -0.1569375
## Juncbufo  0.0190348 -0.121457  0.179346 -0.051491  0.0931930 -0.0211675
## Lolipere -0.3366259 -0.173707 -0.206664  0.154192  0.0092155 -0.0325721
## Planlanc -0.2547353  0.197554 -0.034482 -0.039980  0.1469090 -0.0256698
## Poaprat  -0.2808972 -0.142814 -0.082463  0.084918 -0.0502270 -0.0278725
## Poatriv  -0.1317430 -0.363370  0.059442 -0.137436  0.0581366 -0.0136469
## Ranuflam  0.2783344  0.024399 -0.081013 -0.052662  0.0009058  0.0274683
## Rumeacet -0.1079283 -0.024094  0.054027 -0.122507  0.2108960 -0.0801798
## Sagiproc  0.0409392 -0.082141  0.246528  0.127747 -0.0148906 -0.0086761
## Salirepe  0.0651846  0.156446  0.007210  0.136628 -0.0166452 -0.0581183
## Scorautu -0.0562239  0.158686  0.094988  0.095244  0.0265794  0.1029567
## Trifprat -0.1001443  0.028383 -0.024352 -0.095019  0.1389927 -0.0417313
## Trifrepe -0.0471194 -0.006999  0.052812  0.045812  0.1395303  0.2537096
## Vicilath -0.0541878  0.052774 -0.023104  0.114929  0.0381791  0.0343033
## Bracruta  0.0852752  0.107955 -0.003783  0.181534  0.2198174 -0.1398609
## Callcusp  0.2045115  0.057120 -0.104072 -0.065216 -0.0146101  0.0579472
## 
## 
## Site scores (weighted sums of species scores)
## 
##          PC1      PC2        PC3      PC4       PC5       PC6
## 1  -0.423423 -0.42754 -6.533e-01  0.01583 -0.761705 -0.463469
## 2  -0.356333 -0.36816 -1.991e-01 -0.08370 -0.355682  0.630146
## 3  -0.080900 -0.55215 -5.753e-02  0.27275 -0.217514  0.016724
## 4  -0.041005 -0.49292  7.510e-02  0.33626 -0.282018  0.100013
## 5  -0.458003  0.05220 -1.432e-01 -0.54110  0.351891 -0.117334
## 6  -0.389342  0.20337 -1.022e-01 -0.30549  0.758789 -0.245696
## 7  -0.451813  0.06864 -6.835e-02 -0.41821  0.585536 -0.138732
## 8   0.295113 -0.29375 -6.362e-02  0.19828  0.010903 -0.185034
## 9   0.035973 -0.49264  2.714e-01  0.07945  0.086166 -0.419257
## 10 -0.453094  0.13639 -2.391e-01 -0.12201  0.144355  0.462946
## 11 -0.273214  0.29633  3.931e-05  0.87663  0.126762  0.097096
## 12  0.246622 -0.33173  9.205e-01 -0.03527  0.493526 -0.017653
## 13  0.226908 -0.48390  6.740e-01 -0.28663 -0.071279  0.323642
## 14  0.570818  0.25288 -3.165e-01 -0.32869 -0.085439  1.196310
## 15  0.654415  0.28845 -3.733e-01  0.01035  0.196968  0.002255
## 16  0.720030 -0.10689 -2.727e-01 -0.35992  0.007984 -0.504518
## 17 -0.317465  0.75272  3.395e-01 -0.64284 -0.803236 -0.262492
## 18 -0.201131  0.39818 -2.007e-01  0.89875  0.365682  0.086314
## 19  0.006263  0.73205  8.167e-01  0.33669 -0.503443 -0.085674
## 20  0.689579  0.36848 -4.077e-01  0.09888 -0.048247 -0.475587

## plot PCA
biplot(pca)

We have improved the amount of variation explained and centralized the biplot. Overall, this is an improvement and we know the transformation has address underlying issues with the analysis.

The next thing we want to do is determine whether we have captured the shape of the gradient. A PCA assumes a linear gradient, whereas a correspondance analysis (CA) assumes a unimodal relationship. Over larger gradients it is assumed that the species data will resemble a unimodal relationship. We can check this using a detrended correspondance analysis. A detrended correspondance analysis (DCA) can be used as its own form of analysis but it is another topic of debate like NMDS. However, DCA tends to be argued against more often than in favour. For the purposes of determining gradient shape, it is perfect.

## conduct DCA
dca <- decorana(dune.hell)
dca

## 
## Call:
## decorana(veg = dune.hell) 
## 
## Detrended correspondence analysis with 26 segments.
## Rescaling of axes with 4 iterations.
## 
##                   DCA1   DCA2    DCA3    DCA4
## Eigenvalues     0.5249 0.3042 0.14047 0.08709
## Decorana values 0.5444 0.2830 0.08498 0.04289
## Axis lengths    3.5605 2.9874 1.50175 1.13223

The output of our dca suggests that our axis lengths for the first axes is 3.56. This correspond to the standard deviation of that components and suggest there could be unimodal distribution in the data. The approximate evaluation is as follows: PCA & RDA <3 and 4< CA & CCA

Our number is in between suggesting that both linear and unimodal approximations of the data would be acceptable.

Illustration of the rule how to select whether to use linear ordination methods (like PCA or RDA) or unimodal (CA, DCA or CCA) on the data. Upper diagram shows simulated community structured by a single environmental gradient, with number of species response curves. The diagram below shows the relationship between the length of the gradient sampled in the simulated community on the //x/-axis (in arbitrary units), and length of the first DCA ordination axies (in units of S.D.). The dataset which according to DCA is rather homogeneous (< 3 S.D.) has environmental gradient up to 2000 units long; the longer gradient results into heterogeneous dataset for which linear methods are not suitable. OP of figure & caption

3.2 Testing the gradient

There are two common uses of an ordination. The first is to observe trends in the similarities among sites and species. The second is to utilize the data reduction technique to correlate the gradient among species and sites to other gradients.

We are going to load the varechem and varespec datasets from vegan. We then extract the scores from the ordination and compare them to environmental variables.

data(varechem)
data(varespec)

## transform data
species <- decostand(varespec, method="hellinger")

## create ordination
pca <- rda(species)

## extract the scores from the ordination
score.vals <- scores(pca)$sites

## compare 
cor(score.vals, varechem)

##             N          P          K         Ca         Mg           S
## PC1 0.1880740  0.2204900  0.2784250  0.3521916  0.3175540  0.03184488
## PC2 0.3959471 -0.4827799 -0.3678565 -0.4714881 -0.3989006 -0.34226350
##             Al         Fe         Mn         Zn         Mo   Baresoil
## PC1 -0.6791178 -0.6232759  0.5944998  0.2836699 -0.1433314 0.50544429
## PC2  0.1877119  0.2188943 -0.3322152 -0.3267015  0.1709478 0.05213326
##      Humdepth          pH
## PC1  0.570643 -0.50455541
## PC2 -0.376158  0.04775287

3.3 Redundancy analysis

Finally let’s do a contrained ordination, RDA. A constrained ordination is a direct graident analysis because you suggesting what the composition of the components are that describe the data. Instead of the PC1 representing the x-axis, we will have RDA1 that is a combination of the predictor variables that we include. We can take a look at what this means.

## load soil chem variables
data(varechem)

## conduct RDA
rda <- rda(species, varechem)

## output
rda

## Call: rda(X = species, Y = varechem)
## 
##               Inertia Proportion Rank
## Total          0.3647     1.0000     
## Constrained    0.2551     0.6995   14
## Unconstrained  0.1096     0.3005    9
## Inertia is variance 
## 
## Eigenvalues for constrained axes:
##    RDA1    RDA2    RDA3    RDA4    RDA5    RDA6    RDA7    RDA8    RDA9 
## 0.11145 0.06171 0.02137 0.01843 0.01370 0.00787 0.00677 0.00460 0.00246 
##   RDA10   RDA11   RDA12   RDA13   RDA14 
## 0.00206 0.00154 0.00128 0.00124 0.00064 
## 
## Eigenvalues for unconstrained axes:
##     PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8     PC9 
## 0.04830 0.02129 0.01089 0.01025 0.00738 0.00559 0.00304 0.00194 0.00090

anova(rda)

## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(X = species, Y = varechem)
##          Df Variance      F Pr(>F)  
## Model    14  0.25511 1.4968   0.06 .
## Residual  9  0.10957                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## plot output
plot(rda)

There are two quick ways to determine if the RDA accurately describes the data. The key word in RDA is redundancy, implying that your gradient is essentially described twice. Once by the species data and once by the environmental variables. The more similar those describe the same trend, the greater redundancy and the better the ordination. Here, the eigenvalues for the constrained axes are about 40% of the unconstrained values that suggests there is some degree of redundancy, which is good. When we conduct a permutation test on the RDA we find no significance.

The p value being close to 0.05 and the relatively high level of redundancy suggest that these nutrient variables are not accurately explaining all the variation in our dataset. When we look at the plot, we can see that RDA1 is comprised mostly of the metals. RDA2 instead appears to be described by other nutrients such as nitrogen and phosphorus. As it stands now though, we cannot present this dataset because of the poor explanation of variation and high p.value.

3.4 Cluster Analysis

A cluster analysis groups together different sites or plots, typically by dissimiliarity. One of the more widely accepted methods is using the squared version of Ward’s method. Ward’s method uses a minimum variance criterion that minimizes the total within-cluster variance (see for more Murtagh and Legendre 2014).

We again calculate dissimiliarity indices using the Bray-Curtis method. On our initial plotting, it can be seen that are two general catgorizations of the sites. We can cut this dendrogram into separate groups either arbitrary based on the number of groups or by the height of the dendrogram.

## calculate distances
dis <- vegdist(species, method="bray")

## cluster analysis
clus <- hclust(dis, "ward.D2")

## plot cluster
plot(clus)

## cut the tree based on a specific height
grp <- cutree(clus, h=0.7)

## cut the tree into four clusters
grp <- cutree(clus, 4)

We can plot the dendrogram to be more aesthetically pleasing using the cluster and factoextra packages. Although this is not necessary. Here we can clearly highlight the clustered groups that we generated.

library(cluster)
library(factoextra)

fviz_dend(clus, k = 4, # Cut in four groups
          cex = 0.7, cex.axis=2,  # label size
          color_labels_by_k = TRUE, # color labels by groups
          rect = TRUE # Add rectangle around groups
          )

The next question is do these cluster map on to a treatment or measured variate. We can compare the clustered groups against different environmental variables. Let’s compare the measured nitrogen against the groups that were generated. We can also common parametric analyses, such as ANOVAs, to determine if there are statistical differences among the groups.

## Compare environmental variables to cluster groups
boxplot(varechem$N ~ grp)

## Statistically compare groups
m1 <- aov(varechem$N ~ grp)
summary(m1)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## grp          1  118.2  118.24   4.449 0.0465 *
## Residuals   22  584.6   26.57                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Finally, if these clusters make sense ecologically, are statistically different, and answering a research question, it makes sense to plot these clusters on the ordination. These ordinations can be added using the ordiellipse function and can be used to determine the species that belong in the clusters.

## conduct ordination
ord <- rda(species)

## calculate priority
spp.priority <- colSums(species)

## plot ordination with clusters
plot(ord, type="n")
ordiellipse(ord, grp, lty = 2, col = "grey80", draw="polygon", alpha=150)
orditorp(ord, display = "species", cex = 0.7, col = "darkorange3", priority=spp.priority, air=0.8)
orditorp(ord, display = "sites", cex = 0.7, col = "darkslateblue", air=0.1)