5 Patterns of microbial community composition
5.1 Oceans
What factors explain the microbial community compositions in different environmental samples? We will now return to our six Tara Ocean stations (coral and non-coral), and quantify the differences in microbial communities between the stations, and cluster them based on their differences. The measure of dissimilarity we will use is Bray-Curtis, which is commonly used in ecology. You do not need to know how it works, however if you are interested in a short, easy to read summary, see here
Importantly, the measure is between 0 and 1. Identical communities would have a Bray-Curtis dissimilarity of 0, while communities which share no species would have a Bray-Curtis dissimilarity of 1.
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-6
# read data
matrix <- read.csv("data/chosen_stations_filtered_otus.tsv", header=T, sep = "\t", row.names = 1)
# view the data we are working with. The table contains read counts for each OTU in every station. Each row = OTU, each column = station.
head(matrix)
## station46 station49 station121 station45 station50 station126
## otu_1 160143 191068 63806 165918 48066 29592
## otu_2 505 651 322 1079 446 690
## otu_3 14895 10798 84622 37216 63754 51660
## otu_4 172454 20426 38846 188886 31332 467641
## otu_5 36236 12180 126136 70310 37317 213247
## otu_6 114638 64892 12028 187475 116723 7397
# transpose data so that rows and columns are switched
matrix_t <- t(matrix)
# normalise data
matrix_t_norm <- decostand(matrix_t, method = "total")
# calculate Bray-Curtis distance and cluster the stations
matrix.bc.dist <- vegdist(matrix_t_norm, method = "bray")
matrix.bc.clust <- hclust(matrix.bc.dist, method = "average")
plot(matrix.bc.clust, ylab = "Bray-Curtis dissimilarity")
Questions:
31. What explains the clustering that we see?
5.2 Comparing oceans, lakes, and soils
Finally, we are going to zoom out and compare the broad microbial communities from ocean, lake, and soil samples. To get an overview of the communities, we will phylogenetically place the metabarcoding sequences onto a reference phylogeny as we did for Symbiodinium previously. This time however, we will place OTUs on a global eukaryotic phylogeny with 155 annotated, reference 18S sequences (these sequences are derived from the PR2 18S database) (Figure 8). Note that because this tree is based on 18S sequences only, it is not entirely consistent with what we know from phylogenomics. However, this does not impact phylogenetic placement.
The three datasets we will be comparing today are the Malaspina dataset (a global marine expedition like the Tara Oceans), a freshwater dataset from Lake Baikal, and a dataset from Neotropical forest soils. We’re opting for the Malapspina dataset instead of the Tara because like the soil and freshwater data, it covers the longer, V4 region instead of the V9 region.
Let’s have a cursory look at how many OTUs are in each dataset (NB: normally, we would normalize data first before comparisons, but for the purpose of this lab, it is okay to proceed with un-normalised data.)
Marine - Malaspina
## 8217
Neotropical forest soils
## 28474
Freshwater - lake Baikal
## 1125
The phylogenetic placement has already been carried out. Download the placement files from GitHub. To do so, first download the GitHub repository as a zip file: i.e. Code (in top-right corner in green) –> Download ZIP. After saving, unzip the folder and navigate to Metabarcoding_lab-master –> placement_files. You will see three files with the extension .jplace.
Now for the fun part! We will use the interactive Tree of Life website to visualize the difference between these three environments. Simply load each .jplace file (using the button “Choose file”, selecting the file, and click “Upload”). I recommend using three tabs so you can view all files at the same time.
You will now see a phylogenetic tree and a control panel. Select the following options in the control panel:
- Mode options –> Branch lengths –> Ignore (this will make it easier to view the placements)
- Label options –> Font style –> 26 px (or higher so you can view the tip labels)
- Datasets –> Phylogenetic placements –> On
You will now see bubbles on the tree branches. Each bubble represents OTU placements on a particular branch. The bigger the bubble, the more OTUs belong to that particular lineage!
Questions:
32. What are the most dominant lineages in soils and freshwater samples?
33. In which environment are fungi the most dominant?
34. Which lineages are dominant in the oceans, but absent in freshwater and soils?
35. Are there lineages present in freshwater/soils but absent in oceans? Which ones?
36. List any other observations you find interesting!
5.3 Fin
Discuss your answers with your TA :)