1 Before you start the lab

This lab will introduce you to metabarcoding. Investigating this “hidden diversity” often relies heavily on bioinformatics. Here, most of the analyses have already been carried out for you, and the focus is on interpreting the results of the analyses. All graphs in this practical have been generated in R. The code is visible for those interested (don’t worry if it does not make sense! The goal here is to interpret results) - but you can hide the code by clicking the Hide button above each code chunk. There are also a few dropdown boxes in the document that contain extra information, an example is shown below:

Dropdown box. Click me to get more information
More information


To complete this lab, answer all questions (in bold) and discuss them with your teaching assistant at the end of the lab.

Good luck! :)

1.1 Learning outcomes

After completing this lab, you will be able to:

  • Explain how DNA can be used to study microbial diversity in a high-throughput way and contrast it with using morphology
  • Use given sequences to BLAST against a reference database and identify the corresponding organism
  • Explain the concept of metabarcoding, and assess its power and limitations
  • Classify sequences based on phylogenetic placement
  • Analyse the ecology of taxa of interest using sequence information
  • Analyse and assess how different microbial communities are structured

1.2 Motivation

Microbes run the world! Protists and fungi play crucial ecological roles such as primary production, consumers and decomposers. But how do we study them given that they are so small? Furthermore, most of them are difficult to culture. One method is: using a metabarcoding approach!


Fig. 1: Metabarcoding overview (source:http://www.naturemetrics.co.uk)


Figure 1 shows an overview of the metabarcoding approach. The most commonly used DNA barcode for microbial eukaryotes is the 18S rDNA gene (though it should be noted that the internal transcribed spacer, ITS, is more commonly used for fungi). In this lab, we will focus on the 18S gene using subsets of published sequencing data from: global marinesampling expeditions such as Tara Oceans and the Malaspina Expedition, freshwater from Lake Baikal, and soils from Neotropical forests.