Key terms in microbiome projects

  1. Home
  2.  » 
  3. Info
  4.  » Key terms in microbiome projects

When evaluating the microbiome in a sample based on sequencing data, some key terms are often used, such as beta- and alpha diversity, dispersion and taxonomy. BIOMCARE will use these terms in the result reports accompanied by an introduction to the terminology. This page further present some key concepts in microbiome projects.

Alpha and Beta Diversity

Different higher-level measures are often used to describe the microbiome in a sample. These do not provide information on changes in abundance of specific taxa, but allow us to access a broader change or difference in the composition of microorganisms. Alpha and beta diversity are such measures.

Different measures exist to estimate diversity within ONE sample, jointly called alpha diversity. The different measures reflect the richness (number) or distribution (evenness) of a microbial sample, or aim to reflect a combination of both properties.

Rarefaction curves are often used when calculating alpha diversity indices, because increasing numbers of sequenced taxa allow increasingly accurate estimates of total population diversity. Rarefaction curves can therefore be used to estimate the full sample richness, as compared to the observed sample richness.

While alpha diversity is a measure of microbiome diversity applicable to a single sample, beta diversity is a measure of similarity or dissimilarity of two communities. As for alpha diversity, many indices exist each reflecting different aspects of community heterogeneity. Key differences relates to how the indices value variation in rare species, if they consider presence/absence only or incorporate abundance, and how they interpret shared absence. Bray-Curtis dissimilarity is a popular measure which consider both size (overall abundance per sample) and shape (abundance of each taxa) of the communities(Bray, 1957). Beta diversity is an essential measure for many popular statistical methods in ecology, such as ordination based methods, and is widely used for studying the association between environmental variables and microbial composition.

In summary, alpha diversity measures can be seen as a summary statistic of a single population (within sample diversity), while beta diversity measures is an estimate of similarity or dissimilarity between populations (between samples).


Sizeable changes in the composition of microorganisms in a microbiome that result in a reduction of the microbioms ability to function optimally is referred to as dysbiosis. Common forms of dysbiosis in human gut samples are increased levels of Proteobacteria or reduction in diversity. However, often studies are only able to describe a changed microbiome and not to label it dysbiotic as it entails a malfunction not optimal to the host.


Normalization across samples of sequencing data is performed to account for differences in sequencing depths.

Rarefaction to even read count

This is often performed by subsampling without replacement of the QC’ed set of reads, to a smaller, predetermined and fixed total. “Without replacement” means that each read that is selected and assigned to the normalized sample is not returned to the original pool, thus cannot be selected again. An advantage of this approach is that data is retained as count data and thereby allow for further analyses with statistical tools requiring count data.

Normalization by sample sum

An alternative to normalization by rarefaction where a subset and even number of reads are selected form each sample, read counts can be converted to relative frequencies by dividing with the sample sum. Here, we use the full sample data and normalize to relative abundances. The resulting values are fractions and therefore no longer counts.

The core microbiome

The precise definition of the core microbiota varies between studies but all aim to identify the more reliably detected taxa for further analyses. Measures of mean abundance across samples and fraction of samples with zero abundance are often used to filter the taxa for further analysis. Often, lower abundant taxa are removed from further single-taxa analyses, or are analyzed using different statistical approaches that better handle their distribution properties. Thresholds and statistical models must be selected based on the individual study design, depending on type of microbiome and goal of the analyses.

While the definition of a set of core taxa on a study-by-study basis is practical for statistical and interpretational reasons, many studies have aimed to identify a population-scale core, often referred to as the core measurable microbiome (CMM), defined as the taxa found across all or a defined set of human communities. While this is an interesting biological question, it is calculated with a different aim than the above discussed filtering performed for robustness and statistical purposes.

These microbiome targeted therapies are still in early testing, but I believe we’ll find a way to make them work. That is as big a breakthrough as anything else we will do in health over the next two decades.

Bill Gates

QUARTS article by Katie Palmer, October 2019