Key Terms in Microbiome Projects

  1. Home
  2.  » 
  3. Info
  4.  » Key terms in microbiome projects

When evaluating the microbiome of a sample based on sequencing data, specific key terms are often used, such as beta- and alpha diversity, dispersion, and taxonomy. At Biomcare we use these terms in the result reports accompanied by an introduction to the terminology.
On this page, some key concepts in microbiome projects are presented.


Solving your microbiome problems

We have more than +7 years of experience in analyzing microbiome data and structuring microbiome projects. Our team has worked on more than 50 microbiome projects spanning research, universities and commercial industries.

Biomcare offers both microbiome services for small discovery projects, as well as large custom-designed microbiome projects.

Contact us today and get a quote. We are standing by to service you.

Alpha and Beta Diversity

Different higher-level measures are often used to describe the microbiome in a sample. These do not provide information on changes in the abundance of specific taxa but allow us to access a broader change or difference in the composition of microorganisms. Alpha and beta diversity are examples of such measures.

Different measures exist to estimate diversity within a single sample, jointly called alpha diversity. The different measures reflect the richness (number) or distribution (evenness) of a microbial sample or aim to reflect a combination of both properties.

Rarefaction curves are often used when calculating alpha diversity indices because increasing numbers of sequenced taxa allow increasingly accurate estimates of total population diversity. Rarefaction curves can therefore be used to estimate the full sample richness, as compared to the observed sample richness.

While alpha diversity is a measure of microbiome diversity applicable to a single sample, beta diversity is a measure of the similarity or dissimilarity of two communities. As for alpha diversity, many indices exist, each reflecting different aspects of community heterogeneity. Key differences relate to how the indices value variation in rare species if they consider presence/absence only or incorporate abundance, and how they interpret shared absence. Bray-Curtis dissimilarity is a popular measure that considers both size (overall abundance per sample) and shape (abundance of each taxon) of the communities (Bray, 1957). Beta diversity is an essential measure for many popular statistical methods in ecology, such as ordination-based methods, and is widely used for studying the association between environmental variables and microbial composition.

In summary, alpha diversity measures can be seen as a summary statistic of a single population (within-sample diversity), while beta diversity measures are estimates of similarity or dissimilarity between populations (between samples).


Sizeable changes in the composition of microorganisms in a microbiome, that result in a reduction of the microbiome’s ability to function optimally, are referred to as dysbiosis. Common forms of dysbiosis in human gut samples are increased levels of Proteobacteria or reduction in diversity. However, often studies are only able to describe a changed microbiome but not label it dysbiotic as it entails a malfunction not optimal to the host.


Normalization across samples of sequencing data is performed to account for differences in sequencing depths.

Rarefaction to even read count

This is often performed by subsampling, without replacement of the QC’ed set of reads, to a smaller, predetermined, and fixed total. “Without replacement” means that each read that is selected and assigned to the normalized sample is not returned to the original pool, and thus cannot be selected again. An advantage of this approach is that data is retained as count data and thereby allows for further analysis with statistical tools requiring count data.

Normalization by sample sum

In an alternative to normalization by rarefaction, where a subset and even number of reads are selected from each sample, read counts can be converted to relative frequencies by dividing by the sample sum. Here, we use the full sample data and normalize it to relative abundances. The resulting values are fractions and therefore no longer count.

The core microbiome

The precise definition of the core microbiota varies between studies, but all aim to identify the more reliably detected taxa for further analysis. Measures of mean abundance across samples and fraction of samples with zero abundance are often used to filter the taxa for further analysis. Often, lower abundant taxa are removed from further single-taxa analysis or are analyzed using different statistical approaches that better handle their distribution properties. Thresholds and statistical models must be selected based on the individual study design, depending on the type of microbiome and goal of the analysis.

While the definition of a set of core taxa on a study-by-study basis is practical for statistical and interpretational reasons, many studies have aimed to identify a population-scale core, often referred to as the core measurable microbiome (CMM), defined as the taxa found across all or a defined set of human communities. While this is an interesting biological question, it is calculated with a different aim than the above-discussed filtering performed for robustness and statistical purposes.