Key Terms Used in Microbiome Analysis Projects

Microbiome reports use specific terms that can be unfamiliar at first. This page explains the most important ones - so you can read and use your results with confidence.

On this page, some key concepts in microbiome projects are presented.

Alpha and Beta Diversity

Different higher-level measures are often used to describe the microbiome in a sample. These do not provide information on changes in the abundance of specific taxa but allow us to access a broader change or difference in the composition of microorganisms. Alpha and beta diversity are examples of such measures.

Different measures exist to estimate diversity within a single sample, jointly called alpha diversity. The different measures reflect the richness (number) or distribution (evenness) of a microbial sample or aim to reflect a combination of both properties.

Rarefaction curves are often used when calculating alpha diversity indices because increasing numbers of sequenced taxa allow increasingly accurate estimates of total population diversity. Rarefaction curves can therefore be used to estimate the full sample richness, as compared to the observed sample richness.

While alpha diversity is a measure of microbiome diversity applicable to a single sample, beta diversity is a measure of the similarity or dissimilarity of two communities. As for alpha diversity, many indices exist, each reflecting different aspects of community heterogeneity. Key differences relate to how the indices value variation in rare species if they consider presence/absence only or incorporate abundance, and how they interpret shared absence. Bray-Curtis dissimilarity is a popular measure that considers both size (overall abundance per sample) and shape (abundance of each taxon) of the communities (Bray, 1957). Beta diversity is an essential measure for many popular statistical methods in ecology, such as ordination-based methods, and is widely used for studying the association between environmental variables and microbial composition.

In summary, alpha diversity measures can be seen as a summary statistic of a single population (within-sample diversity), while beta diversity measures are estimates of similarity or dissimilarity between populations (between samples).

Dysbiosis

Sizeable changes in the composition of microorganisms in a microbiome, that result in a reduction of the microbiome's ability to function optimally, are referred to as dysbiosis. Common forms of dysbiosis in human gut samples are increased levels of Proteobacteria or reduction in diversity. However, often studies are only able to describe a changed microbiome but not label it dysbiotic as it entails a malfunction not optimal to the host.

If you encounter the term in a Biomcare report, it will always be accompanied by a description of the specific changes observed in your sample.

Data transformations

Data Transformations in microbiome analyses involve applying mathematical operations to sequencing data to enhance interpretability and meet statistical assumptions. Raw microbiome data, often represented as counts or proportions, can be skewed, sparse, or compositional (summing to a fixed total), making direct analysis challenging. Transformations help address these issues by stabilizing variances, normalizing distributions, and accounting for the compositional nature of the data.

Unlike standardization, which rescales data to a standard range or mean (e.g., z-scores), transformations change the structure of the data to improve its utility in specific analyses. Common transformations include:

Log Transformation: Reduces the impact of large values and handles exponential relationships.
Center Log-Ratio (CLR) Transformation: Accounts for compositionality by normalizing each feature relative to the geometric mean of the sample.
Square Root Transformation: Stabilizes variances for count data, especially when low counts are frequent.
Hellinger Transformation: Converts data to proportions and applies square root to reduce compositional bias in community composition analyses.

The choice of transformation depends on the downstream analysis, as each method addresses specific data challenges in microbiome studies.

Normalization

Normalization is a crucial step in microbiome sequencing data analysis, ensuring that comparisons across samples are meaningful and not biased by technical artifacts. Since sequencing often produces varying read depths across samples due to differences in DNA yield or sequencing efficiency, normalization adjusts the data to make these samples comparable.

This process corrects for disparities in sequencing depth, allowing true biological variations to emerge. Common methods include rarefaction, scaling, or the use of relative abundances, each with trade-offs depending on the research goal. Proper normalization is essential to avoid skewed results and ensure robust conclusions in microbiome studies.

Normalization by rarefaction to even read count

Normalization can be performed by subsampling, without replacement of the QC'ed set of reads, to a smaller, predetermined, and fixed total. "Without replacement" means that each read that is selected and assigned to the normalized sample is not returned to the original pool, and thus cannot be selected again. An advantage of this approach is that data is retained as count data and thereby allows for further analysis with statistical tools requiring count data.

Normalization by sample sum

In an alternative to normalization by rarefaction, where a subset and even number of reads are selected from each sample, read counts can be converted to relative frequencies by dividing by the sample sum. Here, we use the full sample data and normalize it to relative abundances. The resulting values are fractions and therefore no longer count.

The core microbiome

The precise definition of the core microbiota varies between studies, but all aim to identify the more reliably detected taxa for further analysis. Measures of mean abundance across samples and fraction of samples with zero abundance are often used to filter the taxa for further analysis. Often, lower abundant taxa are removed from further single-taxa analysis or are analyzed using different statistical approaches that better handle their distribution properties. Thresholds and statistical models must be selected based on the individual study design, depending on the type of microbiome and goal of the analysis.

While the definition of a set of core taxa on a study-by-study basis is practical for statistical and interpretational reasons, many studies have aimed to identify a population-scale core, often referred to as the core measurable microbiome (CMM), defined as the taxa found across all or a defined set of human communities. While this is an interesting biological question, it is calculated with a different aim than the above-discussed filtering performed for robustness and statistical purposes.

Where to find more on the key concept and ideas of microbiome analyses

Questions about your results?

If you come across a term not listed here, or need help interpreting your report, we are happy to help.