Metastats is a statistical methodology designed to identify differentially abundant features in metagenomic and 16S rRNA sequence datasets. This program utilizes the nonparametric t-test, Fisher’s exact test, and the false discovery rate (FDR) to provide users with a prioritized list of interesting features that define observed differences between two populations (e.g. healthy vs. sick, obese vs. lean, human vs. mouse).
Metastats accepts as input a tab-delimited matrix of counts where rows are features and columns are samples from two populations (the first N columns represent the 1st population – the remainder is the 2nd population).
This method is used in the CloVR-16S and CloVR-Metagenomics tracks to detect differentially abundant functional and taxonomic groups found in the data. For each comparison, Metastats outputs a tab-delimited table with the filename format:
where description denotes the type of data analyzed, Group1 & Group2 are the names of the populations compared, and N1 and N2 are the number of subjects for populations 1 and 2, respectively. So as an example:
are the results of comparing the phylum-level sequence assignments of three overweight and six lean subjects. The results in each tab-delimited file can be imported into a spreadsheet program such as Excel or OpenOffice:
Continuing our overweight vs. lean mock example, in this case we see the mean relative abundances for overweight (group1) and lean (group2) subjects, with corresponding measurements of variance and standard error. The two far right columns display computed p-values and q-values for each comparison. Using p-values, we see that comparisons of Bacteroidetes, Euryarcheota, and Crenarcheota abundance have P<0.05, suggesting these are significant differences between overweight and lean populations. Examining the mean abundances in each population, we find that Bacteroidetes is significantly enriched in group2, as is Euryarcheota. In contrast, Crenarcheota is more abundant in group1 relative to group2. From these results, a user could create tables or histograms for publication.
*Note this example is not based on real data.
Please cite as:
White JR, Nagarajan N, Pop M. Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009 Apr;5(4):e1000352. Epub 2009 Apr 10.