Mapping files in CloVR

The following mapping file formats are supported in CloVR-tracks:
 

CloVR-formatted mapping files

For datasets with one or more fasta files, the associated CloVR-mapping file is tab-delimited and describes the features of each file.

This format requires that:
1. All entries are tab-delimited.
2. All entries in every column are defined.
3. A varying number of colunms may be defined, but three colunms are mandatory: File, SampleName and Description. Additional colunms could be used for pairwise comparisions (see below).
4. The header line begins with: “#File<tab>SampleName” and ends in “Description”.
5. There are no duplicate header fields or file names.
6. No header fields or corresponding entries contain invalid characters (only alphanumeric and underscores are allowed).

Below are two simple examples:

#File     SampleName   PH_p    Gender_p  Status   Description
A.fasta    sampleA     low      male     control   none
B.fasta    sampleB     low     female    control   none
C.fasta    sampleC     high     male     control   none
D.fasta    sampleD     high    female    treated   none
#File     SampleName   BodySite_p    Description
A.fasta    sampleA     oral          oral_visit1_subject0001
B.fasta    sampleB     airways       airways_visit1_subject0001
C.fasta    sampleC     oral          oral_visit2_subject0001
D.fasta    sampleD     airways       airways_visit2_subject0001

Pairwise comparisons: To utilize the Metastats statistical methodology for differential abundance detection, the associated header field must end with “_p”, (e.g. “Treatment_p”, or “PH_p”). Otherwise Metastats will skip pairwise analysis of the entire header field. Please note that only groups containing more than one sample can be compared.
 

Qiime-formatted mapping files

In some cases (typically 16S), sequence data may consist of a single fasta file that contains sequences from multiple samples, individually tagged by sample-specific barcodes as commonly used in the 454 amplicon sequencing protocol. The mapping file provides sample-associated information with the following Qiime-based formatting requirements:

#SampleID BarcodeSequence  LinkerPrimerSequence  Treatment_p  Description
Sample1    AGCACGAGCCTA    TATGCTGCCTCCCGTAGGAGT    Control      male
Sample2    AGCACGAGCCTA    TATGCTGCCTCCCGTAGGAGT   Diabetic     female
Sample3    AACTCGTCGATG    TATGCTGCCTCCCGTAGGAGT    Control     female
Sample4    ACAGACCACTCA    TATGCTGCCTCCCGTAGGAGT   Diabetic      male

where:

1. All entries are tab-delimited.
2. All entries in every column are defined.
3. The header line begins with the following fields: “#SampleID<tab>BarcodeSequence<tab> LinkerPrimerSequence”.
4. The header line must end with the field “Description”.
5. The BarcodeSequence and LinkerPrimerSequences fields have valid
IUPAC DNA characters.
6. There are no duplicate header fields.
7. No header fields or corresponding entries contain invalid characters
(alphanumeric and underscore only allowed).
8. There are no duplicates when the primer and barcodes are appended.

Pairwise comparisons: To utilize the Metastats statistical methodology for differential abundance detection, the associated header field must end with “_p”, (e.g. “Treatment_p”, or “PH_p”). Otherwise Metastats will skip pairwise analysis of the entire header field.  Please note that only groups containing more than one sample can be compared.