The COG protein database was generated by comparing predicted and known proteins in all completely sequenced microbial genomes to infer sets of orthologs. Each COG consists of a group of proteins found to be orthologous across at least three lineages and likely corresponds to an ancient conserved domain. For more information check out the NCBI COG website. Since the COG database is significantly smaller than the NCBI non-redundant (NR) database, it provides a fast alternative for rapidly describing the functional characteristics of one microbe or a community of microbes. Recently, there have been a few successors to the COG db including euKaryotic Orthologous Groups (KOGs) and eggNOG which provide extended analysis of more genomes including eukaryotes.
The current COG database used in CloVR is composed of 144k proteins and over 4800 COGs. While each COG has a specific functional description, it may also have one or more general category letter associations:
CELLULAR PROCESSES AND SIGNALING
[D] Cell cycle control, cell division, chromosome partitioning
[M] Cell wall/membrane/envelope biogenesis
[N] Cell motility
[O] Post-translational modification, protein turnover, and chaperones
[T] Signal transduction mechanisms
[U] Intracellular trafficking, secretion, and vesicular transport
[V] Defense mechanisms
[W] Extracellular structures
[Y] Nuclear structure
[Z] Cytoskeleton
INFORMATION STORAGE AND PROCESSING
[A] RNA processing and modification
[B] Chromatin structure and dynamics
[J] Translation, ribosomal structure and biogenesis
[K] Transcription
[L] Replication, recombination and repair
METABOLISM
[C] Energy production and conversion
[E] Amino acid transport and metabolism
[F] Nucleotide transport and metabolism
[G] Carbohydrate transport and metabolism
[H] Coenzyme transport and metabolism
[I] Lipid transport and metabolism
[P] Inorganic ion transport and metabolism
[Q] Secondary metabolites biosynthesis, transport, and catabolism
POORLY CHARACTERIZED
[R] General function prediction only
[S] Function unknown