Parameter: cluster_num

Master parameters information
Short Name:
 no_bcodmo_term
Short Description:
 No BCO-DMO term
Official Name:
 no_bcodmo_term
No Data Value:
Description:
 

An association with a community-wide standard parameter has not yet been made.

Dataset-parameters information
Supplied Name: 
cluster_num
Supplied Units: 
unitless
Conversion Necessary: 
no
Conversion Utility: 
No Data Value: 
nd
Description: 

Cluster numbers were assigned to each batch of proteins that clustered together with 30% similarity or higher based on a clustering method using the CD-HIT suite web server. This was done progressively from 100% sequence similarity down to 30% with the cluster numbers carrying through. However, if a protein showed 30% similarity to another batch of proteins, then, for the final cut-off of 30%, the proteins from the previously independent batch were grouped in with the new batch. For instance, Cluster005744 has a cluster_num of 12. That means that it has 30% sequence similarity with proteins from proteins 5955, 00152 1(a), and 19987. In a previous clustering (with a higher similarity cutoff), it also showed sequence similarity with proteins Cluster020987, Cluster020227, and Cluster015890 (so they were previously in cluster #12). However, now proteins Cluster020987, Cluster020227, and Cluster015890 show at least 30% sequence similarity with one of the Cluster000783, Cluster050768, Cluster050003, or lotgi1|230171 proteins (that were previously assigned to cluster #0). So now since at least 1 protein in cluster #12 is similar to cluster #0, all #12 proteins were assigned to cluster #0.