Cluster numbers were assigned to each batch of proteins that clustered together with 30% similarity or higher based on a clustering method using the CD-HIT suite web server. This was done progressively from 100% sequence similarity down to 30% with the cluster numbers carrying through. However, if a protein showed 30% similarity to another batch of proteins, then, for the final cut-off of 30%, the proteins from the previously independent batch were grouped in with the new batch. For instance, Cluster005744 has a cluster_num of 12. That means that it has 30% sequence similarity with proteins from proteins 5955, 00152 1(a), and 19987. In a previous clustering (with a higher similarity cutoff), it also showed sequence similarity with proteins Cluster020987, Cluster020227, and Cluster015890 (so they were previously in cluster #12). However, now proteins Cluster020987, Cluster020227, and Cluster015890 show at least 30% sequence similarity with one of the Cluster000783, Cluster050768, Cluster050003, or lotgi1|230171 proteins (that were previously assigned to cluster #0). So now since at least 1 protein in cluster #12 is similar to cluster #0, all #12 proteins were assigned to cluster #0.