Abstract | Genome sequencing efforts for the Triticum aestivum genome produce massive amounts of contigs, preliminary assemblies and putative genes/proteins, nevertheless their annotation is still in its infancy. Given the much larger percentage of annotated genes in other previously sequenced plant genomes such as Arabidopsis thaliana and Oryza sativa and the known phylogenetic and orthology relationship among these plant species and their corresponding genes, we propose an enrichment model that will further expand the horizon of wheat gene annotations. Our sequences and annotations base includes data from Ensembl Plants for 9 plant species: Aegilops tauschii, Arabidopsis thaliana, Brachypodium distachyon, Brassica rapa, Hordeum vulgare, Oryza sativa subsp. japonica, Sorghum bicolor, Triticum urartu and Zea mays. Orthology relationships between wheat genes and each of the 9 plant species are predicted using an in-house software package. Next, ortholog cliques are identified such that each set of genes within a clique represents pairwise orthologs. Using the phylogenetic distances between wheat and each plant species to quantify the level of confidence for gene ontology assignments within each ortholog clique, new gene annotations are assigned to wheat genes such that either novel or more specific GO terms are associated with those genes. Overall, based on clique size equal or larger than 3, our model enriched the existing gene-GO term associations for 7,838 (8%) wheat genes, of which 2,139 had no previous annotation. For the particular case of ortholog cliques of size 10 (13 in total) where all 10 genes within a clique are tightly connected via pairwise orthology, 85 new and more specific GO terms were identified, which represent a 65% increase compared with the previously 130 known GO terms. These observations are further supported for 4 out of the 10 plant species considered in this work by experimental evidence using expressologs (Patel et al., Plant J. 2012). |
---|