pubmed:abstractText |
A computational procedure was developed for systematic detection of lineage-specific expansions (LSEs) of protein families in sequenced genomes and applied to obtain a census of LSEs in five eukaryotic species, the yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe, the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and the green plant Arabidopsis thaliana. A significant fraction of the proteins encoded in each of these genomes, up to 80% in A. thaliana, belong to LSEs. Many paralogous gene families in each of the analyzed species are almost entirely comprised of LSEs, indicating that their diversification occurred after the divergence of the major lineages of the eukaryotic crown group. The LSEs show readily discernible patterns of protein functions. The functional categories most prone to LSE are structural proteins, enzymes involved in an organism's response to pathogens and environmental stress, and various components of signaling pathways responsible for specificity, including ubiquitin ligase E3 subunits and transcription factors. The functions of several previously uncharacterized, vastly expanded protein families were predicted through in-depth protein sequence analysis, for example, small-molecule kinases and methylases that are expanded independently in the fly and in the nematode. The functions of several other major LSEs remain mysterious; these protein families are attractive targets for experimental discovery of novel, lineage-specific functions in eukaryotes. LSEs seem to be one of the principal means of adaptation and one of the most important sources of organizational and regulatory diversity in crown-group eukaryotes.
|
pubmed:affiliation |
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
|