Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
11
pubmed:dateCreated
2002-5-29
pubmed:abstractText
Genomics projects have resulted in a flood of sequence data. Functional annotation currently relies almost exclusively on inter-species sequence comparison and is restricted in cases of limited data from related species and widely divergent sequences with no known homologs. Here, we demonstrate that codon composition, a fusion of codon usage bias and amino acid composition signals, can accurately discriminate, in the absence of sequence homology information, cytoplasmic ribosomal protein genes from all other genes of known function in Saccharomyces cerevisiae, Escherichia coli and Mycobacterium tuberculosis using an implementation of support vector machines, SVM(light). Analysis of these codon composition signals is instructive in determining features that confer individuality to ribosomal protein genes. Each of the sets of positively charged, negatively charged and small hydrophobic residues, as well as codon bias, contribute to their distinctive codon composition profile. The representation of all these signals is sensitively detected, combined and augmented by the SVMs to perform an accurate classification. Of special mention is an obvious outlier, yeast gene RPL22B, highly homologous to RPL22A but employing very different codon usage, perhaps indicating a non-ribosomal function. Finally, we propose that codon composition be used in combination with other attributes in gene/protein classification by supervised machine learning algorithms.
pubmed:commentsCorrections
http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-10390524, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-10618406, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-10647941, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-10675895, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-10684945, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-10802651, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-10886031, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-10937990, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-10967127, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-11258942, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-11290319, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-11297922, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-11406385, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-1752426, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-1776357, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-1825809, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-2207166, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-3526280, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-3916708, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-6760125, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-7984417, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-8871397, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-9223264, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-9278503, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-9396790, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-9559554, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-9644974, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-9702192, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-9914211, http://linkedlifedata.com/resource/pubmed/commentcorrection/12034849-9928479
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:chemical
pubmed:status
MEDLINE
pubmed:month
Jun
pubmed:issn
1362-4962
pubmed:author
pubmed:issnType
Electronic
pubmed:day
1
pubmed:volume
30
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
2599-607
pubmed:dateRevised
2010-11-18
pubmed:meshHeading
pubmed-meshheading:12034849-Algorithms, pubmed-meshheading:12034849-Amino Acid Sequence, pubmed-meshheading:12034849-Base Composition, pubmed-meshheading:12034849-Base Sequence, pubmed-meshheading:12034849-Bias (Epidemiology), pubmed-meshheading:12034849-Codon, pubmed-meshheading:12034849-Computational Biology, pubmed-meshheading:12034849-Conserved Sequence, pubmed-meshheading:12034849-Escherichia coli, pubmed-meshheading:12034849-Genes, Bacterial, pubmed-meshheading:12034849-Genes, Fungal, pubmed-meshheading:12034849-Genomics, pubmed-meshheading:12034849-Hydrophobic and Hydrophilic Interactions, pubmed-meshheading:12034849-Molecular Sequence Data, pubmed-meshheading:12034849-Mycobacterium tuberculosis, pubmed-meshheading:12034849-Ribosomal Proteins, pubmed-meshheading:12034849-Saccharomyces cerevisiae, pubmed-meshheading:12034849-Static Electricity
pubmed:year
2002
pubmed:articleTitle
Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae: lessons from supervised machine learning in functional genomics.
pubmed:affiliation
IMCB-BIC, Institute of Molecular and Cell Biology, 30 Medical Drive, 117609 Singapore.
pubmed:publicationType
Journal Article, Research Support, Non-U.S. Gov't