Source:http://linkedlifedata.com/resource/pubmed/id/19840808
Switch to
Predicate | Object |
---|---|
rdf:type | |
lifeskim:mentions | |
pubmed:issue |
4
|
pubmed:dateCreated |
2010-2-15
|
pubmed:abstractText |
MOTIVATION: Gene identification in genomes has been a fundamental and long-standing task in bioinformatics and computational biology. Many computational methods have been developed to predict genes in prokaryote genomes by identifying translation initiation site (TIS) in transcript data. However, the pseudo-TISs at the genome level make these methods suffer from a high number of false positive predictions. In addition, most of the existing tools use an unsupervised learning framework, whose predictive accuracy may depend on the choice of specific organism. RESULTS: In this paper, we present a supervised learning method, support vector machine (SVM), to identify translation initiation site at the genome level. The features are extracted from the sequence data by modeling the sequence segment around predicted TISs as a position specific weight matrix (PSWM). We train the parameters of our SVM through well constructed positive and negative TIS datasets. Then we apply the method to recognize translation initiation sites in E. coli, B. subtilis, and validate our method on two GC-rich bacteria genomes: Pseudomonas aeruginosa and Burkholderia pseudomallei K96243. We show that translation initiation sites can be recognized accurately at the genome level by our method, irrespective of their GC content. Furthermore, we compare our method with four existing methods and demonstrate that our method outperform these methods by obtaining better performance in all the four organisms.
|
pubmed:language |
eng
|
pubmed:journal | |
pubmed:citationSubset |
IM
|
pubmed:status |
MEDLINE
|
pubmed:month |
Feb
|
pubmed:issn |
1095-8541
|
pubmed:author | |
pubmed:copyrightInfo |
(c) 2009. Published by Elsevier Ltd.
|
pubmed:issnType |
Electronic
|
pubmed:day |
21
|
pubmed:volume |
262
|
pubmed:owner |
NLM
|
pubmed:authorsComplete |
Y
|
pubmed:pagination |
644-9
|
pubmed:meshHeading |
pubmed-meshheading:19840808-Area Under Curve,
pubmed-meshheading:19840808-Bacillus subtilis,
pubmed-meshheading:19840808-Burkholderia pseudomallei,
pubmed-meshheading:19840808-Computational Biology,
pubmed-meshheading:19840808-Databases, Genetic,
pubmed-meshheading:19840808-Escherichia coli,
pubmed-meshheading:19840808-False Positive Reactions,
pubmed-meshheading:19840808-Genetic Vectors,
pubmed-meshheading:19840808-Genome, Bacterial,
pubmed-meshheading:19840808-Pattern Recognition, Automated,
pubmed-meshheading:19840808-Protein Biosynthesis,
pubmed-meshheading:19840808-Protein Structure, Tertiary,
pubmed-meshheading:19840808-Pseudomonas aeruginosa,
pubmed-meshheading:19840808-ROC Curve
|
pubmed:year |
2010
|
pubmed:articleTitle |
Identifying translation initiation sites in prokaryotes using support vector machine.
|
pubmed:affiliation |
College of Science, China Agricultural University, 100083 Beijing, China.
|
pubmed:publicationType |
Journal Article,
Research Support, Non-U.S. Gov't
|