Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
2
pubmed:dateCreated
2004-7-20
pubmed:abstractText
In this paper, we review developments in probabilistic methods of gene recognition in prokaryotic genomes with the emphasis on connections to the general theory of hidden Markov models (HMM). We show that the Bayesian method implemented in GeneMark, a frequently used gene-finding tool, can be augmented and reintroduced as a rigorous forward-backward (FB) algorithm for local posterior decoding described in the HMM theory. Another earlier developed method, prokaryotic GeneMark.hmm, uses a modification of the Viterbi algorithm for HMM with duration to identify the most likely global path through hidden functional states given the DNA sequence. GeneMark and GeneMark.hmm programs are worth using in concert for analysing prokaryotic DNA sequences that arguably do not follow any exact mathematical model. The new extension of GeneMark using the FB algorithm was implemented in the software program GeneMark.fba. Given the DNA sequence, this program determines an a posteriori probability for each nucleotide to belong to coding or non-coding region. Also, for any open reading frame (ORF), it assigns a score defined as a probabilistic measure of all paths through hidden states that traverse the ORF as a coding region. The prediction accuracy of GeneMark.fba determined in our tests was compared favourably to the accuracy of the initial (standard) GeneMark program. Comparison to the prokaryotic GeneMark.hmm has also demonstrated a certain, yet species-specific, degree of improvement in raw gene detection, ie detection of correct reading frame (and stop codon). The accuracy of exact gene prediction, which is concerned about precise prediction of gene start (which in a prokaryotic genome unambiguously defines the reading frame and stop codon, thus, the whole protein product), still remains more accurate in GeneMarkS, which uses more elaborate HMM to specifically address this task.
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:chemical
pubmed:status
MEDLINE
pubmed:month
Jun
pubmed:issn
1467-5463
pubmed:author
pubmed:issnType
Print
pubmed:volume
5
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
118-30
pubmed:dateRevised
2006-11-15
pubmed:meshHeading
pubmed:year
2004
pubmed:articleTitle
Probabilistic methods of identifying genes in prokaryotic genomes: connections to the HMM theory.
pubmed:affiliation
School of Biology and School of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0230, USA. mark@amber.biology.gatech.edu
pubmed:publicationType
Journal Article, Research Support, U.S. Gov't, P.H.S., Research Support, U.S. Gov't, Non-P.H.S.