Switch to
Predicate | Object |
---|---|
rdf:type | |
lifeskim:mentions | |
pubmed:issue |
4
|
pubmed:dateCreated |
1995-7-17
|
pubmed:abstractText |
A simple statistical approach for the analysis of biological sequences, such as splice-sites, promoter regions, helices and extended structure forming regions or any other sequence dependent functional entities in proteins, is presented. The approach has been proved useful to develop a method for prediction of such entities in newly available sequences. We first search for invariant sequence features of each functional entity from the experimentally available sequences and identify a set of 'like' sequences with similar sequence features. In the next step, concrete features of sequence entities in terms of occurrences of smaller subsequences are identified at various positions which are used as a knowledge base to select potential functional entities from the identified 'like' sequences. The third step consists of refinement of this pattern learning, statistical improvements of the knowledge base weight matrices, and finally its application to predict functional entities in newly available sequences. Such an analysis is operationally described for murine splice-site predictions. Regions comprising -30 to +30 nucleotides from the splice-junction at the murine splice-sites (donors and acceptors), reported earlier, were analyzed. Invariant sequence-specific features in terms of monomer frequency average were used to identify splice-site-like sequences in the EMBL murine DNA sequence data base. The frequencies of occurrence of mono-, di-, tri- and tetranucleotides in the known splice-sites were studied in comparison with the splice-site-like sequences; the significant differences in their occurrences were extracted as statistical knowledge coded in weight matrices for computer to identify potential splice-sites. The algorithm was refined and a method was developed to predict potential splice-sites in a given murine DNA; the analysis was also extended to human DNA. The success rate of the method to predict correct splice-sites in these species is found to be 80% and 85%, respectively. The major strength of this method lies in reducing significantly the number of false positives which are normally picked up in such analysis.
|
pubmed:language |
eng
|
pubmed:journal | |
pubmed:citationSubset |
IM
|
pubmed:chemical | |
pubmed:status |
MEDLINE
|
pubmed:month |
Feb
|
pubmed:issn |
0739-1102
|
pubmed:author | |
pubmed:issnType |
Print
|
pubmed:volume |
12
|
pubmed:owner |
NLM
|
pubmed:authorsComplete |
Y
|
pubmed:pagination |
785-801
|
pubmed:dateRevised |
2004-11-17
|
pubmed:meshHeading |
pubmed-meshheading:7779300-Algorithms,
pubmed-meshheading:7779300-Animals,
pubmed-meshheading:7779300-Base Sequence,
pubmed-meshheading:7779300-DNA,
pubmed-meshheading:7779300-Databases, Factual,
pubmed-meshheading:7779300-Humans,
pubmed-meshheading:7779300-Mice,
pubmed-meshheading:7779300-Models, Statistical,
pubmed-meshheading:7779300-Research Design
|
pubmed:year |
1995
|
pubmed:articleTitle |
A statistical analytical approach to decipher information from biological sequences: application to murine splice-site analysis and prediction.
|
pubmed:affiliation |
Centre for Cellular and Molecular Biology, Hyderabad, India.
|
pubmed:publicationType |
Journal Article
|