Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:dateCreated
1997-1-30
pubmed:abstractText
We consider the problem of automatic discovery of patterns and the corresponding subfamilies in a set of biosequences. The sequences are unaligned and may contain noise of unknown level. The patterns are of the type used in PROSITE database. In our approach we discover patterns and the respective subfamilies simultaneously. We develop a theoretically substantiated significance measure for a set of such patterns and an algorithm approximating the best pattern set and the subfamilies. The approach is based on the minimum description length (MDL) principle. We report a computing experiment correctly finding subfamilies in the family of chromo domains and revealing new strong patterns.
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:status
MEDLINE
pubmed:issn
1553-0833
pubmed:author
pubmed:issnType
Print
pubmed:volume
4
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
34-43
pubmed:dateRevised
2005-11-16
pubmed:meshHeading
pubmed:year
1996
pubmed:articleTitle
Discovering patterns and subfamilies in biosequences.
pubmed:affiliation
Institute of Mathematics and Computer Science, University of Latvia, Riga, Latvia. abrozma@cclu.lv
pubmed:publicationType
Journal Article, Review