Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
3
pubmed:dateCreated
2005-5-9
pubmed:abstractText
We introduce a new method for identifying optimal incomplete data sets from large sequence databases based on the graph theoretic concept of alpha-quasi-bicliques. The quasi-biclique method searches large sequence databases to identify useful phylogenetic data sets with a specified amount of missing data while maintaining the necessary amount of overlap among genes and taxa. The utility of the quasi-biclique method is demonstrated on large simulated sequence databases and on a data set of green plant sequences from GenBank. The quasi-biclique method greatly increases the taxon and gene sampling in the data sets while adding only a limited amount of missing data. Furthermore, under the conditions of the simulation, data sets with a limited amount of missing data often produce topologies nearly as accurate as those built from complete data sets. The quasi-biclique method will be an effective tool for exploiting sequence databases for phylogenetic information and also may help identify critical sequences needed to build large phylogenetic data sets.
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:status
MEDLINE
pubmed:month
Jun
pubmed:issn
1055-7903
pubmed:author
pubmed:issnType
Print
pubmed:volume
35
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
528-35
pubmed:dateRevised
2006-11-15
pubmed:meshHeading
pubmed:year
2005
pubmed:articleTitle
Identifying optimal incomplete phylogenetic data sets from sequence databases.
pubmed:affiliation
Department of Computer Science, Iowa State University, Ames, IA 50011, USA.
pubmed:publicationType
Journal Article, Comparative Study, Research Support, U.S. Gov't, Non-P.H.S.