Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:dateCreated
2007-3-19
pubmed:abstractText
Many biological databases contain a large number of variables, among which events of interest may be very infrequent. Using a single data mining method to analyze such databases may not find adequate predictors. The HIV Drug Resistance Database at Stanford University stores sequential HIV-1 genotype-test results on patients taking antiretroviral drugs. We have analyzed the infrequent event of gene mutation changes by combining three data mining methods. We first use association rule analysis to scan through the database and identify potentially interesting mutation patterns with relatively high frequency. Next, we use logistic regression and classification trees to further investigate these patterns by analyzing the relationship between treatment history and mutation changes. Although the AUC measures of the overall prediction is not very high, our approach can effectively identify strong predictors of mutation change and thus focus the analytic efforts of researchers in verifying these results.
pubmed:grant
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:chemical
pubmed:status
MEDLINE
pubmed:issn
1752-7791
pubmed:author
pubmed:issnType
Print
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
385-8
pubmed:dateRevised
2007-12-3
pubmed:meshHeading
pubmed:year
2006
pubmed:articleTitle
A combined data mining approach for infrequent events: analyzing HIV mutation changes based on treatment history.
pubmed:affiliation
Stanford Medical Informatics, Department of Medicine, Stanford University, Stanford, CA 94305, USA. raylin@stanford.edu
pubmed:publicationType
Journal Article, Research Support, N.I.H., Extramural