Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
Pt 3
pubmed:dateCreated
2009-5-21
pubmed:abstractText
Most common human diseases are likely to have complex etiologies. Methods of analysis that allow for the phenomenon of epistasis are of growing interest in the genetic dissection of complex diseases. By allowing for epistatic interactions between potential disease loci, we may succeed in identifying genetic variants that might otherwise have remained undetected. Here we aimed to analyze the ability of logistic regression (LR) and two tree-based supervised learning methods, classification and regression trees (CART) and random forest (RF), to detect epistasis. Multifactor-dimensionality reduction (MDR) was also used for comparison. Our approach involves first the simulation of datasets of autosomal biallelic unphased and unlinked single nucleotide polymorphisms (SNPs), each containing a two-loci interaction (causal SNPs) and 98 'noise' SNPs. We modelled interactions under different scenarios of sample size, missing data, minor allele frequencies (MAF) and several penetrance models: three involving both (indistinguishable) marginal effects and interaction, and two simulating pure interaction effects. In total, we have simulated 99 different scenarios. Although CART, RF, and LR yield similar results in terms of detection of true association, CART and RF perform better than LR with respect to classification error. MAF, penetrance model, and sample size are greater determining factors than percentage of missing data in the ability of the different techniques to detect true association. In pure interaction models, only RF detects association. In conclusion, tree-based methods and LR are important statistical tools for the detection of unknown interactions among true risk-associated SNPs with marginal effects and in the presence of a significant number of noise SNPs. In pure interaction models, RF performs reasonably well in the presence of large sample sizes and low percentages of missing data. However, when the study design is suboptimal (unfavourable to detect interaction in terms of e.g. sample size and MAF) there is a high chance of detecting false, spurious associations.
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:status
MEDLINE
pubmed:month
May
pubmed:issn
1469-1809
pubmed:author
pubmed:issnType
Electronic
pubmed:volume
73
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
360-9
pubmed:meshHeading
pubmed:year
2009
pubmed:articleTitle
Evaluating the ability of tree-based methods and logistic regression for the detection of SNP-SNP interaction.
pubmed:affiliation
Unidade de Xenética, Instituto de Medicina Legal and Departamento de Anatomía Patológica y Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain.
pubmed:publicationType
Journal Article, Research Support, Non-U.S. Gov't, Evaluation Studies