Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
2
pubmed:dateCreated
2005-4-26
pubmed:abstractText
How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. We propose a minimum redundancy - maximum relevance (MRMR) feature selection framework. Genes selected via MRMR provide a more balanced coverage of the space and capture broader characteristics of phenotypes. They lead to significantly improved class predictions in extensive experiments on 6 gene expression data sets: NCI, Lymphoma, Lung, Child Leukemia, Leukemia, and Colon. Improvements are observed consistently among 4 classification methods: Naive Bayes, Linear discriminant analysis, Logistic regression, and Support vector machines. SUPPLIMENTARY: The top 60 MRMR genes for each of the datasets are listed in http://crd.lbl.gov/~cding/MRMR/. More information related to MRMR methods can be found at http://www.hpeng.net/.
pubmed:grant
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:status
MEDLINE
pubmed:month
Apr
pubmed:issn
0219-7200
pubmed:author
pubmed:issnType
Print
pubmed:volume
3
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
185-205
pubmed:dateRevised
2007-11-14
pubmed:meshHeading
pubmed:year
2005
pubmed:articleTitle
Minimum redundancy feature selection from microarray gene expression data.
pubmed:affiliation
Computational Research Division, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA. chqding@lbl.gov
pubmed:publicationType
Journal Article, Comparative Study, Research Support, U.S. Gov't, P.H.S., Research Support, U.S. Gov't, Non-P.H.S., Evaluation Studies, Research Support, N.I.H., Extramural