Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:dateCreated
2008-5-29
pubmed:abstractText
Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.
pubmed:grant
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:status
MEDLINE
pubmed:issn
1093-4715
pubmed:author
pubmed:issnType
Electronic
pubmed:volume
13
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
3839-49
pubmed:meshHeading
pubmed:year
2008
pubmed:articleTitle
A ground truth based comparative study on clustering of gene expression data.
pubmed:affiliation
Department of Electrical and Computer Engineering, Virginia Polytechnic and State University, Arlington, VA 22203, USA.
pubmed:publicationType
Journal Article, Review, Research Support, Non-U.S. Gov't, Research Support, N.I.H., Extramural