Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:dateCreated
2002-10-18
pubmed:abstractText
MOTIVATION: Clustering co-expressed genes usually requires the definition of 'distance' or 'similarity' between measured datasets, the most common choices being Pearson correlation or Euclidean distance. With the size of available datasets steadily increasing, it has become feasible to consider other, more general, definitions as well. One alternative, based on information theory, is the mutual information, providing a general measure of dependencies between variables. While the use of mutual information in cluster analysis and visualization of large-scale gene expression data has been suggested previously, the earlier studies did not focus on comparing different algorithms to estimate the mutual information from finite data. RESULTS: Here we describe and review several approaches to estimate the mutual information from finite datasets. Our findings show that the algorithms used so far may be quite substantially improved upon. In particular when dealing with small datasets, finite sample effects and other sources of potentially misleading results have to be taken into account.
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:status
MEDLINE
pubmed:issn
1367-4803
pubmed:author
pubmed:issnType
Print
pubmed:volume
18 Suppl 2
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
S231-40
pubmed:dateRevised
2006-11-15
pubmed:meshHeading
pubmed:year
2002
pubmed:articleTitle
The mutual information: detecting and evaluating dependencies between variables.
pubmed:affiliation
University Potsdam, Nonlinear Dynamics Group, Germany. steuer@agnld.uni-potsdam.de
pubmed:publicationType
Journal Article, Comparative Study, Evaluation Studies