Source:http://linkedlifedata.com/resource/pubmed/id/19040364
Switch to
Predicate | Object |
---|---|
rdf:type | |
lifeskim:mentions | |
pubmed:issue |
10
|
pubmed:dateCreated |
2008-12-1
|
pubmed:abstractText |
The biclustering problem has been extensively studied in many areas, including e-commerce, data mining, machine learning, pattern recognition, statistics, and, more recently, computational biology. Given an n x m matrix A (n >or= m), the main goal of biclustering is to identify a subset of rows (called objects) and a subset of columns (called properties) such that some objective function that specifies the quality of the found bicluster (formed by the subsets of rows and of columns of A) is optimized. The problem has been proved or conjectured to be NP-hard for various objective functions. In this article, we study a probabilistic model for the implanted additive bicluster problem, where each element in the n x m background matrix is a random integer from [0, L - 1] for some integer L, and a k x k implanted additive bicluster is obtained from an error-free additive bicluster by randomly changing each element to a number in [0, L - 1] with probability theta. We propose an O(n(2)m) time algorithm based on voting to solve the problem. We show that when k >or= Omega(square root of (n log n)), the voting algorithm can correctly find the implanted bicluster with probability at least 1 - (9/n(2)). We also implement our algorithm as a C++ program named VOTE. The implementation incorporates several ideas for estimating the size of an implanted bicluster, adjusting the threshold in voting, dealing with small biclusters, and dealing with overlapping implanted biclusters. Our experimental results on both simulated and real datasets show that VOTE can find biclusters with a high accuracy and speed.
|
pubmed:grant | |
pubmed:commentsCorrections |
http://linkedlifedata.com/resource/pubmed/commentcorrection/19040364-10359783,
http://linkedlifedata.com/resource/pubmed/commentcorrection/19040364-10582567,
http://linkedlifedata.com/resource/pubmed/commentcorrection/19040364-11102521,
http://linkedlifedata.com/resource/pubmed/commentcorrection/19040364-12671006,
http://linkedlifedata.com/resource/pubmed/commentcorrection/19040364-14668247,
http://linkedlifedata.com/resource/pubmed/commentcorrection/19040364-15044247,
http://linkedlifedata.com/resource/pubmed/commentcorrection/19040364-16500941,
http://linkedlifedata.com/resource/pubmed/commentcorrection/19040364-16551664,
http://linkedlifedata.com/resource/pubmed/commentcorrection/19040364-17007074,
http://linkedlifedata.com/resource/pubmed/commentcorrection/19040364-17048406,
http://linkedlifedata.com/resource/pubmed/commentcorrection/19040364-17090578
|
pubmed:language |
eng
|
pubmed:journal | |
pubmed:citationSubset |
IM
|
pubmed:status |
MEDLINE
|
pubmed:month |
Dec
|
pubmed:issn |
1557-8666
|
pubmed:author | |
pubmed:issnType |
Electronic
|
pubmed:volume |
15
|
pubmed:owner |
NLM
|
pubmed:authorsComplete |
Y
|
pubmed:pagination |
1275-93
|
pubmed:dateRevised |
2011-8-1
|
pubmed:meshHeading |
pubmed-meshheading:19040364-Algorithms,
pubmed-meshheading:19040364-Cluster Analysis,
pubmed-meshheading:19040364-Colonic Neoplasms,
pubmed-meshheading:19040364-Computational Biology,
pubmed-meshheading:19040364-Models, Statistical,
pubmed-meshheading:19040364-Pattern Recognition, Automated,
pubmed-meshheading:19040364-Politics,
pubmed-meshheading:19040364-Software
|
pubmed:year |
2008
|
pubmed:articleTitle |
An efficient voting algorithm for finding additive biclusters with random background.
|
pubmed:affiliation |
Department of Computer Science and Technology, Tsinghua University, Beijing, China.
|
pubmed:publicationType |
Journal Article,
Research Support, U.S. Gov't, Non-P.H.S.,
Research Support, Non-U.S. Gov't,
Research Support, N.I.H., Extramural
|