Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:dateCreated
2010-2-3
pubmed:abstractText
In supervised learning, traditional approaches to building a classifier use two sets of examples with pre-defined classes along with a learning algorithm. The main limitation of this approach is that examples from both classes are required which might be infeasible in certain cases, especially those dealing with biological data. Such is the case for membrane-binding peripheral domains that play important roles in many biological processes, including cell signaling and membrane trafficking by reversibly binding to membranes. For these domains, a well-defined positive set is available with domains known to bind membrane along with a large unlabeled set of domains whose membrane binding affinities have not been measured. The aforementioned limitation can be addressed by a special class of semi-supervised machine learning called positive-unlabeled (PU) learning that uses a positive set with a large unlabeled set. METHODS In this study, we implement the first application of PU-learning to a protein function prediction problem: identification of peripheral domains. PU-learning starts by identifying reliable negative (RN) examples iteratively from the unlabeled set until convergence and builds a classifier using the positive and the final RN set. A data set of 232 positive cases and ~3750 unlabeled ones were used to construct and validate the protocol.
pubmed:grant
pubmed:commentsCorrections
http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-10802651, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-14675776, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-15453915, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-16632598, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-16839875, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-17351049, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-17645808, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-17986450, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-18423832, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-18516045, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-18552854, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-18978772, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-19015660, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-19174456, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-19287394, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-19564845, http://linkedlifedata.com/resource/pubmed/commentcorrection/20122235-19858364
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:chemical
pubmed:status
MEDLINE
pubmed:issn
1471-2105
pubmed:author
pubmed:issnType
Electronic
pubmed:volume
11 Suppl 1
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
S6
pubmed:dateRevised
2010-12-28
pubmed:meshHeading
pubmed:year
2010
pubmed:articleTitle
Genome-wide sequence-based prediction of peripheral proteins using a novel semi-supervised learning technique.
pubmed:affiliation
Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA. nitin.bhardwaj@yale.edu
pubmed:publicationType
Journal Article, Research Support, Non-U.S. Gov't, Research Support, N.I.H., Extramural