14990442

Source:http://linkedlifedata.com/resource/pubmed/id/14990442

Download in:

Switch to

Custom View

Named Graph Language Inference

Statements in which the resource exists as a subject.
Predicate	Object
rdf:type	pubmed:Citation
lifeskim:mentions	umls-concept:C0008902, umls-concept:C0033684, umls-concept:C1547402, umls-concept:C1881303, umls-concept:C1881865
pubmed:issue	4
pubmed:dateCreated	2004-3-1
pubmed:abstractText	MOTIVATION: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. RESULTS: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highest-weighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies.
pubmed:grant	http://linkedlifedata.com/resource/pubmed/grant/LM07276-02
pubmed:language	eng
pubmed:journal	http://linkedlifedata.com/resource/pubmed/journal/9808944
pubmed:citationSubset	IM
pubmed:chemical	http://linkedlifedata.com/resource/pubmed/chemical/Nuclear Proteins, http://linkedlifedata.com/resource/pubmed/chemical/PHLPP1 protein, human, http://linkedlifedata.com/resource/pubmed/chemical/Phosphoprotein Phosphatases, http://linkedlifedata.com/resource/pubmed/chemical/Proteins
pubmed:status	MEDLINE
pubmed:month	Mar
pubmed:issn	1367-4803
pubmed:author	pubmed-author:CohenAdielA, pubmed-author:EskinEleazarE, pubmed-author:LeslieChristina SCS, pubmed-author:NobleWilliam StaffordWS, pubmed-author:WestonJasonJ
pubmed:issnType	Print
pubmed:day	1
pubmed:volume	20
pubmed:owner	NLM
pubmed:authorsComplete	Y
pubmed:pagination	467-76
pubmed:dateRevised	2010-11-18
pubmed:meshHeading	pubmed-meshheading:14990442-Algorithms, pubmed-meshheading:14990442-Amino Acid Sequence, pubmed-meshheading:14990442-Artificial Intelligence, pubmed-meshheading:14990442-Molecular Sequence Data, pubmed-meshheading:14990442-Nuclear Proteins, pubmed-meshheading:14990442-Pattern Recognition, Automated, pubmed-meshheading:14990442-Phosphoprotein Phosphatases, pubmed-meshheading:14990442-Proteins, pubmed-meshheading:14990442-Sequence Alignment, pubmed-meshheading:14990442-Sequence Analysis, Protein, pubmed-meshheading:14990442-Sequence Homology, Amino Acid
pubmed:year	2004
pubmed:articleTitle	Mismatch string kernels for discriminative protein classification.
pubmed:affiliation	Department of Computer Science, Columbia University, 1214 Amsterdam Avenue, Mail Code 0401, New York, NY 10027, USA. cleslie@cs.columbia.edu
pubmed:publicationType	Journal Article, Comparative Study, Research Support, U.S. Gov't, P.H.S., Research Support, U.S. Gov't, Non-P.H.S., Research Support, Non-U.S. Gov't, Evaluation Studies, Validation Studies