18313074

Source:http://linkedlifedata.com/resource/pubmed/id/18313074

Download in:

Switch to

Custom View

Named Graph Language Inference

Statements in which the resource exists as a subject.
Predicate	Object
rdf:type	pubmed:Citation
lifeskim:mentions	umls-concept:C0012632, umls-concept:C0150098, umls-concept:C0243071, umls-concept:C0443203, umls-concept:C0678594, umls-concept:C1334043, umls-concept:C2827421
pubmed:issue	4
pubmed:dateCreated	2008-3-21
pubmed:abstractText	A natural way to study protein sequence, structure, and function is to put them in the context of evolution. Homologs inherit similarities from their common ancestor, while analogs converge to similar structures due to a limited number of energetically favorable ways to pack secondary structural elements. Using novel strategies, we previously assembled two reliable databases of homologs and analogs. In this study, we compare these two data sets and develop a support vector machine (SVM)-based classifier to discriminate between homologs and analogs. The classifier uses a number of well-known similarity scores. We observe that although both structure scores and sequence scores contribute to SVM performance, profile sequence scores computed based on structural alignments are the best discriminators between remote homologs and structural analogs. We apply our classifier to a representative set from the expert-constructed database, Structural Classification of Proteins (SCOP). The SVM classifier recovers 76% of the remote homologs defined as domains in the same SCOP superfamily but from different families. More importantly, we also detect and discuss interesting homologous relationships between SCOP domains from different superfamilies, folds, and even classes.
pubmed:grant	http://linkedlifedata.com/resource/pubmed/grant/GM67165
pubmed:language	eng
pubmed:journal	http://linkedlifedata.com/resource/pubmed/journal/2985088R
pubmed:citationSubset	IM
pubmed:status	MEDLINE
pubmed:month	Apr
pubmed:issn	1089-8638
pubmed:author	pubmed-author:ChengHuaH, pubmed-author:GrishinNick VNV, pubmed-author:KimBong-HyunBH
pubmed:issnType	Electronic
pubmed:day	4
pubmed:volume	377
pubmed:owner	NLM
pubmed:authorsComplete	Y
pubmed:pagination	1265-78
pubmed:meshHeading	pubmed-meshheading:18313074-Amino Acid Sequence, pubmed-meshheading:18313074-Computational Biology, pubmed-meshheading:18313074-Databases, Protein, pubmed-meshheading:18313074-Linear Models, pubmed-meshheading:18313074-Models, Molecular, pubmed-meshheading:18313074-Molecular Sequence Data, pubmed-meshheading:18313074-Probability Theory, pubmed-meshheading:18313074-Reproducibility of Results, pubmed-meshheading:18313074-Sequence Alignment, pubmed-meshheading:18313074-Sequence Analysis, Protein, pubmed-meshheading:18313074-Sequence Homology, Amino Acid
pubmed:year	2008
pubmed:articleTitle	Discrimination between distant homologs and structural analogs: lessons from manually constructed, reliable data sets.
pubmed:affiliation	Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390-9050, USA. hua.cheng@utsouthwestern.edu
pubmed:publicationType	Journal Article, Comparative Study, Research Support, Non-U.S. Gov't, Evaluation Studies, Research Support, N.I.H., Extramural