Source:http://linkedlifedata.com/resource/pubmed/id/11021970
Switch to
Predicate | Object |
---|---|
rdf:type | |
lifeskim:mentions | |
pubmed:issue |
1
|
pubmed:dateCreated |
2000-10-25
|
pubmed:abstractText |
The increasing number and diversity of protein sequence families requires new methods to define and predict details regarding function. Here, we present a method for analysis and prediction of functional sub-types from multiple protein sequence alignments. Given an alignment and set of proteins grouped into sub-types according to some definition of function, such as enzymatic specificity, the method identifies positions that are indicative of functional differences by comparison of sub-type specific sequence profiles, and analysis of positional entropy in the alignment. Alignment positions with significantly high positional relative entropy correlate with those known to be involved in defining sub-types for nucleotidyl cyclases, protein kinases, lactate/malate dehydrogenases and trypsin-like serine proteases. We highlight new positions for these proteins that suggest additional experiments to elucidate the basis of specificity. The method is also able to predict sub-type for unclassified sequences. We assess several variations on a prediction method, and compare them to simple sequence comparisons. For assessment, we remove close homologues to the sequence for which a prediction is to be made (by a sequence identity above a threshold). This simulates situations where a protein is known to belong to a protein family, but is not a close relative of another protein of known sub-type. Considering the four families above, and a sequence identity threshold of 30 %, our best method gives an accuracy of 96 % compared to 80 % obtained for sequence similarity and 74 % for BLAST. We describe the derivation of a set of sub-type groupings derived from an automated parsing of alignments from PFAM and the SWISSPROT database, and use this to perform a large-scale assessment. The best method gives an average accuracy of 94 % compared to 68 % for sequence similarity and 79 % for BLAST. We discuss implications for experimental design, genome annotation and the prediction of protein function and protein intra-residue distances.
|
pubmed:language |
eng
|
pubmed:journal | |
pubmed:citationSubset |
IM
|
pubmed:chemical |
http://linkedlifedata.com/resource/pubmed/chemical/Adenylate Cyclase,
http://linkedlifedata.com/resource/pubmed/chemical/Guanylate Cyclase,
http://linkedlifedata.com/resource/pubmed/chemical/L-Lactate Dehydrogenase,
http://linkedlifedata.com/resource/pubmed/chemical/Malate Dehydrogenase,
http://linkedlifedata.com/resource/pubmed/chemical/Protein Kinases,
http://linkedlifedata.com/resource/pubmed/chemical/Proteins,
http://linkedlifedata.com/resource/pubmed/chemical/Serine Endopeptidases
|
pubmed:status |
MEDLINE
|
pubmed:month |
Oct
|
pubmed:issn |
0022-2836
|
pubmed:author | |
pubmed:copyrightInfo |
Copyright 2000 Academic Press.
|
pubmed:issnType |
Print
|
pubmed:day |
13
|
pubmed:volume |
303
|
pubmed:owner |
NLM
|
pubmed:authorsComplete |
Y
|
pubmed:pagination |
61-76
|
pubmed:dateRevised |
2009-11-19
|
pubmed:meshHeading |
pubmed-meshheading:11021970-Adenylate Cyclase,
pubmed-meshheading:11021970-Algorithms,
pubmed-meshheading:11021970-Amino Acid Sequence,
pubmed-meshheading:11021970-Animals,
pubmed-meshheading:11021970-Computational Biology,
pubmed-meshheading:11021970-Databases as Topic,
pubmed-meshheading:11021970-Entropy,
pubmed-meshheading:11021970-Guanylate Cyclase,
pubmed-meshheading:11021970-Humans,
pubmed-meshheading:11021970-L-Lactate Dehydrogenase,
pubmed-meshheading:11021970-Malate Dehydrogenase,
pubmed-meshheading:11021970-Models, Molecular,
pubmed-meshheading:11021970-Molecular Sequence Data,
pubmed-meshheading:11021970-Protein Conformation,
pubmed-meshheading:11021970-Protein Kinases,
pubmed-meshheading:11021970-Proteins,
pubmed-meshheading:11021970-Sensitivity and Specificity,
pubmed-meshheading:11021970-Sequence Alignment,
pubmed-meshheading:11021970-Serine Endopeptidases,
pubmed-meshheading:11021970-Software,
pubmed-meshheading:11021970-Structure-Activity Relationship,
pubmed-meshheading:11021970-Substrate Specificity
|
pubmed:year |
2000
|
pubmed:articleTitle |
Analysis and prediction of functional sub-types from protein sequence alignments.
|
pubmed:affiliation |
Bioinformatics Research Group, SmithKline Beecham Pharmaceuticals Research & Development, 709 Swedeland Road, King of Prussia, PA 19406, USA.
|
pubmed:publicationType |
Journal Article
|