Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
1
pubmed:dateCreated
2004-12-20
pubmed:abstractText
We identified key residues from the structural alignment of families of protein domains from SCOP which we represented in the form of sparse protein signatures. A signature-generating algorithm (SigGen) was developed and used to automatically identify key residues based on several structural and sequence-based criteria. The capacity of the signatures to detect related sequences from the SWISSPROT database was assessed by receiver operator characteristic (ROC) analysis and jack-knife testing. Test signatures for families from each of the main SCOP classes are described in relation to the quality of the structural alignments, the SigGen parameters used, and their diagnostic performance. We show that automatically generated signatures are potently diagnostic for their family (ROC50 scores typically >0.8), consistently outperform random signatures, and can identify sequence relationships in the "twilight zone" of protein sequence similarity (<40%). Signatures based on 15%-30% of alignment positions occurred most frequently among the best-performing signatures. When alignment quality is poor, sparser signatures perform better, whereas signatures generated from higher-quality alignments of fewer structures require more positions to be diagnostic. Our validation of signatures from the Globin family shows that when sequences from the structural alignment are removed and new signatures generated, the omitted sequences are still detected. The positions highlighted by the signature often correspond (alignment specificity >0.7) to the key positions in the original (non-jack-knifed) alignment. We discuss potential applications of sparse signatures in sequence annotation and homology modeling.
pubmed:commentsCorrections
http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-10195279, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-10211816, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-10373007, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-10493887, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-10526163, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-10592240, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-10842345, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-11119639, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-11254392, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-11276080, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-11301302, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-11327757, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-11391014, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-11752351, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-11790845, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-1409577, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-1438297, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-16718863, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-186608, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-2315699, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-7664079, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-7932758, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-8046748, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-8538750, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-8794873, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-9254694, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-9325115, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-9521122, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-9560213, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-9600919, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-9837738, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-9917419, http://linkedlifedata.com/resource/pubmed/commentcorrection/15608116-9927713
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:chemical
pubmed:status
MEDLINE
pubmed:month
Jan
pubmed:issn
0961-8368
pubmed:author
pubmed:issnType
Print
pubmed:volume
14
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
13-23
pubmed:dateRevised
2009-11-18
pubmed:meshHeading
pubmed:year
2005
pubmed:articleTitle
Automatic generation and evaluation of sparse protein signatures for families of protein structural domains.
pubmed:affiliation
AstraZeneca R&D Charnwood, Bakewell Road, Loughborough, Leicestershire LE11 5RH, England. matthew.blades@astrazeneca.com
pubmed:publicationType
Journal Article, Research Support, Non-U.S. Gov't