15854658

Source:http://linkedlifedata.com/resource/pubmed/id/15854658

Download in:

Switch to

Custom View

Named Graph Language Inference

Statements in which the resource exists as a subject.
Predicate	Object
rdf:type	pubmed:Citation
lifeskim:mentions	umls-concept:C0678594, umls-concept:C0936012, umls-concept:C1171366, umls-concept:C1280477, umls-concept:C1521840
pubmed:issue	5
pubmed:dateCreated	2005-4-27
pubmed:abstractText	The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>/=30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.
pubmed:commentsCorrections	http://linkedlifedata.com/resource/pubmed/commentcorrection/15854658
pubmed:language	eng
pubmed:journal	http://linkedlifedata.com/resource/pubmed/journal/2985088R
pubmed:citationSubset	IM
pubmed:status	MEDLINE
pubmed:month	May
pubmed:issn	0022-2836
pubmed:author	pubmed-author:MarsdenRussell LRL, pubmed-author:OrengoChristine ACA, pubmed-author:ThorntonJanet MJM, pubmed-author:ToddAnnabel EAE
pubmed:issnType	Print
pubmed:day	20
pubmed:volume	348
pubmed:owner	NLM
pubmed:authorsComplete	Y
pubmed:pagination	1235-60
pubmed:dateRevised	2006-11-15
pubmed:meshHeading	pubmed-meshheading:15854658-Animals, pubmed-meshheading:15854658-Computational Biology, pubmed-meshheading:15854658-Databases, Protein, pubmed-meshheading:15854658-Genome, pubmed-meshheading:15854658-Genomics, pubmed-meshheading:15854658-Humans, pubmed-meshheading:15854658-Protein Conformation, pubmed-meshheading:15854658-Sequence Analysis, Protein, pubmed-meshheading:15854658-Structural Homology, Protein
pubmed:year	2005
pubmed:articleTitle	Progress of structural genomics initiatives: an analysis of solved target structures.
pubmed:affiliation	Department of Biochemistry and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK. annabel.todd@deshaw.com
pubmed:publicationType	Journal Article, Research Support, U.S. Gov't, P.H.S.