Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
8
pubmed:dateCreated
2003-7-23
pubmed:abstractText
The explosion of biological data resulting from genomic and proteomic research has created a pressing need for data analysis techniques that work effectively on a large scale. An area of particular interest is the organization and visualization of large families of protein sequences. An increasingly popular approach is to embed the sequences into a low-dimensional Euclidean space in a way that preserves some predefined measure of sequence similarity. This method has been shown to produce maps that exhibit global order and continuity and reveal important evolutionary, structural, and functional relationships between the embedded proteins. However, protein sequences are related by evolutionary pathways that exhibit highly nonlinear geometry, which is invisible to classical embedding procedures such as multidimensional scaling (MDS) and nonlinear mapping (NLM). Here, we describe the use of stochastic proximity embedding (SPE) for producing Euclidean maps that preserve the intrinsic dimensionality and metric structure of the data. SPE extends previous approaches in two important ways: (1) It preserves only local relationships between closely related sequences, thus allowing the map to unfold and reveal its intrinsic dimension, and (2) it scales linearly with the number of sequences and therefore can be applied to very large protein families. The merits of the algorithm are illustrated using examples from the protein kinase and nuclear hormone receptor superfamilies.
pubmed:commentsCorrections
http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-10068697, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-10463075, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-10592232, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-10964570, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-10977100, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-11074585, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-11125043, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-11125149, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-11125150, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-11294790, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-11590107, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-11752303, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-11752314, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-12424125, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-12444256, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-12471243, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-7984417, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-8019421, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-8662544, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-9021261, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-9041629, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-9397688, http://linkedlifedata.com/resource/pubmed/commentcorrection/12876310-9666334
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:chemical
pubmed:status
MEDLINE
pubmed:month
Aug
pubmed:issn
0961-8368
pubmed:author
pubmed:issnType
Print
pubmed:volume
12
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
1604-12
pubmed:dateRevised
2009-11-19
pubmed:meshHeading
pubmed:year
2003
pubmed:articleTitle
Exploring the nonlinear geometry of protein homology.
pubmed:affiliation
3-Dimensional Pharmaceuticals Inc., 665 Stockton Drive, Exton, PA 19341, USA.
pubmed:publicationType
Journal Article