Source:http://linkedlifedata.com/resource/pubmed/id/10382671
Switch to
Predicate | Object |
---|---|
rdf:type | |
lifeskim:mentions | |
pubmed:issue |
4
|
pubmed:dateCreated |
1999-8-12
|
pubmed:abstractText |
We present a fast algorithm to search for repeating fragments within protein sequences. The technique is based on an extension of the Smith-Waterman algorithm that allows the calculation of sub-optimal alignments of a sequence against itself. We are able to estimate the statistical significance of all sub-optimal alignment scores. We also rapidly determine the length of the repeating fragment and the number of times it is found in a sequence. The technique is applied to sequences in the Swissprot database, and to 16 complete genomes. We find that eukaryotic proteins contain more internal repeats than those of prokaryotic and archael organisms. The finding that 18% of yeast sequences and 28% of the known human sequences contain detectable repeats emphasizes the importance of internal duplication in protein evolution.
|
pubmed:grant | |
pubmed:language |
eng
|
pubmed:journal | |
pubmed:citationSubset |
IM
|
pubmed:chemical | |
pubmed:status |
MEDLINE
|
pubmed:month |
Jun
|
pubmed:issn |
0887-3585
|
pubmed:author | |
pubmed:issnType |
Print
|
pubmed:day |
1
|
pubmed:volume |
35
|
pubmed:owner |
NLM
|
pubmed:authorsComplete |
Y
|
pubmed:pagination |
440-6
|
pubmed:dateRevised |
2007-11-14
|
pubmed:meshHeading |
pubmed-meshheading:10382671-Algorithms,
pubmed-meshheading:10382671-Archaeoglobus fulgidus,
pubmed-meshheading:10382671-Databases, Factual,
pubmed-meshheading:10382671-Escherichia coli,
pubmed-meshheading:10382671-Genes, Archaeal,
pubmed-meshheading:10382671-Genome, Bacterial,
pubmed-meshheading:10382671-Genome, Fungal,
pubmed-meshheading:10382671-Humans,
pubmed-meshheading:10382671-Poisson Distribution,
pubmed-meshheading:10382671-Proteins,
pubmed-meshheading:10382671-Saccharomyces cerevisiae,
pubmed-meshheading:10382671-Sequence Homology, Amino Acid
|
pubmed:year |
1999
|
pubmed:articleTitle |
A fast algorithm for genome-wide analysis of proteins with repeated sequences.
|
pubmed:affiliation |
Molecular Biology Institute and UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, University of California, Los Angeles, 90095-1570, USA.
|
pubmed:publicationType |
Journal Article,
Research Support, U.S. Gov't, P.H.S.,
Research Support, U.S. Gov't, Non-P.H.S.,
Research Support, Non-U.S. Gov't
|