Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
1
pubmed:dateCreated
1998-4-7
pubmed:abstractText
The FASTA package of sequence comparison programs has been modified to provide accurate statistical estimates for local sequence similarity scores with gaps. These estimates are derived using the extreme value distribution from the mean and variance of the local similarity scores of unrelated sequences after the scores have been corrected for the expected effect of library sequence length. This approach allows accurate estimates to be calculated for both FASTA and Smith-Waterman similarity scores for protein/protein, DNA/DNA, and protein/translated-DNA comparisons. The accuracy of the statistical estimates is summarized for 54 protein families using FASTA and Smith-Waterman scores. Probability estimates calculated from the distribution of similarity scores are generally conservative, as are probabilities calculated using the Altschul-Gish lambda, kappa, and eta parameters. The performance of several alternative methods for correcting similarity scores for library-sequence length was evaluated using 54 protein superfamilies from the PIR39 database and 110 protein families from the Prosite/SwissProt rel. 34 database. Both regression-scaled and Altschul-Gish scaled scores perform significantly better than unscaled Smith-Waterman or FASTA similarity scores. When the Prosite/ SwissProt test set is used, regression-scaled scores perform slightly better; when the PIR database is used, Altschul-Gish scaled scores perform best. Thus, length-corrected similarity scores improve the sensitivity of database searches. Statistical parameters that are derived from the distribution of similarity scores from the thousands of unrelated sequences typically encountered in a database search provide accurate estimates of statistical significance that can be used to infer sequence homology.
pubmed:grant
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:status
MEDLINE
pubmed:month
Feb
pubmed:issn
0022-2836
pubmed:author
pubmed:issnType
Print
pubmed:day
13
pubmed:volume
276
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
71-84
pubmed:dateRevised
2007-11-15
pubmed:meshHeading
pubmed:year
1998
pubmed:articleTitle
Empirical statistical estimates for sequence similarity searches.
pubmed:affiliation
Department of Biochemistry, University of Virginia, Charlottesville 22908, USA.
pubmed:publicationType
Journal Article, Comparative Study, Research Support, U.S. Gov't, P.H.S., Research Support, Non-U.S. Gov't