9514730

Source:http://linkedlifedata.com/resource/pubmed/id/9514730

Download in:

Switch to

Custom View

Named Graph Language Inference

Statements in which the resource exists as a subject.
Predicate	Object
rdf:type	pubmed:Citation
lifeskim:mentions	umls-concept:C0038215, umls-concept:C0750572, umls-concept:C1552603, umls-concept:C1706202, umls-concept:C1710052, umls-concept:C1880496
pubmed:issue	1
pubmed:dateCreated	1998-4-7
pubmed:abstractText	The FASTA package of sequence comparison programs has been modified to provide accurate statistical estimates for local sequence similarity scores with gaps. These estimates are derived using the extreme value distribution from the mean and variance of the local similarity scores of unrelated sequences after the scores have been corrected for the expected effect of library sequence length. This approach allows accurate estimates to be calculated for both FASTA and Smith-Waterman similarity scores for protein/protein, DNA/DNA, and protein/translated-DNA comparisons. The accuracy of the statistical estimates is summarized for 54 protein families using FASTA and Smith-Waterman scores. Probability estimates calculated from the distribution of similarity scores are generally conservative, as are probabilities calculated using the Altschul-Gish lambda, kappa, and eta parameters. The performance of several alternative methods for correcting similarity scores for library-sequence length was evaluated using 54 protein superfamilies from the PIR39 database and 110 protein families from the Prosite/SwissProt rel. 34 database. Both regression-scaled and Altschul-Gish scaled scores perform significantly better than unscaled Smith-Waterman or FASTA similarity scores. When the Prosite/ SwissProt test set is used, regression-scaled scores perform slightly better; when the PIR database is used, Altschul-Gish scaled scores perform best. Thus, length-corrected similarity scores improve the sensitivity of database searches. Statistical parameters that are derived from the distribution of similarity scores from the thousands of unrelated sequences typically encountered in a database search provide accurate estimates of statistical significance that can be used to infer sequence homology.
pubmed:grant	http://linkedlifedata.com/resource/pubmed/grant/LM04969
pubmed:language	eng
pubmed:journal	http://linkedlifedata.com/resource/pubmed/journal/2985088R
pubmed:citationSubset	IM
pubmed:status	MEDLINE
pubmed:month	Feb
pubmed:issn	0022-2836
pubmed:author	pubmed-author:PearsonW RWR
pubmed:issnType	Print
pubmed:day	13
pubmed:volume	276
pubmed:owner	NLM
pubmed:authorsComplete	Y
pubmed:pagination	71-84
pubmed:dateRevised	2007-11-15
pubmed:meshHeading	pubmed-meshheading:9514730-Animals, pubmed-meshheading:9514730-Databases, Factual, pubmed-meshheading:9514730-Evaluation Studies as Topic, pubmed-meshheading:9514730-Humans, pubmed-meshheading:9514730-Mice, pubmed-meshheading:9514730-Regression Analysis, pubmed-meshheading:9514730-Sequence Homology, pubmed-meshheading:9514730-Sequence Homology, Amino Acid, pubmed-meshheading:9514730-Sequence Homology, Nucleic Acid, pubmed-meshheading:9514730-Software
pubmed:year	1998
pubmed:articleTitle	Empirical statistical estimates for sequence similarity searches.
pubmed:affiliation	Department of Biochemistry, University of Virginia, Charlottesville 22908, USA.
pubmed:publicationType	Journal Article, Comparative Study, Research Support, U.S. Gov't, P.H.S., Research Support, Non-U.S. Gov't