20029662

Source:http://linkedlifedata.com/resource/pubmed/id/20029662

Download in:

Switch to

Custom View

Named Graph Language Inference

Statements in which the resource exists as a subject.
Predicate	Object
rdf:type	pubmed:Citation
lifeskim:mentions	umls-concept:C0021200, umls-concept:C0021400, umls-concept:C0026175, umls-concept:C0242356, umls-concept:C0337026, umls-concept:C0523113, umls-concept:C0995203, umls-concept:C1274040, umls-concept:C1413226, umls-concept:C1519249, umls-concept:C1522242, umls-concept:C1553702, umls-concept:C1705803, umls-concept:C2698333
pubmed:dateCreated	2009-12-23
pubmed:abstractText	The Influenza Virus Resource and other Virus Variation Resources at NCBI provide enhanced visualization web tools for exploratory analysis for influenza sequence data. Despite the improvements in data analysis, the initial data retrieval remains unsophisticated, frequently producing huge and imbalanced datasets due to the large number of identical and nearly-identical sequences in the database.We propose a data mining algorithm to organize reported sequences into groups based on their relatedness to the query sequence and to each other. The algorithm uses BLAST to find database sequences related to the query. Neighbor lists precalculated from pairwise BLAST alignments between database sequences are used to organize results in groups of nearly-identical and strongly related sequences. We propose to use a non-symmetric dissimilarity measure well crafted for dealing with sequences of different length (fragments). A balanced and representative data set produced by this tool can be used for further analysis, i.e. multiple sequence alignment and phylogenetic trees. The algorithm is implemented for protein coding sequences and is being integrated with the NCBI Influenza Virus Resource.
pubmed:grant	http://linkedlifedata.com/resource/pubmed/grant/Z99 LM999999
pubmed:commentsCorrections	http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-15072689, http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-15917781, http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-16208317, http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-17683263, http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-17942553, http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-18485197, http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-18940867, http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-19341451, http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-19516283, http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-2983426, http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-3162770, http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-9682055, http://linkedlifedata.com/resource/pubmed/commentcorrection/20029662-9707539
pubmed:language	eng
pubmed:journal	http://linkedlifedata.com/resource/pubmed/journal/101515638
pubmed:status	PubMed-not-MEDLINE
pubmed:issn	2157-3999
pubmed:author	pubmed-author:TatusovaTatianaT, pubmed-author:ZaslavskyLeonidL
pubmed:issnType	Electronic
pubmed:volume	1
pubmed:owner	NLM
pubmed:authorsComplete	Y
pubmed:pagination	RRN1124
pubmed:dateRevised	2011-9-28
pubmed:year	2009
pubmed:articleTitle	Mining the NCBI influenza sequence database: adaptive grouping of BLAST results using precalculated neighbor indexing.
pubmed:affiliation	National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, USA.
pubmed:publicationType	Journal Article