Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
5
pubmed:dateCreated
2009-10-5
pubmed:abstractText
PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.
pubmed:grant
pubmed:commentsCorrections
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:status
MEDLINE
pubmed:month
Oct
pubmed:issn
1532-0480
pubmed:author
pubmed:issnType
Electronic
pubmed:volume
42
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
831-8
pubmed:dateRevised
2011-9-28
pubmed:meshHeading
pubmed:year
2009
pubmed:articleTitle
Improving accuracy for identifying related PubMed queries by an integrated approach.
pubmed:affiliation
National Center for Biotechnology Information, National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA. luzh@ncbi.nlm.nih.gov
pubmed:publicationType
Journal Article, Research Support, N.I.H., Intramural