Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
2
pubmed:dateCreated
2008-2-25
pubmed:abstractText
We investigate an approach that combines Bayesian modeling of probability distributions of descriptor values of active and database molecules with Kullback-Leibler analysis of the divergence between these distributions. The methodology is used for Bayesian screening and also to predict compound recall rates. In our study, we analyze two fundamental approximations underlying the Bayesian screening approach: the assumption that descriptors are independent of each other and, furthermore, that their data set values follow normal distributions. In addition, we calculate Kullback-Leibler divergence for single descriptors, rather than multiple-feature distributions, in order to prioritize descriptors for screening calculations. The results show that descriptor correlation effects, violating the assumption of feature independence, can lead to notable reduction of compound recall in Bayesian screening. Controlling descriptor correlation effects play a much more significant role for achieving high recall rates than approximating descriptor distributions by Gaussians. Furthermore, Kullback-Leibler divergence analysis is shown to systematically identify descriptors that are the most relevant for the outcome of Bayesian screening calculations.
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:status
MEDLINE
pubmed:month
Feb
pubmed:issn
1549-9596
pubmed:author
pubmed:issnType
Print
pubmed:volume
48
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
247-55
pubmed:meshHeading
pubmed:year
2008
pubmed:articleTitle
Bayesian similarity searching in high-dimensional descriptor spaces combined with Kullback-Leibler descriptor divergence analysis.
pubmed:affiliation
Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany.
pubmed:publicationType
Journal Article