Statements in which the resource exists as a subject.
PredicateObject
rdf:type
lifeskim:mentions
pubmed:issue
1
pubmed:dateCreated
2008-12-23
pubmed:abstractText
In this work, we aim to develop a computational approach for predicting DNA-binding sites in proteins from amino acid sequences. To avoid overfitting with this method, all available DNA-binding proteins from the Protein Data Bank (PDB) are used to construct the models. The random forest (RF) algorithm is used because it is fast and has robust performance for different parameter values. A novel hybrid feature is presented which incorporates evolutionary information of the amino acid sequence, secondary structure (SS) information and orthogonal binary vector (OBV) information which reflects the characteristics of 20 kinds of amino acids for two physical-chemical properties (dipoles and volumes of the side chains). The numbers of binding and non-binding residues in proteins are highly unbalanced, so a novel scheme is proposed to deal with the problem of imbalanced datasets by downsizing the majority class.
pubmed:commentsCorrections
http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-10592235, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-11104519, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-1180967, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-11900253, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-12589754, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-14654694, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-14990443, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-15146487, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-15644130, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-16233974, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-16568445, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-16712732, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-16845003, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-16894602, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-17237068, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-17245807, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-17264128, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-17275170, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-17284455, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-17316627, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-17360525, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-17646316, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-17825469, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-2231712, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-9094735, http://linkedlifedata.com/resource/pubmed/commentcorrection/19008251-9254694
pubmed:language
eng
pubmed:journal
pubmed:citationSubset
IM
pubmed:chemical
pubmed:status
MEDLINE
pubmed:month
Jan
pubmed:issn
1367-4811
pubmed:author
pubmed:issnType
Electronic
pubmed:day
1
pubmed:volume
25
pubmed:owner
NLM
pubmed:authorsComplete
Y
pubmed:pagination
30-5
pubmed:dateRevised
2009-11-18
pubmed:meshHeading
pubmed:year
2009
pubmed:articleTitle
Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature.
pubmed:affiliation
State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, P. R. China.
pubmed:publicationType
Journal Article, Research Support, Non-U.S. Gov't