Machine learning and scoring functions (SFs) for molecular drug discovery: Prediction and characterisation of druggable drugs and targets
Date
2020
Authors
Hudson, I.L.
Leemaqz, S.Y.
Abell, A.D.
Editors
Cartwright, H.M.
Advisors
Journal Title
Journal ISSN
Volume Title
Type:
Book chapter
Citation
Machine Learning in Chemistry: The Impact of Artificial Intelligence, 2020 / Cartwright, H.M. (ed./s), vol.17, Ch.11, pp.251-279
Statement of Responsibility
I. L. Hudson, S. Y. Leemaqz and A. D. Abell
Conference Name
Abstract
<jats:p>Predicting druggability and prioritising disease-modifying targets is critical in drug discovery. In this chapter, we describe the testing of a druggability rule based on 9 molecular parameters, which uses cutpoints for each molecular parameter and targets based on mixture clustering discriminant analysis. We demonstrate that principal component constructs and score functions of violations can be used to identify the hidden pattern of druggable molecules and disease targets. Random Forest and Artificial Neural Network rules to classify the high-score target from the low-score molecular violators, based both on molecular parameters and the principal component constructs, have confirmed the value of logD's inclusion in the scoring function. Our scoring functions of counts of violations and novel principal component analytic molecular and target-based constructs partitioned chemospace well, identifying both good and poor druggable molecules and targets. Viable molecules and targets were located in both the beyond Rule of 5 and expanded Rule of 5 regions. Random Forest and Artificial Neural Networks showed different variable importance profiles, with Artificial Neural Networks models performing better than Random Forests. The most important molecular descriptors that influence classification, by the Random Forest methods, were MW, NATOM, logD, and PSA. The optimal Artificial Neural Networks target models indicated that PSA and logD were more important than the traditional parameter MW. Overall, our score 4 partitions using logD were optimal at classification as shown in all Random Forests and Artificial Neural Networks analyses.</jats:p>
School/Discipline
Dissertation Note
Provenance
Description
Access Status
Rights
© The Royal Society of Chemistry 2020