Machine learning and scoring functions (SFs) for molecular drug discovery: Prediction and characterisation of druggable drugs and targets

Date

2020

Authors

Hudson, I.L.
Leemaqz, S.Y.
Abell, A.D.

Editors

Cartwright, H.M.

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Book chapter

Citation

Machine Learning in Chemistry: The Impact of Artificial Intelligence, 2020 / Cartwright, H.M. (ed./s), vol.17, Ch.11, pp.251-279

Statement of Responsibility

I. L. Hudson, S. Y. Leemaqz and A. D. Abell

Conference Name

Abstract

<jats:p>Predicting druggability and prioritising disease-modifying targets is critical in drug discovery. In this chapter, we describe the testing of a druggability rule based on 9 molecular parameters, which uses cutpoints for each molecular parameter and targets based on mixture clustering discriminant analysis. We demonstrate that principal component constructs and score functions of violations can be used to identify the hidden pattern of druggable molecules and disease targets. Random Forest and Artificial Neural Network rules to classify the high-score target from the low-score molecular violators, based both on molecular parameters and the principal component constructs, have confirmed the value of logD's inclusion in the scoring function. Our scoring functions of counts of violations and novel principal component analytic molecular and target-based constructs partitioned chemospace well, identifying both good and poor druggable molecules and targets. Viable molecules and targets were located in both the beyond Rule of 5 and expanded Rule of 5 regions. Random Forest and Artificial Neural Networks showed different variable importance profiles, with Artificial Neural Networks models performing better than Random Forests. The most important molecular descriptors that influence classification, by the Random Forest methods, were MW, NATOM, logD, and PSA. The optimal Artificial Neural Networks target models indicated that PSA and logD were more important than the traditional parameter MW. Overall, our score 4 partitions using logD were optimal at classification as shown in all Random Forests and Artificial Neural Networks analyses.</jats:p>

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

© The Royal Society of Chemistry 2020

License

Grant ID

Call number

Persistent link to this record