FURP-Driven Literature Review II

Rodríguez-Pérez, R., Bajorath, J. Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions. J Comput Aided Mol Des 34, 1013–1026 (2020). https://doi.org/10.1007/s10822-020-00314-0

Driven by FURP(FoSE Undergraduate Research Placement) Programme.

Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions

Publisher:

Springer Nature, Journal of Computer-Aided Molecular Design

Authors:

Raquel Rodríguez-Pérez, Jürgen Bajorath

Index Terms - Machine learning - Black box character - Structure–activity relationships - Compound activity - Compound potency prediction - Multi-target modeling - Model interpretation - Feature importance - Shapley values

DOI: 10.1007/s10822-020-00314-0

Background(Key Point):

Difficulties in interpreting machine learning(ML) models and their predictions limit the practical applicability of and confidence in ML in pharmaceutical research.

A short coming of many ML approaches is the difficulty to rationalize predictions.

Lack of interpretability might result from intrinsic black box character of ML methods such as neural network(NN) or support vector machine(SVM) algorithms. Futhermore, it might also result from using principally interpretable models such as decison trees(DTs) as large ensembles classifiers such as random forest(RF).

Methodology:

Shapley Additive exPlanations method(SHAP), A new methodology applicable of any complexity for ML model interpretation in chemoinformatics and medicinal chemistry is introduced. It is based upon the Shapley value concept from game theory and can be rationalized as an extension of the local interpretable model-agnostic explanations.

Key Findings: The SHAP methodologu enables the interpretation of ML models and their predictions, yielding feature importance for any ML model and therefore sheds light on the black box nature of many ML approaches.

SHAP analysis yielded meaningful explanations of compounded potency and multi-target predictions, revealing different model characteristics responsible for individual model.