Rodríguez-Pérez, R., Bajorath, J. Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions. J Comput Aided Mol Des 34, 1013–1026 (2020). https://doi.org/10.1007/s10822-020-00314-0
Driven by FURP(FoSE Undergraduate Research Placement) Programme.
Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions
Publisher:
Springer Nature, Journal of Computer-Aided Molecular Design
Authors:
Raquel Rodríguez-Pérez, Jürgen Bajorath
Index Terms - Machine learning - Black box character - Structure–activity relationships - Compound activity - Compound potency prediction - Multi-target modeling - Model interpretation - Feature importance - Shapley values
DOI: 10.1007/s10822-020-00314-0
Background(Key Point):
Difficulties in interpreting machine learning(ML) models and their predictions limit the practical applicability of and confidence in ML in pharmaceutical research.
A short coming of many ML approaches is the difficulty to rationalize predictions.
Lack of interpretability might result from intrinsic black box character of ML methods such as neural network(NN) or support vector machine(SVM) algorithms. Futhermore, it might also result from using principally interpretable models such as decison trees(DTs) as large ensembles classifiers such as random forest(RF).
Methodology:
Shapley Additive exPlanations method(SHAP)
, A new
methodology applicable of any complexity for ML model interpretation in
chemoinformatics and medicinal chemistry is introduced. It is based upon
the Shapley value concept from game theory and can be rationalized as an
extension of the local interpretable model-agnostic explanations.
Key Findings: The SHAP
methodologu
enables the interpretation of ML models and their predictions, yielding
feature importance for any ML model and therefore sheds light on the
black box nature of many ML approaches.
SHAP
analysis yielded meaningful explanations of
compounded potency and multi-target predictions, revealing different
model characteristics responsible for individual model.