Comparing Shapley Value Approximation Methods for Unsupervised Feature Importance

Patrick Kolpaczki

doi:10.11576/dataninja-1158

Authors

Patrick Kolpaczki

DOI:

https://doi.org/10.11576/dataninja-1158

Keywords:

Shapley values, feature importance scores, unsupervised learning

Abstract

Assigning importance scores to features is a common approach to gain insights about a prediction model’s behavior or even the data itself. Beyond explainability, such scores can also be of utility to conduct feature selection and make unlabeled high-dimensional data manageable. One way to derive scores is by adopting a game-theoretical view in which features are understood as agents that can form groups and cooperate for which they obtain a reward. Splitting the reward among the features appropriately yields the desired scores. The Shapley value is the most popular reward sharing solution. However, its exponential complexity renders it inapplicable for high-dimensional data unless an efficient approximation is available. We empirically compare selected approximation algorithms for quantifying feature importance on unlabeled data.

Comparing Shapley Value Approximation Methods for Unsupervised Feature Importance

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License