Comparing Shapley Value Approximation Methods for Unsupervised Feature Importance

Authors

  • Patrick Kolpaczki

DOI:

https://doi.org/10.11576/dataninja-1158

Keywords:

Shapley values, feature importance scores, unsupervised learning

Abstract

Assigning importance scores to features is a common approach to gain insights about a prediction model’s behavior or even the data itself. Beyond explainability, such scores can also be of utility to conduct feature selection and make unlabeled high-dimensional data manageable. One way to derive scores is by adopting a game-theoretical view in which features are understood as agents that can form groups and cooperate for which they obtain a reward. Splitting the reward among the features appropriately yields the desired scores. The Shapley value is the most popular reward sharing solution. However, its exponential complexity renders it inapplicable for high-dimensional data unless an efficient approximation is available. We empirically compare selected approximation algorithms for quantifying feature importance on unlabeled data.

Downloads

Published

2024-10-11