Beyond Feature Attribution: Quantifying Neural Unit Contributions using Multidimensional Shapley Analysis

URL
Dokumentart: Master Thesis
Institut: Fachbereich Informatik
Sprache: Englisch
Erstellungsjahr: 2024
Publikationsdatum:
Freie Schlagwörter (Englisch): Large Language Models , Explainable AI , Computer Vision
DDC-Sachgruppe: Informatik
BK - Klassifikation: 54.72

Kurzfassung auf Englisch:

Artificial Intelligence models, such as ChatGPT, have gained immense popularity and are extensively utilized across various domains. Despite their widespread use, these models largely remain black boxes, with their internal workings obscure to users and developers alike. Understanding these models is crucial not only for improving their performance and reliability but also for ensuring they operate within ethical boundaries. Therefore, there is a pressing need for a unified, modelagnostic approach to explainable AI (XAI) that is effective across all data types. To address this need, we introduce a novel framework for Multi-dimensional Shapley Value Analysis, encapsulated in an open-source Python package. This framework advances beyond traditional feature attribution methods like SHAP, enabling the calculation of unit contributions towards multidimensional outputs. We demonstrate this framework through applications on three distinct types of neural networks: Multi-layer Perceptrons (MLP), Large Language Models (LLM), and Deep Convolutional Generative Adversarial Networks (DCGAN). Our investigation begins with the most fundamental neural unit, the neuron in an MLP. We explore the impact of different regularization techniques on neuron functionality and computation distribution. Contrary to popular belief, we find that in networks without regularization, the importance of a neuron shows no correlation with its weights. To demonstrate the scalability of the framework, we then apply it to a highly complex LLM with 56 billion parameters, the Mixtral-8x7B, demonstrating the scalability of our approach to state-of-the-art models. This analysis uncovers taskspecific neural units, revealing that removing certain units can hinder the LLM’s ability to produce a specific language without affecting its comprehension of that language. Finally, we apply our framework to analyze the contributions of neural units in a DCGAN. Our findings suggest that unlike in traditional classification networks, GANs process features in reverse order, with higher-level features generated first, followed by lower-level features in later layers. In conclusion, our framework offers a scalable, model-agnostic approach to explainable AI, demonstrated across multiple neural network architectures in this study. V

Hinweis zum Urherberrecht

Für Dokumente, die in elektronischer Form über Datenenetze angeboten werden, gilt uneingeschränkt das Urheberrechtsgesetz (UrhG). Insbesondere gilt:

Einzelne Vervielfältigungen, z.B. Kopien und Ausdrucke, dürfen nur zum privaten und sonstigen eigenen Gebrauch angefertigt werden (Paragraph 53 Urheberrecht). Die Herstellung und Verbreitung von weiteren Reproduktionen ist nur mit ausdrücklicher Genehmigung des Urhebers gestattet.

Der Benutzer ist für die Einhaltung der Rechtsvorschriften selbst verantwortlich und kann bei Mißbrauch haftbar gemacht werden.