Partial Model Merging

URL
Dokumentart: Master Thesis
Institut: Fachbereich Informatik
Sprache: Deutsch
Erstellungsjahr: 2023
Publikationsdatum:
Freie Schlagwörter (Englisch): Machine Learning , Neural Networks , Model Merging
DDC-Sachgruppe: Informatik
BK - Klassifikation: 54.72

Kurzfassung auf Englisch:

It is commonly known that ensembling two models trained on the same task often outperforms each model individually, while doubling the inference cost. A novel stream of research on model merging explores whether it is possible to combine multiple models into one by interpolating their weights, and could serve as a more efficient alternative to ensembles. It is already viable to interpolate models fine-tuned from a shared base model with the same performance increase as when they are ensembled. However, the interpolation of models that were trained independently or on different datasets still poses a significant challenge, and results in merged models with a higher loss and lower accuracy that is only avoidable when the models are prohibitively wide. In this thesis we postulate that model merging and ensembling represent two extremes of a spectrum by either enforcing a complete overlap or keeping all parameters separate. We evaluate methods that are situated in between, overlapping and interpolating the parameters of both models just in parts, while keeping the remaining parameters unchanged. We demonstrate the ability of such partial model merging to gracefully eliminate any existing barriers, with loss and accuracy approaching those of ensembling as the width increase approaches 100%. We observe this gradual improvement across all architectures and model dimensions, even if the endpoints were trained on disjoint or biased datasets. We also show that specific layers of the models have a significantly higher contribution towards the occuring barriers, and that reducing their overlap first allows us to reduce the barriers quicker. Contrary to previous beliefs these layers cannot be identified just by looking at the average correlations between units. Using a simple baseline method inspired by our findings, we are able to achieve zero-barrier connectivity with respect to both test loss and test accuracy between two regular-width VGG11s independently trained on CIFAR10 and SVHN with a parameter increase of just 22% and 24% respectively, and without any additional training after merging. Similar interpolation-based connectivity on independently trained VGGs has never been achieved experimentally using full merging, even for high width multipliers.

Hinweis zum Urherberrecht

Für Dokumente, die in elektronischer Form über Datenenetze angeboten werden, gilt uneingeschränkt das Urheberrechtsgesetz (UrhG). Insbesondere gilt:

Einzelne Vervielfältigungen, z.B. Kopien und Ausdrucke, dürfen nur zum privaten und sonstigen eigenen Gebrauch angefertigt werden (Paragraph 53 Urheberrecht). Die Herstellung und Verbreitung von weiteren Reproduktionen ist nur mit ausdrücklicher Genehmigung des Urhebers gestattet.

Der Benutzer ist für die Einhaltung der Rechtsvorschriften selbst verantwortlich und kann bei Mißbrauch haftbar gemacht werden.