Improving Large Language Models in Repository Level Programming through Self-Alignment and Retrieval-Augmented Generation : Strich, Jan : INFDok

Improving Large Language Models in Repository Level Programming through Self-Alignment and Retrieval-Augmented Generation

URL	http://edoc.sub.uni-hamburg.de/informatik/volltexte/2024/275/
Dokumentart:	Master Thesis
Institut:	Fachbereich Informatik
Sprache:	Englisch
Erstellungsjahr:	2024
Publikationsdatum:	28.10.2024
Freie Schlagwörter (Englisch):	Repository Code Q&A , Large Language Models (LLM) , Retrieval Augmented Generation , Self-Alignment , LLM-as-a-Judge
DDC-Sachgruppe:	Informatik
BK - Klassifikation:	54.00

Kurzfassung auf Englisch:

Repository-level programming involves writing code specific to a particular domain or project. Large language models (LLMs) such as ChatGPT, GitHub Copilot, Llama, or Mistral can assist programmers as coding assistants and knowledge sources to make the coding process faster and more efficient. This thesis aims to improve coding assistants performance by implementing a Self-Alignment process and a retrieval-augmented generation (RAG) pipeline for a specific repository. Self-Alignment is the process of creating a training dataset by an LLM, curating the samples to improve the dataset quality and supervised fine-tuning with the curated dataset. In comparison, RAG pipelines use a vector database to fetch relevant documents from the repository using similarity search and provide them as context into the model. This thesis introduces SpyderCodeQA, a dataset that tests the ability of models to understand the source code, the dependencies between files, and the overall meta-information about the repository. To evaluate the fine-tuned LLM and RAG pipeline on SpyderCodeQA, the LLM-as-a-Judge evaluation is used, which compares the models pairwise with GPT-3.5 as judge. The results show that models that the fine-tuned LLM and RAG pipelines outperform the LLM without adjustment on the SpyderCodeQA. In addition, the results show that combining both approaches leads to an interaction effect that further improves SpyderCodeQA’s performance. Further ablation studies are conducted investigating hyperparameters such as Top-P, Temperature and the choice of judge. A qualitative analysis of the evaluation results is carried out in order to better understand the effects.

Hinweis zum Urherberrecht

Für Dokumente, die in elektronischer Form über Datenenetze angeboten werden, gilt uneingeschränkt das Urheberrechtsgesetz (UrhG). Insbesondere gilt:

Einzelne Vervielfältigungen, z.B. Kopien und Ausdrucke, dürfen nur zum privaten und sonstigen eigenen Gebrauch angefertigt werden (Paragraph 53 Urheberrecht). Die Herstellung und Verbreitung von weiteren Reproduktionen ist nur mit ausdrücklicher Genehmigung des Urhebers gestattet.

Der Benutzer ist für die Einhaltung der Rechtsvorschriften selbst verantwortlich und kann bei Mißbrauch haftbar gemacht werden.

INFDok - Dokumentenvolltextserver des Fachbereichs Informatik