Fine-Tuning Small Large Language Models for Patient Trial Matching in Precision Medicine : Kraus, Kevin : INFDok

Fine-Tuning Small Large Language Models for Patient Trial Matching in Precision Medicine

URL	http://edoc.sub.uni-hamburg.de/informatik/volltexte/2025/294/
Dokumentart:	Master Thesis
Institut:	Fachbereich Informatik
Sprache:	Englisch
Erstellungsjahr:	2024
Publikationsdatum:	04.03.2025
Freie Schlagwörter (Deutsch):	KI , Sprachmodelle , Präzisionsmedizin , Onkologie
Freie Schlagwörter (Englisch):	AI , language technology , precision medicine , oncology , fine-tuning
DDC-Sachgruppe:	Informatik
BK - Klassifikation:	54.00

Kurzfassung auf Englisch:

This work investigates the task of automated patient-trial matching through fine-tuning Llama2 chat (13B) proposing TrialLlama. TrialLlama is trained on Clinical Trials from a snapshot of the clinicaltrials.gov website and synthetic patient descriptions provided by the TREC Clinical Trials track, using a supervised classification approach. Two primary tasks are explored with this fine-tuned model: 1) patient-trial classification, where the model categorises patient-trial pairs into one of three labels (eligible, excluded, irrelevant), and 2) reasoning, where it extracts and discusses the eligibility criteria from a clinical trial to determine a patient’s eligibility to get enrolled in the corresponding trial. In the patient-trial matching task treated as a binary classification, combining the two negative labels into one class, TrialLlama achieved an accuracy of 0.813 and an F1 score of 0.883. For the original three-label classification task, TrialLlama achieved accuracy and F1-scores of 0.634 and 0.530, respectively. Notably, TrialLlama excelled in the reasoning task, exceeding Llama2 by 0.640 in precision and 0.666 in accuracy. Despite being fine-tuned for classification, TrialLlama demonstrated proficiency in extracting eligibility criteria and assessing a patient’s eligibility concisely and logically. However, several limitations are identified, including fine-tuning difficulties due to dataset limitations, a bias towards extracting inclusion criteria, hallucination issues, and comparability to other systems. Nevertheless, TrialLlama and its open-source codebase hold promise for advancing research in automated patient-trial matching and AI-driven medical assistants.

Hinweis zum Urherberrecht

Für Dokumente, die in elektronischer Form über Datenenetze angeboten werden, gilt uneingeschränkt das Urheberrechtsgesetz (UrhG). Insbesondere gilt:

Einzelne Vervielfältigungen, z.B. Kopien und Ausdrucke, dürfen nur zum privaten und sonstigen eigenen Gebrauch angefertigt werden (Paragraph 53 Urheberrecht). Die Herstellung und Verbreitung von weiteren Reproduktionen ist nur mit ausdrücklicher Genehmigung des Urhebers gestattet.

Der Benutzer ist für die Einhaltung der Rechtsvorschriften selbst verantwortlich und kann bei Mißbrauch haftbar gemacht werden.

INFDok - Dokumentenvolltextserver des Fachbereichs Informatik