Evaluation techniques for the effectiveness of LLM (Large Language Models)

Česky
English

Thesis title:	Evaluační techniky efektivnosti LLM (Large Language Models)
Author:	Bruch, Stanislav
Thesis type:	Diplomová práce
Supervisor:	Umlauf, Miroslav
Opponents:	Novák, Martin
Thesis language:	Česky
Abstract:	Diplomová práce představuje komplexní metodiku pro systematické hodnocení velkých jazykových modelů (LLM) v kontextu firemního nasazení ve společnosti Easy Software. Hlavním cílem práce je vytvoření systematického evaluačního frameworku, který umožní objektivní hodnocení kvality promptů před jejich nasazením do produkčního prostředí softwaru. Práce propojuje teoretické poznatky o evaluaci LLM s praktickou implementací evaluačního systému založeného na nástroji LangSmith a specializovaných LLM evaluátorech. V rámci práce je navržena a implementována evaluační metodika umožňující systematické porovnávání promptů a modelů na základě definovaných kritérií hodnocení. Zároveň je provedena jejich validace. Výsledky práce poskytují metodologický základ pro evaluace AI výstupů a modelů v komerční sféře a mohou sloužit jako referenční rámec pro implementaci podobných evaluačních procesů v dalších softwarových organizacích. Práce přispívá k lepšímu porozumění výkonnostním kritériím jazykových modelů a jejich praktické využitelnosti při integraci AI do firemních aplikací.
Keywords:	umělá inteligence; large language models; LangSmith; evaluace; prompty; datasety; OpenAI; Llama; LLM judge

Thesis title:	Evaluation techniques for the effectiveness of LLM (Large Language Models)
Author:	Bruch, Stanislav
Thesis type:	Diploma thesis
Supervisor:	Umlauf, Miroslav
Opponents:	Novák, Martin
Thesis language:	Česky
Abstract:	The thesis presents a comprehensive methodology for the systematic evaluation of large language models (LLMs) in the context of an enterprise deployment at Easy Software. The main goal of the thesis is to develop a systematic evaluation framework that allows for an objective assessment of the quality of the prompts before their deployment in a production software environment. It combines theoretical knowledge in the field of language model evaluation with the practical implementation of an evaluation system based on the LangSmith tool and specialized LLM evaluators. The thesis proposes and implements an evaluation methodology that enables systematic comparison of prompts and models based on defined evaluation criteria. At the same time, their validation is performed. The results of the work provide a methodological basis for the evaluations of AI outputs and models in the commercial sphere and can serve as a reference framework for the implementation of similar evaluation processes in other software organizations. The work contributes to a better understanding of the performance criteria of language models and their practical applicability in integrating AI into business applications.
Keywords:	large language models; artificial intelligence; evaluation; prompts; datasets; OpenAI; Llama; LLM judge; LangSmith

Information about study

Study programme:	Data a analytika pro business
Type of study programme:	Magisterský studijní program
Assigned degree:	Ing.
Institutions assigning academic degree:	Vysoká škola ekonomická v Praze
Faculty:	Faculty of Informatics and Statistics
Department:	Department of Information Technologies

Information on submission and defense

Date of assignment:	3. 12. 2024
Date of submission:	3. 5. 2025
Date of defense:	6. 6. 2025
Identifier in the InSIS system:	https://insis.vse.cz/zp/90630/podrobnosti

Files for download

Main text
90630_brus05.pdf, 2.5 MB Download

Opponent's review
86921_Novák.pdf, 101.2 kB Download

Supervisor's review
90630_umlm00.pdf, 104.1 kB Download