Empirické porovnání velkých jazykových modelů (LLM)

English
Česky

Thesis title:	Empirické porovnání velkých jazykových modelů (LLM)
Author:	Ježek, Petr
Thesis type:	Diploma thesis
Supervisor:	Berka, Petr
Opponents:	Chudán, David
Thesis language:	English
Abstract:	The aim of this paper is to find criteria suitable for evaluating and comparing large language models (LLM) and leveraging those criteria for model evaluation. The work includes a thorough description of large language models and the principles which they are based on, including an introduction to recent important breakthroughs responsible for their growing popularity. Furthermore, this work focuses on extensive analysis of the state of research and literature in the area of evaluation and comparison of large language models in the domain of software engineering. The aim of the thesis is to develop a methodology for the evaluation and comparison of large language models and to apply this methodology on the most popular large language models. The methodology takes the form of a benchmark and comes with a custom CLI application which automates the comparison of large language models in their ability to develop applications written in the Golang programming language using the Test Driven Development methodology. The models are graded based on their ability to fulfill functional requirements and iteratively develop applications which pass a set of automated tests. The generated applications are then reviewed in the qualitative analysis section, where code quality and adherence to best software engineering practices is checked. Also included is an analysis of the strengths and weaknesses of all selected models and recommendations are provided on what each model is suitable for and where caution is advised.
Keywords:	large language models; LLMs; empirical comparison; benchmark; software engineering; test driven development; golang

Thesis title:	Empirické porovnání velkých jazykových modelů (LLM)
Author:	Ježek, Petr
Thesis type:	Diplomová práce
Supervisor:	Berka, Petr
Opponents:	Chudán, David
Thesis language:	English
Abstract:	Velké jazykové modely (LLM) jsou v současnosti nejznámějším příkladem generativní umělé inteligence. Systémy jako ChatGPT, Claude nebo Gemini jsou schopny odpovídat na otázky a generovat rozsáhlé texty. Cílem této práce je na základě vhodně zvolených dotazů odhalit silné a slabé stránky jednotlivých nástrojů, navrhnout vhodná hodnotící kritéria a s využitím těchto kritérií tyto nástroje porovnat. Součástí práce je popis jazykových modelů a principů na kterých jsou založeny včetně seznámení s důležitými průlomy, které vysvětlují jejich rostoucí popularitu. Dále je provedena analýza stavu výzkumu a literatury v oblasti evaluace a porovnávání velkých jazykových modelů se zaměřením na využití v softwarovém inženýrství. Výsledkem práce je metodologie pro porovnání včetně její aplikace na množinu nejpoužívanějších modelů. Metodologie v podobě benchmarku je podpořena vlastní konzolovou aplikací, která umožňuje automatické srovnání modelů ve schopnosti vyvíjet aplikace dle metodiky Test Driven Development v jazyce Golang. Modely jsou obodovány na základě schopnosti zpracovat funkční požadavky a iterativně vyvinout aplikace splňující sadu automatických testů. Vygenerované aplikace jsou navíc podrobeny kvalitativní analýze na základě code review, kde je hodnocena kvalita kódu a dodržení správných postupů softwarového vývoje. Na závěr jsou vyjmenovány silné a slabé stránky jednotlivých modelů a doporučení, k čemu jsou jednotlivé modely vhodné a kde je naopak třeba obezřetnosti.
Keywords:	LLMs; empirické porovnání; benchmark; softwarové inženýrství; test driven development; golang; velké jazykové modely

Information about study

Study programme:	Znalostní a webové technologie
Type of study programme:	Magisterský studijní program
Assigned degree:	Ing.
Institutions assigning academic degree:	Vysoká škola ekonomická v Praze
Faculty:	Faculty of Informatics and Statistics
Department:	Department of Information and Knowledge Engineering

Information on submission and defense

Date of assignment:	20. 3. 2024
Date of submission:	1. 12. 2024
Date of defense:	20. 1. 2025
Identifier in the InSIS system:	https://insis.vse.cz/zp/88091/podrobnosti

Files for download

Main text
88091_jezp01.pdf, 3.4 MB Download

Public annex
30102_jezp01.zip, 40.2 kB Download

Opponent's review
84706_xchud01.pdf, 119.6 kB Download

Supervisor's review
88091_berka.pdf, 102.8 kB Download