t-SNE: a Machine-Learning Method for Data Dimensionality Reduction and Visualization

English
Česky

Název práce:	t-SNE: a Machine-Learning Method for Data Dimensionality Reduction and Visualization
Autor(ka) práce:	Iqbal, Sobia
Typ práce:	Diploma thesis
Vedoucí práce:	Plašil, Miroslav
Oponenti práce:	Fojtík, Jan
Jazyk práce:	English
Abstrakt:	In the thesis, t-SNE is introduced, discussed , and applied on various synthetic and non synthetic datasets to outline the inner working of t-SNE. The objective of the thesis is to investigate all t-SNE parameters in general, perplexity in particular. This study is intended to find a way of choosing optimal perplexity value. We use two different approaches to find optimal perplexity value. First, we plot t-SNE with set of perplexity values and choose the best plot. Than we plot Kullback-Leibler divergence against perplexity for each batch of data and choose the perplexity suggested by Kullback-Leibler divergence as the author of the t-SNE Laurens van der Maaten said in his original paper "The kullback-Liebler divergence between the joint probability distribution of high-dimensional similarities and low-dimension similarities is minimised by using gradient descent ". Therefor choosing the best perplexity by minimizing the Kullback-Leibler divergence appeared fair. In last we compared the performance from both methods to check which method provides the best results. The research questions investigated throughout the thesis are as follows: Difference between Implementation of Laurens Van der Maaten t-SNE code and t-SNE Package. Second, Comparison of manually and KL chosen perplexity and to conclude which one is more reliable.
Klíčová slova:	SNE; t-SNE; Kullback–Leibler divergence; Perplexity

Název práce:	t-SNE: a Machine-Learning Method for Data Dimensionality Reduction and Visualization
Autor(ka) práce:	Iqbal, Sobia
Typ práce:	Diplomová práce
Vedoucí práce:	Plašil, Miroslav
Oponenti práce:	Fojtík, Jan
Jazyk práce:	English
Abstrakt:	In the thesis, t-SNE is introduced, discussed , and applied on various synthetic and non synthetic datasets to outline the inner working of t-SNE. The objective of the thesis is to investigate all t-SNE parameters in general, perplexity in particular. This study is intended to find a way of choosing optimal perplexity value. We use two different approaches to find optimal perplexity value. First, we plot t-SNE with set of perplexity values and choose the best plot. Than we plot Kullback-Leibler divergence against perplexity for each batch of data and choose the perplexity suggested by Kullback-Leibler divergence as the author of the t-SNE Laurens van der Maaten said in his original paper "The kullback-Liebler divergence between the joint probability distribution of high-dimensional similarities and low-dimension similarities is minimised by using gradient descent ". Therefor choosing the best perplexity by minimizing the Kullback-Leibler divergence appeared fair. In last we compared the performance from both methods to check which method provides the best results. The research questions investigated throughout the thesis are as follows: Difference between Implementation of Laurens Van der Maaten t-SNE code and t-SNE Package. Second, Comparison of manually and KL chosen perplexity and to conclude which one is more reliable.
Klíčová slova:	SNE; t-SNE; Kullback–Leibler divergence; perplexity

Informace o studiu

Studijní program / obor:	Economic Data Analysis/Data Analysis and Modeling
Typ studijního programu:	Magisterský studijní program
Přidělovaná hodnost:	Ing.
Instituce přidělující hodnost:	Vysoká škola ekonomická v Praze
Fakulta:	Fakulta informatiky a statistiky
Katedra:	Katedra statistiky a pravděpodobnosti

Informace o odevzdání a obhajobě

Datum zadání práce:	4. 11. 2021
Datum podání práce:	29. 6. 2023
Datum obhajoby:	23. 8. 2023
Identifikátor v systému InSIS:	https://insis.vse.cz/zp/78655/podrobnosti

Soubory ke stažení

Hlavní práce
78655_iqbs01.pdf, 5.6 MB Stáhnout

Oponentura
79933_xfojj00.pdf, 55.1 kB Stáhnout

Hodnocení vedoucího
78655_plasil.pdf, 56 kB Stáhnout