t-SNE: a Machine-Learning Method for Data Dimensionality Reduction and Visualization
Název práce: | t-SNE: a Machine-Learning Method for Data Dimensionality Reduction and Visualization |
---|---|
Autor(ka) práce: | Iqbal, Sobia |
Typ práce: | Diploma thesis |
Vedoucí práce: | Plašil, Miroslav |
Oponenti práce: | Fojtík, Jan |
Jazyk práce: | English |
Abstrakt: | In the thesis, t-SNE is introduced, discussed , and applied on various synthetic and non synthetic datasets to outline the inner working of t-SNE. The objective of the thesis is to investigate all t-SNE parameters in general, perplexity in particular. This study is intended to find a way of choosing optimal perplexity value. We use two different approaches to find optimal perplexity value. First, we plot t-SNE with set of perplexity values and choose the best plot. Than we plot Kullback-Leibler divergence against perplexity for each batch of data and choose the perplexity suggested by Kullback-Leibler divergence as the author of the t-SNE Laurens van der Maaten said in his original paper "The kullback-Liebler divergence between the joint probability distribution of high-dimensional similarities and low-dimension similarities is minimised by using gradient descent ". Therefor choosing the best perplexity by minimizing the Kullback-Leibler divergence appeared fair. In last we compared the performance from both methods to check which method provides the best results. The research questions investigated throughout the thesis are as follows: Difference between Implementation of Laurens Van der Maaten t-SNE code and t-SNE Package. Second, Comparison of manually and KL chosen perplexity and to conclude which one is more reliable. |
Klíčová slova: | SNE; t-SNE; Kullback–Leibler divergence; Perplexity |
Název práce: | t-SNE: a Machine-Learning Method for Data Dimensionality Reduction and Visualization |
---|---|
Autor(ka) práce: | Iqbal, Sobia |
Typ práce: | Diplomová práce |
Vedoucí práce: | Plašil, Miroslav |
Oponenti práce: | Fojtík, Jan |
Jazyk práce: | English |
Abstrakt: | In the thesis, t-SNE is introduced, discussed , and applied on various synthetic and non synthetic datasets to outline the inner working of t-SNE. The objective of the thesis is to investigate all t-SNE parameters in general, perplexity in particular. This study is intended to find a way of choosing optimal perplexity value. We use two different approaches to find optimal perplexity value. First, we plot t-SNE with set of perplexity values and choose the best plot. Than we plot Kullback-Leibler divergence against perplexity for each batch of data and choose the perplexity suggested by Kullback-Leibler divergence as the author of the t-SNE Laurens van der Maaten said in his original paper "The kullback-Liebler divergence between the joint probability distribution of high-dimensional similarities and low-dimension similarities is minimised by using gradient descent ". Therefor choosing the best perplexity by minimizing the Kullback-Leibler divergence appeared fair. In last we compared the performance from both methods to check which method provides the best results. The research questions investigated throughout the thesis are as follows: Difference between Implementation of Laurens Van der Maaten t-SNE code and t-SNE Package. Second, Comparison of manually and KL chosen perplexity and to conclude which one is more reliable. |
Klíčová slova: | SNE; t-SNE; Kullback–Leibler divergence; perplexity |
Informace o studiu
Studijní program / obor: | Economic Data Analysis/Data Analysis and Modeling |
---|---|
Typ studijního programu: | Magisterský studijní program |
Přidělovaná hodnost: | Ing. |
Instituce přidělující hodnost: | Vysoká škola ekonomická v Praze |
Fakulta: | Fakulta informatiky a statistiky |
Katedra: | Katedra statistiky a pravděpodobnosti |
Informace o odevzdání a obhajobě
Datum zadání práce: | 4. 11. 2021 |
---|---|
Datum podání práce: | 29. 6. 2023 |
Datum obhajoby: | 23. 8. 2023 |
Identifikátor v systému InSIS: | https://insis.vse.cz/zp/78655/podrobnosti |