t-SNE: a Machine-Learning Method for Data Dimensionality Reduction and Visualization
Thesis title: | t-SNE: a Machine-Learning Method for Data Dimensionality Reduction and Visualization |
---|---|
Author: | Iqbal, Sobia |
Thesis type: | Diploma thesis |
Supervisor: | Plašil, Miroslav |
Opponents: | Fojtík, Jan |
Thesis language: | English |
Abstract: | In the thesis, t-SNE is introduced, discussed , and applied on various synthetic and non synthetic datasets to outline the inner working of t-SNE. The objective of the thesis is to investigate all t-SNE parameters in general, perplexity in particular. This study is intended to find a way of choosing optimal perplexity value. We use two different approaches to find optimal perplexity value. First, we plot t-SNE with set of perplexity values and choose the best plot. Than we plot Kullback-Leibler divergence against perplexity for each batch of data and choose the perplexity suggested by Kullback-Leibler divergence as the author of the t-SNE Laurens van der Maaten said in his original paper "The kullback-Liebler divergence between the joint probability distribution of high-dimensional similarities and low-dimension similarities is minimised by using gradient descent ". Therefor choosing the best perplexity by minimizing the Kullback-Leibler divergence appeared fair. In last we compared the performance from both methods to check which method provides the best results. The research questions investigated throughout the thesis are as follows: Difference between Implementation of Laurens Van der Maaten t-SNE code and t-SNE Package. Second, Comparison of manually and KL chosen perplexity and to conclude which one is more reliable. |
Keywords: | SNE; t-SNE; Kullback–Leibler divergence; Perplexity |
Thesis title: | t-SNE: a Machine-Learning Method for Data Dimensionality Reduction and Visualization |
---|---|
Author: | Iqbal, Sobia |
Thesis type: | Diplomová práce |
Supervisor: | Plašil, Miroslav |
Opponents: | Fojtík, Jan |
Thesis language: | English |
Abstract: | In the thesis, t-SNE is introduced, discussed , and applied on various synthetic and non synthetic datasets to outline the inner working of t-SNE. The objective of the thesis is to investigate all t-SNE parameters in general, perplexity in particular. This study is intended to find a way of choosing optimal perplexity value. We use two different approaches to find optimal perplexity value. First, we plot t-SNE with set of perplexity values and choose the best plot. Than we plot Kullback-Leibler divergence against perplexity for each batch of data and choose the perplexity suggested by Kullback-Leibler divergence as the author of the t-SNE Laurens van der Maaten said in his original paper "The kullback-Liebler divergence between the joint probability distribution of high-dimensional similarities and low-dimension similarities is minimised by using gradient descent ". Therefor choosing the best perplexity by minimizing the Kullback-Leibler divergence appeared fair. In last we compared the performance from both methods to check which method provides the best results. The research questions investigated throughout the thesis are as follows: Difference between Implementation of Laurens Van der Maaten t-SNE code and t-SNE Package. Second, Comparison of manually and KL chosen perplexity and to conclude which one is more reliable. |
Keywords: | SNE; t-SNE; Kullback–Leibler divergence; perplexity |
Information about study
Study programme: | Economic Data Analysis/Data Analysis and Modeling |
---|---|
Type of study programme: | Magisterský studijní program |
Assigned degree: | Ing. |
Institutions assigning academic degree: | Vysoká škola ekonomická v Praze |
Faculty: | Faculty of Informatics and Statistics |
Department: | Department of Statistics and Probability |
Information on submission and defense
Date of assignment: | 4. 11. 2021 |
---|---|
Date of submission: | 29. 6. 2023 |
Date of defense: | 23. 8. 2023 |
Identifier in the InSIS system: | https://insis.vse.cz/zp/78655/podrobnosti |