t-SNE: a Machine-Learning Method for Data Dimensionality Reduction and Visualization

Thesis title: t-SNE: a Machine-Learning Method for Data Dimensionality Reduction and Visualization
Author: Iqbal, Sobia
Thesis type: Diploma thesis
Supervisor: Plašil, Miroslav
Opponents: Fojtík, Jan
Thesis language: English
Abstract:
In the thesis, t-SNE is introduced, discussed , and applied on various synthetic and non synthetic datasets to outline the inner working of t-SNE. The objective of the thesis is to investigate all t-SNE parameters in general, perplexity in particular. This study is intended to find a way of choosing optimal perplexity value. We use two different approaches to find optimal perplexity value. First, we plot t-SNE with set of perplexity values and choose the best plot. Than we plot Kullback-Leibler divergence against perplexity for each batch of data and choose the perplexity suggested by Kullback-Leibler divergence as the author of the t-SNE Laurens van der Maaten said in his original paper "The kullback-Liebler divergence between the joint probability distribution of high-dimensional similarities and low-dimension similarities is minimised by using gradient descent ". Therefor choosing the best perplexity by minimizing the Kullback-Leibler divergence appeared fair. In last we compared the performance from both methods to check which method provides the best results. The research questions investigated throughout the thesis are as follows: Difference between Implementation of Laurens Van der Maaten t-SNE code and t-SNE Package. Second, Comparison of manually and KL chosen perplexity and to conclude which one is more reliable.
Keywords: SNE; t-SNE; Kullback–Leibler divergence; Perplexity
Thesis title: t-SNE: a Machine-Learning Method for Data Dimensionality Reduction and Visualization
Author: Iqbal, Sobia
Thesis type: Diplomová práce
Supervisor: Plašil, Miroslav
Opponents: Fojtík, Jan
Thesis language: English
Abstract:
In the thesis, t-SNE is introduced, discussed , and applied on various synthetic and non synthetic datasets to outline the inner working of t-SNE. The objective of the thesis is to investigate all t-SNE parameters in general, perplexity in particular. This study is intended to find a way of choosing optimal perplexity value. We use two different approaches to find optimal perplexity value. First, we plot t-SNE with set of perplexity values and choose the best plot. Than we plot Kullback-Leibler divergence against perplexity for each batch of data and choose the perplexity suggested by Kullback-Leibler divergence as the author of the t-SNE Laurens van der Maaten said in his original paper "The kullback-Liebler divergence between the joint probability distribution of high-dimensional similarities and low-dimension similarities is minimised by using gradient descent ". Therefor choosing the best perplexity by minimizing the Kullback-Leibler divergence appeared fair. In last we compared the performance from both methods to check which method provides the best results. The research questions investigated throughout the thesis are as follows: Difference between Implementation of Laurens Van der Maaten t-SNE code and t-SNE Package. Second, Comparison of manually and KL chosen perplexity and to conclude which one is more reliable.
Keywords: SNE; t-SNE; Kullback–Leibler divergence; perplexity

Information about study

Study programme: Economic Data Analysis/Data Analysis and Modeling
Type of study programme: Magisterský studijní program
Assigned degree: Ing.
Institutions assigning academic degree: Vysoká škola ekonomická v Praze
Faculty: Faculty of Informatics and Statistics
Department: Department of Statistics and Probability

Information on submission and defense

Date of assignment: 4. 11. 2021
Date of submission: 29. 6. 2023
Date of defense: 23. 8. 2023
Identifier in the InSIS system: https://insis.vse.cz/zp/78655/podrobnosti

Files for download

    Last update: