# Regression analysis on tests results for diabetes diagnosis using R

Název práce: Regression analysis on tests results for diabetes diagnosis using R Dadamirzaev, Gayrat Diploma thesis Helman, Karel Bašta, Milan English In multivariate analysis, such as multiple linear regression, unusual points in dataset may influence on fitting of regression model, i.e. may affect on overall estimation results of model and statistical significance of coefficients. Especially, when dataset is full of outliers, it is a question on how to deal with such outlier points in order to avoid violations of regression assumptions and to keep model statistically significant. This master thesis is aimed to explore and answer to this question by using classical linear predictive approach. Theoretical part of this thesis starts with introduction, where it is described the main idea and tasks of research as well as steps of implementation of assigned tasks. Thesis theory will introduce the basic linear regression terms, such as determination coefficient, test of significance of coefficients, regression assumptions and issues like multicollinearity, outliers, etc. Methods of outlier detection and predictive analysis are some of the most main topics in this thesis. Practical part is realization of thesis targets based on terms and methods, which are mentioned in theory of the thesis. Using official R Studio software, four different scenarios or situations with data will be discovered individually and then will be compared between each other. As a conclusion, based on the outcomes of four identical regression models, the quality of these models will be determined. residual diagnostics; prediction; outlier detection; prediction accuracy; Classical linear regression; outliers; influential outliers
Název práce: Regression analysis on tests results for diabetes diagnosis using R Dadamirzaev, Gayrat Diplomová práce Helman, Karel Bašta, Milan English In multivariate analysis, such as multiple linear regression, unusual points in dataset may influence on fitting of regression model, i.e. may affect on overall estimation results of model and statistical significance of coefficients. Especially, when dataset is full of outliers, it is a question on how to deal with such outlier points in order to avoid violations of regression assumptions and to keep model statistically significant. This master thesis is aimed to explore and answer to this question by using classical linear predictive approach. Theoretical part of this thesis starts with introduction, where it is described the main idea and tasks of research as well as steps of implementation of assigned tasks. Thesis theory will introduce the basic linear regression terms, such as determination coefficient, test of significance of coefficients, regression assumptions and issues like multicollinearity, outliers, etc. Methods of outlier detection and predictive analysis are some of the most main topics in this thesis. Practical part is realization of thesis targets based on terms and methods, which are mentioned in theory of the thesis. Using official R Studio software, four different scenarios or situations with data will be discovered individually and then will be compared between each other. As a conclusion, based on the outcomes of four identical regression models, the quality of these models will be determined. Classical linear regression; outliers; influential outliers; prediction accuracy; outlier detection; prediction; residual diagnostics

## Informace o studiu

Studijní program / obor: Kvantitativní metody v ekonomice/Quantitative Economic Analysis Magisterský studijní program Ing. Vysoká škola ekonomická v Praze Fakulta informatiky a statistiky Katedra statistiky a pravděpodobnosti

## Informace o odevzdání a obhajobě

Datum zadání práce: 8. 10. 2019 25. 6. 2020 26. 8. 2020 https://insis.vse.cz/zp/71159/podrobnosti

## Soubory ke stažení

Poslední aktualizace: