Regression analysis on tests results for diabetes diagnosis using R

Thesis title: Regression analysis on tests results for diabetes diagnosis using R
Author: Dadamirzaev, Gayrat
Thesis type: Diploma thesis
Supervisor: Helman, Karel
Opponents: Bašta, Milan
Thesis language: English
Abstract:
In multivariate analysis, such as multiple linear regression, unusual points in dataset may influence on fitting of regression model, i.e. may affect on overall estimation results of model and statistical significance of coefficients. Especially, when dataset is full of outliers, it is a question on how to deal with such outlier points in order to avoid violations of regression assumptions and to keep model statistically significant. This master thesis is aimed to explore and answer to this question by using classical linear predictive approach. Theoretical part of this thesis starts with introduction, where it is described the main idea and tasks of research as well as steps of implementation of assigned tasks. Thesis theory will introduce the basic linear regression terms, such as determination coefficient, test of significance of coefficients, regression assumptions and issues like multicollinearity, outliers, etc. Methods of outlier detection and predictive analysis are some of the most main topics in this thesis. Practical part is realization of thesis targets based on terms and methods, which are mentioned in theory of the thesis. Using official R Studio software, four different scenarios or situations with data will be discovered individually and then will be compared between each other. As a conclusion, based on the outcomes of four identical regression models, the quality of these models will be determined.
Keywords: residual diagnostics; prediction; outlier detection; prediction accuracy; Classical linear regression; outliers; influential outliers
Thesis title: Regression analysis on tests results for diabetes diagnosis using R
Author: Dadamirzaev, Gayrat
Thesis type: Diplomová práce
Supervisor: Helman, Karel
Opponents: Bašta, Milan
Thesis language: English
Abstract:
In multivariate analysis, such as multiple linear regression, unusual points in dataset may influence on fitting of regression model, i.e. may affect on overall estimation results of model and statistical significance of coefficients. Especially, when dataset is full of outliers, it is a question on how to deal with such outlier points in order to avoid violations of regression assumptions and to keep model statistically significant. This master thesis is aimed to explore and answer to this question by using classical linear predictive approach. Theoretical part of this thesis starts with introduction, where it is described the main idea and tasks of research as well as steps of implementation of assigned tasks. Thesis theory will introduce the basic linear regression terms, such as determination coefficient, test of significance of coefficients, regression assumptions and issues like multicollinearity, outliers, etc. Methods of outlier detection and predictive analysis are some of the most main topics in this thesis. Practical part is realization of thesis targets based on terms and methods, which are mentioned in theory of the thesis. Using official R Studio software, four different scenarios or situations with data will be discovered individually and then will be compared between each other. As a conclusion, based on the outcomes of four identical regression models, the quality of these models will be determined.
Keywords: Classical linear regression; outliers; influential outliers; prediction accuracy; outlier detection; prediction; residual diagnostics

Information about study

Study programme: Kvantitativní metody v ekonomice/Quantitative Economic Analysis
Type of study programme: Magisterský studijní program
Assigned degree: Ing.
Institutions assigning academic degree: Vysoká škola ekonomická v Praze
Faculty: Faculty of Informatics and Statistics
Department: Department of Statistics and Probability

Information on submission and defense

Date of assignment: 8. 10. 2019
Date of submission: 25. 6. 2020
Date of defense: 26. 8. 2020
Identifier in the InSIS system: https://insis.vse.cz/zp/71159/podrobnosti

Files for download

    Last update: