Regression analysis on tests results for diabetes diagnosis using R
Thesis title: | Regression analysis on tests results for diabetes diagnosis using R |
---|---|
Author: | Dadamirzaev, Gayrat |
Thesis type: | Diploma thesis |
Supervisor: | Helman, Karel |
Opponents: | Bašta, Milan |
Thesis language: | English |
Abstract: | In multivariate analysis, such as multiple linear regression, unusual points in dataset may influence on fitting of regression model, i.e. may affect on overall estimation results of model and statistical significance of coefficients. Especially, when dataset is full of outliers, it is a question on how to deal with such outlier points in order to avoid violations of regression assumptions and to keep model statistically significant. This master thesis is aimed to explore and answer to this question by using classical linear predictive approach. Theoretical part of this thesis starts with introduction, where it is described the main idea and tasks of research as well as steps of implementation of assigned tasks. Thesis theory will introduce the basic linear regression terms, such as determination coefficient, test of significance of coefficients, regression assumptions and issues like multicollinearity, outliers, etc. Methods of outlier detection and predictive analysis are some of the most main topics in this thesis. Practical part is realization of thesis targets based on terms and methods, which are mentioned in theory of the thesis. Using official R Studio software, four different scenarios or situations with data will be discovered individually and then will be compared between each other. As a conclusion, based on the outcomes of four identical regression models, the quality of these models will be determined. |
Keywords: | residual diagnostics; prediction; outlier detection; prediction accuracy; Classical linear regression; outliers; influential outliers |
Thesis title: | Regression analysis on tests results for diabetes diagnosis using R |
---|---|
Author: | Dadamirzaev, Gayrat |
Thesis type: | Diplomová práce |
Supervisor: | Helman, Karel |
Opponents: | Bašta, Milan |
Thesis language: | English |
Abstract: | In multivariate analysis, such as multiple linear regression, unusual points in dataset may influence on fitting of regression model, i.e. may affect on overall estimation results of model and statistical significance of coefficients. Especially, when dataset is full of outliers, it is a question on how to deal with such outlier points in order to avoid violations of regression assumptions and to keep model statistically significant. This master thesis is aimed to explore and answer to this question by using classical linear predictive approach. Theoretical part of this thesis starts with introduction, where it is described the main idea and tasks of research as well as steps of implementation of assigned tasks. Thesis theory will introduce the basic linear regression terms, such as determination coefficient, test of significance of coefficients, regression assumptions and issues like multicollinearity, outliers, etc. Methods of outlier detection and predictive analysis are some of the most main topics in this thesis. Practical part is realization of thesis targets based on terms and methods, which are mentioned in theory of the thesis. Using official R Studio software, four different scenarios or situations with data will be discovered individually and then will be compared between each other. As a conclusion, based on the outcomes of four identical regression models, the quality of these models will be determined. |
Keywords: | Classical linear regression; outliers; influential outliers; prediction accuracy; outlier detection; prediction; residual diagnostics |
Information about study
Study programme: | Kvantitativní metody v ekonomice/Quantitative Economic Analysis |
---|---|
Type of study programme: | Magisterský studijní program |
Assigned degree: | Ing. |
Institutions assigning academic degree: | Vysoká škola ekonomická v Praze |
Faculty: | Faculty of Informatics and Statistics |
Department: | Department of Statistics and Probability |
Information on submission and defense
Date of assignment: | 8. 10. 2019 |
---|---|
Date of submission: | 25. 6. 2020 |
Date of defense: | 26. 8. 2020 |
Identifier in the InSIS system: | https://insis.vse.cz/zp/71159/podrobnosti |