Regression analysis on tests results for diabetes diagnosis using R

Název práce: Regression analysis on tests results for diabetes diagnosis using R
Autor(ka) práce: Dadamirzaev, Gayrat
Typ práce: Diploma thesis
Vedoucí práce: Helman, Karel
Oponenti práce: Bašta, Milan
Jazyk práce: English
Abstrakt:
In multivariate analysis, such as multiple linear regression, unusual points in dataset may influence on fitting of regression model, i.e. may affect on overall estimation results of model and statistical significance of coefficients. Especially, when dataset is full of outliers, it is a question on how to deal with such outlier points in order to avoid violations of regression assumptions and to keep model statistically significant. This master thesis is aimed to explore and answer to this question by using classical linear predictive approach. Theoretical part of this thesis starts with introduction, where it is described the main idea and tasks of research as well as steps of implementation of assigned tasks. Thesis theory will introduce the basic linear regression terms, such as determination coefficient, test of significance of coefficients, regression assumptions and issues like multicollinearity, outliers, etc. Methods of outlier detection and predictive analysis are some of the most main topics in this thesis. Practical part is realization of thesis targets based on terms and methods, which are mentioned in theory of the thesis. Using official R Studio software, four different scenarios or situations with data will be discovered individually and then will be compared between each other. As a conclusion, based on the outcomes of four identical regression models, the quality of these models will be determined.
Klíčová slova: residual diagnostics; prediction; outlier detection; prediction accuracy; Classical linear regression; outliers; influential outliers
Název práce: Regression analysis on tests results for diabetes diagnosis using R
Autor(ka) práce: Dadamirzaev, Gayrat
Typ práce: Diplomová práce
Vedoucí práce: Helman, Karel
Oponenti práce: Bašta, Milan
Jazyk práce: English
Abstrakt:
In multivariate analysis, such as multiple linear regression, unusual points in dataset may influence on fitting of regression model, i.e. may affect on overall estimation results of model and statistical significance of coefficients. Especially, when dataset is full of outliers, it is a question on how to deal with such outlier points in order to avoid violations of regression assumptions and to keep model statistically significant. This master thesis is aimed to explore and answer to this question by using classical linear predictive approach. Theoretical part of this thesis starts with introduction, where it is described the main idea and tasks of research as well as steps of implementation of assigned tasks. Thesis theory will introduce the basic linear regression terms, such as determination coefficient, test of significance of coefficients, regression assumptions and issues like multicollinearity, outliers, etc. Methods of outlier detection and predictive analysis are some of the most main topics in this thesis. Practical part is realization of thesis targets based on terms and methods, which are mentioned in theory of the thesis. Using official R Studio software, four different scenarios or situations with data will be discovered individually and then will be compared between each other. As a conclusion, based on the outcomes of four identical regression models, the quality of these models will be determined.
Klíčová slova: Classical linear regression; outliers; influential outliers; prediction accuracy; outlier detection; prediction; residual diagnostics

Informace o studiu

Studijní program / obor: Kvantitativní metody v ekonomice/Quantitative Economic Analysis
Typ studijního programu: Magisterský studijní program
Přidělovaná hodnost: Ing.
Instituce přidělující hodnost: Vysoká škola ekonomická v Praze
Fakulta: Fakulta informatiky a statistiky
Katedra: Katedra statistiky a pravděpodobnosti

Informace o odevzdání a obhajobě

Datum zadání práce: 8. 10. 2019
Datum podání práce: 25. 6. 2020
Datum obhajoby: 26. 8. 2020
Identifikátor v systému InSIS: https://insis.vse.cz/zp/71159/podrobnosti

Soubory ke stažení

    Poslední aktualizace: