Regression analysis on tests results for diabetes diagnosis using R
Název práce: | Regression analysis on tests results for diabetes diagnosis using R |
---|---|
Autor(ka) práce: | Dadamirzaev, Gayrat |
Typ práce: | Diploma thesis |
Vedoucí práce: | Helman, Karel |
Oponenti práce: | Bašta, Milan |
Jazyk práce: | English |
Abstrakt: | In multivariate analysis, such as multiple linear regression, unusual points in dataset may influence on fitting of regression model, i.e. may affect on overall estimation results of model and statistical significance of coefficients. Especially, when dataset is full of outliers, it is a question on how to deal with such outlier points in order to avoid violations of regression assumptions and to keep model statistically significant. This master thesis is aimed to explore and answer to this question by using classical linear predictive approach. Theoretical part of this thesis starts with introduction, where it is described the main idea and tasks of research as well as steps of implementation of assigned tasks. Thesis theory will introduce the basic linear regression terms, such as determination coefficient, test of significance of coefficients, regression assumptions and issues like multicollinearity, outliers, etc. Methods of outlier detection and predictive analysis are some of the most main topics in this thesis. Practical part is realization of thesis targets based on terms and methods, which are mentioned in theory of the thesis. Using official R Studio software, four different scenarios or situations with data will be discovered individually and then will be compared between each other. As a conclusion, based on the outcomes of four identical regression models, the quality of these models will be determined. |
Klíčová slova: | residual diagnostics; prediction; outlier detection; prediction accuracy; Classical linear regression; outliers; influential outliers |
Název práce: | Regression analysis on tests results for diabetes diagnosis using R |
---|---|
Autor(ka) práce: | Dadamirzaev, Gayrat |
Typ práce: | Diplomová práce |
Vedoucí práce: | Helman, Karel |
Oponenti práce: | Bašta, Milan |
Jazyk práce: | English |
Abstrakt: | In multivariate analysis, such as multiple linear regression, unusual points in dataset may influence on fitting of regression model, i.e. may affect on overall estimation results of model and statistical significance of coefficients. Especially, when dataset is full of outliers, it is a question on how to deal with such outlier points in order to avoid violations of regression assumptions and to keep model statistically significant. This master thesis is aimed to explore and answer to this question by using classical linear predictive approach. Theoretical part of this thesis starts with introduction, where it is described the main idea and tasks of research as well as steps of implementation of assigned tasks. Thesis theory will introduce the basic linear regression terms, such as determination coefficient, test of significance of coefficients, regression assumptions and issues like multicollinearity, outliers, etc. Methods of outlier detection and predictive analysis are some of the most main topics in this thesis. Practical part is realization of thesis targets based on terms and methods, which are mentioned in theory of the thesis. Using official R Studio software, four different scenarios or situations with data will be discovered individually and then will be compared between each other. As a conclusion, based on the outcomes of four identical regression models, the quality of these models will be determined. |
Klíčová slova: | Classical linear regression; outliers; influential outliers; prediction accuracy; outlier detection; prediction; residual diagnostics |
Informace o studiu
Studijní program / obor: | Kvantitativní metody v ekonomice/Quantitative Economic Analysis |
---|---|
Typ studijního programu: | Magisterský studijní program |
Přidělovaná hodnost: | Ing. |
Instituce přidělující hodnost: | Vysoká škola ekonomická v Praze |
Fakulta: | Fakulta informatiky a statistiky |
Katedra: | Katedra statistiky a pravděpodobnosti |
Informace o odevzdání a obhajobě
Datum zadání práce: | 8. 10. 2019 |
---|---|
Datum podání práce: | 25. 6. 2020 |
Datum obhajoby: | 26. 8. 2020 |
Identifikátor v systému InSIS: | https://insis.vse.cz/zp/71159/podrobnosti |