This thesis explores the landscape of datasets and datasets repositories available for machine learning applications at the country level, focused on economic indicators and demographic information. The thesis focuses on examining and defining various criteria for evaluating datasets, including update frequency, data licensing, community engagement, and documentation completeness. The research encompasses a comparative analysis of datasets and dataset repositories, considering factors such as the number of countries covered, organization type, and the frequency of updates. Additionally, it emphasizes the importance of data quality assessment.
Klíčová slova:
Machine Learning; Data Quality; Data Analysis
Název práce:
Country-level datasets for machine learning
Autor(ka) práce:
Shih, Chien-Yu
Typ práce:
Diplomová práce
Vedoucí práce:
Kliegr, Tomáš
Oponenti práce:
Chudán, David
Jazyk práce:
English
Abstrakt:
This thesis explores the landscape of datasets and datasets repositories available for machine learning applications at the country level, focused on economic indicators and demographic information. The thesis focuses on examining and defining various criteria for evaluating datasets, including update frequency, data licensing, community engagement, and documentation completeness. The research encompasses a comparative analysis of datasets and dataset repositories, considering factors such as the number of countries covered, organization type, and the frequency of updates. Additionally, it emphasizes the importance of data quality assessment.