The bachelor thesis is processed as a replication study of an article about gender stereotypes about occupations in natural language. The main areas of research (i.e. corporas) are children's directed speech, children’s and adult speech, and books, and audiovisual media for children and adults. The aim of this study is to detect associations between professions and gender implicitly reflected in the textual sources. The theoretical part describes the concepts related to big data and mac... show full abstractThe bachelor thesis is processed as a replication study of an article about gender stereotypes about occupations in natural language. The main areas of research (i.e. corporas) are children's directed speech, children’s and adult speech, and books, and audiovisual media for children and adults. The aim of this study is to detect associations between professions and gender implicitly reflected in the textual sources. The theoretical part describes the concepts related to big data and machine learning, replication and replication crisis, gender, and stereotypes, and the analytical methods used. In the practical part the corporas are analyzed using the fastText Skipgram model. The resulting word embeddings are interpreted using methods of metaanalysis in statistical software R. The result of the work is an empirical evidence that gender stereotypes about occupations appear in natural language and correlate with real labor-office statistics. Exist consistent differences in perceptions of gender stereotypes about professions across corporas, and their strength, and representation do not change across age categories. As this is a replication of the article, and inconsistencies occurred during the research, an email was sent to the authors of the publication listing the problems with replication. |