Data Science Projects
Bias in Computer Science Abstracts and the Role of Diversity
Abstract: This research analyses abstracts of papers published at top computing science conferences in 2017 and the bias they may exhibit. With computing science continuing to have a very low proportion of female researchers, earlier research has hypothesised that low participation among female researchers is partly due to a self-reinforcing cycle. Our research aims to investigate this hypothesis by correlating bias in published computing science abstracts with the diversity of a team’s co-authors in terms of nationality and gender representation. We use the Dbias python package for classification and identification of biased words and logistic regression to investigate factors influencing the bias. Our results show, that an overwhelmingly large part of the published abstracts of computing science in 2017 did exhibit some form of bias. The recognized biased words, contained numerous typical computing science terminologies, indicating a need for future work to investigate the degree to which such terminology is biased and which effects this may have on the sociology within the field. We further found that there was a statistically significant effect of the diversity of a team of authors in terms of nationalities and gender distribution reducing bias in an abstract. This continues to stress the importance of diverse teams to pave the path for a more inclusive future.
Final project for the course Text and Multimedia Mining at Radboud University 2023/2024
Poster Presentation for conference ICTOpen’24 in Utrecht
Causal Relations in the World Happiness Report
Abstract: In our developed world, we often measure the welfare of a country and the success of governmental policies through the gross domestic product, or GDP. If the GDP increased, we assume policies were successful and the quality of life within a country increased. Even though it is nowadays well understood that the GDP is not an ideal measurement of social welfare and has thus been criticised as such, it continues to be used as a dominating indicator of welfare in the Western world. In this report we use the dataset provided by the World Happiness Report. We will use this to understand what life satisfaction means, in terms of other variables and the causal relations among them. Finding causal relations between some of the variables and the overall life satisfaction we aim to investigate whether the GDP is an appropriate measure for life satisfaction within a country, according to this dataset. Our exploration suggests that the strongest causal effects seem to be from GDP towards health and from GDP towards social support. Furthermore our findings indicate a strong causal relationship of social support decreasing negative affect. Freedom of choice on the other hand, strongly causes positive affect to increase. Overall, according to our model, GDP is the strongest stand-alone measure to predict life satisfaction, however, it is also clearly visible that it is in no way a complete predictor. In the second report, we compare our manually created model to a model that is created through structure learning.
Final two projects for the course Bayesian Networks and Causal Inference at Radboud University 2023/2024
- Report 1: Causality in the World Happiness Report
- Report 2: Structure Learning and Causal Inference in the dataset of the World Happiness Report
- Code used for Analysis, written in R
Data Mining using the World Happiness Report
Abstract: In this paper, we examine the factors that most significantly correlate with happiness, using variables from the dataset provided by the World Happiness Report. Governments often put tremendous emphasis on the GDP, which, as this report will show, might not be the best idea. We processed the data, both via linear regression and K-Means clustering. By doing so, we discovered that GDP per capita correlates less with happiness than might be expected. Positive affect, social support, and corruption, on the other hand, correlate more significantly with happiness and might be a better focus for governments. Hence, we urge governments to consider different aspects of policies and measure welfare in a different way in their country.
Final project for the course Data Mining at Radboud University 2022/2023