Collect Ethically: Reduce Bias in Twitter Datasets

Abstract

The Twitter platform is appealing to researchers due to the ease of obtaining data and the ability to analyze and produce results rapidly. However, sampling Twitter data for research purposes needs to be regulated to produce unbiased results. In this paper, factors that lead to sampling bias are addressed, case studies that have been encountered are presented, and an approach is proposed to reduce sampling bias and flaws in datasets collected from Twitter. Then, experiments are conducted on two case studies, and a larger dataset is achieved by following the proposed guideline. The results indicate that using multiple Twitter application programming interfaces (APIs) for data collection is the best way to obtain a randomly sampled dataset.

Publication
Information Management and Big Data: 6th International Conference, SIMBig 2019, Lima, Peru, August 21–23, 2019, Proceedings 6
Abdulaziz Alhamadani عبدالعزيز الهمداني
Abdulaziz Alhamadani عبدالعزيز الهمداني
PhD Candidate

My research interests include ML applications, text classification, event detection, pandemic forecasting, and ethical AI.