Spotlight
in
Workshop: Algorithmic Fairness through the Lens of Time
Measuring fairness of synthetic oversampling on credit datasets
Decio Miranda Filho · Thalita Veronese · Marcos M. Raimundo
Machine Learning models often face performance issues due to class imbalance, a common problem characterized by datasets that are biased towards a so called majority class. Oversampling the minority class through synthetic generators has become a popular solution for balancing data, giving rise to a lot of rebalancing techniques, like ADASYN and SMOTE. Practitioners usually lean on performance metrics in order to either refute or advocate for the adoption of some resampling method. However, considering the increasing ethical and legal demands for fair machine learning models, it is important to test the neutrality of these methods with respect to fairness. We conducted an investigation of the effects of oversampling on gender bias by analyzing statistical parity difference (SPD) and equal opportunity difference (EOD) obtained from four credit datasets. Similarly to performance, fairness impact caused by synthetic minority oversampling showed to be more significant for weak classifiers. Our results suggest that synthetic oversampling should be used with caution in order to avoid amplifying or even creating biased data.