NeurIPS Poster [Re] On the Reproducibility of “FairCal: Fairness Calibration for Face Verification”

Poster

[Re] On the Reproducibility of “FairCal: Fairness Calibration for Face Verification”

Marga Don · Satchit Chatterji · Milena Kapralova · Ryan Amaudruz

Great Hall & Hall B1+B2 (level 1) #2016

[ Abstract ]

[ Poster] [ OpenReview]

Abstract:

Scope of Reproducibility — This paper aims to reproduce the study FairCal: Fairness Calibration for Face Verification by Salvador et al., focused on verifying three main claims: FairCal (introduced by the authors) achieves state‐of‐the‐art (i) global accuracy, (ii) fairness-calibrated probabilities and (iii) equality in false positive rates across sensitive attributes (i.e. predictive equality). The sensitive attribute taken into account is ethnicity.Methodology — Salvador et al. provide partial code via a GitHub repository. Additional code to generate image embeddings from three pretrained neural network models were based on existing repositories. All code was refactored to fit our needs, keeping extendability and readability in mind. Two datasets were used, namely, Balanced Faces in the Wild (BFW) and Racial Faces in the Wild (RFW). Additional experiments using Gaussian mixture models instead of K‐means clustering for FairCal validate the use of unsupervised clus‐ tering methods. The code was run on an AMD Ryzen 7 2700X CPU and NVIDIA GeForce GTX1080Ti GPU with a total runtime of around 3 hours for all experiments.Results — In most cases, we were able to reproduce results from the original paper to within 1 standard deviation, and observe similar trends. However, due to missing information about image pre‐processing, we were unable to reproduce the results exactly.What was easy — The original paper is clear and understandable. Furthermore, the authors provided a mostly working version of the code. Though the datasets are not freely available to the public, their authors supplied these to us swiftly after contacting them.What was difficult — While most of the code worked with slight changes, it was assumed there were files containing image embeddings available for both datasets, which the authors neither provided nor gave details about. We therefore pre‐processed and generated embeddings independent of the authors, which makes it more difficult to judge the overall reproducibility of their method. Additionally, we encountered difficulties while improving the efficiency and extendability of the code.Communication with original authors — We emailed the first author of the paper twice. First at the beginning of our undertaking, they were enthusiastic about our attempt, and clarified a few initial doubts about their implementation, the embeddings, and missing files. As per the writing of this paper, they have not responded to the second email.

Chat is not available.