Fair Machine Learning#

In brief#

Fair Machine Learning models take into account the issues of bias and fairness. Approaches can be categorized as pre-processig, which transform the input data, as in-processing, which modify the learning algorithm, and post-processing, which alter models’ internals or their decisions.

More in Detail#

Fairness can be promoted in three different ways in ML as surveyed in [3]. This survey provides a clear categorization of methods under pre-process, in-process and post-process approaches.

Pre-process approaches are the most flexible ones that transforms the data so that the underlying bias is removed. One advantage of using pre-process techniques is that it is the most inspectable method, as it is the earliest opportunity to mitigate biases and measure how it affects the outcome compared to the other two approaches in fair machine learning [4]. Suppression or Fairness Through Unawareness is a baseline method that accounts for removing the sensitive features and proxy sensitive features from the dataset [5]. A recent study [6] proved that removing sensitive information does not guarantee fair outcomes. Massaging the dataset (relabeling) method can act in two ways:

  1. Identify unfair outcomes and correct them by changing the label to what ought to have happened.

  2. Identify sensitive classes and relabel them so that the outcome is fair.

Reweighting approach has several positive aspects compared to suppression and relabeling. It works by postulating that a fair dataset would have no conditional dependence on the outcome of any of the sensitive attributes. That means it corrects the past unfair outcomes by giving more weightage to correct cases and less weightage to incorrect cases. Learning fair representations approach fairness fundamentally differently by aiming for a middle ground between-group fairness and individual fairness [7]. It turns the pre-process problem into a combined optimization problem that finds trade-offs between-group fairness, individual fairness, and accuracy.

In-process approaches modify the learning algorithms to remove biases during model training by either incorporating fairness into the optimization equation or imposing a constraint as regularization [8, 9]. The main categories of in-processing approaches are adversarial debiasing and prejudice removal. The former involves an adversary to predict the sensitive attributes from a downstream task (classification or regression), and thus the model learns a representation independent of sensitive features. Learning fair representations can be done by adding noise to the predictive power using the regulation. Adversarial reweighted learning [10] uses non-sensitive features and labels to measure unfairness and co-train the adversarial reweighting approach to improving learning. On the other hand, the prejudice remover approach has various techniques to mitigate biases during training. Some of the standard methods are:

  1. Heuristic-based: Use Roony rules [11], which effectively rank problems.

  2. Algorithmic Changes: These can be made in every single step of calibration, such as input, output, and model structure [4, 12, 13, 14, 15, 16, 17, 18, 19].

  3. Using pre-trained models: It involves combining available pre-trained models and transferring them to reduce bias [20]

  4. Counterfactual and Causal Reasoning: This considers a model to be group or individual fair if its prediction in the real world is similar to the counterfactual world, where individuals belong to a different protected group. Causal reasoning can be used to caution against those counterfactual explanations [21] [22]. A primary concern on the use and misuse of counterfactual fairness has been studied in [23].

Finally, post-process approaches are the most versatile approaches if the model is already in the production stage and it does not require retraining the model. Another advantage of using post-processing is that the fairness (individual and group) of any downstream tasks can be easily satisfied concerning the domain and application of the model [24]. Also, post-processing is agnostic to the input data, which makes it easier to implement. However, post-processing procedures may present weaker results when compare to pre-processing ones [22, 25].

Assessment tools: Tools can assist practitioners or organizations in documenting the measures, providing guidance, helping formalize processes, and empowering automated decisions. There are various types of tools to identify and mitigate the biases. Out of which, technical/quantitative tools and qualitative tools are primarily used in real-world applications by engineers and data scientists. Technical/quantitative tools focus on data or AI pipeline through technical solutions. One major drawback is that it may miss essential fairness considerations; for example, it cannot be employed to mitigate bias in the COMPAS algorithm as the nuances could not be adequately captured. It lacks methods to understand and mitigate biases but perpetuates a misleading notion that “Fair ML” is not a complex task to achieve. Some of the standard solutions in this category are:

  1. IBM’s AI Fairness 360 Toolkit: It is a python toolkit through the lens of technical solutions under fairness metrics.

  2. Google’s What-If Tool explores the model’s performance on a dataset through hypothetical situations. It allows users to explore different definitions of fairness constraints under various feature intersections.

  3. Microsoft’s fairlean.py: It is a python package consisting of mitigation algorithms and metrics for model assessment.

On the other hand, Quantitative techniques can delve into the nuances of fairness. They can enable teams to explore the societal implications, analyses fairness harms and tradeoffs, and propose plans to find the potential sources of bias and ways to mitigate them. Two of the most prominent qualitative techniques are:

  1. Co-designed AI fairness checklist (2020): This checklist is designed by a group of Microsoft researchers and academicians, 49 individuals from 12 technical organizations. It covers the items included in different stages of the AI pipeline, including envision, define, prototype, build, launch, and evolve, and is customizable according to the deployment.

  2. Fairness Analytic (2019): This analytic tool is developed by Mulligan et al. to promote fairness at the earlier stages of product development. It enables teams to understand biases from a specific application perspective to analyze and document their effects.

While these tools exist to analyze the potential harms, it is the responsibility of users to understand the after-effects of which tools they are using, and which types of biases can mitigate. A detailed review of landscape and gaps in fairness tool kits is given in [1].



Michelle Seng Ah Lee and Jatinder Singh. The landscape and gaps in open source fairness toolkits. In CHI, 699:1–699:13. ACM, 2021.


Nikita Kozodoi, Johannes Jacob, and Stefan Lessmann. Fairness in credit scoring: assessment, implementation and profit implications. European Journal of Operational Research, 297(3):1083–1094, 2022.


Simon Caton and Christian Haas. Fairness in machine learning: a survey. arXiv preprint arXiv:2010.04053, 2020. URL: https://arxiv.org/abs/2010.04053.


Brian d'Alessandro, Cathy O'Neil, and Tom LaGatta. Conscientious classification: a data scientist's guide to discrimination-aware classification. Big data, 5(2):120–134, 2017.


Pratik Gajane and Mykola Pechenizkiy. On formalizing fairness in prediction with machine learning. arXiv preprint arXiv:1710.03184, 2017. URL: https://arxiv.org/abs/1710.03184.


Boris Ruf and Marcin Detyniecki. Active fairness instead of unawareness. arXiv preprint arXiv:2009.06251, 2020. URL: https://arxiv.org/abs/2009.06251.


Richard S. Zemel, Yu Wu, Kevin Swersky, Toniann Pitassi, and Cynthia Dwork. Learning fair representations. In ICML (3), volume 28 of JMLR Workshop and Conference Proceedings, 325–333. JMLR.org, 2013.


Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John T. Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. AI fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias. IBM J. Res. Dev., 63(4/5):4:1–4:15, 2019.


Richard Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael J. Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. A convex framework for fair regression. CoRR, 2017.


Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, and Ed H. Chi. Fairness without demographics through adversarially reweighted learning. In NeurIPS. 2020.


Caitlin Kuhlman, MaryAnn Van Valkenburg, and Elke A. Rundensteiner. FARE: diagnostics for fair ranking using pairwise error metrics. In WWW, 2936–2942. ACM, 2019.


Benjamin Fish, Jeremy Kun, and Ádám Dániel Lelkes. A confidence-based approach for balancing fairness and accuracy. In SDM, 144–152. SIAM, 2016.


Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi, and Cristos Goodrow. Fairness in recommendation ranking through pairwise comparisons. In KDD, 2212–2220. ACM, 2019.


Dylan Slack, Sorelle A. Friedler, and Emile Givental. Fairness warnings and fair-MAML: learning fairly with minimal data. In FAT*, 200–209. ACM, 2020.


Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Allison Woodruff, Christine Luu, Pierre Kreitmann, Jonathan Bischof, and Ed H. Chi. Putting fairness principles into practice: challenges, metrics, and improvements. In AIES, 453–459. ACM, 2019.


Jialu Wang, Yang Liu, and Caleb C. Levy. Fair classification with group-dependent label noise. In FAccT, 526–536. ACM, 2021.


Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, and Mark D. M. Leiserson. Decoupled classifiers for group-fair and efficient machine learning. In FAT, volume 81 of Proceedings of Machine Learning Research, 119–133. PMLR, 2018.


Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard S. Zemel. The variational fair autoencoder. In ICLR. 2016.


Anay Mehrotra and L. Elisa Celis. Mitigating bias in set selection with noisy protected attributes. In FAccT, 237–248. ACM, 2021.


David Madras, Elliot Creager, Toniann Pitassi, and Richard S. Zemel. Learning adversarially fair and transferable representations. In ICML, volume 80 of Proceedings of Machine Learning Research, 3381–3390. PMLR, 2018.


Joshua R Loftus, Chris Russell, Matt J Kusner, and Ricardo Silva. Causal reasoning for algorithmic fairness. arXiv preprint arXiv:1805.05859, 2018. URL: https://arxiv.org/abs/1805.05859.


Razieh Nabi, Daniel Malinsky, and Ilya Shpitser. Learning optimal fair policies. In ICML, volume 97 of Proceedings of Machine Learning Research, 4674–4682. PMLR, 2019.


Atoosa Kasirzadeh and Andrew Smart. The use and misuse of counterfactuals in ethical machine learning. In FAccT, 228–236. ACM, 2021.


Pranay Kr. Lohia, Karthikeyan Natesan Ramamurthy, Manish Bhide, Diptikalyan Saha, Kush R. Varshney, and Ruchir Puri. Bias mitigation post-processing for individual and group fairness. In ICASSP, 2847–2851. IEEE, 2019.


Dana Pessach and Erez Shmueli. A review on fairness in machine learning. ACM Computing Surveys (CSUR), 55(3):1–44, 2022.

This entry was written by Resmi Ramachandran Pillai, Fredrik Heintz, Miguel Couceiro, and Guilherme Alves.