DOD Website

Machine Learning Fairness: Types of Bias

Nagesh Somayajula

--

Every Data Scientist must know — Part 1

The intuition behind Machine Learning Bias

We are now entering the “third wave” of AI where AI systems will become capable of explaining the reasoning behind every decision made by them. To make it happened machine learning bias play’s a very critical role for future generations’ AI applications.

A bias is a systematic error, or deviation from the truth, in results or inferences which may lead to incorrect analysis results including predictions, bias not only impact machine learning predictions but also impacts true analytics and including descriptive analysis.

AI applications are like a small kid, we must train with the right data otherwise they can be misguided, and correcting machines or AI applications will be big challenging, for kids also (pun intended).

The AI systems themselves will construct models that will explain how it works and follow anti-bias rules.

In the machine, learning bias is one of the most common problems and every algorithm falls trap on this various kind of bias, let’s discuss in detail various types of bias and how to uncover it.

Machine learning models are not inherently objective. Engineers train models by feeding them a data set of training examples and human involvement in the provision and curation of this data can make a model’s predictions susceptible to bias. Therefore, providing anti-bias data to the machine is a must to have, and understanding various machine learning bias will be crucial to have for every data scientist and data practitioners.

Four waves of AI

Introduction to Machine learning Bias

When building models, it’s important to be aware of common human biases that can manifest in your data, so you can take proactive steps to mitigate their effects.

The most common machine learning bias exist in real-time data are –

Reporting bias

1. Publication bias

2. Time lag bias

3. Location bias

4. Citation bias

5. Language bias

Automation bias

Selection bias

1.Coverage bias

2. Non-response bias

3.Sampling bias

Group attribution bias

1. In-group bias

2. Out-group homogeneity bias

Implicit bias

1. Confirmation bias

2. Experimenter’s bias

3. Racial Bias

4. Gender Bias

Various other types of Bias

· Algorithms bias

· Academic bias

· Funding bias

· Information bias

· Meta-analysis bias

· Peer review bias

· Recall bias

· Unintended Bias

Let’s Understand each type of Bias with an example

Reporting Bias

Reporting bias occurs when data has nonuniform frequency data, or the frequency of events, properties, and/or outcomes captured in a data set does not accurately reflect their real-world frequency. This bias can arise because people tend to focus on documenting circumstances that are unusual or especially memorable if the ordinary can “go without saying.”

EXAMPLE — Reporting bias:

In machine learning projects where the machine does sentiment analysis for movies rating by each user, a sentiment-analysis model is trained to predict whether movie reviews are positive or negative based on the corpus of user’s submission to a popular website like Netflix, etc. Most reviews in the training data set reflect extreme opinions (reviewers who either loved or hated a movie) because people were less likely to submit a review of a movie if they did not respond to it strongly. As a result, the model is less able to correctly predict the sentiment of reviews that use more subtle language to describe a movie. This is reporting bias where only a few sets of users posted their feedback and the machine cannot genialize based on strong positive or negative feedback from a few sets of users.

In the next section, I will define how we can identify using Python and custom build Libraries and discuss in details about other various types of Bias.

--

--