Research Guides: Diversity, Equity, and Inclusion in Research: Data Equity

What is Data Equity

This page explores different ways to think about data through an equity lens. Research data is often regulated, and many research projects have to go submit an IRB for approval. However an IRB is not always required, and even when it is, an IRB doesn't always investigate research methods from an equity standpoint. It's up to the researcher to go above and beyond to ensure that whatever the topic they are researching is equitable to all people in their community.

This looks vastly different depending on whether you are working in engineering, medicine, ecology, or English literature, dance, and history. This guide is designed to get you thinking about some of the equity concerns that might effect your research data. If you need additional help on your specific topic, don't be afraid to reach out to your Subject Librarian!

Meet Your Subject Librarian!
This short tutorial introduces you to your subject librarian and shows you how to contact them if you have questions!

Available Workshops and Modules

Note that the information below is also available as a workshop. Please contact me, Jodi Coalter, to arrange a time. You can also view the workshop slides in Google Drive linked below.

Pursuing Data Equity: Beyond the IRB
These modules are designed to help researchers who require an IRB to think about their data through an Equity lens.

What's in Data Equity?

Research Data Management (or RDM) can be defined as the process of documenting, organizing, and maintaining the processes used in the information/data lifecycle.

Storage & backup procedures
Script/code documentation
Provenance & citation information
Quality assurance & security protocols
Licensing agreements.

But what counts as "research" data? Basically, we describe research data as the you are collecting that are used to reach your conclusions and “prove that you are right”Research data may be experimental data, observational data, operational data, third party data, public sector data, monitoring data, processed data, or repurposed data.

Note that the collection, use, and reuse of data often follows a common cycle, called the Research Data Management Life Cycle. At each stage, there is an opportunity to incorporate more EDI into your research. Click on the tabs in this box to learn more!

the research data management lifecycle

“... we must explicitly acknowledge that a key way that power and privilege operate in the world today has to do with the word data itself” (D’Ignazio and Klein 2020)

The very first step to incorporating equity into your data is to acknowledge that data are not objective or neutral. From collection to use and reuse, every step is filtered through each researchers preconceived ideas and notions about what should be counted and how it should be counted. Viewing your data management plan through an equity lens will help mitigate unconscious bias and ensure that any unintended harm to historically marginalized communities is avoided.

In other words, how you collect data, who collects the data, how the data is stored, who can access the data are all human driven projects! That means that data themselves cannot be neutral.

There are a few key points where equity overlaps with RDM:

Using diversity/inclusion principles in designing a research study has little to do with RDM, however, once you begin collecting those data, you are obligated to protect those individuals and their data.
When you collect data from marginalized/under-represented groups, you may not have a second chance. Losing data is always bad, but breaking trust with these communities means long-term repercussions for both you and them, particularly when it comes to trust.
Are there issues of translation/context in your data where a non-expert could come away with the wrong conclusion? If so, were these issues addressed in any supplementary documentation (e.g. code book, data dictionary, readme file) that provide adequate cross-walks, translations, or context?
Sharing and publishing data, especially when those data represent people who may take issue with how they are presented or how their data are reused, requires a clear understanding of the terms under which that data were collected. Did you make it clear that you are obligated to share the data with the larger research community? Did you present an option to flag certain variables that the studied populations may feel comfortable giving to you, but might not want made freely available? Are the data unnecessarily precise in terms of geographic location or indirect identifiers?

an image of all yellow lego heads does not equal sign image An image of diverse women scientists

When Things Go Bad

You don't have to look very far to discover an example of when research data management went bad. Below are a few examples of why considering equity in data is important from the very beginning.

1. Facial Recognition

Facial recognition software started in computer science. Unfortunately, computer science is almost 87% white male, which means that when it came to training facial recognition software, they didn't use a diversity of faces. This lead to facial recognition software having trouble distinguishing between Black people. Find out more.

2. Medical Research

Medicine has a long and highly problematic history, and the research devised to study medicine is perhaps the most tarnished. Modern day medicine is no less troubled, as COVID-19 has clearly demonstrated. Medical research recognized health disparity in COVID, but the data about who was dying failed to effect the vaccination program.

A graph showing the number of people dying of COVID in August 2020 by race Graphic showing number of people who have been vaccinated by race

3. The Library of Missing Datasets

Who and what doesn't get counted is just as troublesome as who and what does get counted. The Library of Missing Datasets is an art project by MIMI ỌNỤỌHA that demonstrates how many datasets that should exist but don't.

In summary, none of the examples of “when things go bad” are necessarily the result of poor data management. If anything, proper data management perpetuates a lot of the inequities presented here. However, it is important to note that data management, sharing, and access procedures can reinforce and defend other poor decisions. For example, if an example of misuse or inequity arises, but it isn’t against policy, procedure, or the documentation, then who is to say that it an issue? Who is responsible for fixing it?

It’s easy to understand how inequality is perpetuated in large systems/big data, but what about small-scale datasets? What happens to your data after you graduate or move to a different institution? Who has it and who gets a say in how its used?

Going Beyond IRB Requirements

Many research projects require the approval of an IRB before any type of experimentation can begin. They are intended to stop the project from causing harm, both physical and mental, to research subjects. But they often don't go far enough when it concerns marginalized communities.

Case Study #1

For our example, lets investigate one research study that wanted to investigate performance-related musculoskeletal disorders (PRMDs) in musicians. This particular study required IRB approval because it directly applied to human health. An IRB would probably investigate the ethical implications of the actual research procedure and ensure that the procedure itself didn't impact the participants mental wellbeing.

But what isn't covered in this IRB?

This study might not have taken into account the number of research participants who would be drawn from an academic community of musicians. This group, from the cultural/professional context of a disability, e.g. injury and musical performance, may have unintended consequences to an individual’s career. Professional musical networks are dense and pedagogical lineage is important; personal details about institution, expertise, etc. may give the individual’s identity away. There are a finite number of professors on the tenure track. In this scenario, the onus and responsibility is place on the person with the disability instead of the system.

There are other considerations, too:

Disability studies aren’t always directly beneficial to participants; misuse or mishandling of the data, even unintentionally, may spoil your relationship with that group of people.
Studies are sometimes arduous/stressful on the participant; did you talk with your study group to determine the best and most comfortable way to gather data before you started?
What is the “so what” for your research participants? You are getting a citation/degree, but what are they getting? It can be argued that you have an ethical responsibility to ensure that your data is both understood and usable by the community that you are studying.

Case Study #2

The Havasupai Tribe of Northern Arizona

There are a plethora of projects that simply do not require an IRB, or quality for expedited IRB due to their only glancing interaction with humans. From engineering, ecology, computer science, and data science, many research projects simply don't have to delve too deeply into ethics.

But going above and beyond ensures that there are no accidental casualities to your research.