[The Ethics of Data Curation is the first in a two-part series of AY 2021-22 workshops organized through a Rutgers Global and NEH-supported collaboration between Critical AI@Rutgers and the Australian National University. Below is the second in a series of blogs about each workshop meeting. Click here for the workshop video and the discussion that followed.
by Nidhi Salian, Critical AI@Rutgers Undergraduate Assistant
Facebook is once again under fire. As whistleblower Frances Haugen, a former product manager at the scandal-plagued tech giant, exposed what happens behind the scenes, we got a renewed sense of the multitude of problems that occur when powerful corporations ignore the public interest. In a less well-publicized story, Twitter admitted that its platform amplifies rightwing political messaging over left. Controversies of this kind urge all of us to think critically about how we hold these social media gatekeepers accountable.
Such questions were at the heart of Meredith Broussard’s presentation during the second workshop of the Ethics of Data Curation, an interdisciplinary collaboration between CriticalAI@Rutgers and colleagues at the Australian National University. As a data journalist and Research Director at the NYU Alliance for Public Interest Technology, Broussard works with these topics on a daily basis. Her talk and the discussion that followed make crucial viewing for anyone who missed the live event and wants to join the discussion asynchronously.
In her award-winning 2018 book, Artificial Unintelligence: How Computers Misunderstand the World, Broussard challenges the notion of technochauvinism—the idea that technology can solve all of humanity’s problems. There and in her talk, Broussard calls attention to the rampant bias and discrimination that plague the machine learning technologies that, these days, fall under the rubric of artificial intelligence (AI). Though often assumed to be objective and infallible, so-called AI today depends on the reliability of the datasets on which the software is trained. Because the data in question is, by definition, drawn from the past, AI systems often replicate existing patterns of discrimination including racism, sexism, ableism, and “structural inequality” (Broussard 2018). This tendency to reproduce bias and cause further harm is exacerbated by the homogenous make-up of the higher echelons of AI research and development. As Broussard writes in a memorable passage from her chapter on “People Problems” (one of several we read in preparation for the workshop), the history of the tech industry, beginning with its mid-century roots, is riddled with exclusionary groupthink:
[W]e have a small, elite group of men who tend to overestimate their mathematical abilities, who have systematically excluded women and people of color in favor of machines for centuries, who tend to want to make science fiction real, who have little regard for social convention, who don’t believe that social norms or rules apply to them who have …piles of government money sitting around, and who have adopted the ideological rhetoric of far-right libertarian anarcho-capitalists… What could possibly go wrong?
Describing the countervailing potential of DATA JOURNALISM, Broussard called out the special importance of algorithmic accountability reporting—her own field of specialization. As she writes in Artificial Unintelligence, since machine-learning is “used increasingly to make decisions on our behalf,” the “role of the free press” is to hold this new generation of decision-makers accountable. Algorithmic accountability reporting takes on this crucial democracy-building “responsibility and applies it to the computational world.” It is a task that Broussard—hard at work on a new book while busy teaching and testifying before a congressional task force on AI—is happy to share with others.
Perhaps the best known example of algorithmic accountability reporting is ProPublica’s “Machine Bias,” the 2016 groundbreaking story that found that COMPAS (The Correctional Offender Management Profiling for Alternative Sanctions) was recommending longer jail sentences for African Americans than for Whites. Though COMPAS was designed to help judges make more objective, data-driven decisions about sentencing, its algorithm had virtually the opposite effect. Using the same benchmark that the designers used—the risk of recidivism within two years—ProPublica’s journalists found that COMPAS’s assigned risk scores were “remarkably unreliable in forecasting violent crime: Only 20 percent of the people predicted to commit violent crimes actually went on to do so.” In a kind of perverse digital targeting, the sentencing algorithm was putting African Americans behind bars longer than White peers for no good reason at all. As Broussard put it, the problem isn’t simply that “COMPAS is not fair enough”; rather, given that COMPAS is among the systems (another being facial recognition) that “are disproportionately weaponized against communities of color,” the problem “is that it exists at all.”
A more recent example of algorithmic accountability reporting, Broussard pointed out, is The Markup’s “Secret Bias Hidden in Mortgage-Approval Algorithms.” In the United States, the journalists found, applicants of color are 40-60 percent more likely to be denied mortgages than White counterparts—with Black applicants 80 percent more likely to be denied than Whites with similar incomes, debt-to-income, and combined loan-to-value ratios.
While these examples illustrate how data journalism challenges technochauvinism and exposes tech-driven issues that endanger the public good, they also point to a major challenge for data journalists themselves. That is, when data is inaccessible, it becomes significantly more difficult for data journalists to identify inequality at work. As another article in The Markup reported, Facebook rolled out changes to its code this September which blocks watchdogs from gathering data and monitoring the platform. Without access to data, journalists and other researchers cannot effectively audit the tech platform and uncover the unjust algorithms at work. This is not the first time that Facebook has attempted to undermine public scrutiny: previously the company gave incomplete data to researchers on misinformation from the Social Science One research group, effectively undermining the findings of years of research.
Like the whistleblower’s uncovering of the internal research on Instagram’s harm to teen mental health, when tech companies seek to thwart transparency, they close off an important avenue for public good. Of course, the findings of data journalists may result in uncomfortable truths; but if tech companies enabled data journalists to audit data and conduct independent research, social media could find better directions. As Broussard puts it in Artificial Unintelligence, we need to “move from blind technological optimism to a more reasonable, balanced perspective on how to live lives that are enhanced but not threatened or compromised by technology.”
In the lively discussion that followed the presentation, Broussard fielded a wide range of questions including the impact of Substack; the role of data journalism in a varied media landscape; the potential for computer scientists to work as citizen journalists; the ethics of scraping, data storage, and data transparency; the new ACM Code of Ethics; and the limitations of de-biasing data or algorithms. As Broussard made clear throughout, while technochauvinists claim that technology is the solution to the world’s problems, that has seldom been the case. As “computers have evolved,” she writes in Artificial Unintelligence, “human nature has not. People need to be kept honest.”
If we are ever to be successful in combating current issues of injustice and inequalities, then technology needs to be held accountable for its role in exacerbating those issues. Here, the work of data journalists is vital to ensuring this accountability and making sure that technology truly is working for the community and not the other way around.