From Algorithmic Interpretability to Algorithmic Contestability: DATA ONTOLOGIES WORKSHOP #6: THE ONTOLOGICAL LIMITS OF CODE

criticalai

4 years ago

[Data Ontologies is the second in a two-part series of AY 2021-22 workshops organized through a Rutgers Global and NEH-supported collaboration between Critical AI@Rutgers and the Australian National University. Below is the sixth in a series of blogs about each workshop meeting. For more on the workshop discussion, see Sean Silver’s Guest Forum on Models and Modeling. Click here for the workshop video and the discussion that followed.]

by Mark Aakhus (School of Communication and Information, Rutgers)

On May 5, the ANU-Critical AI@Rutgers workshop series on Data Ontologies turned to a session devoted to the ONTOLOGICAL LIMITS OF CODE—a topic that invites an interesting question about whether there are ontological limits to code. The key readings, Zachary Lipton’s “The Mythos of Model Interpretability” and selections from Louise Amoore’s Cloud Ethics: Algorithms and the Attributes of Ourselves and Others, are quite different but converge on an interest in strategies for addressing the workings of algorithmic systems “in the wild” – that is, their role in decision making. Machine Learning (ML) and its interpretability are key focal points for Lipton, a computer scientist, and Amoore, a geographer. Their differences actually highlight the limits of interpretability as a goal for addressing the workings of algorithmic systems in the wild while disclosing the demands for accountability and the need for pathways to contest algorithmic systems and their outputs. In this sense, the ontological limits of code are not merely given, but always at stake.

The intrigue with ML explored by Lipton and Amoore revolves around the basic fact that machines can “learn” to solve some kinds of problems from data without being given explicit instructions about how. To this extent, data-driven machine learning is a branch of “artificial intelligence” (AI) that works differently than conventional computer technology. Conventionally, programmers encode particular instructions for the processing of information. ML, by contrast, typically involves the generation of a statistical model as the machine “learns” to make accurate predictions about a given data set for some specific purpose. This trained model can then be used to make predictions about new data “in the wild.” As such, ML can perform many tasks involving classification (e.g., labeling some mail as spam) or, more controversially, by finding correlations between multiple variables and a given social outcome (such as predicted default on a loan). ML can also be implemented to discern underlying structures that might otherwise be hard to detect (e.g., identifying speech patterns, or anomalous credit card transactions).

In these and other ways, ML’s predictive capacities are remarkable and consequential; but they can also be a disturbing source of intrigue for humans because, in systems trained on huge data sets, the signals that determine a model’s predictions are usually inscrutable even to those who designed the system. This “black box” problem is fairly well known. However, I suggest that there is more to the intrigue than just inscrutability. By generating a predictive model, a pre-trained ML system has, in effect, created an applied ontology. That is, machine learning models specify critical properties and relations of beings and things: for example, “this email is spam;” “this is the type of person that does not pay their debts.” ML thus participates in the way life is lived and, as such, creates problems as well as solves them. It is here that Lipton and Amoore enter the conversation. Each in their own way.

Lipton’s essay, written in 2016, sets out to describe the “mythos” of model interpretability with a focus primarily on “supervised learning” (a kind of ML that involves humans in setting some learning parameters and labeling at least some of the data). The interpretability of a model commonly refers to recognizing what a system has “learned” and, thus, what it might usefully predict. While such “interpretability” is important to achieve, it does not fully explain why a model works the way it does or whether it is a good model for any given purpose. This presents a fundamental challenge to those who care about the social impacts of machine learning.

Lipton’s intervention is to reflect on model interpretability in the context of the public’s demand for more knowledge of high-stakes decision-making in, for example, medicine, criminal justice, and finance. But Lipton casts doubt on whether model interpretability should be regarded as the presumptive standard or remedy for the public’s desire to increase transparency and assure fairness. Although model interpretability is important, in the real world, determining that a machine learning model’s predictions are accurate is not enough to warrant good decision-making in a given social context. The reason is that “good” decisions typically depend on social criteria—for instance, questions of fairness—which are extrinsic to what the ML system has modeled. Lipton’s point is that “interpretation serves objectives that are important but difficult to model.” Indeed, despite ubiquitous usage in ML discourse, “interpretability is not a monolithic concept, but in fact reflects several distinct ideas” (1) and “has no formal technical meaning” (2).

*Melancholy Painting* by Belan Sambucety

Lipton’s way forward rests on five types of cases in which the critical demands for “interpretation” are difficult to formally model: trust, causality, transferability, informativeness, and fair and ethical decision making. These are all cases in which the relevant criteria—for example, for ethics—“cannot be optimized,” and/or in which the “dynamics of the deployment environment differ from the training environment” (2). As such, the demand for “interpretation” tends to exceed the limits of what “model interpretability” can deliver.

Lipton nonetheless outlines two main pathways for how model interpretability remains relevant to demands for trustworthy ML (5-6). The first is transparency which refers to “understanding the mechanism by which the model works.” Transparency applies at differing levels: to the entire model, the individual components, and the algorithm itself. The second is post-hoc interpretability which involves providing “useful information to practitioners and users.” This is implemented via text explanations that comment on the states of the predictive model when producing outputs, visualizations, local explanations of what the model is focusing on, and examples of what else the model considers similar.

However, Lipton recognizes that even these pathways are riddled with complications (6). For example, many assume that linear models are more interpretable than deep neural networks. While that is sometimes true it does not follow that projecting linear models onto “deep” models is a reliable way of interpreting the latter. Even making complex models more transparent can, paradoxically, make them less accurate. Moreover, post hoc interpretations can be misleading. Lipton’s analysis should deflate exuberant hopes about the limits of model interpretability while motivating researchers to try to meet public demands for useful interpretation.

While this is a good start, I believe that something is missing in Lipton’s analysis and want to propose a different direction. The tensions between model interpretability and the social demands that have created the misleading “mythos” suggest that the entire interpretability enterprise can be naïve or deceptive. Lipton points to the dangers of believing that more information about a model’s inner operations will result in true clarity about a system’s results, or confidence in its fairness or reliability. In doing so, Lipton comes close to recognizing that this misleading focus on breaking open a black box system can easily beg important questions about that system’s purpose and impact. Those social questions may be more about disagreements between stakeholders, or impacts on particular individuals and groups, than about the internal dynamics of a complicated algorithmic system.

What lies behind the desire to unlock the abstruse secrets of black boxes, I contend, are demands for accountability and, perhaps above all, the ability to contest the decisions of machine learning systems. This reframing acknowledges that algorithmic systems are (or should be) fundamentally contestable; that questions about model interpretability concern real-world (ontological) contexts and consequences. It is worth noting that scholars have begun to recognize that facilitating model interpretability facilitates contestation of algorithmic systems and their outputs (e.g., Lyons, Velloso, and Miller, 2021). Indeed! That is why deliberation about algorithmic systems should not be limited to any principle of interpretability, even when buttressed by high standards of “transparency” or “explanation,” without also acknowledging the fundamental contestability of these systems.

Design for contestability would seek to make space for deliberation about the design (and re-design) of algorithmic systems, their implementations, and the social and institutional structures that support them. Such design would necessarily consider pertinent matters of inclusive participation such as what can be contested and when, who can contest, who is accountable, and the conditions for engagement.

Interestingly, Lipton’s identification of five critical issues can be transported to conversations around accountability and design for contestability as some ways of making the debates over “interpreting” predictive systems more robust. The interpretive problems Lipton identifies (trust, causality, transferability, informativeness, and fair and ethical decision-making) frame ontological issues about the nature, purpose, and rights (i.e., the powers granted) of algorithmic systems. Embracing contestability offers a way forward that recognizes the demand for interpretability and its limits while engaging the demand for accountability. To think more about the ontological limits of code in effect is to think harder about what machine learning models do when they make decisions “in the wild.”

Turning from Lipton’s essay to the chapters from Amoore’s 2020 book involves a switch from a computer scientist focused on the problem of interpretability to a geographer concerned with algorithmic systems as arrangements of values, assumptions, and propositions about the world. Amoore’s concern is that these arrangements can become incontrovertible hinges on which interaction, communication, and decision-making proceed. Where Lipton’s analysis points implicitly to the contestability of algorithmic outputs, Amoore explicitly seeks a framework of practices for giving doubt a presence that is otherwise lost in the process of ML’s generation of models: that is, a framework that is intentional about ontology and open to contestability.

Emerging conventions for making algorithmic decision-making accountable, such as techniques for transparency and post-hoc interpretability, focus on the gaps in interpreting the adequacy of the model’s outputs. Amoore contends that to do so is to become stuck in a paradigm of observation, representation, and classification: “a critical response cannot merely doubt the algorithm, pointing to its black-boxed errors and contingencies, for it is precisely through these variabilities that the algorithm learns what to do” (151). Thus, the output of an algorithmic system cannot be treated as simply true or false because it depends on a network of partial relations among entities that can be infinitely recombined.

Amoore’s chapter shifts to concerns about how algorithmic systems render what was previously imperceptible as perceptible and actionable. As is common in contemporary parlance, she understands “algorithms” to extend beyond coded instructions to the entire production of the system’s output. This entails attending to relations between humans and algorithmic systems such as the “selection and labeling of training data, the setting of target outputs, and the editing of code” and the relations between algorithms such as “the classifier supplying the training data on which the neural network learns” whether there is a human in the loop or not (9). Amoore thus situates algorithmic systems in a paradigm of perception, recognition, and attribution which calls for engaging how these systems “modify the threshold of perceptibility itself” (41).

*Pandora’s Box* by John William Waterhouse

It is worth remarking that Amoore’s framework elides the prospects of transparency that Lipton’s analysis sought to improve. After all, Amoore asks, what is the point of finding a model’s author or the source code instantiating a model when the model takes up a life of its own through its outputs that may become inputs for a next iteration (93)? What would one find if an algorithmic black-box were opened? A key problem with techniques of transparency for Amoore is the presumption that it is possible to step outside the system to see its original author(s) and coded instructions, as if going back in time to divine original intentions or the conventions in play at the moment of invention would address where things stand today and tomorrow. Other grounds for giving doubt a presence must be found.

Amoore’s alternative is a cloud ethics that explicates ways for attending to how algorithmic systems participate in organizing worlds and the partial accounts that they already give of themselves in their performance. This includes recognizing, in particular, how algorithms are complex performances of collective, iterative writing among many authors – human and machine – that engage in fabulation by inventing people and writing them into being. The writing generates thresholds of what is perceivable and actionable. The model’s performances include the very conditions of an algorithm’s emergence.

Amoore outlines the key strategies for cloud ethics: attending to apertures to trace rejected alternatives in the machine’s capacity to learn, opacity to disclose what an algorithm learns to see (and not see), and the unattributable to reveal breaches between existing scenes analyzed and the algorithmic output characterizing the scene. Amoore sees these focal points as venues for an ethicopolitics that can engage the singular (and seemingly incontestable) output of an algorithm beyond its code and the truth or falsity of its outputs. These are strategies for being reflective about the conditions of emergence that can introduce doubt rather than reinforce certainty about the way algorithmic systems frame and select what matters in the world including what or whom can be recognized, protested, and claimed.

Amoore offers an interpretive framework for recognizing algorithmic reason: that is, in generating a model, algorithmic systems select from alternative models. A basic point that is key to understanding the role algorithmic systems play in coordinating with humans in the construction of social reality and complex action. It is a framework that considers the ontological limits of code as emergent and not given. However, it falls short of offering a normative framework about accountability and thus a basis on which contestability might be grounded. Even so, as an interpretive scheme for attending to how algorithmic systems modify thresholds of perceptibility it offers crucial input for animating conversations about accountability and contestability. Importantly, it offers a pathway for seeing how algorithmic systems contribute to the (un)making of social realities that is not simply about redressing bad outcomes but about entertaining and debating normative possibilities as decision-making in the wild happens.

*Orange Car Crash Fourteen Times* by Andy Warhol

Is there a significant difference between the approaches of Lipton and Amoore? After all, both writings attempt to render the implicit workings of algorithmic systems explicit for the purpose of interpreting these systems in their deployments. I suggest that cloud ethics picks up where model interpretability leaves off as it seeks points of intervention into the role of algorithmic modeling in constructing social reality. By contrast, Lipton presumes a reality that either is or is not modeled in a way that can be understood and interpreted. While interpretability and cloud ethics are very different stances toward “AI,” when taken together they may be used to pose an interesting answer to the question about whether there are ontological limits to code. It may be that the ontological limits of code are defined by the capabilities for contesting code and the algorithmic systems they instantiate.* Lipton and Amoore point to two different directions for developing such capabilities. However, both remain in the realm of explaining the decisions and decision making of algorithmic systems. The crucial turn to be made would embrace designing pathways for algorithmic contestability that generate accountability for decision making in the wild.

*In our forthcoming book Argumentation in Complex Communication: Managing Disagreement in a Polylogue with Cambridge University Press, Marcin Lewinski and I examine how social and technical practices of argumentation render some ways of arguing reasonable and other ways not. We offer a framework for explaining, evaluating, and designing argumentative communication that is especially relevant to the contestability of AI and the ontological limits of code.

Share this: