[Data Ontologies is the second in a two-part series of AY 2021-22 workshops organized through a Rutgers Global and NEH-supported collaboration between Critical AI@Rutgers and the Australian National University. Below is the fifth in a series of blogs about each workshop meeting. Click here for the workshop video and the discussion that followed.]
by Serap Firat (English, University of California, Berkeley)
The fifth of the Data Ontologies workshops was, DATAFIED ONTOLOGIES, a co-facilitated discussion of Deborah Lupton’s article “How Do Data Come To Matter? Living and Becoming with Personal Data” (2018) and chapters from Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass by anthropologist Mary L. Gray and computational social scientist Siddharth Suri .. The guest facilitators were Gavin J.D. Smith, a sociologist at ANU, and Lori Moon, who holds a doctorate in Linguistics from the University of Illinois, Urbana and has been a researcher at Elemental Cognition since 2019.
The context for the discussion is the increasing datafication of human bodies and everyday lives due to the exponential advancement and use of digital technologies. Even as I type these words on a Google document or store it with some “cloud” provider, data is being extracted, stored, and mined through intensive computational processes. The result is a data-infused culture, the major social transformations of which entail new relations between us and machine modes of “intelligence,” forms of surveillance, and decision-making.
In her 2018 article, Lupton anchors data collection to a data-infused and mediated social world in which we are all living participants, arguing that data are distinctive agents and modes of capital in contemporary life. She puts forth the claim that personal digital data “are not separate entities from people’s bodies and selves, but rather are materialisations and extensions, alternative ways of knowing and enacting bodies and selves” (9). That is, digital technologies and the data they generate are part of human embodiment and selfhood–what Lupton calls “human-data assemblages” (1). She explains:
“The concept of the human–data assemblage works to highlight the distributed and dynamic nature of subjectivity and embodiment that sociomaterial perspectives emphasize. The onto-epistemological problem posed by human–data assemblages requires humans to interpret what aspects of themselves these assemblages differentiate. Data and humans can potentially learn from each other and co-evolve. But humans may find themselves asking to what extent their data speak for them, and to what extent their data are different from other elements of embodiment and selfhood. Making sense of personal data requires developing practices that can manage and interpret lively data to make them useful and knowable” (5).
However, the ontological and epistemological aspects of this expanding data regime have yet to be thoroughly scrutinized and theorized. There are many questions that need close attention from researchers. For instance, how reliable and/or scientific are the methods for data collection and analysis process? What rationalities and motivations are used in such processes? How diverse is the underlying perspective (or is it primarily white and eurocentric in orientation, i.e. does it cover mostly the Western world?) How does it impact different individuals and social groups in their everyday lives and the future? What does and will it (data collection / data-centrism) do to the notion of humanity?
It’s clear that “human-data assemblages” have material and embodied dimensions. Yet, as Gavin Smith pointed out, for all the ostensible materiality of Lupton’s approach, the abstractions of the human-data assemblage can make data unseeable and unknowable. This creates a methodological problem for any analyst who wants to define how research is done “with data, around data, for data and about data.”
The relation between data and humans is both reciprocal and ongoing: while we make and remake digital data, data in turn make and remake us. Articulating one’s personal data, as Smith indicated, is a matter of connecting the metrics with the lived sensory experiences of one’s body and other elements that are important in data sense making. This brings about the notion of liminality between the embodied subject and data subject and the constant interplay between two, including all other factors in terms of institutions, algorithms, the other agents, that are all parts of that kind of becoming-with.
Data can sometimes reveal truths or knowledge about the person that they don’t want to hear, see, and engage with. As the ones whose data are collected and as researchers, we are in the business of generating and working with the complexities, ambivalences, ambiguities, and ambivalences of these data relations, and we struggle for the contingent meaning of our intimate relationship with data proxies.
Humans’ understanding, as Smith pointed out, may be incongruent with what the data and its materializations communicate. As an example in Lupton’s article shows, the data about human bodies may motivate the desire to lose weight or manage chronic conditions. Then, if the biometric data shows the goals are not achieved, they can have demoralizing effects and generate disappointment, guilt and anger. In this case, notifications can be experienced to be bothering, irritating, and considered to be unreasonable demands.
Datafication is related to “data activism” (Milan and Velden), an approach that explores how companies use personal data without the knowledge or permission of the affected individuals. Such digital surveillance and privacy violations help large corporations to generate profits by turning data-mining into predictive analytics for advertising and other commercial purposes (Andrejevic et al., 2015; Brunton and Nissenbaum, 2011; Kennedy and Moss, 2015; Zuboff, 2015).
Digital data is yet another frontier for the analysis of “biopower” and “biopolitics,” according to which human flesh has disaggregated and calibrated dimensions (Coole). Rooted in the ideas of Michel Foucault, who defines “bio-politics” as the style of government which controls the population through bio-power (Security, Territory, Population), this approach tries to comprehend how power functions in, through, and with human bodies. Human-data assemblages invoke biopower and biopolitics, which neither Coole nor Jane Bennett, writing almost a decade ago, foresaw.
Our bodies interface with machinic data representations of bodily processes and practices, and as Smith highlighted, these interactions occur in terms of haptic bodily wisdoms, our sensory system, and our capacity to have intuitions about the social world. That’s a very dynamic social interaction and a very transformative social interaction, but it has lots of ambiguity and raises some questions such as what are the implications of embodied sense interfacing with data sense? Why and how do different individuals and different social groups turn to these technologies? How do they use them to make sense of something they’re interested in, curious about, concerned with?
Regarding the ontological account of data collection, indeed, many things we might think are done automatically are actually human generated and the humans are always in the loop. Lori Moon began her discussion of Gray and Suri’s Ghost Work by reflecting on the proliferation of such human-in-the-loop labor. Moon asked the audience to imagine every time they’ve ever been prompted by a website to prove they are “not a robot.” In these instances, we carefully click the boxes that include traffic lights. But rarely, if ever, do we think about the workers who spent time labeling such images so AI interfaces could successfully recognize their content and confirm our humanity. These people are the annotators, often called gig workers or “mechanical Turks.” In Ghost Work, Gray and Suri call this “often intentionally hidden” human work and “opaque world of employment” ghost work (ix).
Who are those annotators? As pointed out in Ghost Work, they are ordinary people like Joan and Kala, behind so-called automated systems and deciding about “the initial set of labeled images, called training data.” Yet, sometimes these people may not know what they are asked to label (xiii). For instance, Kala, who is a 43-year-old housewife and the mother of two with a bachelor’s degree in electrical engineering, gets help from her sons, saying “they are more qualified to recognize these words than me” (xi). As this example shows, these “invisible workers who power the apps on our phones and websites” provide the “human in the loop” for what is often portrayed as an autonomous process. However, this human work is kept hidden, the world of employment is opaque, and it is not known if ghost workers receive a fair share from the wealth generated by the internet. While many of the tasks are done by ghost workers, their obfuscation can cause people to mistakenly think that AI is shaping and reshaping the working world.
Foremost among the issues related to AI is language because it has become the essence of the debates around AI. Drawing from her own experience in the tech industry, Moon expounded upon the ways in which annotators sort and tag words and images, and label the pictures for internet companies. In this way, they determine the language of the AI system and so our current and future communication with them and ourselves. In this context, their role and impact seem very significant and decisive.
For this reason, invisible workers also play the role of representatives of all of us, of humanity, of choice. This raises questions that we must ask, such as: How reliable are they and their process of tagging and labeling? Don’t the choices of ghost workers and the implications of their work require more attention and scientific scrutiny? How do the choices / assumptions of these people represent us and make sense for us?
To conclude: while humans’ bodies and their everyday lives have become increasingly datafied, and this brings about material and embodied dimensions to human-data assemblages, the ontological and epistemological aspects of this comprehensive data regime haven’t been thoroughly analyzed, and need close attention. The difficulty in this matter is the nature of data, which is, per se, fluid, lively, abstract, and continuously changing. Yet, more research can shed light on the topic. I hope that this blog will prove to be of value in enhancing discussions of similar concerns, and generate a deeper apprehension about this issue.