SNEAK PREVIEW: HARRIET TUBMAN’S DEEP VOICE

Maurice Wallace and Matthew Peeler.

[Critical AI 2.1 is a special issue, co-edited by Lauren M.E. Goodlad and Matthew Stone, collecting interdisciplinary essays and think pieces on a wide range of topics involving Large Language Models. Below, a sneak preview from the issue: Maurice Wallace and Matthew Peeler’s compelling “Harriet Tubman’s Deep Voice”]

Early in 2021, MyHeritage, said to be “the world’s leading genealogy website,” partnered with D-ID, an Israeli platform for so-called generative AI, to launch a new feature called “Deep Nostalgia.” Deep Nostalgia™ focalizes and animates the faces of family members and other historical figures pictured in old photographs to create a live effect. Forebears long since departed smile, blink, look sidewise, and tilt or narrowly turn their heads. 

Deep Nostalgia™ uses deep learning (DL), computer vision, and image processing technologies to produce realistic simulations of living faciality and head gesture. Still, uncanny as the experience must be of seeing one’s ancestor seeming almost alive—for example, my father encountering his adored grandmother, Amy Jane Collins Richardson, in the photo of her he’s treasured since her death in 1959—the effect only approaches the real. As MyHeritage concedes, “the end result is not authentic”—my great grandmother could not be said to have looked or moved in precisely this or that way; and the peculiar tics or twitches my father might remember as idiosyncratic are surely long gone and irrecoverable to any DL model. Rather, what MyHeritage confects (n.d.) is “a technological simulation of how the person in your photo would have moved and looked if they were captured on video.” 

For a student of visual culture like me, it is enough to sit with the video ambitions of Deep Nostalgia™ technology as it enacts a rejoinder to visual theorist W. J. T. Mitchell’s curious question What do pictures want?. If, as Mitchell says (2005 10), to “ask, what do pictures want? is not just to attribute to them life and power and desire, but also to raise the question of what it is they lack, what they do not possess, what cannot be attributed to them,” then the desire of still images must be movement. MyHeritage offers just that: simulated movement by way of controlled, algorithmic blurring of the photographic subject, which we might regard, irony of ironies, as the photo’s visual de-facing. But for all the interest MyHeritage may curry from students of visual culture for Deep Nostalgia™, others whose curiosities lie in sound technologies will find MyHeritage no less a prospect for close study. Indeed, the folks at D-ID have recently developed a “speaking” feature that enables Deep Nostalgia™’s animations to precisely mouth user text. This feature, LiveStory (Perez 2022), also relies on a synthetic voice generator with 31 differently languages represented in its data pool, dozens of dialects, and “both male and female voice options.” 

LiveStory, however, is not Khanmigo, the automated learning resource of Khan Academy. Khanmigo, profiled in the Washington Post by staff writer Gillian Brockell (2023), is scarcely up to LiveStory’s technical prowess in today’s race for “AI,” which makes the headline of Brockell’s July 16 post to the paper’s history blog, puzzling. “We ‘interviewed’ Harriet Tubman using AI. It got a little weird” would hardly be any story at all but for the gap between Khan Academy’s newfound commitment to DL-assisted online education and the historical figure Khanmigo claims to “animate.” Brockell seems agnostic if broadly optimistic toward Khan Academy’s “weird” impersonation of Tubman. But Khanmigo is hopeless. And it is not AI Tubman who is weird but anyone—perhaps Sal Khan most of all—who continues to express fundamentalist faith in the perverse promise of “AI” to resurrect the dead for the sake of learning and—strange as it seems—the salvation of history. That Khanmigo’s counterpart in digital learning, Hello History, expresses its promise in distinctly evangelical terms only renders starker the soteriological subtext in the discourse around the technology’s reputed benefits for learners, particularly those studying history. Though the training data adequate to reanimating the silenced voices of the dead does not now exist and can at best only be approximated, Khanmigo hopes to deliver a revolution in “interactive” education, nonetheless.   

At Hello History, we believe that history should be alive and accessible to everyone. To that end, we use cutting-edge artificial intelligence technology to create interactive experiences that allow people to engage and learn from historical figures. Our AI-powered experiences offer users the chance to have conversations with those who have shaped our world, and to gain insight into their lives, thoughts, and beliefs. Our AI-driven conversations provide an unparalleled learning experience, and it’s our goal to make this accessible to everyone. We want to open a window into the past and make it possible for everyone to learn from history in an engaging and meaningful way.

In other words, at Hello History, history’s ontology is scarcely different from that of a messianic faith.

Compare Hello History’s promotion of its new chatbot to the Protestant mission to make Christianity “alive and accessible to everyone” through a personalized Jesus. So close is the objective of Hello History to Protestant evangelism, in fact, that their boasts of “AI-powered experiences” could be readily adapted for religious outreach. In this scenario, “AI” is exchanged for the Holy Spirit as a little-understood technology offers prospective converts “the chance to…gain insight into their lives, thoughts, and beliefs.” For true believers like Kahn (and perhaps also Brockell), the avowed potential of technology to revolutionize education and make history “alive” abandons science for a zealotry bordering on messianic fundamentalism. As Brockell ponders the possibility of Hello History’s chatbot as a salvific social and educational good, her post passes over the harms that Khanmigo—or any outfit that claims to be “bringing history back to life”—could wage on a public that is easily enamored with digital products marketed as “AI.” 

With so little to say about Khanmigo separate from what one might expect from a product review posted on PCMag or CNET, Brockell’s account of Khanmigo’s ventriloquism amounts to little more than an advertisement for Khan Academy (and its ostensibly liberal politics of race) with Harriet Tubman as celebrity pitchman. But the wooden, lifeless sentences AI Tubman is said to speak in her “interview” with Brockell are closer to the flat and colorless text of a Wikipedia entry than to the inspired speech of a nineteenth-century freedom fighter. Those familiar with large language models (LLMs) such as OpenAI’s GPT-4—the model on which the Khan Academy chatbot is built—know that these statistical models are (as Emily M. Bender, Timnit Gebru and colleagues [2021] have put it) only stochastic parrots, bots that generate plausible sequences of words without understanding the language they use in any human-like way. AI Tubman shows the signs of this high-tech mimicry. Devoid of emotion (even when explicitly asked about her emotional condition under threat of capture), this is a surface-level impersonation at best: it reveals practically nothing about Tubman that isn’t generally known. Far from bringing history back to life, Khanmigo steals Tubman’s (imagined) body, plugs in the details of her life, and conceals the strings mimicking life in science-speak and progressivist platitudes. In this way Khanmigo pretends to incarnate the past.


This “magic” of AI Tubman is a marketing sleight of hand: an attempt to package and sell “AI” to the public as an education-saving technology for the new media age. Whatever disappointments in “AI Tubman” Brockell records—Tubman “hedged” and was “vague” on contemporary issues like reparations and critical race theory—posts like Brockell’s prime the pump for the tech that is still to come when companies like Amazon, long a champion of data-driven analytics (and recently a major investor in Anthropic AI), launch the next generation of “intelligent” software products. Brockell’s puff piece must have pleased Amazon’s founder, Jeff Bezos, in particular, as owner of The Washington Post.

The Educational Briar Patch

With experiments like Khanmigo already up-and-running, it seems likely that the “AI” ship has set sail in American education. Its advance into K-12 and college pedagogy and curricula, where faith in its educational value is on the rise, is calculated and quite possibly unstoppable. As promising as it might appear in motivating new interests in historical learning and humanistic inquiry, we argue that the national investment in the educational utility of automated software comes at an enormous cost—a price paid by the very students that technology aims to convert to history as a lively and accessible field.

Obviously, one of the most important habits of mind developed by secondary and post-secondary education is skepticism. School teaches us to ask why in the interest of developing advanced skills in inductive and deductive reasoning. Why did Dickens or Dickenson or Whitman or Wright or Ishiguro say that, exactly? What point or proposition were they calling us to in saying it that way? Such questioning sharpens a mind skeptical that what is said (or what is known about what is said) is all there is to say. On some level, automated programs like Khanmigo threaten this constructive skepticism. By pretending historical authenticity, they endow their impersonations with an air of direct authority no skepticism can easily challenge.

While the MyHeritage platform clearly declares that “the end result is not authentic, but rather, a technological simulation” of historical reenactment, chat programs marketed specifically for students tend not to be so forthcoming. Not a few misrepresent their applications as offering authentic reconstructions of the words and thoughts of historical personalities as diverse as Tubman, Ghengis Khan, and Winston Churchill. On the strength of passably human-like textual outputs, everyday skepticism is suspended and whatever the model answers back is as if from the mouths of the living dead. In reality, today’s state-of-the-art LLMs cannot even check the factual accuracy of their outputs, a point The Washington Post mostly glosses over. Brockell thus betrays a faith in the progress of “AI”’s ability to learn on its own that is widespread, further obscuring the technology’s actual reliance on armies of poorly compensated human workers, many from east Africa, to filter and moderate its data. Though she never says so explicitly, Brockell seems to believe that tomorrow, AI Tubman will catch and correct today’s misstatements. Nothing serious.

But suppose AI Tubman mimics a source claiming that “slaves developed skills” that could be used for “personal benefit,” an all too plausible proposition. In this scenario, AI Tubman risks painting a picture of the past so misleading that the “facts” of slavery might very well expand to elastically accommodate apologists for slavery and revisionism: crimes of miseducation against the intellectual, social, and material strivings of an educated society. The perception that technology is objective or unbiased is a dangerous article of faith that conceals the well-known vulnerabilities of language models to what the industry calls “hallucinations.” That tendency to confidently make up facts compounds the system’s reliance on flawed data scraped from the internet. Hence, the dependence on vulnerable human workers in Kenya, Uganda, and India (“data enrichment professionals,” euphemistically) who help scrub this much-hyped technology of its deeply embedded biases, stereotypes, and inaccuracies (Perrigo 2023). 

Deep Voice

Let us turn away now from the fantasy of AI Tubman and from history as a general field of knowledge toward the actual Harriet Tubman and the Black struggle she helped to activate. Despite Khanmigo’s aim to give students an interactive experience of history by way of “live” engagements with great men and women from the past, it does not offer the realistic exchange with Tubman it promises. Careful to avoid what Brockell (2023) calls “Tubman’s authentic speech” in order to avert the risk of being assailed for caricaturing Tubman with mocking, minstrel ridicule, AI Tubman speaks instead a “modern conversational language.” Though Brockell responds with “relief,” we are less convinced that AI Tubman’s speech is innocent. The standard American English she voices is not just ahistorical. Worse, the modernization of Tubman’s speech obfuscates her real life in nineteenth-century Maryland, Pennsylvania, and New York, silencing the inflections of time, gender, region, race, or other vocalic variables that, one presumes, would require historically sensitive training data for a more realistic simulation. 

The result of Khan Academy’s haste to unveil Khanmigo without thinking through the implications of that absent data archive is a banal impersonation of one of the most important African Americans in US history. This whitewashed “Tubman” also undoubtedly reinforces the troublesome misconception that history has a homogeneously white sound. Khanmigo risks fortifying the fallacy that the “right” and “classic” sound of historical authority is white (or is what speech in the mouths of whites sounds like). Only “hints” of the actual Tubman’s “courage and piety” mark Tubman’s “AI” voice, Brockell concludes, though even this is very likely but a projection by her derived from a couple of reminders in AI Tubman’s dialogue of religion’s significance in her historical life.

As an aural reenactment of biography, AI Tubman is a far cry from the historical Tubman it pretends to. So far indeed that, failing to promote so much as the spoken fidelity to country, district, and gender that purports to humanize speech—to say nothing of the peculiar calculus of social intonation, regional variations in pace of speech, vernacular vocabularies, and functional speech disorders—Tubman is likely as irrecoverable as the specific facial movements of my grandmother. Unless new platforms take up these considerations, locate reliable transcriptions of speech, and uncover a wider audio record of early Black speech, what we will be left with is a voice closer in tone and dialect to a certain Black woman meteorologist I know and listen to whose overcorrected speech makes every forecast seem an exercise in elocution; nothing in the voice of Khanmigo’s bot suggests any relation to the insurgent, gun-toting fugitive and Union spy named Tubman who came to speech among enslaved people on a Maryland plantation east of the Chesapeake Bay in the 1820s.

Discerning commentators on The Washington Post’s site offered plenty of reasons to pile on Brockell’s botched “interview.”  But in the final analysis, the ease by which “AI” continues to pull off its trickeries worries us most. Illusions of authenticity enacted behind Black masks and charades of Black aliveness issuing from voices imputed onto Black subjects do not bode well for teaching African American history as a discrete counter-history to the mythic one that passes for our national story. 

To be sure, Frederick Douglass left four large volumes worth of speeches, letters, and aural writings from which data might be drawn to give a language model some glimmer of his widely celebrated oratory and public storytelling. But those seeking to resurrect Tubman’s “authentic” voice—which is to say, euphemistically, her antebellum Black voice—will likely continue to fail. The actual Harriet Tubman, escaped from bondage to freedom, was never recaptured bodily by men’s hands. Would it perhaps be best for the evangelists of ed tech to heed this lesson and accept the impossibility of capturing this fugitive voice?


Footnotes

1 As of this writing, a contentious debate over this very interpretation of American slavery is raging in the public sphere. The Florida Department of Education lately unveiled new curricular standards for teaching African American history in the state. The new standards insist that Florida students understand under that during slavery, the enslaved “developed skills which, in some instances, could be applied for their personal benefit.” The backlash to this astonishing revisionism has been widespread. For an excellent treatment of this controversy, see Bouie (2023a), which is followed by Bouie (2023b).

2 The hidden human labor AI depends on—much of it located in the Global South—is worth its own space, especially for what it reveals about the AI economy globally and violent effects of “data enrichment” on the mental lives of its laborers (Perrigo 2023).


Bibliography

About Us. (n.d.-a). https://www.hellohistory.ai/about-us

Animate your family photos. MyHeritage. (n.d.-a). https://www.myheritage.com/deep-nostalgia

Bouie, J. (2023, July 28). Ron DeSantis and the state where history goes to die. The New York Times. https://www.nytimes.com/2023/07/28/opinion/desantis-slavery-florida-curriculum-history.html

Bouie, J. (2023, July 29). In lessons on slavery, Context Matters. The New York Times. https://www.nytimes.com/2023/07/29/opinion/florida-schools-history-slavery.html

Mitchell, W. J. T. (2005). What Do Pictures Want?: The Lives and Loves of Images. The University of Chicago Press.

Perez, S. (2022, March 3). MyHeritage and D-ID partner to bring photos to life with both animations and voice. TechCrunch. https://techcrunch.com/2022/03/03/myheritage-and-d-id-partner-to-bring-photos-to-life-with-both-animations-and-voice/ 
Perrigo, B. (2023, January 18). OpenAI used Kenyan workers on less than $2 per hour: Exclusive. Time. https://time.com/6247678/openai-chatgpt-kenya-workers/

Leave a Reply