Contagious Speech

2022–2023 Installation with video and live performance, questionnaire

The prevalence of virtual “screen talk”—speech through devices connected to the internet—influences the nature of all spoken language. The Covid pandemic, with the sudden transition from “contaminated” face-to-face speech to streamed online speech, accelerated this process.  Central to Contagious Speech is an essay-like video work that explains how and why language IRL (in real life) has been affected.

The script is based on Zoom interviews with a range of voice and  speech professionals, including a singing teacher, a medic, a voice-over artist, a natural language processing specialist, and a beat-box artist. In the work, actors and actresses play characters with similar expertise; they sometimes seem to be conversing with Nicoline van Harskamp directly. However, what can be heard is a synthesized, AI-generated version of her voice. Breath and breathlessness are recurring themes in the work, and while the video jumps from one speaker to another, there is one continuous flow of air that can be traced through a “breath score.” Visualized as a graph that slides in time with the speech, it allows viewers to breathe along with the speakers.

Contagious Speech can be viewed on a dedicated website. It has also been presented in exhibitions with live performers and as a stage performance.

Produced with Gallerie UQO. Presented as an installation at TULCA Festival of Visual Arts in Galway, Galerie UQO in Gatineau, Clemente Center in New York, and Manifold in Amsterdam. Presented as a stage performance at BAK in Utrecht and De Brakke Grond in Amsterdam.

With Irina Hrabarska, Nazanin Fakoor, Slobodan Bajic, Moena de Jong, Jacques de Bock, Hurrakane Tha SoundZtorm, Rakesh Parangath, Nicoline van Harskamp, Juha Myllylä, Yvette Supraski, Ali Shafiee, Jaike Belfor, Yurie Umamoto, Faustina Kanin, and Claire King.

Voix Blanche

2023–in progress Stage performance, podcast series

Voix Blanche surveys the prodosody – including factors like intonation, pitch, and speed – of AI-generated speech. “Neural voices” are trained using datasets that contain lecture-like speech in dominant languages, and thus exibit prosodic bias towards this.

This project looks at artistic prosody types and the way they can be reproduced in neural text-to-speech (NTTS) processes. Building on Nicoline van Harskamp’s other digital art projects about the future of spoken language, Voix Blanche surveys traditional and current prosody types used in theatre, art, and poetry, as well as social media, gaming and podcasts. Together with a developer, the prosodic scope of existing text-to-speech applications are subverted and enhanced, with the eventual goal to write stage dialogues for human and “neural” performers. An episodic podcast about the working process will be published on a dedicated website.

A work in progress, with financial support from the Creative Industries Fund NL and the Pauwhof Fund. First results were presented at Sorbonne Nouvelle in Paris in November 2023, hosted by Myriam Suchet.

Telepresence Toolbox

2022–23 Symposium, toolbox, streaming platform

Since the 1980s, artists have performed their work via computer platforms, and described their practices as “cyberformance,” “virtual theatre,” “telepresence art,” or “networked performance.” In the pandemic years (2020–22), the medium became better known as “Zoom performance,” a catch-all for the forms of online live performances that made use of video conferencing platforms.

In the post-pandemic moment, Telepresence Toolbox examines—with students, staff, and guests at the University of Fine Arts (Kunstakademie) in Münster: what has been produced online? How can we expand on a discipline that is much more than a product of circumstance? What streamed performance practices and what hybrid practices—where liveness occurs in person and online simultaneously—exist today? What tradition is it indebted to?

In the online series Zoom Performance Meetings in 2022, and in the later hybrid Zoom Performance Symposium, these questions were discussed theoretically. In 2023 and 2024, a physical and digital toolbox was developed that allows students of the Kunstakademie Münster and the Kunstakademie Düsseldorf to stream their live performance work on a purpose-built platform, instructed by a live broadcasting engineer

Produced within the professorship for performative art at the University of Fine Arts in Münster, with an NRW Digi-Fellowship. In collaboration with Lena Newton, professor for stage design at the Kunstakademie Düsseldorf.

Symposium with Mallika Taneja, JODI (Joan Heemskerk), Jana Kerima, Lex Rütten, Pablo Fontvila, and Ali Eslami. Realized in collaboration with Joe Bauer, Paula Göbb and Renée Morales Garcia.

Zoom Performance meetings with Almut Pape, Clara Gomez, Lila Moore, Paul Sermon, Rana Hamadah, Sara Lana, Sina-Marie Schneller, Tamara Kuselman, Marie-Hélene Leblanc, Joan Heemskerk, Jana Kerima Stolzer, Malika Taneja, Public Universal Friend and Annie Abrahams.

English Forecast

2013 Streamed live performance, video

Four people sit inside an audio booth. Each wears headphones and keeps eye contact with the camera and therefore the viewer at home. They take turns saying sentences that together form statements about the future of the English language. Each person’s pronunciation or “accent” changes each time they speak, and it becomes clear, after some time, that they are mimicking the speech of others that they are hearing through their headphones. Their words are subtitled in International Phonetic Alphabet. After completing a statement, the speakers repeat words previously pronounced, and add a short pause where the viewer is invited to repeat what is said out loud.

The performance was streamed from the Performance Room in Tate Modern to live audiences via YouTube.

Produced by Tate Modern as part of Tate Live Performance Room. Presented as a video at the Edith-Russ-Haus in Oldenburg, Fundació Antoni Tàpies in Barcelona, Frac Lorraine in Metz, Sogn og Fjordane Kunstmuseum in Førde, and Museo Arte Contemporáneo in Vigo.

With Walles Hamonde, Sakuntala Ramanee, Ariane Barnes and Chris Rochester.

Contagious Speech – video excerpt
Contagious Speech at Manifold Books in Amsterdam – peformance documentation

Contagious Speech

Juho sits on a stool, bent over his phone. He wordlessly hums a Billie Eilish song.

YVETTE’S VOICE-OVER  When people use the telephone, and something in the connection hinders the communication, they either start raising their voice, or they start squeezing their voice. Regardless of the nature of the hindrance. Even though they’re completely audible to the other, they always start compensating for what they hear.This is also true in video conferencing. Users tend to raise the larynx, and squeeze the airflow, thereby limiting the range of the instrument that is the voice.

NICOLINE’S SYNTHETIC VOICE  Like a head with fingers attached.

Yvette appears.

YVETTE  My posture after a 5-hour Zoom call affects my breathing and speaking in a number of ways. First of all, when I sit for 5 hours, my hips get stuck, and due to that,  the muscles that move my diaphragm. With a stressed diaphragm, I can’t breathe properly. Secondly, the gaze. Eyes are built to regularly vary the visual field, to change focus from close up to far away. Not to get stuck at a distance of 20 inches from a screen. The body always organises itself around the senses, so when my eyes become so inactive, my body also quiets down, and then my breathing becomes shallower. In this way, my computer eyes define the way I speak. Thirdly, when I’m in a room with actual people, I adapt the dynamic range of my voice to the perceived distance from the others. Such acoustic considerations aren’t necessary in screen-talk. At a continuous distance of 20 inches, my speaking muscles become lazy. They will, eventually, weaken.

Juho is sitting in a very collapsed position.

YVETTE’S VOICE-OVER  So he’s sitting in a very collapsed position. He’s breathing very little and due to his posture, he doesn’t have the space to fill his lungs with air. He starts speaking anyway, but towards the end of a word or a sentence, he runs out of air, and he phases out the voice until he produces a creaky sound. The creaky voice.

JUHO  Also known as vocal fry.

YVETTE’S VOICE-OVER  Vocal fry appears only on vowels, so articulation isn’t needed. There’s no projection, and very little volume. He’s speaking like she doesn’t care to reach me. He’s speaking like she doesn’t care about speaking.

JUHO Kun puhut kuin olisit innostunut jostain, se tuntuu naiivilta. On paljon turvallisempaa piilottaa innostuksesi. (When you speak like you’re enthusiastic about something, it comes off as naïve. Much safer to hide your excitement.)

YVETTE’S VOICE-OVER  Sometimes she uses her voice, her fully retracted voice, to emphasise something.

JUHO  Hieman kuin uhkauksen kuiskaus. (A bit like whispering a threat.)

Yvette re-appears on screen.

YVETTE  Vocal fry was first identified as a cultural phenomenon and was associated with young North American women. But it actually occurs in many other languages, among all types speakers, as a permanent feature. In Finland, researchers found that vocal fry occurs in males and females to an equal extend now, and in official settings like news broadcasts, as often as in informal speech. Because when people lose their willingness to project air during speech, they will eventually also lose their ability.

Yvette appears in a Zoom screen and shows footage from a 1987 Finish news item.

YVETTE  This is what Finish sounded like in 1987. No fry!

Yvette re-appears on screen.

YVETTE  In a Zoom call, we rely too much on the microphones that are built into our device.  When somebody says they can’t hear you, do you speak louder? No, you will move closer to your device. I’ve been a signing teacher since the 1980’s, and I’ve always worked on the premise that an unamplified singing voice is active, and should produce overtones that can be heard even through a complete orchestra. It’s not only a matter of volume, it’s a matter of a piercing voice quality. Today, singers like Beyoncé and Ariana Grande still have this quality, and on TV talent shows they still appreciate a large female voice.

NICOLINE’S SYNTHETIC VOICE  This style of singing makes me physically aggressive.

YVETTE  But with a microphone, you can sing notes in a range that normally produces very little volume. A new generation of singers uses this to the extreme. Billie Eilish is a perfect example of screen singer. She doesn’t work with the vocal cords that much, but rather with the articulation.

Juho appears, softly singing Billy Eilish now.

YVETTE’S VOICE-OVER  There’s no resonance in the high notes. Everything is small and breathy. This voice would never work in acoustic set-up.  And you may feel chill and lazy during the singing, but actually you work harder. You’re inefficient because you waste a lot of  air.

NICOLINE’S SYNTHETIC VOICE  This breathiness is also a more contagious way of speaking.

Yvette re-appears on screen.

YVETTE  Last week I met my own singing teacher from college.  She still works with students and she told me: “It’s getting worse and worse. Before, the new ones couldn’t sing very well. But now, after the pandemic, they don’t even know how to make sound. They don’t let air out. Everything is going inward.”

Script fragment from Contagious Speech

Contagious Speech – full length video

Repetition Excercise / SSML Excercise

MEISNER  “All right. Use something new and begin again, slowly.”
<speak> All right. <break time=”500ms”/> Use something new and begin again, slowly.</speak>

 After a moment, Anna elbows Vince in the back.

VINCE  “You poked me in the back!”
<speak>You poked me <emphasis level=”strong”>in the back!</emphasis><break time=”500ms”/></speak>

ANNA  “I poked you in the back.”
<speak>I poked <prosody pitch=”high”>you</prosody> in the <prosody rate=”medium”>back.</prosody><break time=”500ms”/></speak>

VINCE  “You poked me in the back.”
<speak>You poked me <emphasis level=”strong”>in the back.</emphasis><break time=”500ms”/></speak>

ANNA  “Yes, I poked you in the back.”
<speak>Yes, I poked <prosody pitch=”high”>you</prosody> in the <prosody rate=”medium”>back.</prosody><break time=”500ms”/></speak>

VINCE  “Yes, you poked me in the back.”
<speak>Yes, <emphasis level=”strong”>you</emphasis> poked me <emphasis level=”strong”>in the back.</emphasis><break time=”500ms”/></speak>

ANNA  “Yes,” she says, amused at his displeasure, “I poked you in the back.”
<speak>Yes,<break time=”100ms”/><break time=”100ms”/> I poked <prosody pitch=”low”>you</emphasis> <emphasis level=”medium”>in the back.</emphasis><break time=”500ms”/></speak>

VINCE  “What’s funny?“ he snaps.
<speak>What’s <emphasis level=”strong” rate=”fast”>funny?</emphasis><break time=”200ms</speak>

ANNA  “What’s funny?”
<speak>What’s <prosody pitch=”high”>funny?</prosody><break time=”500ms”/></speak>

VINCE  “What’s funny?“ he repeats.
<speak>What’s <emphasis level=”strong”>funny?</emphasis><break time=”200ms”/> </speak>

ANNA  “What’s funny?”
<speak>What’s <prosody pitch=”high”>funny?</prosody><break time=”500ms”/></speak>

VINCE “What’s funny?” Vincent says with unnatural stress on the first word.
<speak>What’s <emphasis level=”strong” rate=”slow”>funny?</emphasis><break time=”500ms”/></speak>

MEISNER  “No! That’s a reading! Until then it was very good, but ‘What’s funny?’ was a way of creating variety. I’ll show you something.”
<speak> <prosody pitch=”high”>No! That’s a reading! Until then it was very good, but ‘What’s funny?’ was a way of creating variety. I’ll show you something .</prosody> </speak> “No! That’s a reading! Until then it was very good, but ‘What’s funny?’ was a way of creating variety. I’ll show you something.” | <speak> <prosody pitch=”high”>No! That’s a reading! Until then it was very good, but ‘What’s funny?’ was a way of creating variety. I’ll show you something .</prosody> </speak>

Synthetic speech markup language for an excerpt from Sanford Meisners actors’ manual

English Forecast – performance view (photo by Ana Escobar voor Tate Photography)
English Forecast at Tate Live: Performance Room – full length version with concluding interview