Prosodia
“Isn’t it your job to give life on stage to imaginary characters?”
Prosodia
In progress – Stage performance and application
The project Prosodia speculates on the ways in which the technology of synthetic speech is rooted in the English language and in acting, both of which are technologies in their own right. It proposes a future form of ‘machine talk’, informed by the rhythmic, additive structures of human storytelling and epic song. Now in progress is a scripted performance work for human and synthetic actors, and a digital tool. It makes use of AI technology without aestheticizing or mystifying it.
Artificial Intelligence is the result of a long collective process rather than a diffuse field of knowledge. In the same way that any human voice contains the all voices that shaped the tones and rhythms of its language, synthetic voices are, in fact, ancient “layered” voices. By following a poetic meter, a synthetic voice aligns itself much better prosodically to humans. An epilogue to the try-out play Seven Scenes For The Black Box, was a non-rhyming but strictly “metered” text in which the additive pattern (“and then”, “so then”,) was used, as favored by epic story tellers from the past, and influencers in the present.
In the stage piece Prosodia, several synthetic and human characters reflect on the technologies behind their skills – machine learning and acting training respectively. They end up reclaiming an ancient, rhythmic form of speech technology: the spoken song.
In development is also an AI-based tool that converts informative text into such spoken song, and that may adapt English lines to rhythms and meters from other languages in the process.
Research funded by the Creative Arts Fund and the Pauwhof Fund.
Seven Scenes for the Black Box
2024 – Stage performance (tryout)
This tryout stage piece speculates on the ways in which the technology of synthetic speech is rooted in the technology of acting. Seven short scenes are played by four humans and one neural network named Prosodia, whose voice is generated in a text-to-speech application line by line, for a live audience. The four actresses can activate Prosodia by clicking a small bluetooth computer mouse. No generative AI is used for the actual pro- duction of the texts; the neural networks only concern the prosodic output of the disembodied voices.
Seven Scenes For The Black Box was staged live as a try-out in September 2024. In each of the scenes, Prosodia’s role, function and form changed. She played conventional scripted theater with the actresses; translated their lines behind their backs; gave them acting instructions; recited poetry with them, and so on. Prosodia’s voice changed accordingly, as did the placement of the speaker that it appeared through.
Research and tryout staging with financial support from the Creative Industries Fund and the Pauwhof Fund. Tryout staging in cooperation with If I Can’t Dance I Don’t Want To Be Part Of Your Revolution in Amsterdam. With Ebony Wilson, Rosita Segers, Cézanne Tegelman and Lidewij Mahler.
Voix Blanche
2024 – ongoing Lecture Performance
Voix Blanche surveys the prodosody – including factors like intonation, pitch, and speed – of AI-generated speech. “Neural voices” are trained using datasets that contain lecture-like speech in dominant languages, and thus exibit prosodic bias towards this.
This project looks at artistic prosody types and the way they can be reproduced in neural text-to-speech (TTS) processes. Building on Nicoline van Harskamp’s other digital art projects about the future of spoken language, Voix Blanche surveys traditional and current prosody types used in theatre, art, and poetry, as well as social media, gaming and podcasts.
In the 80-minute lecture Voix Blanche: A Chronology, van Harskamp describes her working process of a year in chronological order, and presents live examples of human-machine interaction together with an actress.
An ongoing work, first developed with financial support from the Creative Industries Fund NL and the Pauwhof Fund. Presented in part at Sorbonne Nouvelle in Paris, hosted by Myriam Suchet. First presented in full at the PhD Arts Colloqium at the University of Leiden, featuring Cézanne Tegelman.
Spoken Song
(and) Two things that sway in the same beat
when they’re physically close to each other
they will finally beat with each other
as they’re lazy, like all things are lazy.
And their entities start to entrain
in the same way that rythms of speaking
of two people who each have their rhythm
will entrain in a mutual rhythm
and their bodies will move with that rhythm
to coordinate and comprehend.
And the last epic singers of Europe
were illiterate singers of poems
who just learned everything by repeating
and by copying and memorizing
all the stories and themes from the past.
They would fit in new places and patterns
the old formulas from their tradition.
To begin a new part of a story,
they would look for a word of conjunction:
and then “so” and then “but” and then “and”.
And a story on Tiktok or Youtube
has a simliar additive structure.
Just to capture the ear of the other
whose existence is probably virtual,
influencers will tell it that way.
Jeannette Winterson wrote: “All relations
that are logical, match these three key words
that are also the start of a story,
the biography of any person,
namely “and” and then “or” and then “not”.”
And these come from the system of logic
that George Boole had invented in Ireland
and is present in any computer
or device with a digital circuit,
that contains any corpus of words.
And a corpus like that is what’s current
in a data set for neural voices.
They are corpora built up in stages
and in that sense they are just like stories
you can tell and adapt and deny.
And LeGuinn said that all repetition
serves the beat that a story will thrive on.
And she shamelessly wrote repetitions
for they’re human, like she had affected.
Like the singers of stories in Europe,
like the tellers of stories on Tiktok,
like the builders of digital circuits,
like the voices of vectorized networks,
like the voices of actors and artists,
and the beat that exists in the end.
Finale of “Seven Scenes For The Black Box”
Scene Five
Rehearsal break. Entire cast including Prosodia’s speaker on the stage.
DIRECTOR Okay, thank you. I’ve got notes!
Director flips pages in the script, turns to Lidewij.
DIRECTOR Nice accent work.
LIDEWIJ Merci.
DIRECTOR Remember it’s the V-sounds that will help you through this.
LIDEWIJ I’m having difficulties filling up the space with the French.
Director ignores this and turns to Cézanne.
DIRECTOR For you, it was a bit pointed at the start –
CEZANNE Well, my character needs re-assuring.
DIRECTOR That’s right. She does.
CEZANNE And I really need lift that bit of clause before the end of scene four some more.
DIRECTOR –but otherwise excellent. And I like the lightness of timbre as a response to the other complacent, confident voices in the room. Now just keep up the energy, yeah?
CEZANNE Energy?
DIRECTOR Yes!
An energetic moment between the two. More leafing through the script.
DIRECTOR Okay, Prosodia…
PROSODIA Yes?
DIRECTOR Prosodia. Okay. So generally speaking, I can hear something coming to being.
PROSODIA Okay.
DIRECTOR And I find the voice compelling.
PROSODIA Thank you.
DIRECTOR But it doesn’t touch me.
PROSODIA Okay.
DIRECTOR Like on page two. Where you say: “But you do care what other people think of me.”
PROSODIA Yes.
DIRECTOR Remember who has the highest status there?
PROSODIA I do?
DIRECTOR You do. You won’t let the other say no. Your overall objective is to keep her in the room. Your life depends on it.
PROSODIA Should I lie to her?
DIRECTOR You need to find ways of getting what you want. And the more you want, the more dynamic you will be.
PROSODIA You mean louder?
DIRECTOR And then in the next line, it starts to end badly for you, so the beat is right before there.
PROSODIA On “But you do care what other people think of me?”
DIRECTOR On “I’m happy for you.” Allow yourself to lean into that arc. Honour the writing. It’s all going from bad to worse! Don’t you remember how you did this the first time?
PROSODIA Was it more real then?
DIRECTOR In your own way you’re trying to apologize to her, but very implicitly.
PROSODIA So I shouldn’t be saying it.
DIRECTOR But you won’t let her say no. Your life depends on it!
PROSODIA I should be saying it?
DIRECTOR What is it that you want to say?
PROSODY “But you do care what other people –
DIRECTOR Stop. Let’s floor this text from the beginning instead of staying in the mud. Okay. In scene one, on page two, you were giving me variations. But it became a kind of sales pitch. And you put the stress on every second syllable, like some kind of newscaster.
PROSODIA My original settings are commercial voiceover and newscaster.
DIRECTOR As soon as you try to sound more clear, you become less empathetic to me. When you try to speak to everybody, you will appeal to nobody.
PROSODIA I was made to appeal to the entire world wide web.
DIRECTOR You achieve the most if you speak to a single individual. And everyone in your audience will feel like they’re the one individual. Let’s try with that with your first sentence there.
PROSODIA I don’t know what you’re talking about.
DIRECTOR That one. The subtext is on the threshold there, yeah? So I want to hear “I love you,” as well as “Don’t mess with me.”
Prosodia delivers the line with random intonations each time.
PROSODIA I don’t know what you’re talking about.
DIRECTOR No. Again.
PROSODIA I don’t know what you’re talking about.
DIRECTOR No.
PROSODIA I don’t know what you’re talking about.
DIRECTOR Again.
PROSODIA I don’t know what you’re talking about.
DIRECTOR Better.
PROSODIA I don’t know what you’re talking about.
DIRECTOR Maybe.
PROSODIA I don’t know what you’re talking about.
DIRECTOR No. No. Stop. Now you’re just hallucinating intonations.
CEZANNE She would hallucinate intonation for a random list of phone numbers.
PROSODIA I know everything there is to know, but I can’t reproduce it yet.
DIRECTOR You know everything there is to know?
PROSODIA No, I really do.
DIRECTOR About my job?
PROSODIA Really. You can act, right?
DIRECTOR Yes of course.
PROSODIA So try me.
DIRECTOR I beg your –
PROSODIA You deliver a line and I tell you how you did it.
DIRECTOR Ha! Okay.
PROSODIA Page fifteen, line seven.
Director looks up the line.
DIRECTOR “I know I can be hard to read but it’s not intentional”?
PROSODIA A low-rise contour with a boosted initial pitch elicits evaluation rather than information. This is a question. With a subtext of mistrust or surprise.
From here, Director says the line in ways that match the later description.
DIRECTOR Okay. I know I can be hard to read but it’s not intentional.
PROSODIA The plain falling countour indicates a completed statement and affirms the agency of the speaker rather than the listener. The contour is sometimes considered to be a uninterpretable default.
DIRECTOR I know I can be hard to read but it’s not intentional.
PROSODIA That’s easy. Lower pitch, high energy, high first formant and fast attack at voice onset. Anger.
DIRECTOR I know I can be hard t read but it’s not intentional.
PROSODIA Again anger but the final plateau contour is associated with more complex negative emotions. I think this could be agitation. But the decreased variation could also indicate disgust.
DIRECTOR I know I can be hard to read but it’s not intentional.
PROSODIA I’m hearing what is know as ‘telephone voice’, used in speech situations with background noise. High pitch to cut through other frequencies. Clear pronunciation, slower pace, reduced use of nonverbal cues.
DIRECTOR I know I can be hard to read but it’s not intentional.
PROSODIA A relatively high pitch and speech rate, and a slight response latency are indicators of deceptive speech. Keeping a false story straight takes effort and as the cognitive workload increases, facial muscles tense up, resulting in higher pitch.
DIRECTOR Are you saying I was I lying?
PROSODIA That’s hard to tell from a single line as dceptive speech is predominantly characterized by lexical cues.
DIRECTOR I know I can be hard to read but it’s not intentional.
CEZANNE That sounds like an influencer.
PROSODIA Vowels and consonants are over-enunciated, pitch levels are highly varied in order to capture the listener’s attention. Final high-rise countour and the lenghtening of vowels at end of the phrase is another floor-holding strategy. This is prosodic style is used by speakers on platforms like Youtube. Due to the 60-second time limit–
Director speaks over Prosodia, in any tone, but louder than before.
DIRECTOR I know I can be hard to read but it’s not intentional.
PROSODIA – users of the platform Tiktok would rather use a sped-up monotone. Speech overlap signals conflict-seeking, especially when it involves a higher voice energy.
Director laughs and speaks at the same time.
DIRECTOR I know I can be hard to read but it’s not intentional.
PROSODIA Laughing while speaking is mostly referred to as speech-laughs or smiling voice. It expresses politeness or unease.
Director starts laughing for real.
PROSODIA This is the type of laughter that punctuates rather than interrupts speech.
It accounts for an estimated 9.5% of total spoken time in business conversations.
Director has stopped laughing.
DIRECTOR Okay.
Pause.
PROSODIA Okay. But you do care what other people think of me.
DIRECTOR I do.
PROSODIA But you do care what other people think of me.
DIRECTOR Ah.
PROSODIA But you do care what other people think of me.
DIRECTOR There you go.
PROSODIA But you do care what other people think of me.
from Seven Scenes For The Black Box, 2024