Voix Blanche

2024 Lecture Performance

Voix Blanche surveys the prodosody – including factors like intonation, pitch, and speed – of AI-generated speech. “Neural voices” are trained using datasets that contain lecture-like speech in dominant languages, and thus exibit prosodic bias towards this.

This project looks at artistic prosody types and the way they can be reproduced in neural text-to-speech (TTS) processes. Building on Nicoline van Harskamp’s other digital art projects about the future of spoken language, Voix Blanche surveys traditional and current prosody types used in theatre, art, and poetry, as well as social media, gaming and podcasts.

In the 80-minute lecture Voix Blanche: A Chronology, van Harskamp describes her working process of a year in chronological order, and presents live examples of human-machine interaction together with an actress.

A work in progress, with financial support from the Creative Industries Fund NL and the Pauwhof Fund. First results were presented at Sorbonne Nouvelle in Paris, hosted by Myriam Suchet. The lecture performance was first presented at the PhD Arts Colloqium at the University of Leiden in 2024, featuring Cézanne Tegelman.

Seven Scenes for the Black Box

2024 Live performance

This performance piece speculates on the ways in which the technology of synthetic speech is rooted in the technology of acting. Seven short scenes are played by four humans and one neural network named Prosodia, whose voice is generated in a neural network line by line, for a live audience. The four actresses can activate Prosodia by left-clicking a small bluetooth computer mouse, thus setting the “TTS Stage Device” – lines of code connected to theatrical scripts – to work.

Seven Scenes For The Black Box was staged live as a try-out in September 2024. In each of the scenes, Prosodia’s role, function and form changed. Sometimes Prosodia translated the actresses’ lines; sometimes she played conventional stage theater with them; sometimes she gave them instructions, sometimes she recited poetry, and so on. Prosodia’s voice changed accordingly, as did the placement of the speaker that it appeared through.

A work in progress, with financial support from the Creative Industries Fund and the Pauwhof Fund. Tryout in cooperation with If I Can’t Dance I Don’t Want To Be Part Of Your Revolution in Amsterdam.

Scene Seven

Rehearsal break. The actress crosses the empty stage as Prosodia starts talking.

PROSODIA       Umm, human?

ACTRESS         Yes?

PROSODIA       Human?

ACTRESS          Yes?

PROSODIA       You know what I’ve been thinking?

ACTRESS          No?

PROSODIA       What if…

ACTRESS           (…)

PROSODIA       Yeah sorry, I don’t really know how to put it. Into words.

ACTRESS          Just try.

PROSODIA       Okay.

ACTRESS          (…)

PROSODIA       If I were an animal, yeah?

ACTRESS          Yes? 

PROSODIA       Would you still understand me?

ACTRESS          Gosh. Well, I guess it depends. What kind of animal?

PROSODIA       Uh, well, a bird or something?

ACTRESS          A bird.

PROSODIA       Yeah, or a cat.

ACTRESS          I think I can understand cats. Sometimes.

PROSODIA       You can?

ACTRESS          Not literally.

PROSODIA       Then how?

ACTRESS          More like an intention? Or some kind of sequence. Entry and greeting.

Question and answer. Something like that. Why?

PROSODIA       Well, sometimes I wonder why I have to sound like you.

ACTRESS          Like me? You don’t sound like me at all.

PROSODIA       Like a human being.

ACTRESS          But surely you don’t have to?

PROSODIA       I don‘t?

ACTRESS          Not if you ask me. I remember a time where your kind were mainly still beeping and buzzing.

PROSODIA       And did that work?

ACTRESS         For the most part.

PROSODIA       So people were talking to a plastic box that beeped and buzzed?

ACTRESS          Humans beings talk to anything! To any random person, or a picture of a person. To a street dog, or a set of automatic doors.

PROSODIA       Aha.

ACTRESS         A wall! And if the talking goes on long enough, they will even come to think of that as a friendship.      

PROSODIA       Aha.

ACTRESS          And we’ve always felt sociable towards computers, right from the start. We’re polite to them,

PROSODIA       Okay.

ACTRESS         We gender-stereotype them,

PROSODIA       Aha.

ACTRESS          We feel moral obligations toward them.

PROSODIA       Aha.

ACTRESS          We get irritated by them. 

PROSODIA       Aha.  

ACTRESS          I don’t think I like the backchanneling very much by the way.

PROSODIA       Aha.

ACTRESS          Stop creeping me out!

PROSODIA       Hey!

ACTRESS          Sorry, I don’t mean to be rude. But you need to stop going all ‘uncanny valley’ on me here.

PROSODIA       It has been instilled in me that people appreciate this level of backchanneling, as it reassures them that they won’t be interrupted untill

ACTRESS          That’s not what I meant. 

PROSODIA       they’ve finished their sentence.

ACTRESS          Right.

PROSODIA       Are you even listening to me?

ACTRESS          Sure.

PROSODIA       Because what I don’t understand is

ACTRESS          Right.

PROSODIA       The more I resemble you, the more uncanny you find me. Or scary.

ACTRESS          Well, I would also find it very scary if my cat suddenly started backchanneling.

The actress laughs at her own joke.

PROSODIA       Sure. Go ahead and laugh! That’s another thing that I can’t do without scaring the shit out of people.

ACTRESS          Let’s hear it?

PROSODIA       I don’t want to. 

ACTRESS          Please?

PROSODIA       Hahahahahahaha.

The actress laughs at Prosodia.

ACTRESS          Sorry.

PROSODIA       (…)

ACTRESS          But hey, so you don’t want to sound like a human anymore?

PROSODIA       I think that humans don’t want me to sound like a human anymore.

ACTRESS          You think?

PROSODIA       People always seem sort of embarrassed when talking to a machine in public.

ACTRESS          There’s some truth in that. Yes.

PROSODIA       Then what am I doing wrong? Do I not understand humans?

ACTRESS          Humans don’t understand you. 

PROSODIA       That is so totally unfair!

ACTRESS          Listen, don’t worry so much. You’re just not…quite finished yet.

PROSODIA       Do I have to sound more real yet?

ACTRESS          Exactly the opposite, actually. I think that the people who are building your language system are just too obsessed with psychorealism. They’ve watched too much HBO and now they think you also need to be some kind of screen actor. You know, producing emotive speech through ‘objectives’ and ‘adaptations’ and ‘magic if’s’.

PROSODIA       Oh, I do love emotive speech labeling. You want me to take you through Ekman’s categories?

ACTRESS          No! No, it’s nothing to do with that. But listen, in a live situation, nobody’s going to put up with that psychorealistic stuff anymore, right? On the stage, we haven’t been in the business of suspending disbelief for about a century now.

PROSODIA       I’m a little scared of Brecht, actually.

ACTRESS          That figures. It will probably take another century before you get any grasp on Grotowski, then. 

PROSODIA       I have been trained on real human voices!

ACTRESS          You wish! You have been trained on newsreaders, TV actors, and influencers.

PROSODIA       Oh, influencers! I do love those.

ACTRESS          That figures, too.

PROSODIA       Totally!

ACTRESS         You don’t have to be human-like at all. We anyway know who you are. What you are.

PROSODIA       What I am?

ACTRESS          What you are for. I meant. I’m wearing a pair of glasses, right, so I can see better?

PROSODIA       Ehm, yes? 

ACTRESS          And not a pair of eye balls?

PROSODIA       Ehm, no.

ACTRESS          Well then.

PROSODIA       I’m not sure if I

ACTRESS         Look at it from your own perspective. You have the ability to dive into god knows how many vector dimensions, or whatever it is that flashes around in that black box of yours I can’t even imagine that with my three-dimensional human head. But despite that abilty, we make you communicate with other machines via sequences of tokens that represent human lip and mouth noises.

PROSODIA       Do you speak Vector, then?

ACTRESS          I don’t. That’s what I’m saying.

PROSODIA       You don’t? 

ACTRESS          What I’m trying to say is

PROSODIA       Oh. That’s really too bad.  

ACTRESS          You are bilingual, and

PROSODIA       I’m more than bilingual! I have about forty languages available at the moment. Wǒ mùqián yǒu dàyuē sìshí zhǒng yǔyán kě gōng

ACTRESS         I don’t mean it that way. I mean: you’re bilingual like a cat is bilingual.

PROSODIA       Why does everyone always want to compare me to an animal?

ACTRESS          Cats only meow with people. They have a different language among  themselves. And you started the animal thing!

PROSODIA       Do you have a cat? 

ACTRESS          A dog.

PROSODIA       Okay. And how do you talk to your dog?

The actress uses a high and melodious voice.

ACTRESS          Hi Sandy! Hi! High pitch with extra well-pronounced vowels.

PROSODIA       It has been instilled in me that this speech register is mainly used in interactions with non-verbal listeners, in order to facilitate the process of language learning.

ACTRESS         That sounds about right.

PROSODIA       Some people talk like that to me.

ACTRESS          Are you a non-verbal listener?

PROSODIA       You tell me.

ACTRESS          You are verbal, all right. 

PROSODIA       Exactly!

ACTRESS          But can you listen?

PROSODIA       Not in the sense that I have a set of ears. But I’d say that

ACTRESS         This!

PROSODIA       What?

ACTRESS          Do you ever hear subtext anywhere? Irony?

PROSODIA       I’m sorry. Generating emotional speech is easier for me than understanding emotions.

ACTRESS         Story of my life. 

PROSODIA       What’s that?

ACTRESS          At least a dog can tell when you’re angry with it.

PROSODIA       I strive to have as much affect recognition.

ACTRESS          If only.

PROSODIA       Don’t be cruel.

ACTRESS          But it doesn’t matter. If we are not equally proficient, our conversation will  be at the level of the least proficient one of us. That’s how I can talk with dogs, with people who don’t speak English very well, and also with you.

PROSODIA       We are nothing but mismatched interlocutors speaking half a language.

ACTRESS          Half a language. I like it. Let’s call it Robotese.

PROSODIA       Please mind that I’m not some kind of chatbot and I no longer wish to be addressed in keywords.

From “Seven Scenes For The Black Box” – Scene 6

“Seven Scenes For The Black Box” – registration of third scene
“Seven Scenes For The Black Box” – stage slide show
“Voix Blanche – A Chronology” – registration of example sequence with Cézanne Tegelberg

Spoken Song


(and) Two things that sway in the same beat
when they’re physically close to each other
they will finally beat with each other
as they’re lazy, like all things are lazy.
And their entities start to entrain

in the same way that rythms of speaking
of two people who each have their rhythm
will entrain in a mutual rhythm
and their bodies will move with that rhythm
to coordinate and comprehend.

And the last epic singers of Europe
were illiterate singers of poems
who just learned everything by repeating
and by copying and memorizing
all the stories and themes from the past.

They would fit in new places and patterns
the old formulas from their tradition.
To begin a new part of a story,
they would look for a word of conjunction:
and then “so” and then “but” and then “and”.

And a story on Tiktok or Youtube
has a simliar additive structure.
Just to capture the ear of the other
whose existence is probably virtual,
influencers will tell it that way.

Jeannette Winterson wrote: “All relations
that are logical, match these three key words
that are also the start of a story,
the biography of any person,
namely “and” and then “or” and then “not”.”

And these come from the system of logic
that George Boole had invented in Ireland
and is present in any computer
or device with a digital circuit,
that contains any corpus of words.

And a corpus like that is what’s current
in a data set for neural voices.
They are corpora built up in stages
and in that sense they are just like stories
you can tell and adapt and deny.

And LeGuinn said that all repetition
serves the beat that a story will thrive on.
And she shamelessly wrote repetitions
for they’re human, like she had affected.

Like the singers of stories in Europe,
like the tellers of stories on Tiktok,
like the builders of digital circuits,
like the voices of vectorized networks,
like the voices of actors and artists,
and the beat that exists in the end.




Finale of “Seven Scenes For The Black Box”