Experimental musician Holly Herndon created an AI voice clone that anyone can use

by
0 comments
Experimental musician Holly Herndon created an AI voice clone that anyone can use

This musician created an AI clone of his voice so anyone can sing like him

Experimental musician Holly Herndon says this technology isn’t here to replace artists – and the future of creativity belongs to collective intelligence

Holly Herndon stands indoors at the Serpentine North Gallery in London, composed of a suspended circular sculptural structure, with brick walls in the background.

Holly Herndon at the Serpentine North Gallery in London in October 2024.

Matthew Chattel/Future Publishing via Getty Images

Holly Herndon hears the future of music in data. Herndon came to electronic music after singing in churches and choirs in East Tennessee. He received a master’s degree from Mills College and a doctorate from Stanford University’s Center for Computer Research in Music and Acoustics.

When he started experimenting with machine learning in 2015, the outputs seemed “bad,” but he remembers seeing “diamonds in the rough.” Today those experiments have evolved into custom models that allow anyone to make one that looks exactly like they want.

scientific American spoke to Herndon about training his AI models and his belief that creativity has always been collective—AI makes it visible.


On supporting science journalism

If you enjoyed this article, consider supporting our award-winning journalism Subscribing By purchasing a subscription, you are helping ensure a future of impactful stories about the discoveries and ideas shaping our world today.


(An edited transcript of the interview follows.)

You describe your work as “protocol art.” What does it mean?

In the 20th century, the site of media creation – the paper and pen where music was written – was the artistic work. With protocol art, the creative work occurs upstream of media generation. It is creating the terms and conditions in which art is made.

We are really interested in training our own models. I always say “we” because I work with my partner Matt Dryhurst. We consider each step of the model-building process as a creative intervention moment. Creating datasets is part of the artwork. I often write music for training—music not necessarily for the human ear but for the computer to learn something.

Can you give me an example of what this looks like in practice?

Right now we have an exhibition in Berlin. We were inspired by the medieval composer Hildegard von Bingen. We wanted to pretend as if polyphony existed when she was alive. We started with a model of his compositions and added rule sets so that it could generate polyphony in his style. We took those outputs, rearranged them and gave them to human singers to interpret. Then we created a giant installation where the artists sing and invite the public to train with us.

It’s not about putting out “Write me a pop song with guitar.” It’s about using this technology to bring humans together to create art in real space.

Most commercial AI models are trained on data extracted from the Internet. Why do you insist on creating your own models?

As an electronic musician, I’ve never been one to sample – I always created my own sonic palette. When we started, before Listen and before all this, we had to create our own dataset. It felt absolutely natural, like creating your own samples or digital tools.

One criticism of products (like Listen) is that they are very “middle” sounding – trained on everything or average at most. My models seem unique because I am creating the training data myself. I also think Suno has a motivation under the hood that is limiting it to three-minute songs with a verse-chorus structure. There are railings which are making it boring. I would love for them to get rid of some of the obstacles.

Has any model ever surprised you?

We did a project around 2021 called Holly+ – which is a voice clone of my special voice. We worked with Voctro Labs to train a voice model that works in real time so people can sing using my voice. That was game-changing.

If it works in real time, other people can identify each other in real time. While we were testing it, my partner, who is British, was singing into it. I heard my voice with a British accent. It was so surreal, I had to leave the room – he was singing like me. It was one of the biggest mental mysteries how weird and wonderful this thing could be.

I think it will take five to 10 years to become seamless. But once we’re body morphing in real time – imagine you can make a model of a whale’s voice, then create a hybrid soprano whale. When you sing high, it becomes operatic; When you sing slow, you’re more Whale or Barry White. We are no longer tied to our larynx.

Where do you think we’ll be in 10 years?

A lot of the fears surrounding this technology are really fears about how the current internet works – the attention economy, how hard it is to be a creator. My partner always says, “Scrolling is for bots, and scrolling is for humans.”

Our more optimistic vision is using agents to deal with all the nonsense and filter through the stuff, actually bringing us together in the real world. That’s why our projects involve meeting people IRL and working together. Some of my smartest developer friends are vibe coding with many agents while cooking or taking a walk with their kid. Things can be really beautiful if we imagine and create that way.

Does this technology change your definition of creativity?

This whole AI thing can’t force us to see ourselves as perhaps the only creative actors in the universe. It doesn’t have to be scary – it can be beautiful and liberating.

Creativity happens in groups, in community. AI is just collective intelligence – aggregated human intelligence. The 20th century art model is associated with an individual genius who touches an object and imbues it with value. It is being imposed on his head. I am the collective intelligence of all the teams.

It’s time to stand up for science

If you enjoyed this article, I would like to ask for your support. scientific American He has served as an advocate for science and industry for 180 years, and right now may be the most important moment in that two-century history.

i have been one scientific American I’ve been a member since I was 12, and it’s helped shape the way I see the world. Science Always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does the same for you.

if you agree scientific AmericanYou help ensure that our coverage focuses on meaningful research and discovery; We have the resources to report on decisions that put laboratories across America at risk; And that we support both emerging and working scientists at a time when the value of science is too often recognised.

In return, you get the news you need, Captivating podcasts, great infographics, Don’t miss the newsletter, be sure to watch the video, Challenging games, and the best writing and reporting from the world of science. you can even Gift a membership to someone.

There has never been a more important time for us to stand up and show why science matters. I hope you will support us in that mission.

Related Articles

Leave a Comment