The vast improvements of generative audio AIs have created powerful new tools to generate audio
content for a wide target group. Popular audio generation models, like Suno and Udio, enable users to
generate high-quality music by writing a text prompt. This makes them very accessible and easy to use
for the average person, but for audio professionals such as musicians and sound designers, they lack
expressivity and adjustability because of their text prompt interface and non-real-time audio
generation. Novel approaches to AI audio generation like IRCAM RAVE offer solutions to these
problems by generating audio in real time and abandoning the need for text prompting and offering
direct and precise control of the generation through the model’s latent variables. This inevitably raises
the question of how an interface for a real-time-generative-audio-model that abandons the need for
text prompting could be designed and if this method offers potential for new sound design and music
tools.
This thesis explores one of various ways of designing an interface for a real-time-generative-audio-
model by giving the user methods to experimentally explore RAVE models with a variety of
interaction methods. These interaction methods range from simple sliders to 3D physical models, with
each method exploring a distinct way to interact and therefore generate audio content. During an
expert review, the interface was evaluated to gather feedback on its perception and usability, as well as
the interaction methods used. Using this expert review, the interface was enhanced with new
interaction methods. This thesis concludes with an outlook for future generative audio models and
what audio professionals may expect to incorporate them into their workflows.
The vast improvements of generative audio AIs have created powerful new tools to generate audio content for a wide target group. Popular audio generation models, like Suno and Udio, enable users to generate high-quality music by writing a text prompt. This makes them very accessible and easy to use for the average person, but for audio professionals such as musicians and sound designers, they lack expressivity and adjustability because of their text prompt interface and non-real-time audio generation. Novel approaches to AI audio generation like IRCAM RAVE offer solutions to these problems by generating audio in real time and abandoning the need for text prompting and offering direct and precise control of the generation through the model’s latent variables. This inevitably raises the question of how an interface for a real-time-generative-audio-model that abandons the need for text prompting could be designed and if this method offers potential for new sound design and music tools.
This thesis explores one of various ways of designing an interface for a real-time-generative-audio-model by giving the user methods to experimentally explore RAVE models with a variety of interaction methods. These interaction methods range from simple sliders to 3D physical models, with each method exploring a distinct way to interact and therefore generate audio content. During an expert review, the interface was evaluated to gather feedback on its perception and usability, as well as the interaction methods used. Using this expert review, the interface was enhanced with new interaction methods. This thesis concludes with an outlook for future generative audio models and what audio professionals may expect to incorporate them into their workflows.