The vast improvements of generative audio AIs have created powerful new tools to generate audio content for a wide target group. Popular audio generation models, like Suno and Udio, enable users to generate high-quality music by writing a text prompt. This makes them very accessible and easy to use for the average person, but for audio professionals such as musicians and sound designers, they lack expressivity and adjustability because of their text prompt interface and non-real-time audio generation. Novel approaches to AI audio generation like IRCAM RAVE offer solutions to these problems by generating audio in real time and abandoning the need for text prompting and offering direct and precise control of the generation through the model’s latent variables. This inevitably raises the question of how an interface for a real-time-generative-audio-model that abandons the need for text prompting could be designed and if this method offers potential for new sound design and music tools.
This thesis explores one of various ways of designing an interface for a real-time-generative-audio-model by giving the user methods to experimentally explore RAVE models with a variety of interaction methods. These interaction methods range from simple sliders to 3D physical models, with each method exploring a distinct way to interact and therefore generate audio content. During an expert review, the interface was evaluated to gather feedback on its perception and usability, as well as the interaction methods used. Using this expert review, the interface was enhanced with new interaction methods. This thesis concludes with an outlook for future generative audio models and what audio professionals may expect to incorporate them into their workflows.
This master thesis explores a possbiel approach to the design of an interface for generative audio models by providing users with multiple methods of interaction that each have their own qualities that contribute to the exploration of RAVE models.