P.A.G.U.R.I. : prompt audio generator user research investigation

The widespread use of artificial intelligence tools has simplified and enhanced numerous human activities, especially in creative processes in the field of music. In particular, the ability to transform textual descriptions into complex musical compositions is becoming a powerful tool increasingly accessible to everyone, since new possibilities are emerging to allow users to create highly specific sounds based on their personal needs. However, the adoption of such tools remains limited, due to the lack of clear examples of user needs and the necessary requirements to meet their demands in the field of music creation. The work of PAGURI, acronym for Prompt Audio Generator User Research Investigation, focuses on analyzing user behavior in the context of music and artificial intelligence tools for audio generation, with a particular focus on text-to-music. This study outlines the motivations behind the research, the tools used for the investigation, and describes the experiment conducted, in which a sample of individuals had the opportunity to use a text-to-music tool to generate audio from textual inputs and create personalized models with their own music. Finally, the results regarding the interaction between users and the text-to-music model will be presented, along with the related comments and suggestions on how and where these musical generation tools can find space and be employed to their fullest potential.

Il crescente impiego di strumenti di intelligenza artificiale ha semplificato e migliorato numerose attività umane, soprattutto nei processi creativi all’interno dell’ambito musicale. In particolare, la capacità di trasformare semplici descrizioni testuali in complesse composizioni musicali sta diventando un potente strumento di supporto sempre più accessibile a tutti, poiché stanno emergendo nuove possibilità per consentire agli utenti che usufruiscono di questi mezzi di creare suoni altamente specifici soddisfacendo le proprie esigenze personali. Tuttavia, l’adozione di tali strumenti rimane ancora limitata, a causa della mancanza di chiari esempi di esigenze degli utenti e dei requisiti necessari per soddisfare le loro richieste all’interno del processo della creazione musicale. Il lavoro di PAGURI, acronimo di Prompt Audio Generator User Research Investigation, si concentra sull’analisi del comportamento dell’utente nel contesto della musica e degli strumenti di intelligenza artificiale per la generazione audio, con particolare attenzione al text-to-music. Questo studio delinea le motivazioni alla base della ricerca, gli strumenti utilizzati per l’indagine, e descrive l’esperimento condotto in cui un campione di individui ha avuto l’opportunità di utilizzare un strumento text-to-music per generare audio da input testuali e creare modelli personalizzati con la propria musica. Verranno infine presentati i risultati pertinenti all’interazione tra gli utenti e il modello text-to-music, insieme ai relativi commenti e suggerimenti su come e dove questi strumenti di generazione musicale possono trovare spazio di utilizzo ed essere impiegati al massimo delle loro potenzialità.