The thesis investigates Aalto University's current practices for captioning and subtitling video courses. The results show how the current workflow is riddled with problems that hinder the usability of the software, aggravated by automated tools that produce low quality results. These problems cause a widespread scarcity of captioned media on the platform, crippling the accessibility of educational content at Aalto University. To resolve these deficiencies, the thesis proposes an alternative user interface with improved usability, implementing UI features to facilitate subtitle editing. Moreover, the solution implements an error checking and a suggestion system powered by LLM to further improve the users' efficiency during the manual correction of captions and subtitles. A more efficient and powerful transcription service that uses open source WhisperX is also proposed. Subsequent evaluations show how the proposed solution is competitive compared to the current workflow adopted at Aalto. The results show how the automatic English captions have fewer errors, and those errors are readily highlighted by the suggestion systems. The translated subtitles sound natural and have a few remaining mistakes that can be easily caught by the LLM hints. In conclusion, this thesis sheds light on the fallacies of the current workflows. It proposes a solution with the prospect that a possible full or partial implementation of the ideas suggested in this work will help to make an educational experience more accessible at Aalto University.
La tesi indaga le attuali pratiche dell'Aalto University per il sottotitolaggio dei video-corsi forniti dall'ateneo. I risultati mostrano come l’attuale workflow sia costellato di problemi che ostacolano l’usabilità del software, aggravati da strumenti automatizzati che producono risultati di bassa qualità. Tali problemi causano la carenza di contenuti multimediali sottotitolati sulle piattaforme educazionali dell'universitá, riducendone l’accessibilità dei contenuti educativi. Per risolvere queste problematiche, la tesi propone un'interfaccia utente alternativa migliorandone l'usabilitá, implementando funzionalità per facilitare la modifica dei sottotitoli. Inoltre, la soluzione implementa un sistema di revisione degli errori basato su LLM per migliorare ulteriormente l'efficienza degli utenti durante la correzione manuale dei sottotitoli e delle relative traduzioni. Viene inoltre proposto un servizio di trascrizione più efficiente e potente utilizzando WhisperX. I successivi test mostrano come la soluzione proposta sia competitiva rispetto all'attuale workflow adottato ad Aalto. I risultati mostrano come i sottotitoli inglesi autogenerati presentino meno errori e tali errori vengano prontamente evidenziati dai sistemi di revisione. I sottotitoli tradotti sono scorrevoli e naturali e presentano pochi errori che sono facilmente individuati dalla successiva revisione tramite LLM. In conclusione, la tesi fa luce sulle attuali problematiche dell'attuale workflow aziendale e propone una possibile soluzione prospettando che una possibile implementazione totale o parziale delle idee suggerite in questo lavoro contribuisca a rendere l'Universitá di Aalto piú accessibile e inclusiva.
LLM-powered workflow for captioning and subtitling of video courses at Aalto University
BORDONARO, GUIDO
2023/2024
Abstract
The thesis investigates Aalto University's current practices for captioning and subtitling video courses. The results show how the current workflow is riddled with problems that hinder the usability of the software, aggravated by automated tools that produce low quality results. These problems cause a widespread scarcity of captioned media on the platform, crippling the accessibility of educational content at Aalto University. To resolve these deficiencies, the thesis proposes an alternative user interface with improved usability, implementing UI features to facilitate subtitle editing. Moreover, the solution implements an error checking and a suggestion system powered by LLM to further improve the users' efficiency during the manual correction of captions and subtitles. A more efficient and powerful transcription service that uses open source WhisperX is also proposed. Subsequent evaluations show how the proposed solution is competitive compared to the current workflow adopted at Aalto. The results show how the automatic English captions have fewer errors, and those errors are readily highlighted by the suggestion systems. The translated subtitles sound natural and have a few remaining mistakes that can be easily caught by the LLM hints. In conclusion, this thesis sheds light on the fallacies of the current workflows. It proposes a solution with the prospect that a possible full or partial implementation of the ideas suggested in this work will help to make an educational experience more accessible at Aalto University.File | Dimensione | Formato | |
---|---|---|---|
2024_04_Bordonaro.pdf
non accessibile
Dimensione
2.02 MB
Formato
Adobe PDF
|
2.02 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/234672