The first step towards software testing automatic generation : automatic tests quality labeling

Create new things by analyzing and mix the features of the existing ones is the most promising Deep Learning future development. In the literature, good results are reached with images and some attempts have been done with text, it is now time to transfer the knowledge to more exciting areas. This work explores the Software Testing field with a data science approach to get ready for the future self-writing software testing age. The power of generative networks from deep learning stands in their ability to generate every kind of item. This thesis contributes to the analysis of the possibility to generate software tests as content. Software testing is particularly interesting because is one of the most important steps in the software development lifecycle. Thousands of tests are executed every day with the aim to prove and assure the software products functionalities. Software testing requires a lot of efforts and resources to be maintained. The automatic tests generation will turn into an overall quality improvement of the software product and in a support for the Quality Assurance engineers, who are involved in software testing. To reach the goal of automatic test writing, intermediate steps are required. As the objective is to improve the software quality, it is crucial for writing tests with good quality. Tests labeling with an information about their quality is the first intermediate step. The tests quality evaluation is not a trivial task, software testing experts are very far from the consensus on a clear definition. Is here presented a novel approach to objectively define the features which characterize the quality of the tests. By analyzing the answers obtained by interviewing the experts, we formalized the properties related to the quality of the tests. Resilience and reliability are the two main analyses that must be carried out in order to label a test with an information about its quality. Both the analyses are to be done a posteriori by using the data coming from the executions outcome of the tests and the defect information that may exist related to each execution. A Data Mining approach is given by structuring a seven steps pipeline for the resilience features extraction. From the data sets about tests executions outcome and defects data, the pipeline transforms entries in vectors that are easy to use structures for the distance computation. Clusters of similar vectors are built to gather tests according to their behavior. Deeper specific analyses are conducted in the defect reasons to properly link the problem to the failure. An expert look is given to validate the content of the clusters and uniquely label them. The reliability analysis has been formulated and explored with a simple case of study. The idea of reliability is similar to the prediction accuracy and gives a score to each test. By combining the two information can be finally set a global label about the quality of every single test. This step is only formalized and some directions are given to the future works. The thesis contains a methodology chapter that helps the reader to repeat the analyses. Some results are given together with the solutions about the encountered problematic, this real case adaptation has been carried out inside Amadeus, which provided the data about the tests. The thesis represents the first step of a completely new research that has no references in the literature.

La funzionalità più promettente per gli sviluppi futuri delle Deep Neural Network é la generazione di nuovi contenuti. Oggi, all’interno della letteratura scientifica, troviamo molti esempi che introducono la generazione di immagini; in questo campo sono stati raggiunti entusiasmanti risultati. Alcuni tentativi sono stati fatti con altre tipologie di contenuto quali musica e testo. E ́ ora il momento di applicare questa tecnica di generazione a contenuti più interessanti. Il potere delle reti generative derivanti dal Deep Learning sta nella loro capacità di generare qualsiasi tipologia di contenuto, sono però necessari moltissimi dati affinché la rete raggiunga risultati eccellenti. La ricerca presentata in questa tesi, esplora il campo del software testing con un approccio di data science. L’ obbiettivo é prepararsi alla scrittura automatica dei test del software trattandoli come contenuto. Il collaudo del software (comunemente software testing) è uno dei passaggi più importanti nel ciclo di vita dello sviluppo del software. Migliaia di test vengono eseguiti ogni giorno con lo scopo di controllare e assicurare le funzionalità dei prodotti software. Questi test richiedono molti sforzi e risorse per essere mantenuti. La generazione automatica dei test apporterà un miglioramento generale della qualità del prodotto software e un supporto per gli ingegneri del Quality Assurance. Nondimeno, si potrà assistere ad un progressivo spostamento di risorse umane, prima impiegate nella scrittura dei test, a compiti concettualmente più avanzati. Per raggiungere l’obiettivo della scrittura automatica del test, sono necessari passaggi intermedi. Poiché l’obiettivo è migliorare la stabilità del software, è fondamentale scrivere test di buona qualità. L informazione riguardo la qualità dei test non è però presente nei dati che oggi vengono conservati all’interno delle aziende. Il primo passo, è stato quindi etichettare i test con informazioni riguardo la loro qualità. La valutazione della qualità dei test non è un compito banale, gli esperti di software testing sono molto lontani dal consenso su una definizione chiara e condivisa. Analizzando le risposte ottenute dalle interviste fatte agli esperti, sono state definite oggettivamente le proprietà che caratterizzano la qualità dei test: resilienza e affidabilità. Viene qui presentato un nuovo approccio che segue i criteri utili per l’ estrazione di tali caratteristiche. Sono proposte due analisi da condurre a posteriori, utilizzando cioè i dati provenienti dall’esito delle esecuzioni dei test e le informazioni sulle defects (= difetti, cioè problemi reali riscontrati). Questi due set di dati vengono analizzati in parallelo e poi uniti in modo da individuare la corrispondenza che può esistere tra risultato dell’ esecuzione e il problema riscontrato. L’estrazione delle caratteristiche di resilienza ́e ottenuta grazie all’utilizzo di un approccio di data mining, il quale ha permesso di strutturare una pipeline contenente sette passaggi. La pipeline trasforma lo stato finale di tutte le esecuzioni di ogni singolo test in un valore numerico; i singoli valori vengono aggregati in vettori che rappresentano le esecuzioni dello stesso test, lanciate nella stessa giornata. Vengono utilizzati algoritmi di clustering per raggruppare i vettori che presentano un comportamento simile nello stato finale di esecuzione. Analisi più approfondite e specifiche sono poi condotte per individuare la ragione dei difetti, in modo da collegare correttamente il problema al fallimento. Viene fornito un intervento degli esperti per convalidare il contenuto dei cluster ed etichettarli in modo univoco. L’analisi di affidabilità è stata solo formulata e testata con un semplice caso di studio. Essa prevede un controllo della coerenza tra fallimenti e problemi reali riscontrati. Combinando le due informazioni, si può infine attribuire ad ogni singolo test un’etichetta globale della sua qualità. Questo passaggio è solo formalizzato e vengono date alcune indicazioni per gli sviluppi futuri. La tesi contiene un capitolo metodologico che aiuta il lettore a ripetere le analisi. Sono forniti alcuni risultati insieme alle soluzioni dei problemi riscontrati. L’ adattamento al caso reale è stato effettuato presso Amadeus, azienda che ha la necessaria disponibilità di dati e l’interesse a condurre lo studio.