A flexible and context independent software architecture for constructing, composing and validating complex networks

Complex networks have proved to be a powerful modelling tool to analyze systems whose relations are difficulty representable by means of more traditional techniques, such as mathematical equations. They have been already used in many fields, ranging from economics and sociology to computer science and bioinformatics. The research work carried out for this Master’s thesis pertains to the first step involved in a typical network analysis workflow, namely, the building network task. This is a fundamental part of the whole workflow, as, generally, analysis results highly depend on the structure of the model to which they are applied. Therefore, this thesis is aimed at designing and implementing a generic and reusable software infrastructure which can perform and automatize the task of network construction - a laborious and tedious step often slowing down the set up time of experiments based on complex network analysis. The software architecture is conceived not only to work with data organized in different ways but also to orchestrate a composition workflow that can be used to integrate more networks into a single one. This thesis is structured as follows. Chapter 1 gives an introduction to the biomedical context, providing the reader with information about the analysed data types related to this field. Chapter 2 'Theory of Complex Networks' presents the basic theoretical notions necessary to fully understand the network concepts used throughout the research. Moreover, it provides the reader with examples of analysis applied to real systems - mainly focused on the application of complex network theory results to the biomolecular domain - in order to give a hint of the potentialities associated to modelling systems by means of networks. As the objectives of the work are defined in Chapter 3, Chapter 4 'Algorithms and Software Libraries' presents the materials and methods adopted to build the software architecture. Chapter 5 constitutes the heart of the thesis dealing with the whole software architecture, its motivations and the associated issues. Chapter 6 and Chapter 7 both discusses the validation of the project's results. The former reports statistics mostly about the time responsiveness and the memory consumption associated to use cases run on the platform. The latter, 'Testing and Validation of composition workflow', exploits biomolecular use cases in order to validate the network composition workflow, which was implemented with the architecture. In Chapter 8, the conclusions of the project are outlined, while Chapter 9 deals with potential software developments and applications in which the framework architecture could be exploited. The document is closed by the appendix A which includes the technical documents developed for the software architecture.

Le reti complesse costituiscono un potente strumento di modellazione nell’analisi di sistemi le cui relazioni sono difficilmente rappresentabili attraverso tecniche più tradizionali, quali le equazioni matematiche. Esse hanno già trovato applicazione in svariati campi, dalle scienze sociali ed economiche, alle scienze dell’informazione, compresa la bioinformatica. Il lavoro di ricerca, svolto nell’ambito di questa tesi di Laurea Magistrale, si concentra sulla prima fase tipica di workflow di analisi delle reti, ossia la fase di costruzione delle reti. Questa rappresenta una parte fondamentale dell’intero workflow, dal momento in cui, generalmente, i risultati dell’analisi dipendono fortemente dalla struttura del modello a cui sono applicati. Per questo motivo la ricerca mira a creare ed implementare un’infrastruttura software generica e riutilizzabile, in grado di svolgere ed automatizzare la fase di costruzione delle reti – fase particolarmente laboriosa e spesso alla base del rallentamento di esperimenti basati sul’analisi di reti complesse. L’architettura del software è concepita non solo per gestire diversi formati dati, ma anche per dirigere un flusso di operazioni tali da poter integrare più reti all’interno di un unico modello, esso stesso una rete. La tesi si articola nel modo seguente. Il Capitolo 1 introduce concetti relativi al contesto biomedico, fornendo al lettore informazioni riguardo ai tipi di dati analizzati nel lavoro e appartenenti a questo ambito. Il Capitolo 2, dal titolo 'Theory of Complex Networks' - (Teoria delle reti complesse), presenta le nozioni teoriche di base necessarie alla piena comprensione di nozioni legate alle reti e ricorrenti nella tesi. Inoltre, fornisce una serie di esempi di analisi accostate a sistemi reali – riguardanti principalmente l’applicazione di risultati teorici delle reti complesse al dominio biomolecolare – allo scopo di offrire una prospettiva sulle potenzialità associate a sistemi di modellazione attraverso le reti. Mentre gli obiettivi del lavoro sono descritti nel Capitolo 3, il Capitolo 4 'Algorithms and Software Libraries' - (Algoritmi e Librerie Software) illustra i materiali e i metodi adottati per costruire l’architettura del software. Il Capitolo 5, dunque, rappresenta il cuore della tesi e tratta l’architettura del software, le motivazioni alla base della sua strutturazione e altre questioni ad essa relative. Il Capitolo 6 ed il Capitolo 7 si occupano della validazione dei risultati del progetto, concentrandosi rispettivamente su parametri legati ai tempi di esecuzione e ai consumi di memoria delle operazioni lanciate sulla piattaforma il primo, e sui casi d’uso biomolecolari al fine di validare il processo di composizione di reti implementato all'interno dell’architettura, il secondo, intitolato 'Testing and Validation of composition workflow' - (Test e validazione del workflow di composizione). Nel Capitolo 8 vengono delineate le conclusioni tratte in seguito alla realizzazione del progetto e il Capitolo 9, infine, illustra i potenziali sviluppi futuri del software ed le eventuali applicazioni per cui l’architettura, così strutturata, potrebbe essere sfruttata. In calce al lavoro, l’Appendice A contiene tutti i documenti tecnici sviluppati per l’architettura del software.