Operator Precedence Grammars (OPGs), thanks to their local parsability property, allow for the definition of data-parallel algorithms like Operator Precedence Parsing (OPP), capable of dominating the performance of traditional LR parsers and achieving significant speed-ups on multicore machines. However, it has been observed that when the data being parsed contain a large portion of parallel elements, such as chains of arithmetic operations or arrays, OPP is forced to advance sequentially, suffering from long critical paths that negatively affect its performance and cause the production of highly skewed parse tree. This prompted researchers to propose alternative approaches such as Associative Operator Precedence Parsing (AOPP), and even to revisit the theoretical foundations of OPGs, leading to the formalization of Cyclic Operator Precedence Grammars (C-OPG), a family of grammars capable of overcoming such limitations while retaining the same expressive power as traditional OPGs. In this work, we present a novel parallel parsing algorithm that leverages the advantages of C-OPGs to improve upon OPP and AOPP by providing better performance and higher scalability, and by producing optimal unranked parse trees. Finally, we demonstrate its applicability through a practical implementation within the GoPAPAGENO project, a parallel parser generator developed in the Go programming language, comparing the results of different techniques on real-world data.
Le Grammatiche a Precedenza di Operatori (OPG), grazie alla loro proprietà di parsabilità locale, consentono di definire algoritmi paralleli come Operator Precedence Parsing (OPP) in grado di dominare le prestazioni dei parser LR tradizionali e di ottenere significativi speed-up su macchine multiprocessore. Tuttavia, è stato osservato che, quando i dati contengono un'ampia porzione di elementi paralleli, come catene di operazioni aritmetiche o liste, OPP è costretto ad avanzare in modo sequenziale, soffrendo di lunghi percorsi critici che influiscono negativamente sulle sue prestazioni e che causano la produzione di alberi sintattici molto sbilanciati. Ciò ha spinto i ricercatori a proporre approcci alternativi, come Associative Operator Precedence Parsing (AOPP), e persino a rivisitare le basi teoriche degli OPG, portando alla formalizzazione delle Grammatiche a Precedenza di Operatori Cicliche (C-OPG), una famiglia di grammatiche in grado di eliminare i requisiti che causavano tali limitazioni pur mantenendo la stessa potenza espressiva degli OPG tradizionali. In questo lavoro, presentiamo un nuovo algoritmo di parsing parallelo che sfrutta i vantaggi delle C-OPG per migliorare i risultati ottenuti da OPP e da AOPP, fornendo migliori prestazioni, maggiore scalabilità e producendo alberi di parsing ottimali. Infine, ne dimostriamo l'applicabilità attraverso un'implementazione pratica all'interno del progetto GoPAPAGENO, un generatore di parser paralleli sviluppato con Go, confrontando i risultati delle diverse tecniche su dati reali.
Parallel parsing of cyclic operator precedence grammars
GIORNETTA, MICHELE
2023/2024
Abstract
Operator Precedence Grammars (OPGs), thanks to their local parsability property, allow for the definition of data-parallel algorithms like Operator Precedence Parsing (OPP), capable of dominating the performance of traditional LR parsers and achieving significant speed-ups on multicore machines. However, it has been observed that when the data being parsed contain a large portion of parallel elements, such as chains of arithmetic operations or arrays, OPP is forced to advance sequentially, suffering from long critical paths that negatively affect its performance and cause the production of highly skewed parse tree. This prompted researchers to propose alternative approaches such as Associative Operator Precedence Parsing (AOPP), and even to revisit the theoretical foundations of OPGs, leading to the formalization of Cyclic Operator Precedence Grammars (C-OPG), a family of grammars capable of overcoming such limitations while retaining the same expressive power as traditional OPGs. In this work, we present a novel parallel parsing algorithm that leverages the advantages of C-OPGs to improve upon OPP and AOPP by providing better performance and higher scalability, and by producing optimal unranked parse trees. Finally, we demonstrate its applicability through a practical implementation within the GoPAPAGENO project, a parallel parser generator developed in the Go programming language, comparing the results of different techniques on real-world data.File | Dimensione | Formato | |
---|---|---|---|
2024_10_Giornetta_02.pdf
accessibile in internet per tutti
Descrizione: Thesis
Dimensione
803.9 kB
Formato
Adobe PDF
|
803.9 kB | Adobe PDF | Visualizza/Apri |
2024_10_Giornetta_Executive Summary_02.pdf
accessibile in internet per tutti
Descrizione: Executive Summary
Dimensione
646.87 kB
Formato
Adobe PDF
|
646.87 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/227581