Toward portable HPC applications with SYCL : a molecular docking case study

My thesis aims to verify and assess the portability across hardware vendors and performance achievable on GPGPUs computing using the new C++ framework SYCL. To do it, I ported a complete high-throughput molecular docking application from CUDA code to SYCL, converting every kernel of the application to it and being sure that the new code produces the same results as the original one. Moreover, SYCL code has to be as similar to the native one as possible. During the conversion, I make analyses of the possibility available and the limitation that a non-proprietary framework has against native code. The performances were tested on NVIDIA and AMD hardware, comparing results to the original CUDA ones. In addition to it, I analyze code differences from a performance perspective. The code is compiled with DPC++ and hipSYCL because both compilers work with the available GPUs, and it allows analyzing better the current situation of SYCL as a standard. The results of the studies show how it is possible to convert all the existing code from CUDA to SYCL and only in a few cases use some workaround to convey the current limitation of the standard. It also demonstrates that thanks to its rapid evolution SYCL is catching up quickly with proprietary code and it enables researchers to bring their applications to new hardware. The outcome obtained by SYCL code performance-wise is very encouraging for the future because, as I demonstrate, are already comparable with CUDA ones. Concluding that thanks to the current performance level reached SYCL, as a standard, is ready to be considered usable in a performance-oriented environment. Moreover, the great flexibility and portability of the code across vendors, without the necessity to learn a new language, make it a very intriguing framework for the supercomputers of the near future.

L’obiettivo della mia tesi è verificare e convalidare la portabilità e le performance che si possono ottenere in ambito GPGPU computing utilizzando SYCL, un nuovo standard in C++ per programmare Heterogeneous Processors. Per raggiungere il mio obbiettivo, sono partito da un’applicazione di docking molecolare, scritta in CUDA e sviluppata al Politecnico di Milano. Ho convertito tutto il codice in SYCL, assicurandomi che il nuovo codice producesse gli stessi risultati dell’originale e che fosse il più simile possibile a quest’ultimo. Durante il porting ho analizzato le diverse possibilità e limitazione che un framework non proprietario, SYCL, ha in confronto a del codice nativo per un hardware specifico come CUDA. Le performance sono state testate su GPU NVIDIA e AMD, comparando i risultati con quelli ottenuti da CUDA, in quanto è lo standard per l’applicazione. Inoltre ho analizzato le differenze nelle performance ottenute, andando a ricercare cosa nel codice potrebbe causarle. Il codice SYCL è compilato utilizzando entrambi i compilatori disponibili che permettono di eseguire il codice sull’hardware prescelto, ovvero DPC++ e hipSYCL. In questo modo, ho la possibilità di capire meglio la situazione attuale e le possibilità di SYCL senza che sia vincolato dallo sviluppo di un singolo compilatore. I risultati dello studio mostrano come sia possibile convertire tutto il codice preesistente da CUDA a SYCL, in modo che solo in pochi casi sia necessario trovare altre strategie per ovviare le limitazioni correnti del nuovo standard. Anche che grazie alla sia rapida evoluzione SYCL sta recuperando velocemente la distanza dagli standard proprietari ed inoltre permette di eseguire il codice su nuovo hardware dove prima non era possibile. I risultati delle performance ottenuti con SYCL sono molto incoraggianti per il futuro, perchè, come ho dimostrato, sono già comparabili a quelli di CUDA. Concludo che grazie alle performance attuali raggiunte SYCL come standard, è pronto per essere considerato utilizzabile in ambienti dove la performance è fondamentale. Inoltre la grande flessibilità e portabilità del codice su hardware di diversi provider, lo rendono un framework molto interessante per i supercomputer del prossimo futuro.