Mol2Raman: a deep learning model for Raman spectra prediction

Raman spectroscopy, a powerful non-destructive analytical technique, plays a vital role in material characterization and molecular analysis. Despite its potential, interpreting complex Raman spectra, especially in the context of chemical compounds, presents sig- nificant challenges like the computational time required to accurately simulate Raman spectra for complex molecules can be prohibitively high, limiting its practicality for large- scale studies. Recent advancements in deep learning have enabled the development of more accurate and efficient models for spectral prediction, offering new opportunities to enhance the applications of Raman spectroscopy in many fields. This thesis explores the intersection of Raman spectroscopy and deep learning by in- troducing Mol2Raman, a novel deep learning framework that utilizes Graph Neural Net- works (GNN) to predict Raman spectra from molecular structures encoded as Simplified Molecular Input Line Entry System (SMILES) strings. The framework is built on an architecture that tackles the prediction problem in two stages, focusing on the unique challenges of Raman spectral prediction. The first model specializes in identifying the number of Raman-active frequencies. These predictions capture the essential vibrational properties of the molecule, laying the foundation for more detailed spectral analysis. The second model extends this functionality by incorporating the number of Raman-active frequency predictions as global features, which summarize the molecule’s vibrational be- havior. These global features are integrated with local molecular descriptors, such as atomic environments and bonding patterns, allowing the model to predict the intensities of the Raman spectral peaks. This hierarchical approach ensures that both broad molec- ular characteristics and specific atomic-level interactions are accounted for, enhancing the accuracy and interpretability of the predictions. By structuring the problem in this two-stage manner, Mol2Raman bridges the gap be- tween theoretical molecular properties and spectral prediction. The framework capitalizes on the complementary strengths of the two models, with the first providing a high-level summary of vibrational activity and the second refining these insights to produce detailed spectral outputs. This innovative design enables Mol2Raman to handle the complexity of Raman spectroscopy and to generate a spectra prediction, in less than a second, that align closely to computed data.

La spettroscopia Raman, una potente tecnica analitica non distruttiva, svolge un ruolo fondamentale nella caratterizzazione dei materiali e nell’analisi molecolare. Nonostante il suo potenziale, l’interpretazione di spettri Raman complessi, soprattutto nel contesto dei composti chimici, presenta sfide significative, come il tempo computazionale neces- sario per simulare accuratamente gli spettri Raman per molecole complesse, che può essere proibitivo, limitandone la praticità negli studi su larga scala. I recenti progressi nell’apprendimento profondo hanno reso possibile lo sviluppo di modelli più accurati ed ef- ficienti per la predizione spettrale, aprendo nuove opportunità per ampliare le applicazioni della spettroscopia Raman in molti campi. Questa tesi esplora l’intersezione tra spettroscopia Raman e apprendimento profondo introducendo Mol2Raman, un innovativo framework di deep learning che utilizza reti neurali grafiche (Graph Neural Networks, GNN) per predire gli spettri Raman a par- tire da strutture molecolari codificate come stringhe SMILES (Simplified Molecular Input Line Entry System). Il framework si basa su un’architettura che affronta il problema della predizione in due fasi, concentrandosi sulle sfide uniche poste dalla predizione degli spettri Raman. Il primo modello è specializzato nell’identificazione del numero di fre- quenze Raman-attive. Queste predizioni catturano le proprietà vibrazionali essenziali della molecola, fornendo le basi per un’analisi spettrale più dettagliata. Il secondo mod- ello estende questa funzionalità incorporando le predizioni delle frequenze Raman-attive come caratteristiche globali, che riassumono il comportamento vibrazionale della molecola. Queste caratteristiche globali vengono integrate con descrittori molecolari locali, come gli ambienti atomici e i pattern di legame, consentendo al modello di predire le intensità dei picchi nello spettro Raman. Questo approccio gerarchico garantisce che vengano con- siderate sia le caratteristiche generali della molecola che le interazioni specifiche a livello atomico, migliorando l’accuratezza e l’interpretabilità delle predizioni. Strutturando il problema in due fasi, Mol2Raman colma il divario tra le proprietà molecolari teoriche e la predizione spettrale. Il framework sfrutta le forze complementari dei due modelli: il primo fornisce un riepilogo ad alto livello dell’attività vibrazionale, men- tre il secondo affina queste informazioni per produrre spettri dettagliati. Questo design innovativo consente a Mol2Raman di gestire la complessità della spettroscopia Raman e di generare una predizione dello spettro in meno di un secondo, allineandosi strettamente ai dati calcolati.