Autonomous control of a programmable bevel-tip needle with an inverse reinforcement learning approach

Brain surgery has become more common recently due to the rise of neurodegenerative diseases. These procedures can diagnose and treat brain diseases, but they also carry many risks for patients. To minimize these risks, minimally invasive neurosurgery techniques have been developed, including keyhole neurosurgery (KN), which uses a small hole in the skull to access deep brain targets. However, standard catheters used in KN can only follow straight paths, limiting the areas that can be reached. To address this problem, modern steerable needles have been developed that can reach the point to be treated more precisely and minimize the risk of affecting blood vessels or other anatomical structures. The system presented in this work includes a PBN-based dynamic environment surgical simulator, connected with Neuroinspire, a simulator that, thanks to an integrated path planning algorithm, sets kinematic limits and creates preoperative trajectories. By integrating Unity3D, it was possible to use the NVIDIA Flex package to generate a dynamic environment that closely reproduces the real one. Additionally, the ML-Agents package enabled the easy integration of an approach based on Inverse Reinforcement Learning (IRL) to train the catheter (agent) and test its behaviour during the intraoperative phase. The approach used involves Proximal Policy Optimization (PPO), an on-policy algorithm (DRL) that maximizes a surrogate objective function, and Generative Adversarial Imitation Learning (GAIL), a new IRL-like framework that extracts a policy from manually recorded data, allowing IRL to exploit complex, high-dimensional DRL configurations. This method enables the agent to independently follow the path from its starting point to the target point (i.e. a tumour) by combining feedback from the surrounding environment (simulated brain tissue) and movements of an expert surgeon saved in demonstration files (.demo). The initial objective was to evaluate whether it was possible to train the agent using demonstrations acquired directly with the real system, in vitro. However, due to the complexity of the system, only a few trials were recorded, which proved insufficient for training a GAIL-based network. The agent was then trained with the same dataset of preoperative paths but with a much greater number of data acquired in simulation. This produced better results than the previous model, but they were still not optimal. Consequently, it was investigated how the behaviour of the network would change if the dataset of preoperative paths was doubled. The three models were tested on three different paths (test dataset). By comparing all the models, it was determined that an approach such as PPO+GAIL and a complex environment, like the one presented in this work, require not only a large number of data points but also a greater variety of data. With the latest model, the system successfully guided a bevel-tip needle to a predefined target pose with an average position error of 0.93 ± 0.4 mm and 3.43 ± 0.5 degrees of orientation in a simulated deformable environment with a 100% success rate. The proposed framework may be suitable to support neurosurgeons during surgical procedures, but its efficiency needs to be assessed in a real-world context.

La chirurgia cerebrale è diventata più comune di recente a causa dell'aumento delle malattie neurogenerative. Queste procedure possono diagnosticare e trattare le malattie cerebrali, ma comportano anche molti rischi per i pazienti. Per ridurre al minimo questi rischi, sono state sviluppate tecniche di neurochirurgia minimamente invasive, tra cui la neurochirurgia endoscopica, che utilizza un piccolo foro nel cranio per accedere alle aree cerebrali situate in profondità. Tuttavia, i cateteri standard utilizzati nella neurochirurgia endoscopica possono seguire solo percorsi rettilinei, limitando le aree che possono essere raggiunte. Per affrontare questo problema, sono stati sviluppati nuovi aghi orientabili che riescono a raggiungere il punto da trattare in modo più preciso e ridurre al minimo il rischio di colpire vasi sanguigni o altre strutture anatomiche. Il sistema proposto in questo lavoro comprende un simulatore chirurgico di ambiente dinamico basato su PBN, in collegamento con Neuroinspire, simulatore che grazie ad un algoritmo integrato per la pianificazione dei percorsi, imposta alcuni limiti cinematici e crea delle traiettorie preoperatorie. Grazie all’integrazione di Unity3D è stato possibile sfruttare il pacchetto NVIDIA flex per generare un ambiente dinamico che riproducesse il più possibile quello reale. Inoltre, grazie al pacchetto ML-Agents è stato possibile integrare facilmente un approccio basato su Inverse Reinforcement Learning (IRL) per allenare il catetere (agente) e testare il suo comportamento durante la fase intraoperatoria. L’approccio utilizzato prevede l’utilizzo di Proximal Policy Optimization (PPO), un algoritmo on-policy (DRL) che massimizza una funzione obiettivo surrogata, e Generative Adversarial Imitation Learning (GAIL), un nuovo framework simile a IRL che estrae una politica da dati registrati manualmente consentendo a IRL di sfruttare configurazioni DRL complesse e ad alta dimensionalità. Grazie a questo metodo è possibile insegnare all’agente a percorrere autonomamente il percorso dal suo punto iniziale al punto di target (ad esempio un tumore), combinando i feedback ricevuti dall’ambiente circostante (tessuto cerebrale simulato) e i movimenti di un chirurgo esperto salvati in files di dimostrazione (.demo). L’obiettivo iniziale era valutare se fosse possibile allenare l’agente, utilizzando delle dimostrazioni acquisite direttamente con il sistema reale, in vitro, ma a causa della complessità del sistema, è stato possibile registrare solo alcune prove, il cui numero è risultato insufficiente per allenare una rete basata su GAIL. Successivamente, si è provato ad allenare l’agente con lo stesso dataset di percorsi preoperatori ma con un numero molto maggiore di dati acquisiti in simulazione. Sono stati ottenuti migliori risultati rispetto al modello precedente ma comunque non ottimali. Di conseguenza, si è voluto studiare come il comportamento della rete sarebbe cambiato se il dataset di path preoperatori fosse raddoppiato. Tutti e tre i modelli sono stati testati su tre percorsi diversi (dataset di test). Confrontando tutti i modelli si è valutato che un approccio come PPO+GAIL e un ambiente complesso come quello presentato in questo lavoro, richiedono oltre che un grande numero di dati, anche una maggiore varietà di dati. Con l’ultimo modello il sistema ha guidato con successo un ago orientabile verso il target con un errore di posizione medio di 0,93 ± 0,4 mm e di 3,43 ± 0,5 gradi di orientamento in un ambiente simulato deformabile con un tasso di successo del 100%. Il framework proposto potrebbe essere adatto per supportare i neurochirurghi durante le procedure chirurgiche ma è necessario valutare la sua efficienza in un contesto reale.