Multimodal deep learning for prediction of postoperative complications in cardiac surgery

Chest radiographs are the most common imaging exam and are used for both diagnosis and monitoring of different diseases. When a patient undergoes a cardiac surgery, it is routine practice to obtain a chest X-ray right after the operation, that can support physicians in the detection of possible complications due to the surgery or the anesthesia. Together with the radiograph, vital parameters, that are collected at regular intervals in the intensive care unit, are checked by physicians to determine whether the patient’s state changes over time. Deep learning methods can be exploited to support specialists in interpreting chest X-rays at the point of care. Similarly, they can also support them in understanding the patient’s state from the monitored parameters, which are produced at a fast rate and can be difficult to process by the human brain, that is inclined to miss part of the information. A recurrent neural network was used in previous work to predict complications in the 24 hours following a cardiac surgery, by considering vital parameters, such as blood pressure, monitored at a time interval of 30 minutes [1]. The goal of this thesis is to investigate whether a multimodal approach, considering both the monitored data used in [1] and a chest radiograph that is taken after the operation can improve the prediction of complications. In particular, this work focuses on postoperative bleeding, a complication that requires a re-exploration surgery, and should be detected as early as possible. Different integration strategies are examined. The results of each experiment are compared with the results obtained by the model considering only the monitored temporal parameters to verify whether images could actually bring additional information to the model. The outcomes show that the multimodal models, that include the chest radiographs, performs better than the baseline, which is the model considering only the monitored parameters. Overall, the multimodal model outperforms the simple RNN, with an accuracy of 83.86 % and a ROC AUC score of 88.88%, which represents an absolute improvement of respectively 5.91 % and 3.19%.

Le radiografie toraciche sono l’esame visuale più comune per la diagnosi ed il monitoraggio di malattie di vario tipo. Quando un paziente subisce un intervento chirurgico, una x-ray del torace viene fatta di routine dopo l’operazione; essa può essere di supporto ai medici nel rilevamento di possibili complicanze dovute all’operazione o all’anestesia. Insieme alla radiografia, i parametri vitali, monitorati ad intervalli regolari in terapia intensiva, sono controllati dai dottori per determinare se lo stato del paziente cambia nel tempo. I metodi di apprendimento profondo (deep learning) possono essere sfruttati per supportare gli specialisti nell’interpretazione delle radiografie del torace. Allo stesso modo, tali metodi possono aiutare gli esperti nel comprendere lo stato del paziente attraverso i parametri monitorati, che sono spesso prodotti ad un elevato tasso di velocità e possono essere difficili da processare dall’ uomo, che è incline a perdere parte dell’informazione. Una rete neurale ricorrente è stata utilizzata in uno studio precedente per predire le complicanze nelle 24 ore che seguono un intervento cardiaco, considerando parametri vitali (es. pressione sanguigna) monitorati ogni 30 minuti [1]. L’ obiettivo di questa tesi è quello di investigare se un approccio multimodale, che considera i dati usati in [1], uniti ad una lastra toracica del paziente fatta dopo l’operazione, possa migliorare la predizione di complicanze. In particolare, questo lavoro si focalizza sul sanguinamento postoperatorio, una complicanza che richiede un’operazione riesplorativa, e che va diagnosticata il prima possibile. Sono state considerate diverse strategie di integrazione. I risultati di ciascun esperimento sono stati comparati con quelli ottenuti dal modello che considera soltanto i parametro temporali monitorati, per verificare se le immagini potessero portare informazioni aggiuntive al modello. I risultati mostrano che il modello multimodale, che include le lastre toraciche, è più performante di quello originale. Nel complesso, il modello multimodale supera la rete neurale ricorrente, con una accuratezza dell’83,86% ed un punteggio ROC AUC dell’88,88%, che rappresenta un miglioramento assoluto rispettivamente del 5,91% e del 3,19%.