Robust camera-independent weed and crop segmentation through unsupervised domain adaptation techniques

Crops are an important source of food and of different products. In conventional agriculture, weed removal has a significant impact on crops, and is usually addressed by spreading agrochemicals uniformly all over the field. This method is not optimal, neither economically nor environmentally. Before automated robots can implement targeted weed removal, a system must be developed that reliably distinguishes crops, weeds and soil in varying conditions, like different plants' growth stages, different fields or different years. To do this, robots must be able to classify each pixel of the images they see, i.e, segmentation task, into the crop, weed and soil classes. These segmentation systems usually perform well on fields and in conditions whereby they have been trained, but they show a decline in performance when used on fields and in conditions that were not seen during training. Thus, we developed a segmentation system that can exploit the knowledge acquired in a specific setting to reach good segmentation performance in different settings. In this thesis, we evaluted the performance of a segmentation system on different datasets; we analyzed unsupervised domain adaptation techniques for segmentation systems by trying to segment a dataset with no labeled images, i.e., Target dataset, using a different dataset with labeled images, i.e., Source dataset. The techniques used will be based on CycleGAN and the Fast Fourier Transform. With CycleGAN we transformed the images of the Source domain into Target-domain style and vice versa, thus exploiting the labeling of the Source and create pseudo-labels of the Target. With the Fast Fourier Transform we exploited the preservation of the semantics given by the Phase to exchange the Amplitude of a Source image with that of a Target image, in order to obtain images with Source semantics in Target-style and vice versa. As an addition to the CycleGAN method, we propose the use of an initial constraint based on the Phase extracted with the Fast Fourier Transform. At the time of writing, this is the first time that Fast Fourier Transform has been used in agriculture for the weed detection problem. Lastly, the results were evaluated on four combinations of datasets, acquired on the same field but with different RGB cameras and platforms. We obtained an improvement of the baseline, and highlight the growth potential of unsupervised domain adaptation techniques for semantic segmentation of crop and weeds. We also tested CycleGAN's performance on datasets taken in the same field but two years apart and at different growth stages. In this case CycleGAN is unable to maintain the semantics of the images during the transformations and it worsens the performance compared to the baseline.

Il raccolto è un’importante fonte per il cibo e altri prodotti. Nell’agricoltura convenzionale la rimozione delle erbacce ha un impatto significativo sul raccolto e solitamente viene affrontata spargendo agrochimici su tutto il campo; metodo che però si dimostra essere non ottimale a livello economico e ambientale. Prima che i robot automatici possano implementare un’eliminazione mirata dell’erbacce è necessario sviluppare un sistema che riesca a distinguere in modo affidabile il raccolto, erbacce e terreno in condizioni variabili, come diverse fasi di crescita delle piante, diversi campi o diverse annate. Per fare ciò, i robot devono essere in grado di classificare ogni pixel delle immagini che vedono, i.e., attività di segmentazione, nelle classi di raccolto, erba e suolo. Nonostante i sistemi di segmentazione siano in grado di ottenere una buona performance sui campi e nelle condizioni in cui sono stati allenati, essi diminuiscono la propria resa se utilizzati in campi e condizioni non osservate durante l’allenamento. Pertanto, abbiamo sviluppato un sistema di segmentazione in grado di sfruttare le conoscenze acquisite in un contesto specifico per raggiungere buone prestazioni di segmentazione in contesti diversi. In questa tesi, in primo luogo, abbiamo valutato la performance di un sistema di segmentazione su diversi dataset; in seguito, abbiamo analizzato le tecniche di adattamento di dominio non supervisionato per sistemi di segmentazione, segmentando un dataset senza immagini etichettate, i.e. Target dataset, usando un dataset diverso con immagini etichettate, i.e. Source dataset. Le tecniche usare si basano su CycleGAN e sulla Trasformata di Fourier Veloce. Con CycleGAN abbiamo transformato le immagini del dominio Source in stile dominio Target e viceversa, con la finalità di sfruttare le etichettature del Source e creare pseudo-etichette del target. Con la Trasformata Veloce di Fourier abbiamo sfruttato il mantenimento della semantica dato dalla Fase scambiando l’Amplitudine di un’immagine Source con quella di un’immagine Target, per ottenere immagini con semantica di Source in stile Target e viceversa. In aggiunta al metodo CycleGAN, proponiamo l'uso di un vincolo iniziale basato sulla Fase estratta con la Trasformata di Fourier Veloce. Al momento di stesura, questa è la prima volta che la Trasformata Veloce di Fourier è stata utilizzata in agricoltura per il problema del rilevamento delle erbacce. I risultati sono stati valutati su quattro combinazioni di dataset acquisiti sullo stesso campo ma con diverse telecamere RGB e piattaforme. Abbiamo ottenuto un miglioramento della baseline e evidenziato il potenziale nell’uso di tecniche di adattamento di dominio in ambito di segmentazione semantica per raccolto e erbacce. Abbiamo anche testato la performance di CycleGAN su dataset presi sullo stesso campo ma a due anni di distanza e a stage di crescita diversi. In questo caso CycleGAN non riesce a mantenere la semantica delle immagini durante le trasformazioni, peggiorando la performance rispetto alla baseline.