Peduncle cutting point identification in images and point clouds for grapes autonomous harvesting

Grape harvesting is labor-intensive and requires a lot of workforce, but when turned into wine, grapes have a huge profit margin. Italy is the first wine producer in the world but it struggles to find workers, so our goal is to implement a grape autonomous harvesting system. In particular, this work focuses on the peduncle cutting point identification problem: to output the cutting point starting from the input information given by a camera. We developed a software in 2D that exploits the RGB image given by a simple camera, and another software that works in 3D starting from the point cloud output by a time-of-flight camera. Both approaches follow the same workflow: first we detect the grapes and berries, then we search the peduncle and finally we compute the cutting point. The grapes and berries detection step is the same in both algorithms and it is carried out by the YOLOv4 network. Comparing with the literature, we are the first to detect also the berries: we add such step because we rely on them to better approximate the geometry of the grape. Then, in 2D, starting from the bounding boxes, we identify a Region of Interest, we extract the edge segments in it and we select via geometrical and dynamical considerations the segment which better approximate the peduncle. In 3D, on the other hand, we proceed in successive croppings and segmentations to extract the grape first, and then the peduncle. Lastly, in both approaches, the cutting point is identified in the middle of the peduncle, i.e., as the segment mean point in 2D and as the peduncle cloud centroid in 3D. With respect to the state-of-the-art, the 2D peduncle search is a mix of methods found in literature, with the addition of new considerations to discard wrong segments. The 3D approach, on the other hand, is developed starting from similar works on the harvesting of tomatoes and red peppers, but its application on grape harvesting is unprecedented. On our test sets, the 2D algorithm performs with 85% precision and 55% recall, with a 11,6 milliseconds computational time, while the 3D algorithm has a 95% precision with a 23% recall, and an average computational time of 151 milliseconds.

La vendemmia richiede molta forza lavoro ed è fisicamente intensa, ma i grappoli, una volta trasformati in vino, portano un grosso margine di profitto. L'Italia è il primo produttore al mondo di vino, ma si fatica a trovare lavoratori per la vendemmia, per cui il nostro obiettivo è di implementare un sistema autonomo per la raccolta dell'uva. Questa tesi, in particolare, si concentra sull'identificazione del punto di taglio del picciuolo partendo dalle informazioni raccolte da una telecamera. E' stato sviluppato un software che lavora in 2D sfruttando le immagini RGB restituite da una normale telecamera, e un software che lavora in 3D partendo dalla nuola di punti raccolta con una time-of-flight camera. Entrambi gli approcci seguono lo stesso flusso: prima rileviamo i grappoli e gli acini, poi cerchiamo il picciulo e infine calcoliamo il punto di taglio. La rilevazione di grappoli e acini è fatta con la rete YOLOv4 e, confrontandoci con la letteratura, notiamo che siamo i primi ad riconoscere anche degli acini: la nostra idea innovativa è di sfruttarli per meglio approssimare la forma del grappolo. A partire dai bounding boxes ottenuti nel primo step, in 2D identifichiamo una Regione di Interesse, estraiamo i segmenti di bordo presenti e tra questi selezioniamo quello che meglio approssima il picciuolo tramite considerazioni geometriche e dinamiche. Invece in 3D, tramite ritagli e segmentazioni successive, estraiamo prima il grappolo e poi il picciulo. Infine, entrambi gli approcci identificano il punto di taglio come il centro del picciulo, che è approssimato dal punto medio del segmento in 2D e dal centroide della nuvola di punti del picciuolo in 3D. Confrontandoci con la letteratura, l'approccio 2D è un mix dei metodi trovati, con l'aggiunta di nuove considerazioni per filtrare i segmenti candidati. Al contrario, l'approccio 3D è stato sviluppato partendo da lavori simili condotti su pomodorini e peperoni rossi, ma la sua applicazione sui grappoli non ha precedenti. Nei nostri test, l'algoritmo 2D ha mostrato precisione dell'85% e recall del 55%, con un tempo di computazione di 11,6 millisecondi, mentre l'algoritnmo 3D ha una precisione del 95% e una recall del 23%, il tutto con un tempo di computazione di 151 millisecondi.