Computer Vision System for the Analysis and Classification of Reusable Cups

This thesis has been carried out within a project inquired to Digital Strategy Innovation. The aim of the project is to create a system for the detection of clutter and litter potentially interfering with the washing process of washing machines provided to fast food chain units. The main objective was the creation of a clutter detection system in an indoor environment through the use of an RGB-D camera. The methodology that was followed is based on the detection of any clutter present in the cup by exploring image-based and 3D-based available models. These approaches helped obtain robust performance for the recognition of clutter present in the cup and accurate estimation for the localization, thus reporting whether the cup could be washed or had to be further inspected and freed from any obstructing object. In the context of this project a large variety of objects must be perceived in small and restricted scenes, where they may be partially occluded and hindered by additional interferences. In order to tackle these challenges, two different models were developed: the first based on RGB anomaly detection and the second on the 3D reconstruction of the scene by making use of the RGB and Depth capture. The models employed were namely: • Patch Distribution Modeling, PaDiM, to detect and localize anomalies, corresponding to objects impeding the correct washing process, in RGB images in a one-class learning setting. This model utilizes a pretrained convolutional neural network (CNN) for patch embedding, for the probabilistic representation of the normal class multivariate uses Gaussian distributions, while to better identify potential anomalies it exploits correlations between the different semantic levels of CNN. • a 3D module for the creation and filtering of the point cloud, to estimate pose and size of objects.

Questa tesi è stata sviluppata all'interno di un progetto affidato a Digital Strategy Innovation. L'obiettivo del progetto è creare un sistema per il rilevamento di oggetti estranei e residui che potrebbero interferire con il processo di lavaggio delle lavastoviglie fornite alle catene di fast food. L'obiettivo principale era la creazione di un sistema di rilevamento di oggetti in un ambiente interno attraverso l'uso di una telecamera RGB-D. La metodologia seguita si basa sul rilevamento di eventuali oggetti presenti nella tazza esplorando modelli disponibili basati su immagini e su dati 3D. Questi approcci hanno consentito di ottenere prestazioni robuste per il riconoscimento degli oggetti presenti nella tazza e una stima accurata della localizzazione, determinando così se la tazza potesse essere lavata o dovesse essere ispezionata ulteriormente e liberata da eventuali oggetti ostacolanti. Nel contesto di questo progetto, una grande varietà di oggetti deve essere percepita in scene piccole e ristrette, dove potrebbero essere parzialmente occlusi e ostacolati da interferenze aggiuntive. Per affrontare queste sfide, sono stati sviluppati due modelli diversi: il primo basato sul rilevamento di anomalie RGB e il secondo sulla ricostruzione 3D della scena utilizzando la cattura RGB e di profondità. I modelli impiegati sono i seguenti: • Patch Distribution Modeling (PaDiM), per rilevare e localizzare anomalie, corrispondenti a oggetti che impediscono il corretto processo di lavaggio, in immagini RGB in un contesto di one-class learning. Questo modello utilizza una rete neurale convoluzionale (CNN) preaddestrata per il patch embedding, impiega distribuzioni gaussiane multivariate per la rappresentazione probabilistica della classe normale, mentre per identificare meglio le potenziali anomalie sfrutta le correlazioni tra i diversi livelli semantici della CNN. • un modulo 3D per la creazione e filtraggio della nuvola di punti, per stimare posa e dimensioni degli oggetti.