Learning Cluster Representatives for Approximate Nearest Neighbor Search

Developing increasingly efficient and accurate algorithms for approximate nearest neighbor search is a paramount goal in modern information retrieval. A primary approach to addressing this question is clustering, which involves partitioning the dataset into distinct groups, with each group characterized by a representative data point. By this method, retrieving the top-k data points for a query requires identifying the most relevant clusters based on their representatives---a routing step---and then conducting a nearest neighbor search within these clusters only, drastically reducing the search space. The objective of this thesis is not only to provide a comprehensive explanation of clustering-based approximate nearest neighbor search but also to introduce and delve into every aspect of our novel state-of-the-art method, which originated from a natural observation: The routing function solves a ranking problem, making the function amenable to learning-to-rank. The development of this intuition and applying it to maximum inner product search has led us to demonstrate that learning cluster representatives using a simple linear function significantly boosts the accuracy of clustering-based approximate nearest neighbor search.

Learning Cluster Representatives for Approximate Nearest Neighbor Search

Vecchiato, Thomas

2024/2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Computer science and information technology
			
	Anno Accademico
	
				2024-10-25
			
	Relatore
	
				Lucchese, Claudio
			
	Appare nelle tipologie:
	
				Laurea magistrale

File in questo prodotto:

File	Dimensione	Formato
880038-1292564.pdf accesso aperto Tipologia: Altro materiale allegato Dimensione 5.4 MB Formato Adobe PDF Visualizza/Apri	5.4 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14247/23340