Distance-Aware Positional Encoding in Transformers for Improved Link
Prediction

Link prediction task aims to detect hidden relations between nodes, leveraging the graph's underlying structure while often integrating node features. The presence or absence of a link is usually defined by a threshold based on a real value related to the similarity between the pair of target nodes. There are mainly three types of link prediction methods: heuristic-based methods, latent-based methods, and content-based methods. Heuristic-based methods use a specific heuristic to compute the similarity between the target nodes. Each heuristic (Common Neighbor, Jaccard score, Adamic-Adar, etc.) is based on a given assumption that does not fit all the possible graphs. Latent-based methods leverage the graph's latent structural properties by usually incorporating the adjacency or the Laplacian matrix. The structural information is processed to build meaningful embeddings, which are then used to score the relation between the target pair without using the nodes' features themselves. Content-based methods do not use the structural information of the received graph; instead, they leverage the single features of each node. More recent methods (GNN, MPNN, Graph-Transformers) integrate all these approaches or at least content-based and latent-based methods to achieve a better and more complete representation of the nodes, that are then used to feed a decoder, usually an MLP, returning the link likelihood. In this experiment, the link likelihood between pairs of nodes is computed via an encoder-decoder setup. The transformer-encoder is fed with the distance-based positional embeddings and the node features for each vertex in the h-hop subgraph around the target nodes. The decoder, an MLP, then receives the aggregation of the encoder output target nodes and yields the link likelihood.

Distance-Aware Positional Encoding in Transformers for Improved Link Prediction

BARCHIELLI, BIAGIO

2023/2024

Abstract

Link prediction task aims to detect hidden relations between nodes, leveraging the graph's underlying structure while often integrating node features. The presence or absence of a link is usually defined by a threshold based on a real value related to the similarity between the pair of target nodes. There are mainly three types of link prediction methods: heuristic-based methods, latent-based methods, and content-based methods. Heuristic-based methods use a specific heuristic to compute the similarity between the target nodes. Each heuristic (Common Neighbor, Jaccard score, Adamic-Adar, etc.) is based on a given assumption that does not fit all the possible graphs. Latent-based methods leverage the graph's latent structural properties by usually incorporating the adjacency or the Laplacian matrix. The structural information is processed to build meaningful embeddings, which are then used to score the relation between the target pair without using the nodes' features themselves. Content-based methods do not use the structural information of the received graph; instead, they leverage the single features of each node. More recent methods (GNN, MPNN, Graph-Transformers) integrate all these approaches or at least content-based and latent-based methods to achieve a better and more complete representation of the nodes, that are then used to feed a decoder, usually an MLP, returning the link likelihood. In this experiment, the link likelihood between pairs of nodes is computed via an encoder-decoder setup. The transformer-encoder is fed with the distance-based positional embeddings and the node features for each vertex in the h-hop subgraph around the target nodes. The decoder, an MLP, then receives the aggregation of the encoder output target nodes and yields the link likelihood.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
			
	Anno Accademico
	
				2023
			
	Relatore
	
				VASCON, SEBASTIANO
			
	Appare nelle tipologie:
	
				Laurea magistrale

File in questo prodotto:

File	Dimensione	Formato
881423.pdf accesso aperto Dimensione 1.37 MB Formato Adobe PDF Visualizza/Apri	1.37 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14247/24867