Link prediction task aims to detect hidden relations between nodes, leveraging the graph's underlying structure while often integrating node features. The presence or absence of a link is usually defined by a threshold based on a real value related to the similarity between the pair of target nodes. There are mainly three types of link prediction methods: heuristic-based methods, latent-based methods, and content-based methods. Heuristic-based methods use a specific heuristic to compute the similarity between the target nodes. Each heuristic (Common Neighbor, Jaccard score, Adamic-Adar, etc.) is based on a given assumption that does not fit all the possible graphs. Latent-based methods leverage the graph's latent structural properties by usually incorporating the adjacency or the Laplacian matrix. The structural information is processed to build meaningful embeddings, which are then used to score the relation between the target pair without using the nodes' features themselves. Content-based methods do not use the structural information of the received graph; instead, they leverage the single features of each node. More recent methods (GNN, MPNN, Graph-Transformers) integrate all these approaches or at least content-based and latent-based methods to achieve a better and more complete representation of the nodes, that are then used to feed a decoder, usually an MLP, returning the link likelihood. In this experiment, the link likelihood between pairs of nodes is computed via an encoder-decoder setup. The transformer-encoder is fed with the distance-based positional embeddings and the node features for each vertex in the h-hop subgraph around the target nodes. The decoder, an MLP, then receives the aggregation of the encoder output target nodes and yields the link likelihood.
Distance-Aware Positional Encoding in Transformers for Improved Link Prediction
BARCHIELLI, BIAGIO
2023/2024
Abstract
Link prediction task aims to detect hidden relations between nodes, leveraging the graph's underlying structure while often integrating node features. The presence or absence of a link is usually defined by a threshold based on a real value related to the similarity between the pair of target nodes. There are mainly three types of link prediction methods: heuristic-based methods, latent-based methods, and content-based methods. Heuristic-based methods use a specific heuristic to compute the similarity between the target nodes. Each heuristic (Common Neighbor, Jaccard score, Adamic-Adar, etc.) is based on a given assumption that does not fit all the possible graphs. Latent-based methods leverage the graph's latent structural properties by usually incorporating the adjacency or the Laplacian matrix. The structural information is processed to build meaningful embeddings, which are then used to score the relation between the target pair without using the nodes' features themselves. Content-based methods do not use the structural information of the received graph; instead, they leverage the single features of each node. More recent methods (GNN, MPNN, Graph-Transformers) integrate all these approaches or at least content-based and latent-based methods to achieve a better and more complete representation of the nodes, that are then used to feed a decoder, usually an MLP, returning the link likelihood. In this experiment, the link likelihood between pairs of nodes is computed via an encoder-decoder setup. The transformer-encoder is fed with the distance-based positional embeddings and the node features for each vertex in the h-hop subgraph around the target nodes. The decoder, an MLP, then receives the aggregation of the encoder output target nodes and yields the link likelihood.File | Dimensione | Formato | |
---|---|---|---|
881423.pdf
accesso aperto
Dimensione
1.37 MB
Formato
Adobe PDF
|
1.37 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14247/24867