Social Activity Forecasting: A Foundational Study of Multi-Level Forecasting in Robot-Centric Social Scenes

This thesis introduces Social Activity Forecasting (SAF), a multi-level forecasting task in which future activities are predicted at a fixed horizon for heterogeneous social entities (individuals, pairwise interactions, and groups) in robot-centric panoramic scenes. SAF is formulated as a multi-label, multi-entity forecasting problem with level-specific label spaces, and is instantiated on JRDB-Social under multiple settings that vary observation length and forecast horizon. To address the task, the thesis proposes MMSAFNet, a modular multimodal architecture that supports heterogeneous inputs (including RGB, current activity vectors, textual embeddings, and spatial cues), configurable fusion mechanisms, cross-level information sharing, and alternative latent dynamics modules, enabling controlled architectural and modality ablations. A dedicated evaluation protocol is also introduced, combining per-level samplewise precision/recall/F1 with subset-based reporting over overall, action change, and action unchanged entities, and mean average precision (mAP) for tail-sensitive analysis. Experiments across multiple dataset variants show that persistence-based baselines are highly competitive on the overall subset due to strong dataset priors, while learned multimodal models provide stronger evidence of non-trivial forecasting on the action change subset, especially in tail-sensitive metrics. Results further indicate that modality choice has a larger impact than the explored architectural toggles, with semantic modalities (activity vectors and textual embeddings) providing the strongest gains in the current benchmark regime. Overall, the thesis provides a foundational task formulation, modular baseline architecture, and evaluation methodology for multi-level social activity forecasting in robot-centric scenes.

Social Activity Forecasting: A Foundational Study of Multi-Level Forecasting in Robot-Centric Social Scenes

MARCHIORO, PIERLUIGI

2024/2025

Abstract

This thesis introduces Social Activity Forecasting (SAF), a multi-level forecasting task in which future activities are predicted at a fixed horizon for heterogeneous social entities (individuals, pairwise interactions, and groups) in robot-centric panoramic scenes. SAF is formulated as a multi-label, multi-entity forecasting problem with level-specific label spaces, and is instantiated on JRDB-Social under multiple settings that vary observation length and forecast horizon. To address the task, the thesis proposes MMSAFNet, a modular multimodal architecture that supports heterogeneous inputs (including RGB, current activity vectors, textual embeddings, and spatial cues), configurable fusion mechanisms, cross-level information sharing, and alternative latent dynamics modules, enabling controlled architectural and modality ablations. A dedicated evaluation protocol is also introduced, combining per-level samplewise precision/recall/F1 with subset-based reporting over overall, action change, and action unchanged entities, and mean average precision (mAP) for tail-sensitive analysis. Experiments across multiple dataset variants show that persistence-based baselines are highly competitive on the overall subset due to strong dataset priors, while learned multimodal models provide stronger evidence of non-trivial forecasting on the action change subset, especially in tail-sensitive metrics. Results further indicate that modality choice has a larger impact than the explored architectural toggles, with semantic modalities (activity vectors and textual embeddings) providing the strongest gains in the current benchmark regime. Overall, the thesis provides a foundational task formulation, modular baseline architecture, and evaluation methodology for multi-level social activity forecasting in robot-centric scenes.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
			
	Anno Accademico
	
				2024
			
	Abstract in italiano
	
				This thesis introduces Social Activity Forecasting (SAF), a multi-level forecasting task in which future activities are predicted at a fixed horizon for heterogeneous social entities (individuals, pairwise interactions, and groups) in robot-centric panoramic scenes. SAF is formulated as a multi-label, multi-entity forecasting problem with level-specific label spaces, and is instantiated on JRDB-Social under multiple settings that vary observation length and forecast horizon. To address the task, the thesis proposes MMSAFNet, a modular multimodal architecture that supports heterogeneous inputs (including RGB, current activity vectors, textual embeddings, and spatial cues), configurable fusion mechanisms, cross-level information sharing, and alternative latent dynamics modules, enabling controlled architectural and modality ablations. A dedicated evaluation protocol is also introduced, combining per-level samplewise precision/recall/F1 with subset-based reporting over overall, action change, and action unchanged entities, and mean average precision (mAP) for tail-sensitive analysis. Experiments across multiple dataset variants show that persistence-based baselines are highly competitive on the overall subset due to strong dataset priors, while learned multimodal models provide stronger evidence of non-trivial forecasting on the action change subset, especially in tail-sensitive metrics. Results further indicate that modality choice has a larger impact than the explored architectural toggles, with semantic modalities (activity vectors and textual embeddings) providing the strongest gains in the current benchmark regime. Overall, the thesis provides a foundational task formulation, modular baseline architecture, and evaluation methodology for multi-level social activity forecasting in robot-centric scenes.
			
	Relatore
	
				RAHMAN, MUHAMMAD RAMEEZ UR
			
	Correlatore
	
				VASCON, SEBASTIANO
			
	Appare nelle tipologie:
	
				Laurea magistrale

File in questo prodotto:

File	Dimensione	Formato
MSc_Thesis_2025___Pierluigi_Marchioro (PDF-A).pdf accesso aperto Dimensione 6.04 MB Formato Adobe PDF Visualizza/Apri	6.04 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14247/28247