Structure-from-Motion, a Literature Review and an Implementation with RealSense

Structure-from-Motion (SfM) is a technique that, starting from a set of 2D images representing the same scene captured from different viewpoints, aims to reconstruct the 3D structure of the scene. It is widely used in AR, robotics, cultural heritage, and geological applications. A standard SfM pipeline is composed of two main stages. The first is correspondence search: feature extraction detects local keypoints in each frame; feature matching identifies pairs of overlapping images by matching those keypoints; geometric verification confirms a geometric relation (rotation and translation) between overlapping images. The output is the relative pose graph. The second stage is 3D scene reconstruction: a pair of overlapping frames is chosen as initialization, then 3D point coordinates are triangulated, and finally, during iterative registration, a dense point cloud representing the reconstructed scene is produced. The main limitation of standard SfM is scale ambiguity, since depth cannot be recovered from 2D images. To overcome this, the presented implementation integrates the RealSense depth sensor, which captures RGB-D frames with depth maps in metric units. Using this data, scale ambiguity is solved, and 3D coordinates of keypoints are obtained directly without triangulation. The pipeline is simplified: depth provides 3D points in the local camera system, and relative poses are then estimated to merge all frames into a global reconstruction.

Structure-from-Motion, a Literature Review and an Implementation with RealSense

MARTIGNON, LAURA

2024/2025

Abstract

Structure-from-Motion (SfM) is a technique that, starting from a set of 2D images representing the same scene captured from different viewpoints, aims to reconstruct the 3D structure of the scene. It is widely used in AR, robotics, cultural heritage, and geological applications. A standard SfM pipeline is composed of two main stages. The first is correspondence search: feature extraction detects local keypoints in each frame; feature matching identifies pairs of overlapping images by matching those keypoints; geometric verification confirms a geometric relation (rotation and translation) between overlapping images. The output is the relative pose graph. The second stage is 3D scene reconstruction: a pair of overlapping frames is chosen as initialization, then 3D point coordinates are triangulated, and finally, during iterative registration, a dense point cloud representing the reconstructed scene is produced. The main limitation of standard SfM is scale ambiguity, since depth cannot be recovered from 2D images. To overcome this, the presented implementation integrates the RealSense depth sensor, which captures RGB-D frames with depth maps in metric units. Using this data, scale ambiguity is solved, and 3D coordinates of keypoints are obtained directly without triangulation. The pipeline is simplified: depth provides 3D points in the local camera system, and relative poses are then estimated to merge all frames into a global reconstruction.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
			
	Anno Accademico
	
				2024
			
	Relatore
	
				BERGAMASCO, FILIPPO
			
	Appare nelle tipologie:
	
				Laurea magistrale

File in questo prodotto:

File	Dimensione	Formato
Martignon_Laura_Master_thesis_SfM.pdf accesso aperto Dimensione 10.68 MB Formato Adobe PDF Visualizza/Apri	10.68 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14247/27027