Structure-from-Motion (SfM) is a technique that, starting from a set of 2D images representing the same scene captured from different viewpoints, aims to reconstruct the 3D structure of the scene. It is widely used in AR, robotics, cultural heritage, and geological applications. A standard SfM pipeline is composed of two main stages. The first is correspondence search: feature extraction detects local keypoints in each frame; feature matching identifies pairs of overlapping images by matching those keypoints; geometric verification confirms a geometric relation (rotation and translation) between overlapping images. The output is the relative pose graph. The second stage is 3D scene reconstruction: a pair of overlapping frames is chosen as initialization, then 3D point coordinates are triangulated, and finally, during iterative registration, a dense point cloud representing the reconstructed scene is produced. The main limitation of standard SfM is scale ambiguity, since depth cannot be recovered from 2D images. To overcome this, the presented implementation integrates the RealSense depth sensor, which captures RGB-D frames with depth maps in metric units. Using this data, scale ambiguity is solved, and 3D coordinates of keypoints are obtained directly without triangulation. The pipeline is simplified: depth provides 3D points in the local camera system, and relative poses are then estimated to merge all frames into a global reconstruction.

Structure-from-Motion, a Literature Review and an Implementation with RealSense

MARTIGNON, LAURA
2024/2025

Abstract

Structure-from-Motion (SfM) is a technique that, starting from a set of 2D images representing the same scene captured from different viewpoints, aims to reconstruct the 3D structure of the scene. It is widely used in AR, robotics, cultural heritage, and geological applications. A standard SfM pipeline is composed of two main stages. The first is correspondence search: feature extraction detects local keypoints in each frame; feature matching identifies pairs of overlapping images by matching those keypoints; geometric verification confirms a geometric relation (rotation and translation) between overlapping images. The output is the relative pose graph. The second stage is 3D scene reconstruction: a pair of overlapping frames is chosen as initialization, then 3D point coordinates are triangulated, and finally, during iterative registration, a dense point cloud representing the reconstructed scene is produced. The main limitation of standard SfM is scale ambiguity, since depth cannot be recovered from 2D images. To overcome this, the presented implementation integrates the RealSense depth sensor, which captures RGB-D frames with depth maps in metric units. Using this data, scale ambiguity is solved, and 3D coordinates of keypoints are obtained directly without triangulation. The pipeline is simplified: depth provides 3D points in the local camera system, and relative poses are then estimated to merge all frames into a global reconstruction.
File in questo prodotto:
File Dimensione Formato  
Martignon_Laura_Master_thesis_SfM.pdf

accesso aperto

Dimensione 10.68 MB
Formato Adobe PDF
10.68 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14247/27027