Real-time monocular 3D reconstruction of scenarios using artificial intelligence techniques
Metadata
Show full item recordAuthor
Herrera-Granda, Erick P.Editorial
Universidad de Granada
Departamento
Universidad de Granada. Programa de Doctorado en Tecnologías de la Información y ComunicaciónDate
2024Referencia bibliográfica
Herrera-Granda, Erick P. Real-time monocular 3D reconstruction of scenarios using artificial intelligence techniques. Granada: Universidad de Granada, 2024. [https://hdl.handle.net/10481/90846]
Sponsorship
Tesis Univ. Granada.; SDAS Research GroupAbstract
This research presents a comprehensive study on monocular 3D reconstruction of environments using
only RGB images as input acquired through a monocular sensor. The objectives were to develop a
suitable taxonomy, review seminal algorithms, compare open-source methods, and develop a novel 3D
reconstruction system using the principal classic techniques combined with artificial intelligence to improve
the overall system performance. An exhaustive literature review led to a proposed taxonomy
with three classifications: direct vs indirect, dense vs sparse, and classic vs machine learning. This resulted
in 10 categories used to classify 42 notable monocular SLAM, SFM, and VO systems based on 11
identified criteria. Subsequently, through rigorous benchmarking, ten prominent open-source algorithms
were implemented across the taxonomy to discern each method's advantages and limitations.
The TUM-Mono dataset, considered the most complete benchmark comprising 50 outdoor and indoor
sequences, was used for evaluation. Statistical analysis revealed that sparse-direct methods significantly
outperformed others, with DSO excelling. In addition, it was evidenced that integrating machine
learning modules into the SLAM pipeline significantly contributes to the system performance and the
final reconstruction quality. Consequently, DSO was selected for enhancement by integrating the stateof-
the-art single image depth estimation NeW-CRFs CNN module. This module introduced depth prior
knowledge to refine DSO's depth initialization and tracking. Using the TUM-Mono dataset, the new
DeepDSO method was benchmarked against DSO and CNN-DSO. DeepDSO surpassed the others
across various metrics, including translation error, rotation error, scale error, alignment error, and
RMSE. Statistical tests confirmed DeepDSO's superiority, achieving an impressive RMSE of 0.0624,
which corresponds to an error reduction close to 13.35% with respect to the original DSO system.
DeepDSO pushes monocular VO boundaries by strategically integrating machine learning-based depth
estimation. In addition, the taxonomy and comparative analysis provide guidelines for appropriate
algorithm selection and implementation. This study validates the benefits of implementing artificial
intelligence within SLAM, VO and SFM systems and lays the groundwork for continued depth initialization
and point-tracking optimisations.