Context-Adaptable Deployment of FastSLAM 2.0 on Graphic Processing Unit with Unknown Data Association
Metadatos
Mostrar el registro completo del ítemEditorial
MDPI
Materia
FastSLAM2.0 CUDA GPGPU
Fecha
2024-12-09Referencia bibliográfica
Giovagnola, J. & Pegalajar Cuéllar, M. & Morales Santos, D.P. Appl. Sci. 2024, 14, 11466. [https://doi.org/10.3390/app142311466]
Patrocinador
German Federal Ministry of Education and Research BMBF under grant number 16ME0097 (ZuSE KI-mobil)Resumen
Simultaneous Localization and Mapping (SLAM) algorithms are crucial for enabling
agents to estimate their position in unknown environments. In autonomous navigation systems,
these algorithms need to operate in real-time on devices with limited resources, emphasizing the
importance of reducing complexity and ensuring efficient performance. While SLAM solutions
aim at ensuring accurate and timely localization and mapping, one of their main limitations is
their computational complexity. In this scenario, particle filter-based approaches such as FastSLAM
2.0 can significantly benefit from parallel programming due to their modular construction. The
parallelization process involves identifying the parameters affecting the computational complexity in
order to distribute the computation among single multiprocessors as efficiently as possible. However,
the computational complexity of methodologies such as FastSLAM 2.0 can depend on multiple
parameters whose values may, in turn, depend on each specific use case scenario ( ingi.e., the context),
leading to multiple possible parallelization designs. Furthermore, the features of the hardware
architecture in use can significantly influence the performance in terms of latency. Therefore, the
selection of the optimal parallelization modality still needs to be empirically determined. This may
involve redesigning the parallel algorithm depending on the context and the hardware architecture.
In this paper, we propose a CUDA-based adaptable design for FastSLAM 2.0 on GPU, in combination
with an evaluation methodology that enables the assessment of the optimal parallelization modality
based on the context and the hardware architecture without the need for the creation of separate
designs. The proposed implementation includes the parallelization of all the functional blocks of the
FastSLAM 2.0 pipeline. Additionally, we contribute a parallelized design of the data association step
through the Joint Compatibility Branch and Bound (JCBB) method. Multiple resampling algorithms
are also included to accommodate the needs of a wide variety of navigation scenarios.