Minimizing Cuckoo Hashing Insertion Time for Networking Applications in FPGAs
Metadata
Show full item recordEditorial
IEEE
Date
2025Referencia bibliográfica
Megías, C., Forencich, A., Ros, E., & Díaz, J. (2025). Minimizing Cuckoo Hashing Insertion Time for Networking Applications in FPGAs. IEEE Access, 13, 216832-216841.
Sponsorship
Grant PID2021-123930OB-C22 funded by MCIN/AEI/10.13039/501100011033/; ERDF, EU (FEDER) ‘‘A way of making Europe’’; INTARE project under Grant TED2021-131466B-I00; EU DAIS project within the ECSEL Calls under Grant 101007273-2 and Grant PCI2021-121967; Spanish Formación de Profesorado Universitario Ph.D. Program under Grant FPU20/01857; University of Granada through the Contratos Puente Postdoctoral Program under Grant CP-2025-37Abstract
Cuckoo Hashing is a well-known scheme for maintaining a hash-based data structure with
worst-case constant search time. The search can be easily pipelined in FPGAs to obtain a response every
clock cycle. However, inserting new elements may require multiple iterations to reallocate elements due to
collisions, a challenge that intensifies with increasing table occupancy (load). Here, we pursue an efficient
implementation of Cuckoo Hashing in FPGAs that exploits the inherent parallel processing capabilities
of the technology to minimize the insertion time and enhance the performance of real-time applications
like communications networks. Our analysis shows that the candidate implementation alternatives can be
grouped into four different categories, considering the number of iterations required to perform new element
insertions relative to the hash table’s load. Within each category, the performance is similar, providing
flexibility for implementation. Among these methods, one exhibits the best results under all load conditions,
without incurring a large complexity penalty, reducing the number of iterations for the insertion by more
than 60% for loads beyond 95% compared to the most recent works. A Cuckoo Hashing architecture for
networking applications is presented and used to evaluate and verify all insertion methods through hardware
evaluation. Our implementation improves on existing architectures by at least 0.2 operations per clock
cycle.





