Afficher la notice abrégée

dc.contributor.authorNuñez Andrade, Emilio
dc.contributor.authorVidal-Daza, Isaac
dc.contributor.authorRyan, James W.
dc.contributor.authorGómez- Bombarelli, Rafael
dc.contributor.authorMartin Martinez, Francisco J.
dc.date.accessioned2025-02-27T07:45:46Z
dc.date.available2025-02-27T07:45:46Z
dc.date.issued2025-02-03
dc.identifier.citationNuñez Andrade, Emilio; Vidal-Daza, Isaac; Ryan, James W. ; Gómez- Bombarelli, Rafael; Martin Martinez, Francisco J. Embedded machine-readable molecular representation for resource-efficient deep learning applications. Digital Discovery, 2025, Advance Articlees_ES
dc.identifier.urihttps://hdl.handle.net/10481/102754
dc.description.abstractThe practical implementation of deep learning methods for chemistry applications relies on encoding chemical structures into machine-readable formats that can be efficiently processed by computational tools. To this end, One Hot Encoding (OHE) is an established representation of alphanumeric categorical data in expanded numerical matrices. We have developed an embedded alternative to OHE that encodes discrete alphanumeric tokens of an N-sized alphabet into a few real numbers that constitute a simpler matrix representation of chemical structures. The implementation of this embedded One Hot Encoding (eOHE) in training machine learning models achieves comparable results to OHE in model accuracy and robustness while significantly reducing the use of computational resources. Our benchmarks across three molecular representations (SMILES, DeepSMILES, and SELFIES) and three different molecular databases (ZINC, QM9, and GDB-13) for Variational Autoencoders (VAEs) and Recurrent Neural Networks (RNNs) show that using eOHE reduces vRAM memory usage by up to 50% while increasing disk Memory Reduction Efficiency (MRE) to 80% on average. This encoding method opens up new avenues for data representation in embedded formats that promote energy efficiency and scalable computing in resource-constrained devices or in scenarios with limited computing resources. The application of eOHE impacts not only the chemistry field but also other disciplines that rely on the use of OHE.es_ES
dc.description.sponsorshipEngineering and Physical Sciences Research Council (EPSRC) Program Grant EP/T028513/ 1 “Application Targeted and Integrated Photovoltaics”,es_ES
dc.description.sponsorshipRoyal Society of Chemistry (RSC) Enablement Grant (E21- 7051491439)es_ES
dc.description.sponsorshipRSC Enablement Grant (E21- 8254227705)es_ES
dc.description.sponsorshipGoogle Cloud Research Credits program with the award GCP19980904es_ES
dc.description.sponsorshipEPSRC PhD scholarship ref. 2602452es_ES
dc.description.sponsorshipConsejo Nacional de Humanidades, Ciencias y Tecnologías (CONAHCYT) PhD scholarship ref. 809702.es_ES
dc.language.isoenges_ES
dc.publisherRoyal Society of Chemistryes_ES
dc.rightsAtribución 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.titleEmbedded machine-readable molecular representation for resource-efficient deep learning applicationses_ES
dc.typejournal articlees_ES
dc.rights.accessRightsopen accesses_ES
dc.identifier.doi10.1039/d4dd00230j
dc.type.hasVersionVoRes_ES


Fichier(s) constituant ce document

[PDF]

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée

Atribución 4.0 Internacional
Excepté là où spécifié autrement, la license de ce document est décrite en tant que Atribución 4.0 Internacional