| dc.contributor.author | Suárez-Martín, Ignacio | |
| dc.contributor.author | Risso, Valeria Alejandra | |
| dc.contributor.author | Romero-Zaliz, Rocío | |
| dc.contributor.author | Sánchez Ruiz, José Manuel | |
| dc.date.accessioned | 2025-07-01T10:18:37Z | |
| dc.date.available | 2025-07-01T10:18:37Z | |
| dc.date.issued | 2025-05-15 | |
| dc.identifier.citation | Suárez-Martín, I.; Risso, V.A.; Romero-Zaliz, R.; Sanchez-Ruiz, J.M. Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning. Int. J. Mol. Sci. 2025, 26, 4741. [DOI: 10.3390/ijms26104741] | es_ES |
| dc.identifier.uri | https://hdl.handle.net/10481/105012 | |
| dc.description | This research was funded by grant IHRC22/00004 (to J.M.S.-R.) funded by the “Instituto
de Salud Carlos III (ISCIII)” and Next-Generation EU, grant PID2021-124534OB-100 (to J.M.S.-R.)
funded by MICIU/AEI/10.13039/501100011033 and by “ERDF/EU”, and grant PID20210125017OBI00 (to R.R.-Z.) funded by MCIN/AEI/10.13039/501100011033. This publication is part of the
Project “Ethical, Responsible and General Purpose Artificial Intelligence: Applications In Risk
Scenarios” (IAFER) Exp.:TSI-100927-2023-1 funded through the Creation of university-industry
research programs (Enia Programs), aimed at the research and development of artificial intelligence,
for its dissemination and education within the framework of the Recovery, Transformation and
Resilience Plan from the European Union Next Generation EU through the Ministry for Digital
Transformation and the Civil Service | es_ES |
| dc.description.abstract | The protein sequence space is vast. This fact, together with the prevalence of epistasis, hampers the engineering of novel enzymes through library screening and is a major obstacle to any attempt to predict natural protein evolution. Recently, specialized methodologies have been used to determine fitness data on ~260,000 sequences for the gene of the enzyme dihydrofolate reductase and antibody affinity data for all combinations of the mutations present in the receptor-binding domain (RBD) of the Omicron strain of SARS-CoV-2 (~30,000 variants). We show that upon iterative training on a total of just a few hundred variants, various state-of-the-art AI tools (multi-layer perceptron, random forest, and XGBoost algorithms) find very high fitness variants of the enzyme and predict the antibody evasion patterns of the RBD. This work provides a basis for efficient, widely applicable, low-throughput experimental approaches to assess viral protein evolution and to engineer enzymes for biotechnological applications. | es_ES |
| dc.description.sponsorship | Instituto de Salud Carlos III (IHRC22/00004) | es_ES |
| dc.description.sponsorship | Next-Generation EU | es_ES |
| dc.description.sponsorship | MICIU/AEI/10.13039/501100011033 (PID2021-124534OB-100, PID2021-0125017OB-I00) | es_ES |
| dc.description.sponsorship | Enia Programs | es_ES |
| dc.language.iso | eng | es_ES |
| dc.publisher | MDPI | es_ES |
| dc.rights | Atribución 4.0 Internacional | * |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | * |
| dc.subject | Enzyme engineering | es_ES |
| dc.subject | Viral protein evolution | es_ES |
| dc.subject | Focused library screening | es_ES |
| dc.title | Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning | es_ES |
| dc.type | journal article | es_ES |
| dc.rights.accessRights | open access | es_ES |
| dc.identifier.doi | 10.3390/ijms26104741 | |
| dc.type.hasVersion | VoR | es_ES |