Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning

Suárez-Martín, Ignacio; Risso, Valeria Alejandra; Romero-Zaliz, Rocío; Sánchez Ruiz, José Manuel

doi:10.3390/ijms26104741

dc.contributor.author	Suárez-Martín, Ignacio
dc.contributor.author	Risso, Valeria Alejandra
dc.contributor.author	Romero-Zaliz, Rocío
dc.contributor.author	Sánchez Ruiz, José Manuel
dc.date.accessioned	2025-07-01T10:18:37Z
dc.date.available	2025-07-01T10:18:37Z
dc.date.issued	2025-05-15
dc.identifier.citation	Suárez-Martín, I.; Risso, V.A.; Romero-Zaliz, R.; Sanchez-Ruiz, J.M. Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning. Int. J. Mol. Sci. 2025, 26, 4741. [DOI: 10.3390/ijms26104741]	es_ES
dc.identifier.uri	https://hdl.handle.net/10481/105012
dc.description	This research was funded by grant IHRC22/00004 (to J.M.S.-R.) funded by the “Instituto de Salud Carlos III (ISCIII)” and Next-Generation EU, grant PID2021-124534OB-100 (to J.M.S.-R.) funded by MICIU/AEI/10.13039/501100011033 and by “ERDF/EU”, and grant PID20210125017OBI00 (to R.R.-Z.) funded by MCIN/AEI/10.13039/501100011033. This publication is part of the Project “Ethical, Responsible and General Purpose Artificial Intelligence: Applications In Risk Scenarios” (IAFER) Exp.:TSI-100927-2023-1 funded through the Creation of university-industry research programs (Enia Programs), aimed at the research and development of artificial intelligence, for its dissemination and education within the framework of the Recovery, Transformation and Resilience Plan from the European Union Next Generation EU through the Ministry for Digital Transformation and the Civil Service	es_ES
dc.description.abstract	The protein sequence space is vast. This fact, together with the prevalence of epistasis, hampers the engineering of novel enzymes through library screening and is a major obstacle to any attempt to predict natural protein evolution. Recently, specialized methodologies have been used to determine fitness data on ~260,000 sequences for the gene of the enzyme dihydrofolate reductase and antibody affinity data for all combinations of the mutations present in the receptor-binding domain (RBD) of the Omicron strain of SARS-CoV-2 (~30,000 variants). We show that upon iterative training on a total of just a few hundred variants, various state-of-the-art AI tools (multi-layer perceptron, random forest, and XGBoost algorithms) find very high fitness variants of the enzyme and predict the antibody evasion patterns of the RBD. This work provides a basis for efficient, widely applicable, low-throughput experimental approaches to assess viral protein evolution and to engineer enzymes for biotechnological applications.	es_ES
dc.description.sponsorship	Instituto de Salud Carlos III (IHRC22/00004)	es_ES
dc.description.sponsorship	Next-Generation EU	es_ES
dc.description.sponsorship	MICIU/AEI/10.13039/501100011033 (PID2021-124534OB-100, PID2021-0125017OB-I00)	es_ES
dc.description.sponsorship	Enia Programs	es_ES
dc.language.iso	eng	es_ES
dc.publisher	MDPI	es_ES
dc.rights	Atribución 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.subject	Enzyme engineering	es_ES
dc.subject	Viral protein evolution	es_ES
dc.subject	Focused library screening	es_ES
dc.title	Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning	es_ES
dc.type	journal article	es_ES
dc.rights.accessRights	open access	es_ES
dc.identifier.doi	10.3390/ijms26104741
dc.type.hasVersion	VoR	es_ES

Files in this item

Name:: ijms-26-04741.pdf
Size:: 2.148Mb
Format:: PDF

This item appears in the following Collection(s)

OpenAIRE (Open Access Infrastructure for Research in Europe)
Publicaciones financiadas por Framework Programme 7, Horizonte 2020, Horizonte Europa... del European Research Council de la Unión Europea en el marco del Proyecto OpenAIRE que promueve el acceso abierto a Europa.

Show simple item record

Except where otherwise noted, this item's license is described as Atribución 4.0 Internacional