Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning
Metadatos
Mostrar el registro completo del ítemAutor
Suárez-Martín, Ignacio; Risso, Valeria Alejandra; Romero-Zaliz, Rocío; Sánchez Ruiz, José ManuelEditorial
MDPI
Materia
Enzyme engineering Viral protein evolution Focused library screening
Fecha
2025-05-15Referencia bibliográfica
Suárez-Martín, I.; Risso, V.A.; Romero-Zaliz, R.; Sanchez-Ruiz, J.M. Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning. Int. J. Mol. Sci. 2025, 26, 4741. [DOI: 10.3390/ijms26104741]
Patrocinador
Instituto de Salud Carlos III (IHRC22/00004); Next-Generation EU; MICIU/AEI/10.13039/501100011033 (PID2021-124534OB-100, PID2021-0125017OB-I00); Enia ProgramsResumen
The protein sequence space is vast. This fact, together with the prevalence of epistasis, hampers the engineering of novel enzymes through library screening and is a major obstacle to any attempt to predict natural protein evolution. Recently, specialized methodologies have been used to determine fitness data on ~260,000 sequences for the gene of the enzyme dihydrofolate reductase and antibody affinity data for all combinations of the mutations present in the receptor-binding domain (RBD) of the Omicron strain of SARS-CoV-2 (~30,000 variants). We show that upon iterative training on a total of just a few hundred variants, various state-of-the-art AI tools (multi-layer perceptron, random forest, and XGBoost algorithms) find very high fitness variants of the enzyme and predict the antibody evasion patterns of the RBD. This work provides a basis for efficient, widely applicable, low-throughput experimental approaches to assess viral protein evolution and to engineer enzymes for biotechnological applications.





