Protein Fold Recognition from Sequences using Convolutional and Recurrent Neural Networks Villegas Morcillo, Amelia Otilia Gómez García, Ángel Manuel Morales Cordovilla, Juan Andrés Sánchez Calle, Victoria Eugenia protein fold recognition deep learning convolutional neural networks recurrent neural networks embedding learning random forests The identification of a protein fold type from its amino acid sequence provides important insights about the protein 3D structure. In this paper, we propose a deep learning architecture that can process protein residue-level features to address the protein fold recognition task. Our neural network model combines 1D-convolutional layers with gated recurrent unit (GRU) layers. The GRU cells, as recurrent layers, cope with the processing issues associated to the highly variable protein sequence lengths and so extract a fold-related embedding of fixed size for each protein domain. These embeddings are then used to perform the pairwise fold recognition task, which is based on transferring the fold type of the most similar template structure. We compare our model with several template-based and deep learning-based methods from the state-of-the-art. The evaluation results over the well-known LINDAHL and SCOP_TEST sets,along with a proposed LINDAHL test set updated to SCOP 1.75, show that our embeddings perform significantly better than these methods, specially at the fold level. Supplementary material, source code and trained models are available at http://sigmat.ugr.es/~amelia/CNN-GRU-RF+/. 2022-02-08T09:05:00Z 2022-02-08T09:05:00Z 2021-12-08 info:eu-repo/semantics/article http://hdl.handle.net/10481/72712 10.1109/TCBB.2020.3012732 eng http://creativecommons.org/licenses/by-nc-nd/3.0/es/ info:eu-repo/semantics/openAccess Atribución-NoComercial-SinDerivadas 3.0 España