An effective, practical and low computational cost framework for the integration of heterogeneous data to predict functional associations between proteins by means of Artificial Neural Networks

Pérez Florido, Javier; Pomares Cintas, Héctor Emilio; Rojas Ruiz, Ignacio; Guillén Perales, Alberto; Ortuño Guzmán, Francisco Manuel; Urquiza Ortiz, José Miguel

doi:10.1016/j.neucom.2012.11.040

neurocomp_S0925231213002166.pdf (895.1Kb)

Identificadores

URI: https://hdl.handle.net/10481/101411

DOI: 10.1016/j.neucom.2012.11.040

Exportar

Editorial

Elsevier

Materia

Data integration

Systems biology

Multilayer Perceptrons

Functional linkage network

Data distribution

Date

2013-12-09

Referencia bibliográfica

Florido, J.P., Pomares, H., Rojas, I., Guillén, A., Ortuno, F.M. and Urquiza, J.M., 2013. An effective, practical and low computational cost framework for the integration of heterogeneous data to predict functional associations between proteins by means of Artificial Neural Networks. Neurocomputing, 121, pp.64-78.

Sponsorship

This work was supported in part by the Spanish Project SAF2010-20558 and Junta de Andalucia Project P09-TIC-175476.

Abstract

Nowadays, the uncovering of new functional relationships between proteins is one of the major goals of biological studies. For this task, the integration of evidences from heterogeneous data sources by means of machine learning methodologies has been demonstrated to be an effective way of providing a complete genome-wide functional network and more accurate inferences of new functional associations. This work presents a new framework to be used in Artificial Neural Networks (ANNs) for the task of predicting functional relationships between proteins through the integration of evidences from heterogeneous data sources. The developing of such new methodology is motivated by the problems that arise when applying ANNs to this kind of problems, namely, the computational cost of ANN optimization process due to the nature of data (large number of instances and high dimensionality). The method selects smaller representative/non-random subsets from the original data set selected for ANN optimization process, resulting in a reduction of the number of data to be trained and, consequently, the computational cost. Moreover, the fact that the subsets are not only smaller, but also representative from the original one, (i) prevents the repetition of the optimization process several times with different random subsets of data, which is commonly used to get a reliable and fair evaluation of ANN's prediction accuracy, and (ii) benefits the learning procedure in the sense of a reduction of the overfitting problem, improving, this way, the prediction ability.

Collections

DICAR - Artículos

Except where otherwise noted, this item's license is described as Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License