An effective, practical and low computational cost framework for the integration of heterogeneous data to predict functional associations between proteins by means of Artificial Neural Networks
Metadata
Show full item recordAuthor
Pérez Florido, Javier; Pomares Cintas, Héctor Emilio; Rojas Ruiz, Ignacio; Guillén Perales, Alberto; Ortuño Guzmán, Francisco Manuel; Urquiza Ortiz, José MiguelEditorial
Elsevier
Materia
Data integration Systems biology Multilayer Perceptrons Functional linkage network Data distribution
Date
2013-12-09Referencia bibliográfica
Florido, J.P., Pomares, H., Rojas, I., Guillén, A., Ortuno, F.M. and Urquiza, J.M., 2013. An effective, practical and low computational cost framework for the integration of heterogeneous data to predict functional associations between proteins by means of Artificial Neural Networks. Neurocomputing, 121, pp.64-78.
Sponsorship
This work was supported in part by the Spanish Project SAF2010-20558 and Junta de Andalucia Project P09-TIC-175476.Abstract
Nowadays, the uncovering of new functional relationships between proteins is one of the major goals of biological studies. For this task, the integration of evidences from heterogeneous data sources by means of machine learning methodologies has been demonstrated to be an effective way of providing a complete genome-wide functional network and more accurate inferences of new functional associations. This work presents a new framework to be used in Artificial Neural Networks (ANNs) for the task of predicting functional relationships between proteins through the integration of evidences from heterogeneous data sources. The developing of such new methodology is motivated by the problems that arise when applying ANNs to this kind of problems, namely, the computational cost of ANN optimization process due to the nature of data (large number of instances and high dimensionality). The method selects smaller representative/non-random subsets from the original data set selected for ANN optimization process, resulting in a reduction of the number of data to be trained and, consequently, the computational cost. Moreover, the fact that the subsets are not only smaller, but also representative from the original one, (i) prevents the repetition of the optimization process several times with different random subsets of data, which is commonly used to get a reliable and fair evaluation of ANN's prediction accuracy, and (ii) benefits the learning procedure in the sense of a reduction of the overfitting problem, improving, this way, the prediction ability.