An effective, practical and low computational cost framework for the integration of heterogeneous data to predict functional associations between proteins by means of Artificial Neural Networks Pérez Florido, Javier Pomares Cintas, Héctor Emilio Rojas Ruiz, Ignacio Guillén Perales, Alberto Ortuño Guzmán, Francisco Manuel Urquiza Ortiz, José Miguel Data integration Systems biology Multilayer Perceptrons Functional linkage network Data distribution Nowadays, the uncovering of new functional relationships between proteins is one of the major goals of biological studies. For this task, the integration of evidences from heterogeneous data sources by means of machine learning methodologies has been demonstrated to be an effective way of providing a complete genome-wide functional network and more accurate inferences of new functional associations. This work presents a new framework to be used in Artificial Neural Networks (ANNs) for the task of predicting functional relationships between proteins through the integration of evidences from heterogeneous data sources. The developing of such new methodology is motivated by the problems that arise when applying ANNs to this kind of problems, namely, the computational cost of ANN optimization process due to the nature of data (large number of instances and high dimensionality). The method selects smaller representative/non-random subsets from the original data set selected for ANN optimization process, resulting in a reduction of the number of data to be trained and, consequently, the computational cost. Moreover, the fact that the subsets are not only smaller, but also representative from the original one, (i) prevents the repetition of the optimization process several times with different random subsets of data, which is commonly used to get a reliable and fair evaluation of ANN's prediction accuracy, and (ii) benefits the learning procedure in the sense of a reduction of the overfitting problem, improving, this way, the prediction ability. 2025-01-31T07:38:37Z 2025-01-31T07:38:37Z 2013-12-09 journal article Florido, J.P., Pomares, H., Rojas, I., Guillén, A., Ortuno, F.M. and Urquiza, J.M., 2013. An effective, practical and low computational cost framework for the integration of heterogeneous data to predict functional associations between proteins by means of Artificial Neural Networks. Neurocomputing, 121, pp.64-78. https://hdl.handle.net/10481/101411 10.1016/j.neucom.2012.11.040 eng 121;64-78 http://creativecommons.org/licenses/by-nc-nd/3.0/ embargoed access Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License Elsevier