| dc.contributor.author | Galpert, Deborah | |
| dc.contributor.author | Del Río, Sara | |
| dc.contributor.author | Herrera Triguero, Francisco | |
| dc.date.accessioned | 2020-12-17T07:42:16Z | |
| dc.date.available | 2020-12-17T07:42:16Z | |
| dc.date.issued | 2015 | |
| dc.identifier.citation | Deborah Galpert, Sara del Río, Francisco Herrera, Evys Ancede-Gallardo, Agostinho Antunes, Guillermin Agüero-Chapin, "An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species", BioMed Research International, vol. 2015, Article ID 748681, 12 pages, 2015. https://doi.org/10.1155/2015/748681 | es_ES |
| dc.identifier.uri | http://hdl.handle.net/10481/64962 | |
| dc.description.abstract | Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity
measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined
in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the
possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between
ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other
genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome
pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiaeSchizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the
supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low
ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with
Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment
similarities combined with the advances in big data supervised classification. | es_ES |
| dc.description.sponsorship | Portuguese Foundation for Science and Technology
SFRH/BPD/92978/2013 | es_ES |
| dc.description.sponsorship | European Union (EU) | es_ES |
| dc.description.sponsorship | national funds through FCT
PEst-C/MAR/LA0015/2013
PTDC/AAC-AMB/121301/2010
FCOMP-01-0124-FEDER-019490 | es_ES |
| dc.description.sponsorship | Spanish Government
TIN2014-57251-P | es_ES |
| dc.description.sponsorship | Regional Andalusian Research
P11-TIC-7765
P10-TIC-6858 | es_ES |
| dc.language.iso | eng | es_ES |
| dc.publisher | HINDAWI LTD | es_ES |
| dc.rights | Atribución 3.0 España | * |
| dc.rights.uri | http://creativecommons.org/licenses/by/3.0/es/ | * |
| dc.title | An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species | es_ES |
| dc.type | journal article | es_ES |
| dc.rights.accessRights | open access | es_ES |
| dc.identifier.doi | 10.1155/2015/748681 | |