<rdf:RDF xmlns:rdf="http://www.openarchives.org/OAI/2.0/rdf/" xmlns:ow="http://www.ontoweb.org/ontology/1#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ds="http://dspace.org/ds/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/rdf/ http://www.openarchives.org/OAI/2.0/rdf.xsd">
   <ow:Publication rdf:about="oai:digibug.ugr.es:10481/55857">
      <dc:title>Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers</dc:title>
      <dc:creator>Galpert, Deborah</dc:creator>
      <dc:creator>Fernández, Alberto</dc:creator>
      <dc:creator>Herrera Triguero, Francisco</dc:creator>
      <dc:creator>Antunes, Agostinho</dc:creator>
      <dc:creator>Molina-Ruiz, Reinaldo</dc:creator>
      <dc:creator>Agüero-Chapin, Guillermin</dc:creator>
      <dc:subject>Ortholog detection</dc:subject>
      <dc:subject>Pairwise protein similarity measures</dc:subject>
      <dc:subject>Big data</dc:subject>
      <dc:subject>Supervised classification</dc:subject>
      <dc:subject>Imbalance data</dc:subject>
      <dc:description>Abstract&#xd;
Background: The development of new ortholog detection algorithms and the improvement of existing ones are of&#xd;
major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog&#xd;
classification approach implemented in a big data platform that considered several pairwise protein features and the&#xd;
low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International,&#xd;
2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by&#xd;
Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach;&#xd;
they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test&#xd;
set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models&#xd;
implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes.&#xd;
Results: The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built&#xd;
with only alignment-based similarity measures or combined with several alignment-free pairwise protein features&#xd;
showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such&#xd;
supervised approaches outperformed traditional methods, there were no significant differences between the exclusive&#xd;
use of alignment-based similarity measures and their combination with alignment-free features, even within the&#xd;
twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in&#xd;
Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be&#xd;
achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed&#xd;
that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free&#xd;
features related to amino acid composition.&#xd;
Conclusions: The incorporation of alignment-free features in supervised big data models did not significantly improve&#xd;
ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based&#xd;
similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection&#xd;
methods encourages the evaluation of other alignment-free protein pair descriptors in future research.</dc:description>
      <dc:date>2019-05-24T10:28:49Z</dc:date>
      <dc:date>2019-05-24T10:28:49Z</dc:date>
      <dc:date>2018</dc:date>
      <dc:type>journal article</dc:type>
      <dc:identifier>Galpert, D. [et al.]. Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers. BMC Bioinformatics (2018) 19:166 https://doi.org/10.1186/s12859-018-2148-8.</dc:identifier>
      <dc:identifier>1471-2105</dc:identifier>
      <dc:identifier>http://hdl.handle.net/10481/55857</dc:identifier>
      <dc:language>eng</dc:language>
      <dc:rights>http://creativecommons.org/licenses/by/3.0/es/</dc:rights>
      <dc:rights>open access</dc:rights>
      <dc:rights>Atribución 3.0 España</dc:rights>
      <dc:publisher>BMC (part of Springer Nature)</dc:publisher>
   </ow:Publication>
</rdf:RDF>