Universidad de Granada Digibug
 

Repositorio Institucional de la Universidad de Granada >
1.-Investigación >
Departamentos, Grupos de Investigación e Institutos >
Departamento de Genética >
DG - Artículos >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10481/33377

Title: WordCluster: detecting clusters of DNA words and genomic elements
Authors: Hackenberg, Michael
Carpena, Pedro
Bernaola-Galván, Pedro
Barturen, Guillermo
Alganza, Ángel M.
Oliver, José Luis
Issue Date: 2011
Abstract: Background Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php webcite including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.
Sponsorship: The Spanish Government grants BIO2008-01353 to JLO, mobility PR2009-0285 to PC, Spanish Junta de Andalucía grants P07-FQM3163 to PC and P06-FQM1858 to PB are acknowledged. The Spanish 'Juan de la Cierva' grant to MH and Basque Country 'Programa de formación de investigadores del Departamento de Educación, Universidades e Investigación' grant to GB are also acknowledged.
Publisher: Biomed Central
Keywords: Genomic elements
DNA words
Bioinformatics
WordCluster
URI: http://hdl.handle.net/10481/33377
ISSN: 1748-7188
Rights : Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License
Citation: Hackenberg, M.; et al. WordCluster: detecting clusters of DNA words and genomic elements. Algorithms for Molecular Biology, 6: 2 (2011). [http://hdl.handle.net/10481/33377]
Appears in Collections:DG - Artículos

Files in This Item:

File Description SizeFormat
Hackenberg_WordCluster.pdf317.57 kBAdobe PDFView/Open
Recommend this item

This item is licensed under a Creative Commons License
Creative Commons

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! OpenAire compliant DSpace Software Copyright © 2002-2007 MIT and Hewlett-Packard - Feedback

© Universidad de Granada