CAFTAN: a tool for fast mapping, and quality assessment of cDNAs
Metadatos
Mostrar el registro completo del ítemEditorial
BMC
Materia
Inteligencia artificial Artificial intelligence
Fecha
2006-10-25Referencia bibliográfica
del Val, C... [et al.]. CAFTAN: a tool for fast mapping, and quality assessment of cDNAs. BMC Bioinformatics 7, 473 (2006). [https://doi.org/10.1186/1471-2105-7-473]
Patrocinador
German Federal Ministry of Education and Research 01GR0101 and 01GR0420 and 01GR0450Resumen
Background: The German cDNA Consortium has been cloning full length cDNAs and continued with their
exploitation in protein localization experiments and cellular assays. However, the efficient use of large cDNA
resources requires the development of strategies that are capable of a speedy selection of truly useful cDNAs
from biological and experimental noise. To this end we have developed a new high-throughput analysis tool,
CAFTAN, which simplifies these efforts and thus fills the gap between large-scale cDNA collections and their
systematic annotation and application in functional genomics.
Results: CAFTAN is built around the mapping of cDNAs to the genome assembly, and the subsequent analysis
of their genomic context. It uses sequence features like the presence and type of PolyA signals, inner and flanking
repeats, the GC-content, splice site types, etc. All these features are evaluated in individual tests and classify
cDNAs according to their sequence quality and likelihood to have been generated from fully processed mRNAs.
Additionally, CAFTAN compares the coordinates of mapped cDNAs with the genomic coordinates of reference
sets from public available resources (e.g., VEGA, ENSEMBL). This provides detailed information about overlapping
exons and the structural classification of cDNAs with respect to the reference set of splice variants.
The evaluation of CAFTAN showed that is able to correctly classify more than 85% of 5950 selected "known
protein-coding" VEGA cDNAs as high quality multi- or single-exon. It identified as good 80.6 % of the single exon
cDNAs and 85 % of the multiple exon cDNAs.
The program is written in Perl and in a modular way, allowing the adoption of this strategy to other tasks like
EST-annotation, or to extend it by adding new classification rules and new organism databases as they become
available. We think that it is a very useful program for the annotation and research of unfinished genomes.
Conclusion: CAFTAN is a high-throughput sequence analysis tool, which performs a fast and reliable quality
prediction of cDNAs. Several thousands of cDNAs can be analyzed in a short time, giving the curator/scientist a
first quick overview about the quality and the already existing annotation of a set of cDNAs. It supports the
rejection of low quality cDNAs and helps in the selection of likely novel splice variants, and/or completely novel
transcripts for new experiments.