Collaborative text-annotation resource for disease-centered relation extraction from biomedical text
Metadatos
Mostrar el registro completo del ítemEditorial
Elsevier
Materia
Information extraction Information retrieval Collaborative annotation Corpus annotation Text mining Relation extraction Protein–protein interaction Gene–disease association Autism Disease evidence network Clinical informatics
Fecha
2009-02-14Referencia bibliográfica
C. Cano... [et al.]. Collaborative text-annotation resource for disease-centered relation extraction from biomedical text, Journal of Biomedical Informatics, Volume 42, Issue 5, 2009, Pages 967-977, ISSN 1532-0464, [https://doi.org/10.1016/j.jbi.2009.02.001]
Patrocinador
P08-TIC-4299 of J. A., Sevilla and TIN2006-13177 of DGICT, Madrid; Milton foundation; National Science Foundation under Grant No. 0543480Resumen
Agglomerating results from studies of individual biological components has shown the potential to produce
biomedical discovery and the promise of therapeutic development. Such knowledge integration
could be tremendously facilitated by automated text mining for relation extraction in the biomedical
literature. Relation extraction systems cannot be developed without substantial datasets annotated
with ground truth for benchmarking and training. The creation of such datasets is hampered by the
absence of a resource for launching a distributed annotation effort, as well as by the lack of a standardized
annotation schema. We have developed an annotation schema and an annotation tool which can
be widely adopted so that the resulting annotated corpora from a multitude of disease studies could be
assembled into a unified benchmark dataset. The contribution of this paper is threefold. First, we provide
an overview of available benchmark corpora and derive a simple annotation schema for specific
binary relation extraction problems such as protein–protein and gene–disease relation extraction.
Second, we present BioNotate: an open source annotation resource for the distributed creation of a
large corpus. Third, we present and make available the results of a pilot annotation effort of the autism
disease network