Using SMILES strings for the description of chemical connectivity in the Crystallography Open Database
Metadatos
Mostrar el registro completo del ítemEditorial
Springer Open
Materia
Crystallography Open Database Open access to scientific data Crystal structure database Molecular structure SMILES Substructure search
Fecha
2018Referencia bibliográfica
Quirós Olozábal, M. [et al.]. Using SMILES strings for the description of chemical connectivity in the Crystallography Open Database. Quirós et al. J Cheminform (2018) 10:23. https://doi.org/10.1186/s13321-018-0279-6.
Patrocinador
The authors are grateful to the Junta de Andalucía (Research Group FQM-195) for financial support of the publication costs of this article.Resumen
Computer descriptions of chemical molecular connectivity are necessary for searching chemical databases and for
predicting chemical properties from molecular structure. In this article, the ongoing work to describe the chemical
connectivity of entries contained in the Crystallography Open Database (COD) in SMILES format is reported. This collection
of SMILES is publicly available for chemical (substructure) search or for any other purpose on an open-access
basis, as is the COD itself. The conventions that have been followed for the representation of compounds that do
not fit into the valence bond theory are outlined for the most frequently found cases. The procedure for getting the
SMILES out of the CIF files starts with checking whether the atoms in the asymmetric unit are a chemically acceptable
image of the compound. When they are not (molecule in a symmetry element, disorder, polymeric species,etc.),
the previously published cif_molecule program is used to get such image in many cases. The program package
Open Babel is then applied to get SMILES strings from the CIF files (either those directly taken from the COD or those
produced by cif_molecule when applicable). The results are then checked and/or fixed by a human editor, in a
computer-aided task that at present still consumes a great deal of human time. Even if the procedure still needs to be
improved to make it more automatic (and hence faster), it has already yielded more than 160,000 curated chemical
structures and the purpose of this article is to announce the existence of this work to the chemical community as well
as to spread the use of its results.