Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses.

Bibliographic Collection: 
Publication Type: Journal Article
Authors: Cadeddu, Andrea; Wylie, Elizabeth K; Jurczak, Janusz; Wampler-Doty, Matthew; Grzybowski, Bartosz A
Year of Publication: 2014
Journal: Angew Chem Int Ed Engl
Volume: 53
Issue: 31
Pagination: 8108-12
Date Published: 2014 Jul 28
Publication Language: eng
ISSN: 1521-3773

Methods of computational linguistics are used to demonstrate that a natural language such as English and organic chemistry have the same structure in terms of the frequency of, respectively, text fragments and molecular fragments. This quantitative correspondence suggests that it is possible to extend the methods of computational corpus linguistics to the analysis of organic molecules. It is shown that within organic molecules bonds that have highest information content are the ones that 1) define repeat/symmetry subunits and 2) in asymmetric molecules, define the loci of potential retrosynthetic disconnections. Linguistics-based analysis appears well-suited to the analysis of complex structural and reactivity patterns within organic molecules.

DOI: 10.1002/anie.201403708
Alternate Journal: Angew. Chem. Int. Ed. Engl.