A procedure for virtual fragmentation of molecules into functional groups

This article presents a computer assisted procedure for virtual fragmentation of molecules. The proposed algorithm defines fragments which usually coincide with the classical functional groups. The fragmentation criteria are the bond order’s value of the chemical bonds and the type of the connected atoms (hydrogen, carbon, heteroatom)


Introduction
The study of the characteristic chemical properties of many molecules with very diverse chemical structures has led to the introduction of the functional group concept -a group of atoms causing the molecule to have certain "functions" (i.e. the capacity of participating in specific chemical reactions).During a specific reaction the structure of the functional group changes while the rest of the molecule remains unchanged.From the structural point of view the functional group is a group of atoms connected by certain types of chemical bonds.
In the last 150 years a large list of functional groups was created, i.e. the relationship between the structure of molecules and their properties was discovered.
In order to identify what are the functional groups of a molecule, one has to go through the molecular graph and to compare the groups of atoms found in the molecule with the list of functional groups mentioned above. 1 Other lists of fragments, which do not coincide with the classical functional groups, are often used.Sometimes the structure fragmentation is defined manually, either by dummy variables or fingerprints. 2The procedures of fragmentation, as well as the lists of fragments, differ from author to author.[5][6][7][8][9][10][11][12][13] In this article we propose an algorithm for the virtual fragmentation of molecules, an algorithm which does not need a previously established list of functional groups.The found fragments often coincide with the classical functional groups.

Methods and formulae
We use the following definitions: - The geometry of the analyzed molecule was optimized by molecular mechanics using the GMMX procedure, 14 included in PCMODEL software. 15Then, the geometry was optimized more exactly and the bond orders were computed with the PM3 method 16 included in the MOPAC package. 17The following keywords string was used: "pm3 pulay gnorm=0.01shift=50 geo-ok camp-king bonds mmok".
The MOPAC files data were then processed by a new version of the DESCRIPT software. 18n this new version the previous fragmentation procedure is replaced by the fragmentation algorithm we propose here.
The fragmentation procedure has the following steps: 1) acquiring the molecular graph, the graph which contains only the heavy atoms 2) identification of the M and S bonds on the graph 3) identification of the AM and AS atoms on the graph 4) the definition of "internal" chemical bonds, i.e. the chemical bonds between atoms inside a single fragment: a) the bonds involving hydrogen atoms b) all M bonds c) S bonds between AS carbon atoms d) S bonds between AS heteroatoms e) S bonds between an AS heteroatom and any AM atom 5) the definition of "external" chemical bonds, i.e. the chemical bonds between two fragments (any bond which is not "internal") 6) the removal of the "external" bonds from the graph Removing the "external" bonds from the molecular graph we obtain a set of sub-graphs -the set of virtual fragments.
All computations were made on a Pentium 4 / 2400 MHz / 512 RAM.

Results and Discussion
We take the bond order of the chemical bond as the fundamental criterion of the molecular fragmentation procedure.In our opinion this is imposed by the fact that in unsaturated molecules the conjugation of neighbouring functional groups leads to the formation of new functional groups: ester (carbonyl + ether), amide (carbonyl + amino), phenyl (ene + ene + ene), furan (ene + ether + ene) etc.The computed bond order of the chemical bond between two heavy atoms is B. The DESCRIPT (SDFP) 18 standard virtual fragmentation procedure uses the following axiom: two heavy atoms are part of the same fragment if they are connected by an "internal" bond; a chemical bond is "internal" if its B value is greater then the limit value, k, between the "single" bond and the "aromatic" bond In the standard version of DESCRIPT the SDFP skips the above steps 3, 4c, 4d and 4e, as it defines as "internal" bonds only the bonds involving hydrogen atoms and the M bonds.
The SDFP axiom is insufficient for obtaining a coincidence between the virtual fragments and the classical functional groups.
Indeed the molecular fragments identified by SDFP (Figure 1, C + D zone) coincide with the classical functional groups (Figure 1, A + B + C zone) only if all the bonds between the heavy atoms of the classical functional groups have a bond order greater than k value (Figure 1, C zone).SDFP identifies some fragments which are not considered classical functional groups: C 6 H 5 O (phenyl + ether), C 6 H 5 N (phenyl + substituted amino) etc.We note that none of the SDFP fragments Y -Z (Y=heteroatom, Z=Ar / ene, Y and Z are bonded by M bond) coincide with a classical functional group (Figure 1, D zone).In these fragments, denoted as YMZ, the conjugation modifies Y's and Z's chemical properties.Taking these YMS fragments as "functional groups" seems as justified as taking the amide or ester fragments as functional groups.However, these fragments are not classically considered single functional groups, but ensembles of functional groups (Figure 1
When, instead of the SDFP, we use the fragmentation procedure proposed here, the set of virtual fragments identified by DESPRIPT includes a greater number of the classical functional groups (figure 1, right).
The bond order can be computed by various methods (AM1, PM3, DFT, ab initio).When the bond orders are computed by PM3, as we have done, the k = 1.017 is used.This "border" value of B, which is used in aromaticity calculations by TPA (Topological Path Aromaticity) algorithm, 19 has been empirically established after we have analyzed the experimental aromaticity data for a great number of molecules with very diverse (aromatic and non-aromatic) structure.Also this k value is used by the last version of PRECLAV 20,21 program in QSPR/QSAR computations.SDFP use k = 1.014 value.
The above a) -e) rules were obtained empirically, by analyzing a large number of molecules with very diverse structures.Our aim was to achieve a correspondence as good as possible between the list of virtual fragments and the list of classical functional groups.
The proposed algorithm is exemplified in figure 2 -the computation of the bond orders and then the virtual fragmentation of strychnine.
The bond orders in the benzene cycle (the average of B = 1.387) were computed in the interval [1.284, 1.491].For the Ar -N bond B = 1.030, for the N -CO bond B = 1.058, and for the C=O bond B = 1.802.These values show the existence of an M bond and of the (Ar + N + CO) fragment.In a different area of the molecule there exists another M bond with B = 1.911.The bond orders of the S bonds turned out much lower.
In figure 2a the M bonds are shown in red and the S bond are shown in black.Figure 2b shows the AS and AM atoms.In figure 2c the "internal" bonds are shown in magenta and the "external" bonds in black.The resulting virtual fragments are shown in figure 2d.Additional examples are presented in Table 1.The molecules were chosen for their diversity of classical functional groups, M/S bonds, and AS/AM atoms.Thus, this set of molecules is, in our opinion, a broad illustration of the proposed algorithm.In the molecules where all heavy atoms are AS carbon atoms the algorithm identifies a single fragment (e.g. 1 and 2).The same happens when all the heavy atoms are connected by M bonds (e.g. 3, 6, 7, 13, benzene, any PAH, pyridine, indole, urea, guanidine, thiourea etc.).
The CX 3 group (X = halogen) is identified as a collection of four fragments.The atoms N and Br in molecule 41, the atoms N and Cl in molecule 49 and the O atoms in molecule 44 are AS atoms.So these atoms are in the same fragment according to rule d).
The COX group is identified as a single fragment (acid halide) according to rule e).The same rule also works in the case of phosgene, which is identified as a single fragment.The thiophosgene molecule is also identified as a single fragment but this time due to rule b).From the point of view of the proposed fragmentation procedure phosgene belongs to the same category as acetyl chloride and thiophosgene to the same category as thiourea.
In the molecules 9, 12, 20, 21, 27 and 52, the heteroatoms are connected to the aromatic cycle with an M bond.According to the proposed procedure, the conjugation determines the increase of the chemical bond order value and leads to the inclusion of the heteroatom in the same virtual fragment as the aromatic cycle to which it is connected.This does not happen in case of thioether 10, urethane 36, sulfonamide 49, or saccharin 53.
In the molecules 46 and 50, the heteroatoms are connected to the non-saturated structures with an M bond.In the case of the exotic cycle 50, not yet synthesized, the functional groups are difficult to identify by intuition.
The 45 and 46 molecules are in keto-enolic equilibrium.
In figure 3 we present the virtual fragmentation of an ammonium salt obtained with SLASH 22 algorithm, with SDFP, and with the procedure proposed here.In contrast to the SLASH algorithm, the DESCRIPT fragmentation identifies the Ph-O-Ph fragment (this is an YMS fragment).
There exists no obstacle of principle in applying this new algorithm to ions, radicals or ionradicals.Due to the lack of experimental data -needed for the parameterization of the methodwe used in the analysis of these species the same value for k.In Table 2  fragmentation of some ions, radicals and ion-radicals.For ions the charge C ≠ 0, for radicals the multiplicity M > 1.
The Table 2 species can be formed by losing / capturing electrons, hydrogen ions or hydrogen atoms, in chemical reactions, inside mass spectrometers etc.Their electronic structure is very different from that of their provenance molecules.This is why the structure of virtual fragments is also very different.These fragments cannot be considered classical functional groups and in figure 1 they are placed in area D.  The quality of the fragmentation method proposed here can be verified with various computation procedures which, taking this fragment identification as their starting point, compute values for logP, solubility, molar refraction etc.One can then check whether these computed values are in good agreement with the experimental data.

Conclusions
The procedure we proposed here does not need a previously established list of functional groups or fragments, allows automatic virtual fragmentation; once MOPAC has characterized the molecules, the users' assistance is no longer needed and can fragment any molecules, ions, radicals, and ion-radicals.
If the analyzed species is a molecule the identified fragments usually coincide with the classical functional groups.The conjugated classical functional groups should be always considered a single fragment -a new functional group.
atom = any atom other than hydrogen heteroatom = any heavy atom other than carbon -B = the computed bond order value of a chemical bond k = the "border" value of B -M = chemical bond type for which B ≥ k -S = chemical bond type for which B < k -AS = heavy atom type connected to other heavy atoms only by S type bonds -AM = heavy atom type connected to at least one heteroatom by M type bonds When a certain molecule is analyzed the identification of the minimum potential energy conformer is important because the values of the bond orders B are characteristic to each conformer and, sometimes, they are very close to the border value k.

Figure 1 .
Figure 1.Classical functional groups and DESCRIPT virtual fragments.

Table 1 .
Identified fragments of some analyzed molecules © ARKAT USA, Inc

Table 1 .
Continued * The number of identified fragments in the analyzed molecule © ARKAT USA, Inc we present the virtual © ARKAT USA, Inc

Table 2 .
Identified fragments in some ions, radicals and ion-radicals