Design of new DNA-interactive agents by molecular docking and QSPR approach

The design of new series of pyrrolo-pyrimidine derivatives, further annelated with a third heterocycle of different size, which also present several chain shape moieties of variable length and with different physico-chemical character, is reported. In this contribution we showed that the combination of docking-based and QSPR-based methods could lead to good models for ligand-DNA interaction prediction. By means of these computational approaches on 360 proposed inhibitors, we were able to select the most promising candidates as DNA-interactive drugs potentially endowed with antitumor activity.


Introduction
DNA represents a traditional target for chemotherapeutic intervention in human cancers, especially for those where high proliferation rates of some tumor cell types have resulted in sensitivity to drugs, which block replication and transcription of their DNA. 1 Molecular recognition of DNA by small molecules is a fundamental problem in drug design.Polycyclic heterocycles having a planar structure can be effective pharmacophore moieties for DNA-interactive drugs because they can insert between the stacked base paired oligonucleotides.Moreover, if they bear suitable side chains, further interactions of these ligands with the other important architectural feature of DNA, its minor groove, can be envisaged.
3][4][5] These classes of compounds can be considered bioisosters of the antitumor drugs of the actinomycins family, the most representative of which, Dactinomycin (Actinomycin D), has been extensively investigated. 6Its biological activity ISSN 1551-7012 Page 14  ARKAT USA, Inc.
appears to depend on the very slow rate of dissociation of the complex between DNA and the drug, which reflects the intermolecular hydrogen bonds, the planar interaction between the purine rings and the chromophore, the phenoxazone moiety, and the several van der Waals interactions between the side chain and the DNA.
In connection with these studies, we have recently reported the design, synthesis and biological activity of a series of 4-substituted indolo [3,2-e] [1,2,3]triazolo [1,5-a]pyrimidine derivatives. 7Compounds, selected on the basis of the docking studies and suitably synthesized, showed antiproliferative activity against each type of tumor cell lines investigated, generally in the low micromolar range.The molecular modeling results correlated well with the experimental activity data and the choice of suitable chains was theoretically supported: a more negative ∆Gbind value was mirrored by a higher inhibitory activity.In that context we have demonstrated how the introduction of the side chains was determinant for the increase of the biological activity.
Now we intend to extend this approach to different polycyclic scaffolds to explore how different structural features into the planar portion can affect the DNA binding capability.Therefore we report herein molecular modeling studies on pyrrolo-pyrimidine derivatives (both [2',3':4,5] and [3',4':4,5] condensed), further annelated with a third heterocycle of different size, which also present several chain shape moieties of variable length and with different physicochemical character.Moreover QSPR models were also developed to further support the proposed mechanism of action of these series of inhibitors.

Results and Discussion
The structures of the designed compounds, reported in Figure 1, show that either the size of the third ring (5, 6, 7 atoms) was varied as well as the pyrrole-pyrimidine junction (3,4-e or 2,3-e).Scheme 1. Synthetic approach to the tricycles.
The polycondensed tricycles were chosen having also in mind the easiness of synthetic methodology.In fact all these compounds could be accessible from known (and already studied by us [8][9][10] ) 2-and 3-aminopyrroles, ortho-cyano substituted, and the versatile BMMAs (N-(BisMethylthio)Methylenamino Acids) according to the procedure already successfully employed in previous work 11,12 (Scheme 1).

Figure 1. Structures of the selected scaffolds.
The twenty chains a-t, reported in Table 1, were selected and combined with the ring skeletons to generate the various ligands, presenting the substituents linked to the pyrrole NHs.Moreover a set of compounds of type C was also investigated to explore how simple substitutions on the base nuclei (such as the introduction of methyl groups) can influence the binding capability (Figure 2).The PDB was searched for DNA fragments bound with intercalators and the structure 1DSC (an octamer complexed with Actinomycin D, already successfully used by us 7 for this type of study) was selected.The original ligand was removed and the DNA sequence was utilized for the docking experiments.The 100 independent docking runs carried out for each ligand generally converged to a small number of different positions ("clusters" of results differing by less than 1.5 Å rmsd).Generally, in the standard Autodock output, the top clusters (i.e.those with the most favorable ΔGbind) are also associated with the highest frequency of occurrence which suggests a good convergence behavior of the search algorithm (see materials and methods section for details).An example of the resulting clusters is depicted in Figure 3.These experiments allowed to estimate the interaction energies between DNA fragment and the different ligands.Thus the in silico screening of the selected 360 structures allowed us to predict the binding free energy of the lowest energy docked structure for each ligand considered in this study (values reported in Table 2) and to evidence the type of interaction with DNA (intercalating mode or groove binding mode).
The calculated free energy of binding was found in the range -3.88 to -19.68 Kcal/mol.It results immediately evident that generally all the designed compounds have shown high capability to bind DNA, with few exceptions for scaffolds belonging only to the A3, A5, C3 series, especially when combined with chains of type d, e, or f.The data become more interesting if we consider that in the same experiment the value obtained for Actinomycin D was -10.37 Kcal/mol.All the classes of compounds are therefore able to bind DNA tighter than the reference drug, as also testified by the average values calculated either taking into account modulation of the chain on the selected skeletons (Table 2, last row) and the influence of the polycondensed system on each side chain (Table 3).In particular the compounds able to form more stable complex with DNA have shown ΔG values from 1.5 to 1.9 times more negative than Actinomycin D, see for example B1n (ΔG = -19.68Kcal/mol), C4i (ΔG = -19.62Kcal/mol).Moreover these last two, together with B6i and C2n, resulted the best DNA binders, with calculated ΔG always more negative than -19.0 Kcal/mol.However, by comparison of the behavior of the basic classes of compounds, it becomes evident that the introduction of methyl groups in the pyrrole moiety enhances the binding capability.In fact, with the sole exception of C3 type derivatives, less tighter DNA binders (cfr C3 vs B3 column) and some example of C6 series (less than a quarter of the set), generally the dimethyl scaffold furnishes ΔG values more negative than the corresponding B series.These data are also evident if the average value for all the chains is considered [see for example B1 vs C1 (ΔG = -13.37Kcal/mol vs ΔG = -13.45Kcal/mol) up to ΔG -14.84 Kcal/mol vs ΔG = -15.14Kcal/mol as in the case of B6 vs C6].As a result, generally for derivatives of class C the stability of the complexes with DNA is of the same order of magnitude of that calculated in the case of class A. If the DNA-binding capability is considered from the point of view of the chain types, it is possible to evidence that the introduction of i and t moiety always gives rise to favorable interactions, regardless the position and the size of the tricyclic system.The compounds bearing those two chains are always found in the first three rankings and the calculated averaged ΔG are the more negative (-17.49 and -17.28 Kcal/mol, respectively).Also the presence of chains b and s, followed by r and n types, resulted of primary importance in modulating the binding capability at least for more than half of the designed structures, and in any case gave rise to the most effective binders.
From a structural point of view it seems that the presence in the side chain of a ring with basic nitrogen character, as in the case of d, e, f, or m types, is detrimental for the binding capability, whereas an aliphatic amino group is much more tolerated (see for example a, c, h, j chains).On the contrary the presence of carboxylate functions and of amino acid moieties favors a more tighter binding, as shown in the case of chains b, r, i, n.All these results are in agreement with the literature data for DNA-interactive drugs, such as for example mitoxantrone, for which the specific molecular recognition also depends on the base/acid character of the chains laying along the grooves. 13Noteworthy as well is the behavior of derivatives presenting the chains t and s typical of the antiviral drugs acyclovir and ganciclovir.
By analyzing our data with respect to the ligand binding site, the docking studies revealed two different binding modes (Figure 4).The majority of structures were found to assume a position with the chromophore portion intercalated between GC base pairs whereas the side chain lies along the minor groove.A consistent number of derivatives (165) however were shown to bind essentially aligned along the minor groove.Again if we consider the nature of the heterocyclic moiety as driving force for the chosen binding mode, it appears evident that scaffolds of type A strongly direct to the intercalation to DNA, especially when the ring size is 5/6 atoms.In fact only in 2-4 cases out of 20 the groove binding mode was evidenced.But the introduction of a methyl group into the pyrimidinone moiety, as in classes of type 5, or ring enlargement to give the benzodiazepine, as in types 6, favors both binding modes that therefore become driven by the character of the side chain.Moreover these systems probably present steric hindrance to the insertion between the DNA base pairs.
With respect to the nature of side chains, in the case of n and d types, the complex is forced to align along the DNA minor groove, and only 4 or 5 derivatives out of 18 show intercalating ability.On the contrary, in the case of e, m and j chain types is the intercalating mode preferred, since the alternative binding to the groove is limited to 2-4 examples out of the 18 considered.
The docking results allow us to evaluate the affinity of each ligand towards the DNA fragment, but they do not give us any information on the type of possible electronic interactions involved in such complexes.For this reason the predicted binding affinity values (ΔGbind) of the 360 ligands, derived from the molecular docking, were taken as starting point for a QSPR analysis.Therefore, we submitted the results obtained for every single complex to structureproperties analysis by using CODESSA PRO software.The results of this approach consist in equations in which it is possible to appraise type and weight of the descriptors involved in the model built for each complex.In this case the obtained equations (A1 -C6), showing the dependency of ΔG as function of five descriptors, are listed in Table 4, whereas the molecular descriptors legend is reported in the Supplementary Materials, as Table S1.The quality of the models, obtained according to the procedure described in the section material and methods, are summarized in Table 5. Very satisfactory correlations were obtained in each case (R 2 ≥0.91;Q 2 ≥0.89), with the sole exception of B4 and C2 equations.Only seventy six descriptors in total were included into the 18 equations.Among them, some contribute to a larger extent to the models and are present in more than one equation.In particular, according to the classification reported in Table S1, for these type of ligands the more important descriptors resulted: • Among the constitutional descriptors: the total number of rings and the average information content; • Among the geometrical descriptors: the inertia moment and the molecular area; • Among the electrostatic descriptors: the partial charge (Zefirov) for all atom types; • Among the quantum-chemical descriptors: the heat of formation of the molecule and the molecular orbital energies.Moreover, from the analysis of the descriptors used to describe the binding on each single fragment and also in the whole, it is possible to get insights on the type of interactions involved in the formation of the complexes.The statistical distribution of all descriptors included in the models (Figure 5) showed a clean contribution of the quantum-chemical and electrostatic type.On the other hand constitutional and topological effects appear to be not determinant for the interactions DNA-ligand.

Conclusions
In this paper we described how docking experiments allowed to estimate the interaction energies between DNA fragment and the 360 different ligands, and to evaluate the binding mode.From our results it was possible to recognize two different kinds of binding: an "intercalating mode" and a "groove mode".The nature, size and physico-chemical properties of the designed ligands strongly influence the type of binding.
In this contribution we showed that the combination of docking-based and QSPR-based methods could lead to good models for ligand-macromolecule interaction prediction.In particular by docking procedure it was possible to calculate ΔGbind values for a set of DNAligand complexes.A QSPR analysis allowed the characterization of the type of interactions in such complexes.
The more interesting derivatives of the new series will be synthesized and further investigated in DNA-binding assays.The studies reported herein provided reliable information on the capability of new ligands to interact with DNA allowing us to quickly select the more interesting derivatives in a series of congeners.

Molecular docking
It was performed with AutoDock, version 3.0.5. 14It combines a rapid energy evaluation through precalculated grids of affinity potentials with a variety of search algorithms to find suitable binding positions for a ligand on a given macromolecule.While the protein is required to be rigid, the program allows torsional flexibility in the ligand. 15The Protein Data Bank 16 was searched for DNA fragments bound with intercalators and the structure 1DSC (an octamer complexed with Actinomycin D) was selected.Before performing docking experiment the ligand structure was removed, and Gasteiger-Marsili charges 17 were assigned.Docking to DNA was carried out using the empirical free energy function and the Lamarckian genetic algorithm, applying a standard protocol, with an initial population of 100 randomly placed individuals, a maximum number of 1.5 × 10 6 energy evaluations, a mutation rate of 0.02, a crossover rate of 0.80, and an elitism value of 1. Proportional selection was used, where the average of the worst energy was calculated over a window of the previous 10 generations.For the local search, the socalled pseudo-Solis and Wets algorithm was applied using a maximum of 300 iterations per local search.The probability of performing local search on an individual in the population was 0.06, and the maximum number of consecutive successes or failures before doubling or halving the local search step size was 4. 100 independent docking runs were carried out for each ligand.
Results differing by less than 1.0 Å in positional root-mean-square deviation (rmsd) were clustered together and represented by the result with the most favorable free energy of binding.The structures of the ligands (Figures 1, 2 and Table 1) were generated with Ligprep software 18 and OPLS_2005 force field was used for the 3D optimization.The structures were setup for docking as follows: polar hydrogens were added using the PROTONATE utility (written by D.A. Case, K. Cross, and G.P. Gippert and distributed with AutoDock).Solvation parameters were added to the final DNA fragment file using the ADDSOL utility of AutoDock.The grid maps, representing the protein in the actual docking process, were calculated with AutoGrid.The grids (one for each atom type in the ligand, plus one for electrostatic interactions) were chosen to be sufficiently large to include significant portions of the minor groove.The dimensions of the grids were thus 60 × 60 × 60, with a spacing of 0.375 Å between the grid points.Zefirov method.Quantum-chemical descriptors add important information to the conventional descriptors.They are divided into three subgroups: charge distribution-related, valence-related and quantum mechanical energy-related descriptors.BMLR (Best MultiLinear Regression) analysis was applied for developing QSPR models.For selection of descriptors the heuristic method as available in CODESSA PRO was used.First, descriptors with missing or constant values for the set of structures are discarded from the original set.Further selection of descriptors is accomplished on the basis of the statistical parameters: R 2 , F-test and t-test for the one-parameter equation with the descriptors.The default values which were kept constant throughout the calculations were set as follows: the descriptor is eliminated if it does not meet the criteria for one-parameter equation (F-test value below 1.0; R 2 min less than 0.001 and t-value below 0.01).Default value for the intercorrelation of descriptors was set to R 2 max=0.90.After the preselection of descriptors, BMLR models are developed in a stepwise procedure.Thus, descriptors and correlations are ranked according to the values of the F-test and the correlation coefficient.Starting with the top descriptor from the list, two-parameter correlations are calculated.In the following steps new descriptors are added one-by one until the pre-selected number of descriptors in the model is achieved.The final result is a list of best models according to the values of the F-test and correlation coefficient.

Figure 2 .
Figure 2. Structures of the selected dimethyl derivatives.

Table 1 .
Structures of the selected side chains

Table 3 .
Influence of the polycondensed system on each side chain (average rows ΔGbind values)

Table 5 .
Quality of the QSPR models 2: square of correlation coefficient; Q 2 : square of correlation coefficient obtained by performing leave-one-out (LOO) cross-validation; F: Fisher criteria; s 2 : square of the standard deviation.
Optimized atomic coordinates of ligands were used as an input data for the CODESSA PRO program,19where calculation and selection of descriptors for QSPR were carried out.CODESSA PRO (COmprehensive DEscriptors for Structural and Statistical Analysis) is a multipurpose program providing various methods for statistical analysis of experimental data such as Partial Least Squares, (Multiple) Linear and Non-Linear Regressions, and Principal Components Regression.A set of 615 molecular descriptors derived from geometrical and quantum-chemical structures was computed in our study, as available in this program.All descriptors were divided into five groups: constitutional, topological, geometric, electrostatic, and quantum-chemical.Constitutional descriptors reflect only the molecular composition of the compounds as the number of atoms, the number of bonds and the molecular weight, and so on.Topological descriptors describe the atomic connectivity in a molecule.Geometric descriptors are calculated from the three-dimension atomic coordinates of the molecule, and comprise moments of inertia, shadow indices, molecular volume, and molecular surface area-like descriptors.Electrostatic descriptors reflect characteristics of the charge distribution in the molecule, calculated by the ISSN 1551-7012Page 26  ARKAT USA, Inc.