A QSAR study on carbonic anhydrase inhibition: predicting logKi(hCAI) by using (SO 2 NH 2 ) NMR chemical shift as a molecular descriptor

The paper describes the use of NMR chemical shift of –SO 2 NH 2 ( δ (–SO 2 NH 2 )) moiety as a molecular descriptor for estimating carbonic anhydrase inhibition of CAI (log K i (hCAI)) for a set of sulfonamide incorporating picolinoyl moieties. The results have shown that logK i (hCAI) can be decently estimated in multi-parametric regression analysis consisting of δ (–SO 2 NH 2 )


Introduction
The sulfonamides represent an important class of biologically active compounds with at least five different classes of pharmacological agents that have obtained from the sulfonamide structure as lead.The antibacterial sulfonamides continue to play an important role in chemotherapy, alone or in combination with other drugs.Furthermore, the sulfonamides that inhibit the zinc enzyme carbonic anhydrase (CA, EC 4.2.1.1)pouses many applications as diuretic, antiglucoma or antiepileptic drugs among others 1 .Some antithyroid drugs have also been developed starting from the sulfonamide structure as lead molecule 2 .Supuran and coworkers 3 have synthesized a series of water-soluble sulfonamides incorporating picolinoyl moieties.The synthesis was carried out by the reaction of 20 aromatic / hetrocyclic sulfonamides containing a free amino, imino, hydrazine or hydroxyl group, with picolinic acid in the presence of carbodiimide derivatives.These new derivatives were assayed as inhibitors of carbonic anhydrase (CA) isozyme CAI.All these new compounds were characterized by elemental analysis, NMR spectra (as 1 H-NMR) and the reactions were monitored by thin-layer chromatography.These physicochemical and spectral techniques confirmed the structures of newly synthesized compounds.The CA inhibition against isozyme I (CAI) were reported as Ki (nM) which we have converted into their log unit and used in the present study (Table 1).The inhibition data presented in Table 1 prove that the picolinoylamido-sulfonamides behave as strong inhibitors of CA I.
The molecular structure and NMR chemical shift information of organic compounds acting as drugs can be combined to form powerful models of biological activity.Such data-activity relation is now-a-day called Quantitative Structure-Data-Activity Relationship (QSDARs) instead of QSAR as it (QSDARs) involved the use of spectroscopic data in establishing structure-activity relationship.As is well known, chemical shifts in NMR offer a powerful probe for the study of the immediate atomic environment in a molecule.Although individual chemical shifts for different atoms have received wide attention, it is somewhat surprising that there is hardly any study devoted to the collection of chemical shifts in a molecule.It is worthy to mention that NMR spectra reflect quantum mechanical properties that, QSAR tactus depend on local electrostatics and geometry.The 13 C NMR spectrum of a compound contains a pattern of frequencies that correspond directly to the quantum mechanical properties of the carbon nuclear magnetic dipole in a magnetic field.The spectral pattern reflects the local electrostatic environment and electron orbital configuration of each atom.The resonance from different carbon orbital configurations is generally well separated from each other, which permits the use of advantageous for 13 C NMR spectral directly to build the QSDAR models [1][2][3] .
Recently one of the authors (PVK) has initiated interesting investigations on 13 C NMR chemical shift [4][5][6][7][8][9] .His approach was two-fold: firstly to establish 13 C NMR chemical shift as a molecular descriptor and secondly to use the same for modeling property-activity toxicity of organic compounds acting as drugs.One of such applications studied by Khadikar being modeling CA inhibition using 13 C NMR chemical shift 6 .Prompted by these results we have undertaken the present study in that we have modeled carbonic anhydrase inhibitor of CA I using 13 C NMR chemical shifts of benzene sulfonamides as a molecular descriptor.In doing so we have used maximum -R 2 method and applied variety of statistics 10 .(1) This sequence shows that orthanilamide is the strongest inhibitor of CA I, while the benzothiazole-2-sulfonamids (16-18) have the lowest inhibitory potential.In spite of this information, the sequence doesn't exhibit any structure-activity relationships.Consequently, we have subjected the data (Table 2) for regression analysis so as to find out the statistically most significant model for estimating logKi (hCAI).The preliminary regression analysis indicated that δ (-SO 2 NH 2 ) alone is incapable for modeling log Ki (hCAI).Thus, no one-variable model is possible for estimating log Ki(hCAI).Obviously, we have applied multiple regression analysis 13 .Before a multiple regression analysis is undertaken it is convenient to normalize the data in certain ways in order to make the detection of significant correlations easier.Normally, it is sufficient to preprocess the data by means of auto-scaling and mean-centering the variables.Auto-scaling gives each variable unit variance and hence the same chance to contribute to a calculated model, while mean-scaling facilities interpretation.This can be achieved by obtaining correlation matrix 13 .In the present case, in addition to log Ki (hCAI) we have four variables: δ (-SO 2 NH 2 ), I 1 , I 2 , and I 3 for obtaining correlation matrix.Out of these four variables three (I 1 , I 2 , and I 3 ) are indicator parameters.These are dummy parameters some times used in QSAR molecules.They account for those structural features not taken adequate care in molecular descriptors.The meaning of these indicator parameters is given as a footnote to the Table 2.The obtained correlation matrix is given in Table 3.
The data show that the combination of δ (-SO 2 NH 2 ) with I 1 or I 2 or both will result into statistically significant models.The information obtained from the correlation matrix is not enough for obtaining statistically significant model; for that we need to make use of rule of thumb 14,15 .In this regard it is worthy to mention that the technique that has been most used in QSAR is linear multiple regression, which employees the least squares method 13 to find out the equation of "best-fit" of biological activity with a given combination of parameters.Tute 14 pointed out the limitations and some common pitfalls of multiple regression analysis.According to him there must be a sufficient number of compounds included in the analysis to enable statistical significance to be resulted, despite inevitable errors in measurements.A rule of thumb was evolved that at least three to five data points (compounds) should be included for every parameter in the equation (model).Looking to the number of compounds (21) and in accordance with the rule of thumb 14,15 we can at the most go for four-variable modeling for obtaining a statistically sound model for the estimation of log Ki (hCAI).At this stage it is worth defining the rule of thumb, which states that at least three to six data points (compounds) should be included for every parameter in the equation (model).Obviously, in the present case the four variable correlation containing δ (-SO 2 NH 2 ), I 1 , I 2 , I 3 as the correlating parameters may yield such a model.For achieving this goal we have carried out step-wise regression analysis following the method of maximum-R 2 .The results obtained are given in Table 4. *The model-8 is statistically not allowed as the coefficients of δ (-SO 2 NH 2 ) and I 3 are smaller than their respective standard deviation.# In obtaining model-9 (eq 4), the compound 21 is deleted being outlier.
A perusal of Table 4 shows that two two-variable models consisting of (i) δ (-SO 2 NH 2 ), I 1 and (ii) δ (-SO 2 NH 2 ), I 2 as the correlating parameters are statistically fair and that the latter model is comparatively better.This model is found as below: log Ki (hCAI) = -5.415+1.164(±0.490)δ (-SO 2 NH 2 ) -2.5054(±0.663)I2 (2) N = 21, CV = 320, R = 0.666, R 2 A = 0.382, F = 7.182Here and there after N is the number of compounds used, CV is coefficient of variance, R is multiple correlation coefficients, R 2 A is adjustable R 2 , and F is the Fisher ' s statistics.The above equation 2 shows that the magnitude of log Ki (hCAI) increases with increase in δ (-SO 2 NH 2 ), while the presence five-member heterocyclic ring fused with benzene molecules is not favorable of the exhibition of the log Ki (hCAI).
Finally, we have attempted four variable modeling using δ (-SO 2 NH 2 ), I 1 , I 2 , and I 3 as the correlating parameters.However, this model was statistically insignificant as the coefficients of δ (-SO 2 NH 2 ) and I 3 terms were quite smaller than their respective standard deviation.Also, although multiple correlation coefficient ( R ) is slightly increased, there is significant increase in CV also.Under such a situation this model becomes statistically insignificant and needs to be neglected.From the results and discussion made above we conclude that the three-variable model (equation 3) discussed above is the most appropriate model for monitoring, modeling, and estimating logKi(hCAI).We have, therefore, examined this model in more detail.We have estimated logKi (hCAI) using this model and compared them with the observed logKi(hCAI).Such a comparison is shown in Table 5.The residue, i.e. the difference between observed and calculated log Ki (hCAI) values indicated that the compound 21 is an outlier and needs to be deleted from the procedure of regression analysis.The deletion of this compound is justified from the fact that structurally it is quite different from the remaining set of compounds.The regression analysis of the reduced data set consisting of 20 compounds yielded a model with excellent statistics as below: log Ki (hCAI) = -1.359+0.641(±0.0.321) δ (-SO 2 NH 2 ) -1.619 (±0.363)I 1 -2.834(±0.436)I 2 (4) N = 20, CV = 0.200, R = 0.899, R 2 A = 0.770, F = 22.279

Comments on R 2 A
Before proceeding further it is necessary to comment of R 2 A 13 .By definition R 2 A takes into account of adjustment of R 2 .If a variable is added that does not contribute its fair share, the R 2 A will decline.R 2 A is a measure of the % explained variation in the dependent variable that takes into account relationship between the number of compounds and the number of independent variables in the regression model.Where as R 2 will always increase when an independent variable is added.R 2 A will decrease if the added variable doesn't reduce the unexplained variation enough to offset the loss of degrees of freedom.The R 2 A values as recorded in Table 4 show that it goes on increasing as we pass from one-variable to three-variable model and that is highest for the three-variable model containing δ (-SO 2 NH 2 ) I 1 , and I 2 as the correlating parameters.When we have added I 3 to this model, the resulting four-variable model has 0.604 as ARKAT the value of R 2 A .That is, by adding I 3 , R 2 A is declined from 0.616 to 0.604, thus suggesting that the added I 3 has no contribution to the resulting model and such a model need not be considered.Res.-residue (difference between observed and calculated log Ki (hCAI)).

Variance inflation factor (VIF) and eigen values
We now discuss the variance inflation factor (VIF) and eigen values of the parameters involved in the models 5 and 9 (Table 4).These values are presented in Table 8.
The VIF is defined as below: Where R i is the multiple correlation coefficients of the ith independent variables on all other independent variables.Thus, a VIF is defined for each variable in the equation, not for the ARKAT equation as a whole, and all the VIF values should be less than 10.All VIF values for both the models 5 and 9 are around 2, and thus much lesser than 10, indicating that these models reach the statistical requirements and that there is no co-linearity problem.The conclusion arrived from VIF values are further confirmed from the respective correlation matrix, eigen values and Ridge statistics discussed below.

Predictive Power
From the results and discussion made above we conclude that compared to model-5, the model-9 (Table 4) has excellent statistics.However, excellent statistics does not necessarily mean that the model will also have an excellent predictive power.For a model as a whole to be an excellent model it should have both excellent statistics and excellent predictive power.The Pogliani quality factor (Q) estimates the predictive power of the model 7,8 .This Q factor is defined as the ratio of correlation coefficient to that of CV i.e.Q = R / CV.The Q values presented in Table 4 indicate that the model-9 has better predictive power compared to that of the model-7.Another way of deciding predictive power is to calculate the activity and compare it with the observed (experimental) activity.That model will have better predictive power in which the calculated activities are closer to the observed activities.In other words that model will be have better To confirm this we have to examine the correlation statistics in that the correlation between observed and calculated activity is investigated; such statistics as given below further support that the model-9 is better than the model-5.That is for the model-9 both statistics and predictive power are excellent and thus it is an excellent model for estimating log Ki (hCAI).From model-5 (eq 3) log Ki (hCAI) Obs = -0.402+1.116(±0.013) log Ki (hCAI) Cal (6) N = 20, CV = 0.199, R 2 = 0.786, R 2 A = 0.773, F = 65.918From model-9 (eq 4) log Ki (hCAI) Obs = -0.001+1.000(±0.115) log Ki (hCAI) Cal (7) N = 20, CV = 0.189, R 2 = 0.807, R 2 A = 0.797, F = 75.360It is clear that the model-9 (Table 4, and eq 4) is the most appropriate for modeling log Ki) hCAI).y = 0.7035x + 0.8776 R 2 = 0.7855 0.9 1.9 2.9 3.9 0.9 1.9 2.9 3.9

Conclusions
From the results and discussion made above we conclude that the NMR chemical shift of -SO 2 NH 2 i.e. δ (-SO 2 NH 2 ) in combination with the indicator parameters can be used successfully for monitoring, modeling, and estimating carbonic anhydrase inhibition [log Ki (hCAI)].Since, NMR is directly related to the molecular structures and thus, to molecular graph, we advocate that all the parameters obtained from the molecular spectra, such as chemical shifts NMR spectra can be used as molecular descriptors.
ISSN 1424-6376 Page 14 © ARKAT Experimental Section (i) Carbonic anhydrase inhibition.All the inhibition constants Ki reported by Supuran and coworkers 3 are adopted and used in the present investigation after converting them in to their log units.
(ii) 1 H-NMR chemical shift.The 1 H-NMR chemical shifts for -SO 2 NH 2 i.e. δ (-SO 2 NH 2 ) in ppm units is also taken from the paper of Supuran 3 .
(iii) Indicator parameters.These are dummy parameters used in QSAR study.In the present cases we have used three indicator parameters (I 1 , I 2 , I 3 ), the details of which are given as a footnote to the Table 2.An indicator parameter assumes only two values 1 (when that structural feature is present) or 0 (when that structural feature is absent).
(iv) Regression analysis.All the regression analyses were attempted using NCSS software.

Table 1 .
Structural details of benzene sulfonamides used in the present study

Table 4 .
Regression parameters and quality of correlations for the different models

Table 6 .
Correlation matrix for the parameters used in model-5 (eq 3)

Table 7 .
Correlation matrix for the parameters used in model-9 (eq 4)

Table 8 .
Variance inflation factors and eigen values for the parameters used in model (5) (eq 3)The data presented in Tables6-8; and Figs 1 and 2 further establishes our findings and recommended that the models are free from the defect due to co-linearity, see multiple regression report (Fig3) and Ridge statistics (