New alternatives for estimating the octanol/water partition coefficient and water solubility for volatile organic compounds using GLC data (Kovàts retention indices)

New possibilities for estimating octanol/water partition coefficients (log P ) and the water solubility ( S w ) were investigated using Kovàts retention indices ( I ) obtained from GLC retention data for 132 volatile organic compounds belonging to 7 different chemical classes (hydrocarbons, alcohols, aldehydes, ketones, carboxylic acids, esters and halogen compounds). Application of the multilinear regression method led to six equations, all involving index I , as follows: ( i ) direct correlation log P vs. I (eq.1); ( ii ) log P vs. I , molar refractivity, and surface tension (eq. 2); ( iii ) log P vs. I and structural characteristics (eqs. 3, 4 and 6); ( iv ) log P vs. the I/S w ratio (eq. 5). Excepting eq. 1 (which showed relatively weak correlations), eqs. 2 – 6 can provide reliable values for log P and for log S w as proved by the significant statistical parameters. The general models presented through eqs. 2 – 6 may also be applied in estimation of other biological and/or ecological important properties, which are linearly dependent on the log P or log S w values. By generalization, a new calculation method is suggested (eq. 7), in order to allow the estimation of log P or log S w in terms of the number of bonds and the Kovàts retention index.


Introduction
The estimation of hydrophobic/hydrophilic properties of chemical compounds is relevant for many fields including medicine, pharmacology, foods, fragrances, chemical industry and environmental protection.][13] In order to obtain experimental logP values, liquid/liquid (l/l) extraction (octanol/water or, generally, organic solvent/water) may be applied.As an alternative to this time-consuming method, the Hansch model provides the logP values from the experimental fragment and bond increments. 1,2][7][8][9][10][14][15][16] Another improvement of the classical l/l extraction is offered by reversedphase liquid chromatography (RP-HPLC) 17,18 and reversed-phase thin-layer chromatography (RP-TLC), [19][20][21] which are at present the most frequently used techniques for providing experimental values of the octanol/water partition coefficients.Such methods are based on the linear correlation between logP values of the corresponding compounds and the capacity factors (for RP-HPLC) or RM 0 values (in the case of RP-TLC), when reliable correlations are obtained within homologous series.In the case of volatile compounds, gas liquid chromatography (GLC) may provide more suitable alternatives.
The retention index on a certain stationary phase 22 is the result of a gas-liquid partition process, so that it may contain information related to solvation.Such information can be extracted from GLC data, under the form of well-known solubility factors, [23][24][25] which are used for the calculation of various properties, including logP.Besides this indirect method, only few attempts of evaluating directly physico-chemical properties from GLC data have been reported, 26 such studies being restricted to polycyclic aromatic derivatives.
The aim of the present study is to investigate the possibility of estimating the hydrophobic/hydrophilic properties of organic compounds, namely logP and logS w , from Kovàts indices (I) for a set of 132 organic compounds from 7 different classes.In order to obtain relationships suitable for structurally diverse sets, the simple correlations logP vs.I were improved by addition of other parameters, such as molar refractivity, surface tension, number of bonds, accessible polar surface, or water solubility.

Results and Discussion
The use of the index I for estimation of the logP value and water solubility was carried out in the present study as a stepwise strategy, which allowed drawing up of some new linear relationships between the mentioned parameters and the index I.The predictability of logP and logS w through the performed models was tested by cross-validation, using the leave-one-out method. 27The leave-one-out cross-validation coefficient (R 2 CV ) was provided by the CODESSA program. 28dditionally, the quality factor of the regressions (Q) was calculated as R/SE. 29,30he set of compounds and the corresponding literature-accessible values for logP, 1,31,32 S w , 6,7,32 and I index 24 are presented in Table 1 (stationary phase polyphenyl ether).1) vs. I.
In order to obtain linear relationships between logP and I, sets were formed according to functional groups, as shown in Table 1.Because the experimental I values presented in Table 1 are about two orders greater than logP, index I was replaced in the calculations with the ratio I/100, denoted as I * .Linear regressions were performed according to a general equation (eq.1), leading to relatively weak correlations (R 2 = 0.431 -0.990).Statistics of these correlations are presented in Table 2.
logP= a×I * + b eq. 1 where: I * = I (Kovàts index, Table 1)/ 100 There are significant differences between the correlation parameters corresponding to the various classes, but such differences may be due rather to the heterogeneous composition of the sets, with respect to the olefinic, cyclic or aromatic structures.The best correlation coefficients were observed in the case of alcohols (R 2 =0.948) and carboxylic acids (R 2 =0.990).Such pronounced linearity may be explained by the smaller proportion of unsaturated, cyclic or aromatic structures within these sets.

Dependence of the log P parameter on the Kovàts index(I) and physical properties (molar refractivity and surface tension)
Eq. 1 was improved, according to the general equation 2, by the addition of a second parameter, constructed from two physical properties (molar refractivity and surface tension) which affect the partition process between water and the organic solvent.The molar refractivity and surface tension were computed using the ACD ChemSketch 8.0 Freeware software. 33Coefficients and statistical parameters of eqs.2a -2g, corresponding to each class of compounds, are presented in Table 3.These results show significant increases of the correlation coefficients (R 2 = 0.927 -0.992).At the same time, a good correlation between logP values calculated through eqs.2a -2g and the experimental logP is displayed by Figure 2.

Dependence of the log P parameter on the Kovàts index involving structural effects and accessible polar surface
Another improvement of eq. 1 was achieved by a QSPR study.According to the observed influence of the molecular structure on the dependence logP vs.I (Figure 1), two parameters were added to eq.1: the number of bonds (nBt) 34 in order to characterize the effects of unsaturation, rings, or aromaticity, and the accessible polar surface, which may describe the availability of the functional group to hydration.Ascribing net atomic charges (CNDO) and geometry optimization (MM + force field) were performed using the HyperChem program, 35 followed by computing of the accesssible surface of the heteroatoms.As seen in Table 4 and Figure 3, the use of the number of bonds nBt (a constitutional descriptor) together with index I (eq.3) led to a significant increase of statistical parameters, compared to eq.1.
logP= a×I * + b×nBt + c eq. 3 where nBt = number of bonds.Attempts were made for finding a unique relationship able to predict logP of all the 129 compounds (for which experimental logP was available, Table 1), independent of the type of the functional group.In this case, supplementary characteristics of the functional group are required for these compounds.The accessible polar surface was found to be the best from several tested descriptors.It was included in the QSPR model (eq.4) as a substituent factor (SF), only in the case of heteroatom-containing structures.Satisfactory statistics could be achieved only by adding an indicator variable (V) to the QSPR model, so that V has value "1" for those structures which contain oxygen atoms, and "0" for the other ones.logP = 0.031 × I * + 0.185 × nBt + 6.05 ×10 -3 ×SF + 0.353 ×V eq.4 N = 129; R 2 = 0.974; F = 1194.4,SE= 0.275, R 2 CV =0.969, Q=3.840 where: I * = I (Kovats index, Table 1)/100, nBt = number of bonds, SF = Sp = solvent accessible surface of the heteroatoms; for hydrocarbons, SF = 0; V = 1 for compounds containing oxygen atoms, and V = 0 for the other ones.
In Figure 4  Although R is slightly lower than in the case of eqs.3a -3g, the statistical parameters of eq. 4 (R 2 = 0.974, F = 1194.4,SE = 0.275) may be considered relatively good, taking into account the number and the chemical diversity of the structures covered by this relationship.
When "1" value was assigned to variable V for both oxygen-and fluorine-substituted compounds, no improvement was observed (R 2 = 0.971, F = 1065, SE = 0.291).Taking into account the hydrogen bond acceptor character of the fluorine atoms, an improvement of the statistics for equation 4 should be expected, if the variable V is related to hydrogen bonding.Such a result suggests that variable V is related rather to the Brønsted acid-base behavior of the oxygen-containing compounds.

Dependence of the logP parameter on index I and water solubility
As it is known, 1,2 logP involves the partition of a chemical compound between an organic solvent (usually 1-octanol) and water.Starting from this basic notion, we attempted to correlate logP values with the ratios I/S w , where S w represents the water solubility (eq.5a, obtained using those compounds for which both the experimental logP and S w were available, Table 1).Such a strategy may offer an easily accessible method for measuring logP, being related to the real partition process (organic phase/water).A plot of logP calculated (eq.5a) and the experimental logP is presented in Figure 5.  1).logP= 0.809 ×log(I * /S w ) + 0.066 eq.5a N = 105; R 2 = 0.969; F = 3183,72; SE = 0.281; R 2 CV = 0.965; Q=3.501 where S w = experimental water solubility.
The resulted correlation equation (eq.5a) was compared to Yalkowsky's equation applied to the same set of compounds (eq.5b), 4 showing an improvement of statistical parameters, when the Kovàts index is used together with water solubility.The improvement added by the retention index (which is related to the solubility in the organic phase) may depend on the stationary phase.logP= −0.823 × logSw + 0.812 eq.5b N = 105; R 2 = 0.950; F = 1981,84; SE = 0.353; R 2 CV = 0.949; Q = 2.762

Estimation of water solubility on the basis of Kovàts retention indices
The linear dependence between logP and S w discussed above is due to the physico-chemical definition of the logP parameter, i. e. the organic phase/water partition.Based on such a relationship, logS w may be expressed as a linear function of the same parameters that describe logP.Thus, the general model represented by eq.3 (which gave the best results from all the tested models) was applied for estimation of logS w , resulting in eq.6: logS w = a×I * + b×nBt + c eq. 6 where: I * = I (Kovàts index, Table 1)/100, nBt = number of bonds.5), vs. logS w exp.
The coefficients of eq.6, corresponding to each class of compounds from table 1, and the statistical parameters are presented in Table 5.This result, together with the plot of logS w calcd vs. logS w exp .(Figure 6), augment the validity of such a model for estimating the water solubility of various compounds.
Because both parameters logP and logS w can be expressed by the same QSPR model involving index I, a new general relationship, suitable for the estimation of logP or logS w , is proposed as eq.7: logP or logS w = a×I * + b×nBt + c eq.7 where the definitons of I * , and nBt are those specified for eq. 3 and eq. 6.
Values of the coefficients a, b, c have been already presented for each calculated property and for each class of compounds from Table 1.
The present study based on the correlation between logP and Kovàts index underlines the possibility of using GLC for the estimation of the octanol/water partition coefficients instead of liquid chromatography, in the case of volatile compounds.The linear relationship between these two parameters within homologous series could be extended to more general relationships, applicable to sets of increased structural diversity (eqs. 2 -5).In the present work, several descriptors related to the physico-chemical properties of the solutes were tested, but many other descriptors may give significant results. 34,36he intercorrelation of parameters is presented in Table 6 for the most general models, covering large datasets, namely for eq. 4 (Table 6a) and eq.5a (Table 6b).Excepting the negative significant intercorrelation between logP and logS w , no other intercorrelation can be seen in Tables 6a and 6b.More pronounced intercorrelations of the descriptors may occur within particular sets from Table 1.Randić's orthogonalization procedure can be applied to the collinear parameters for each equation within each class of compounds.According to Randić, 37,38 orthogonalization of descriptors does not affect the statistical parameters (R, F, SE) which are the same for both orthogonal and nonorthogonal models.The main reasons for using orthogonal descriptors are the stability and significance of the equation's coefficients.Analysis of the parameters and of the coefficients of the presented equations, employing orthogonal models, on several stationary phases, and using larger data sets, will be the subject of a more detailed future study.
A comparison between the estimating abilities of the proposed models towards ClogP 1,31 and several known QSPR models (eg.AlogP, 14 MlogP 15 and ACDlabs 33 ) can be made through the residuals between the experimental and the calculated logP and logS w values, as presented in Table 7. Values for AlogP and MlogP were available by using the Dragon software. 39 order to compare the presented results with the data calculated by the mentioned softwares, an average absolute deviation for the considered set was calculated for each model.
According to Table 7, average absolute deviations are smaller for eqs. 2 − 5 than for AlogP and MolgP, and in the case of eq.3 they are almost equal with those for ClogP.For the calculation of logS w , eq. 6 shows better results compared to the ACDlabs software.

Conclusions
Studying the possibility of estimation the octanol-water partition coefficients and water solubility starting from GLC data (Kovàts retention indices) in the case of 132 volatile compounds, belonging to seven different chemical classes, six equations were formulated involving the Kovàts index and additional parameters, namely molar refractivity and surface tension (eq.2), number of bonds (eq.3 and 6), accessible polar surface (eq.4) and the I/S w ratio (eq.5a).As a generalization of the performed study, a new model (eq.7) was proposed in order to estimate either logP or log S w in terms of the Kovàts index and number of bonds.The linear relationship between logP and the Kovàts index formulated through eqs. 2 -6, with R 2 values between 0.927 and 0.993, proves that GLC may offer a facile alternative for the estimation of the octanol/water partition coefficient for volatile compounds.Gas-liquid chromatography (GLC) is a simple and widely used technique for which very small samples are needed, compared to other physicochemical methods.Moreover, eq.7 evidences the possibility of extending each of the presented models in order to calculate other properties, which are linearly dependent on logP, such as bioconcentration factors 11,13 or soil adsorption coefficients. 128][19][20][21] Likewise in the case of HPLC and TLC, improvements in the accuracy of property determination by using GLC retention indices may be expected by refinements in selecting the appropriate stationary phase.Such aspects, together with refining the already obtained models, need further investigations.

Table 1 .
Experimental values for Kovàts index(I), logP and water solubility (S w ) for the studied set of compounds

General Papers ARKIVOC 2009 (x) 174-194 ISSN 1551-7012 Page 180 © ARKAT USA, Inc. 1. Estimation of log P using the Kovàts retention index (I) 1.1. Simple correlation of the log P parameter with the Kovàts index(I)
As it may be observed from Figure1, logP is linearly dependent on index I within various homologous series.Plotting logP vs.I also shows trends, depending on the functional group, unsaturation, and the presence of rings or aromatic structures.

LogP exp. Hydrocarbons Alcohols Aldehydes Ketones Carboxylic acids Esters Halogen compounds Figure
1. Plot of logP parameter (experimental, Table

Table 2 .
The coefficients a, b and statistical parameters a (R, F, SE, R 2 CV , Q) in the case of eq. 1, for each class of compounds (according toTable 1)

Table 3 .
Coefficients a, b, c and statistical parameters (R 2 , F, SE, R 2 CV , Q) a in the case of eq. 2, for each class of compounds (according toTable 1)
the logPcalcd.values (calculated via eq.4) are plotted against the experimental values logPexp.

Table 5 .
Coefficients a, b, c and statistical parameters (R, F, SE, R 2 CV , Q) in the case of eq.6, for each class of compounds (according to Table1) © ARKAT USA, Inc.

Table 7 .
Residuals of the experimental logP and logS w values towards the corresponding values calculated through eq. 2 − 6, and through some already known QSPR models © ARKAT USA, Inc. © ARKAT USA, Inc.