Binding pose analysis of hydroxyethylamine based β-secretase inhibitors and application thereof to the design and synthesis of novel indeno[1,2-b ]indole based inhibitors

β-Secretase (BACE1) is recognised as a target for the treatment of Alzheimer’s disease, and transition-state isosteres such as hydroxyethylamines have shown promise when incorporated into BACE1 inhibitors. A computational investigation of previously reported carbazole-based hydroxylethylamines with contradictory binding poses was undertaken using molecular dynamic simulations to rationalise the ligands preferred binding preference. Visual inspection of the confirmed binding pocket showed unoccupied space surrounding the carbazole moiety which was probed through the synthesis of seventeen ligands wherein the carbazole ring system was replaced with an indeno[1,2-b ]indole ring system. The most active compound, rac -1- [benzyl(methyl)amino]-3-(indeno[1,2-b ]indol-5(10 H )-yl)propan-2-ol, indicated an inhibition of 91% at 10 µM against β-secretase with a cytotoxicity IC 50 value of 10.51 ± 1.11 µM against the SH-SY5Y cell line.


Introduction
4][5][6][7] The total cost of care in the United States for people with AD was $305 billion in 2020, not including an estimated $244 billion in unpaid caregiving by family and friends making it one of the costliest diseases to manage. 2 Alzheimer's disease is characterised by a gradual onset and progression of deficits in more than one area of cognition, including disruption in behavioural, language and memory skills. 2 To date, consensus on the cause of AD has not been reached, however, several hypotheses have been put forward based upon the various pathophysiological factors observed.One of these hypotheses, the amyloid hypothesis, also known as the amyloid cascade hypothesis, revolves around the cascade of events arising during the formation and accumulation of amyloid-beta (Aβ) fragments in the extracellular matrix of neuronal cells. 8The Aβ fragment is cleaved from the amyloid precursor protein (APP) by the secretase enzymes (α, β and γ secretase) 9 at different sites, leading to the formation of different fragments. 10The amyloidogenic metabolic pathway is initiated by the aspartyl protease β-secretase (also known as BACE1), followed by γ-secretase, which results in a full Aβ 1- 40/42 fragment, which then aggregates to form toxic amyloid plaques in the brain. 11,12Due to BACE1 being the initiator of the amyloidogenic metabolic pathway, it is considered a desirable target for lowering brain levels of Aβ and, consequently, the development of treatments for, or the prevention of, AD.
0][21][22][23] In particular, the group of Macchia have developed a series of such BACE1 inhibitors, comprised of a carbazole ring connected via an HEA linker to a range of aromatised ring systems.The most active compounds prepared, 1, 19 2 21 and 3 20 had reported IC 50 values of 0.5 µM, 1.6 µM and 3.8 µM, respectively (Figure 1).An initial review of the literature, however, revealed that two contradictory binding poses have been reported for these compounds, and, to the best of our knowledge, no work has been carried out to confirm the correct pose.Furthermore, no crystal structures of BACE1 with either these or similar compounds have been reported.In this study, binding-pose validation using molecular dynamics (MD) simulations was conducted on Macchia compounds 1, 2 and 3 to confirm the correct binding pose.Following this, a series of seventeen compounds were designed and synthesised to probe unexplored chemical space identified during the binding pose analysis.Synthesised compounds were then subjected to biological screening and further in silico investigations.

Results and Discussion
Binding pose analysis An initial in silico analysis of previously reported compounds 1, 2 and 3 was undertaken to rationalise the binding pose.Prior to commencing with docking and MD studies, the pK a of the amines was computationally predicted to determine whether the protonated forms of the ligands should be used.Calculations of the pK a s determined the free-base amines to be 2.6, 6.2 and 1.9 for 1, 2 and 3, respectively (see ESI for pK a results).][21] As there are no reported crystal structures of BACE1 complexed with the same or structurally-similar compounds, several BACE1 crystal structures with different binding-pocket conformations were selected for docking and induced-fit docking (IFD).In both approaches, several binding poses for ligands 1, 2 and 3 were obtained, possibly explaining the discrepancies noted in the literature.With these discrepancies having been replicated, a more thorough investigation was performed to identify the correct binding pose.The poses generated from the IFD protocol can largely be separated into pose A or pose B in which the ligand is rotated by ~ 180֯ in the binding pocket.The binding poses of A (in blue) and B (in green) of 1, 2 and 3 in the binding pocket are highlighted in Figure 2. In the case of pose A, the carbazole ring was found to be surrounded by the residues of Pro131, Ile187, Arg189, Tyr259 and Thr390, and, in pose B, by Gly74, Ile171, Trp176 and Thr292.Subsequent MD simulations of 100 ns showed a clear preference for pose A over pose B shown, again, in blue and green, respectively, in Figure 2. The root-mean-square deviation (RMSD) of the ligands during the simulation show pose A to be stable in the binding pocket with an RMSD of 2 Å, while pose B is unstable with significant movement of the ligand in relation to the backbone.This strongly suggests that pose A is the correct binding pose, and is in contrast with the literature reporting for compound 1, 19 which suggested pose B based on molecular docking studies.Pose A of 1 is noted to form π-π stacking with Tyr132 and Trp176; hydrogen bonding with Thr292 from the hydroxy group of HEA; and halogen bonds with Lys285 and Thr390 from the chlorine substituent on the carbazole (Figure 3).Following visual inspection of pose A, it was noted that there is reasonable space for elongation of the carbazole moiety, suggesting that there was additional chemical space available to explore.The carbazole moiety is noted to have sufficient space for elongation while avoiding steric hindrance.The dashed yellow, cyan and purple lines show the hydrogen bonds, π-π stacking and halogen bonds between the ligand and surrounding residues.For clarity, selected residues of the binding pocket have been hidden.

Synthesis
0][21] The HEA moiety and the 1-naphthylamine moieties have been extensively studied, wherein, the HEA moiety has been replaced with different amine and hydroxyl derivatives, 24 and the 1-naphthylamine moiety with sulphonamide or arylcarboxamide derivatives. 20,21In contrast, the chemical space surrounding the carbazole moiety has remained largely unexplored, with only phenyl-2-substituted indoles having been previously reported. 19In light of the binding-pose analysis, we elected to probe the chemical space surrounding the carbazole group of compound 1 by replacing it with an indeno[1,2-b]indole ring system.The 1-naphthylamine moiety was then substituted with various amines while the HEA moiety was kept constant (Figure 4).Syntheses  The preparation of the desired targets was envisaged through the treatment of epoxide 21 with various amines.Initially, epoxide 21 was prepared in two steps by reacting commercially available 1-indanone 22 with phenyl hydrazine hydrochloride, in the presence of Amberlyst-15 as a catalyst, in a Fischer indole type reaction to afford indole 23 25 , which was subsequently reacted with excess (±)-epichlorohydrin (ECH) in the presence of base to afford epoxide 21 in an overall yield of 67% (Scheme 1). 26heme 1. (i) 1.2 eq.PhNHNH 2 .HCl, cat.Amberlyst-15, EtOH, reflux, 12 h, 81%; (ii) 7 eq.(±)-ECH, 2.5 eq.KOH, THF, 85 °C, 10 h, 83%.

Structure activity relationship study
The BACE1 inhibitory activity of all newly synthesized compounds was assessed using a fluorescent BACE1 activity detection kit (Sigma-Aldrich, St. Louis, USA).The BACE1 inhibitory activities for the synthesized compounds, expressed as percentage inhibition at 10 µM, are presented in Table 1.In addition, selected compounds were assessed for cytotoxicity in the SH-SY5Y neuroblastoma cell line using the sulforhodamine B (SRB) staining assay as described by Vichai and Kirtikara. 28Although the SH-SY5Y cell line is cancerous in nature, it presents as a good model of a neurological cellular environment. 28Compounds were screened in both assays as racemic mixtures.Surprisingly, subtle changes in the nature of the R 1 group resulted in wide fluctuations in activity.Arguably, benzyl derivatives were found to be better inhibitors of BACE1 compared to heterocyclic aliphatic substituents such as compounds 14 and 15.Addition of a methyl group to the benzyl nitrogen (compound 19) afforded the compound with the highest inhibition (91%).In comparison, having the position unsubstituted (compound 4) resulted in a dramatic decrease in activity (42%).Addition of a second benzyl group (compound 13) resulted in complete loss of inhibition.In the case of the latter, one might surmise that the group is too bulky, however, previously reported 3 showed that bulky substituents can readily be accommodated while maintaining good activity.The substitution of the hydrogen on the benzyl nitrogen (compound 4, 42%) with a methyl group (compound 19, 91%) appears more favourable in terms of inhibition.The substitution of the planar aromatic benzyl group (compound 4, 42%) with that of more flexible cyclohexyl and methyl groups (compound 20, 86%) or 1,2,3,4-tetrahydronaphthalene group (compound 17, 71%) increased inhibition substantially, suggesting that cyclic aliphatic systems may be good substituents in the development of BACE1 inhibitors.Finally, compound 19, which displayed the highest percentage inhibition, exhibited relatively low cytotoxicity against the SH-SY5Y neuroblastoma cells with an IC 50 value of 10.51 µM.

In silico studies
Prediction of pK a values.Previous computational studies only considered the neutral form of the ligands for molecular docking.However, due to the secondary and tertiary nitrogen groups present in ligands 4-20, it was decided to use a Quantum Mechanical (QM) based approach to predict the pK a of the ligands.Using Schrödinger's pK a predictor within the Jaguar suite, it was determined that the pK a of the tertiary nitrogen of the most active compound, 19, was 6.8, regardless of the stereochemistry (see the ESI).The pH of the buffer used for the in vitro screening was 4.5.This would strongly suggest that compound 19 was protonated when the assay was performed.Therefore, the protonated form of 19 was used in the subsequent in silico studies.Docking and pose validation.Computational analysis of compound 19 was modelled to confirm whether the same preference for binding which was noted for 1, 2 and 3 was retained.As suspected, 19 yielded the same two main poses for each enantiomer from the induced fit docking (IFD) protocol.Figure 5 shows the two binding poses of 19 (S,R) in the binding pocket, with pose A in blue and pose B in green.The tetracyclic ring of pose A was found to be surrounded by the residues of Pro131, Ile187, Arg189, Tyr259 and Thr390; and, in pose B, by Gly74, Ile171, Trp176 and Thr292.
Instead of utilising computationally expensive MD simulations, an alternative approach was investigated using the binding-pose-metadynamics algorithm present in the Schrödinger suite.Several simulations of each binding pose were run to predict which pose is correct.Metadynamics simulations utilise Gaussian potentials to discourage keeping the current pose.Using this technique, incorrect binding poses are displaced, while the correct binding pose remains relatively stable.The approach is quantified in terms of two scoring functions: the persistence score (Perscore) and the Pose Score (PoseScore).The persistence score (PersScore) is considered as the average persistence of important contacts(for example hydrogen bonds and π interactions) between the ligand and protein.A higher PersScore equates to more stable complexes.The PoseScore measures the root-mean-square deviation (RMSD) of the ligand at the end of the simulation.As such, lower PoseScore's equate to more stable complexes.
The simulations showed that pose A is relatively stable while pose B is significantly displaced during the simulation.The PersScore of pose A shows significantly more stable interactions than for pose B (Figure 5), suggesting again that pose A is the correct binding pose.For the second enantiomer (S,S), it was not possible to differentiate between the two binding poses using the binding pose metadynamics, even when the gaussian height for the simulation was increased to ten times the default (see the ESI for metadynamics results).Subsequently, a long 100 ns molecular dynamic (MD) simulation was performed as was originally done for 1, 2 and 3 (Figure 6).A clear difference in stability was noted between the two binding poses, where pose A (blue) was significantly more stable than pose B (green), with the RMSD remaining below approximately 2 Å.Thus, for both enantiomers of 19 (S,R and S,S), a clear preference was noted for pose A, adding further support for the predicted binding poses of 1, 2 and 3.
Visual inspection of the phenyl ring of 19 while bound in pose A shows that there is enough space in the surrounding area to accommodate larger moieties such as compounds 7, 8, 10, 11, 17 and 20 which are noted to be active at 10 μM.In the case of the other two enantiomers (R,R and R,S), in which the protonated nitrogen is inverted, both poses showed significant displacement within the binding pocket during a 100 ns MD simulation (see the ESI for binding poses and MD results).This may be suggestive, thereof, that only the enantiomers shown in figure 5 and figure 6 cause inhibition of BACE1. Figure 6.Overlaid docking poses of the two suggested binding poses of an enantiomer of 20 (S,S).Pose A (blue) has hydrogen bonding to the backbone nitrogen of Thr133 (shown in yellow), along with ionic interaction with Asp289 and Asp93.A π-π edge-to-face interaction is also present with Tyr259.Pose B (green) has the π-π interaction with Phe169, hydrogen bonding to Asp93, and ionic interactions with Asp93 and Asp289.For clarity, selected residues of the binding pocket have been hidden.A plot of the ligands RMSD during the course of the 100 ns MD simulation is shown for pose A and pose B in blue and green, respectively.The ligand RMSD is indicative of how stable the ligand is with respect to the protein's backbone.

Interaction frequency analysis.
With the further confirmation of the correct binding pose, analysis of the molecular dynamic (MD) simulations was performed for the enantiomers of 19 (S,R and S,S) to better understand the differences in the interactions that occur throughout the simulation.Figure 7 shows the contacts that were noted between the ligands (in pose A) and the residues of the binding pocket.The enantiomers shared notable similarities, such as the hydrophobic interactions with Tyr132, Phe169 and Tyr259.The first enantiomer (S,R), in which the hydroxyl is in the R configuration, has ionic interactions with Asp93 and Asp289, along with an additional hydrogen bonding to Asp289 from the hydroxyl hydrogen.Less frequent interactions include water bridges and hydrogen bonds with Thr133 and Gln134.For the second enantiomer (S,S), where the hydroxyl is in the S configuration, ionic interactions with Asp93 and Asp289 were also noted, albeit less frequently, especially in the case of the latter interaction.While the water bridges and hydrogen bonds with Thr133 and Gln134 were again noted, this was with increased hydrogen bonding frequency for Thr133, compared to the first enantiomer.

Supervised Machine Learning BACE1 Prediction
Effort was directed towards utilising machine learning (ML) as a predictive tool for estimating potency of BACE1 inhibitors.The docking tool Qvina2 performed well in redocking the BACE1 inhibitors from the comparative assessment of scoring functions (CASF) database with 86% of the poses having a RMSD < 2 Å.The square of the Pearson product-moment correlation coefficient (R 2 ) between the Qvina2 score and experimentally determined inhibition constants was 0.531; the root-mean-squared error (RMSE) was 1.938 (Figure 8A).The enhanced directory of useful decoys (DUD-E) trained random-tree forest (RTF) model yielded a 0.99 area under accumulation curve with a 1% enrichment factor of 55 [see ESI for receiver operating characteristic (ROC) curve].Applying the DUD-E trained RTF model to the docked poses of compounds with known K i values, an R 2 and RMSE of 0.876 and 0.737, respectively, were achieved (Figure 8B).
Pose A and B for compounds 1, 19, 2 and 3 were analysed to compare the predicted K i 's.The DUD-E-trained RTF model predicted K i values (Min = K i + RMSE and Max = K i -RMSE) for the best pose of compounds 1, 19, 2 and 3 were 312.9 nM (Min-Max: 90.1 -1086.8nM), 603,20 nM (Min-Max: 173.67 -2095.05nM), 95.6 nM (Min-Max: 27.5 -331.9 nM), and 405.4 nM (Min-Max: 116.7 -1408.1 nM), respectively.The predicted K i for compound 1 corresponded to the reported IC 50, however, for compounds 2 and 3 the K i 's were 10 times lower than the reported IC 50, suggesting that the DUD-E-trained RTF model overestimates the K i for these compounds.The newly synthesized compound 19 had a predicted K i value of < 2. The K i was predicted to be more than the IC 50 's of compounds 1, 2 and 3, suggesting that it is a less potent compound in comparison.Due to the large range in the confidence interval, preference for either pose could not be identified using the developed RTF model.

Pharmacokinetics, ADME parameters and drug-like nature
The library of compounds was further assessed in terms of physiochemical descriptors, predictive absorption, distribution, metabolism and excretion (ADME) parameters, and pharmacokinetic properties using the Swiss Institute of Bioinformatics SwissADME web tool. 29All compounds showed good drug-likeness in terms of the Lipinski guidelines, and no pan assay interference structures (PAINS) were noted. 30Analysis of the boiled-egg plot shows that all ligands synthesized, with the exception of 7, 13 and 10, are all predicted to be blood brain barrier permeants (yellow area).Notably, previously reported 1, 2 and 3 all fall outside of the upper WlogP limit of ~6 required for efficient BBB permeation (Figure 9).All the ligands prepared were predicted to have high gastrointestinal absorption (yellow and white areas), and all were predicted to be P-glycoprotein substrates.The physiochemical descriptors for the three most active compounds, 10, 19, and 20, and previously reported compounds, 1, 2 and 3, are summarised in Table 2.

Conclusions
Computational studies were performed to identify the correct binding pose of a series of carbazole-based BACE1 inhibitors as there was no clear consensus on the correct pose based on previously reported studies.
The validated binding pose, whereby the carbazole is surrounded by Pro131, Ile187, Arg189, Tyr259 and Thr390, showed room for elongation of the carbazole moiety.This led to the design and synthesis of seventeen novel 1-amino-3-(indeno[1,2-b]indol-5(10H)-yl)propan-2-ol derivatives as possible BACE1 inhibitors.The compounds exhibited moderate to low cytotoxicity against the SH-SY5Y neuroblastoma cell line.Future research will be carried out whereby the tetracyclic ring system will be replaced with different substituted ring systems in an attempt to further reduce the cytotoxicity while maintaining the current inhibition of BACE1.
MD simulations suggested that only the S,R and S,S enantiomers are acting as inhibitors, and in silico ADME predictions suggest that compounds 19 and 20 are potentially more attractive than compounds 1, 10, 2 and 3 as lead compounds with lower WLogP values (< 6), placing them in the optimal range for BBB permeants.

Experimental Section
General.All solvents, chemicals, and reagents were obtained commercially and used without further purification. 1H NMR (300 MHz) and 13 C NMR (75 MHz) spectra were recorded on Bruker AVANCE-III-300 instrument using CDCl 3 .CDCl 3 contained tetramethylsilane as an internal standard.Chemical shifts, δ, are reported in parts per million (ppm), and splitting patterns are given as singlet (s), doublet (d), triplet (t), quartet (q), doublet of doublets (dd), triplet of doublets (td) or multiplet (m).Coupling constants, J, are expressed in hertz (Hz).Mass spectra were recorded in ESI mode on a Waters Synapt G2 Mass Spectrometer at 70 eV and 200 mA.Samples were dissolved in acetonitrile (containing 0.1% formic acid) to an approximate concentration of 10 μg/mL.Infrared spectra were obtained using a Bruker ALPHA Platinum ATR spectrometer.
The absorptions are reported on the wavenumber (cm -1 ) scale, in the range 400-4000 cm -1 .Melting points were measured on a Stuart Melting Point SMP10 microscope and are uncorrected.The retention factor (R f ) values quoted are for thin layer chromatography (TLC) on aluminium-backed Macherey-Nagel ALUGRAM Sil G/UV 254 plates pre-coated with 0.25 mm silica gel 60.Spots were visualised using UV light and basic KMnO 4 spray reagent.Chromatographic separations were performed on Macherey-Nagel Silica gel 60 (particle size 0.063 -0.200 mm).Yields refer to isolated pure products unless stated otherwise.

5,10-Dihydroindeno[1,2-b]indole (23).
A mixture of 1-indanone 22 (10.00 g, 75.68 mmol, 1 eq.), phenylhydrazine hydrochloride (13.14 g, 90.87 mmol, 1.2 eq.) and Amberlyst-15 (37.88 g, 0.5 g/mmol SM) was refluxed in absolute ethanol (250 mL) for 12 h.The reaction was monitored by TLC, and, upon completion, the mixture was cooled to room temperature, the catalyst filtered off, and the product washed thoroughly with ethyl acetate (100 mL).The organic filtrate was collected, dried (Na 2 SO 4 ), filtered and the solvent was removed in vacuo to afford  5 mmol, 1 eq.) was dissolved in dry tetrahydrofuran (100 mL), followed by the addition of (±)epichlorohydrin (21.89 mL, 279.9 mmol, 7 eq.).Potassium hydroxide (5.47 g, 97.5 mmol, 2.5 eq.) was then added slowly and the reaction mixture was heated to 85 °C for 10 h.The turbid reaction mixture was filtered to remove the salts, and the salt mass was rinsed with acetone (20 mL).The solvent was then removed in vacuo, the solid obtained was dissolved in dichloromethane (100 mL) and washed with distilled water (100 mL).The organic layer was then dried (NaSO 4 ), filtered and the solvent was removed in vacuo to afford the crude product.The product was triturated from methanol, filtered off and washed with cold methanol (15 mL) to afford the product rac-5-(oxiran-2ylmethyl)-5,10-dihydroindeno [1,2-b] Cytotoxicity screening.Cytotoxicity was assessed as cell density using the sulforhodamine B (SRB) staining assay on SH-SY5Y neuroblastoma cells as described by Vichai and Kirtikara with minor modifications. 28The SH-SY5Y cell line was cultured in DMEM/Ham's F12 nutrient medium (1:1) supplemented with 10% foetal calf serum (FCS) in 75 mL flasks at 37 °C and 5% CO 2 in a humidified incubator.Confluent cells were washed with phosphate buffered saline and harvested using TrypLE™ Express to detach the cells.Detached cells were centrifuged (200 x g, 5 min), counted using the trypan blue exclusion assay (0.1%), and diluted to 1 x 10 5 cells/mL in 10% FCS-fortified medium.Cell suspension (100 µL) was seeded into sterile, clear 96-well plates, and incubated overnight to allow for attachment.Blank wells contained 200 µL FCS (5%)-fortified media without cells to account for background interference and sterility.Attached cells were exposed to 100 µL medium (negative control), compounds 4-20 (0.01-100 µM) or saponin (1%; positive control) prepared in FCSnegative medium for 72 h.Cells were fixed using 50 µL trichloroacetic acid (50%) overnight at 4 °C.Fixed cells were washed three times with tap water and stained using 100 µL SRB solution (0.057% in 1% acetic acid) for 30 min.Stained cells were washed four times with 100 µL acetic acid (1%) and air-dried.The bound dye was eluted using 200 µL Tris-buffer (10 mM, pH 10.5) and the absorbance measured at 510 nm (reference 630 nm) using a Synergy 2 plate reader (Bio-Tek Instruments, Inc.).All values were adjusted by subtracting the blank.Cell density is expressed relative to the negative control as a percentage.
Statistics.Assays were performed as three intra-as well as three inter-replicates.Statistical analyses were performed using GraphPad Prism 5.0.BACE1 percentage inhibition was determined using linear regression analysis.
Molecular modelling pK a Predictions.Ligand structures were prepared using ligprep. 32The neutral enantiomers of 1, 19, 2 and 3 were utilised for the QM based pK a predictions using Jaguar which is present in the Schrödinger suite. 33A conformational search was performed using default settings which entails the use of an automatic search function to identify pK a atoms and run a pK a calculation on each compound.Molecular Docking.All protein structures were prepared using protein preparation wizard from Schrödinger, where protonation states were assigned and energy minimisation was preformed to relieve unfavourable constraints.Molecular docking was done using Glide extra precision (XP) 34 and all ligand structures were treated as flexible to obtain 10 poses for each ligand.Induced Fit Docking.Using the Induced Fit Docking (IFD) protocol present in the Schrödinger suite, [35] the protonated enantiomers of 19 were docked into the prepared crystal structure which produced the best docking scores from the previous docking (PDB: 3CIC and 1W51).The default settings were used with the exception that Glide XP was used for the redocking stage.Binding Pose Metadynamics.Selected complexes from the IFD protocol were selected for binding pose metadynamics, an algorithm present in the Schrödinger suite [36].The complexes were run using the default settings as well as at an increased gaussian height of 0.5 kcal/mol, which is ten times larger than the default height of the gaussian.Molecular Dynamic simulations and analysis.Complexes for pose A and B of all four enantiomers from the IFD procedure were submitted for MD simulations using the Desmond package from the Schrödinger Suite and the OPLS3e force field. 37Solvent molecules within the binding pocket from the IFD were kept and the complex was solvated using TIP3P water model with the boundaries of the simulation box being 10 Å away from the complex, with an orthorhombic box containing approximately 45000 atoms.The MD simulations were done for 100 ns with NPT conditions using Berendsen thermostat (310 K, 1.013 bar) and particle mesh Ewald (PME) electrostatics with a cut-off of 9 Å.Frames were extracted every 250 ps.Analysis of the MD simulations was done using the simulation interaction diagram tool within the Schrodinger Suite. 32The ligand RMSD relative to the protein backbone was analysed using the simulation even analysis tool.Machine Learning.The PDBbind v2018 general set contains 16 126 protein-ligand complexes with experimentally measured binding affinities (K i , K d and IC 50 ). 38Complexes with covalently bound ligands and peptide ligands were removed to give a final set of 11781 protein-ligand complexes.This set contained 315 BACE1 protein-ligand complexes with measured binding affinities.Haupt et al. (2015) suggested that for the cytochrome P450 enzymes, values of Ki can be reliably estimated from values of IC 50 /2 . 39In addition, research lead by Kalliokoski found that a K i -IC 50 conversion factor of 2 for the ChEMBL database is a reasonable assumption. 40For this study K d , IC 50 /2 and K i values were assumed equal in order to train a random tree forest regression model to predict K i values of docked poses.K i values were transformed to Gibbs free energy with: The Smina package was used to minimize the ligands with the Vina scoring function following which 43 parameters of the bound ligands including the Vina docking score, 9 steric, 3 hydrophobic, 2 nonhydrophobic, 10 atom-type Gaussian, 4 non-direct hydrogen bond terms, 2 acceptor-acceptor, 2 donor-donor, 2 repulsion, 2 and 4 solvation, 1 electrostatic and 5 ligand descriptors were extracted. 40Additionally Babel was used to extract 9 ligand descriptors including topological polar surface area (TPSA), octanol/water partition coefficient (logP), molar refractivity (MR), molecular weight (MW), number of aromatic bonds (abonds), number of double bonds (dbonds), number of hydrogen bond acceptors 1 and 2 (HBA1 and HBA2), number of hydrogen bond donors (HBD) for a total number of 52 parameters to build a random tree forest (RTF) predictive model trained against the Gibbs free energy estimates of the ligands.The Ranger package in R was used to build RTF models and the Caret package was used for cross-validation of the models. 41,42An initial RTF model was built from the 11781 CASF protein-ligand complexes by using M = 2500 regression trees with m try = 33 and a 10-fold cross validation.The active and decoy BACE1 ligands from the DUD-E database were docked with QuickVina2 (QVina2) into 5 selected BACE1 receptors (PDB: 2VKM, 3CIC, 3I25, 3LPK, 4H3F) and all the docked poses were scored using the initial RTF model. 43The best scored active ligands from the initial RTF model were calibrated with their measured binding affinities transformed to their Gibbs free energy estimates while the scores from the decoys were calibrated to give the best scores decoy with minimum binding affinity of 10 µM.The parameters from these ligands were included in the final BACE1 RTF model and this model was used to predict the binding affinities of the synthesized ligands in this study.The predicted interval range was calculated as Min = K i + RMSE, and Max = K i -RMSE.

Figure 2 .
Figure 2. Overlaid images of the two selected binding poses for a) 1, b) 2 and c) 3. Pose A and pose B are depicted on the left in blue and green, respectively.For clarity, selected residues of the binding pocket have been hidden.A plot of the ligands RMSD with respect to the protein's backbone during the course of the 100 ns MD simulation is shown on the right for pose A (blue) and pose B (green), respectively.

Figure 3 .
Figure 3. Validated binding pose of 1 with BACE1.The carbazole moiety is noted to have sufficient space for elongation while avoiding steric hindrance.The dashed yellow, cyan and purple lines show the hydrogen bonds, π-π stacking and halogen bonds between the ligand and surrounding residues.For clarity, selected residues of the binding pocket have been hidden.

Figure 5 .
Figure 5. A) Overlaid docking pose of the two suggested binding poses of the (S,R) enantiomer of 20 from the induced fit docking.Pose A (blue) contains a hydrogen bond to Asp289 (shown in yellow), along with ionic interactions with Asp289 and Asp93 from the protonated tertiary nitrogen (shown in pink).Pose B (green) has the same ionic interactions with Asp289 and Asp93 as pose A, while π-π edge-to-face stacking is noted between Tyr132 and the tetracyclic ring.For clarity, selected residues of the binding pocket have been hidden.B) A plot of the ligands RMSD, averaged over the course of seven metadynamics simulations, with each simulation being 10 ns.

Figure 7 .
Figure 7. Protein-ligand contacts noted during a 100 ns MD simulation at 310 K for two different enantiomers of 19 (S,R and S,S) bound to BACE1 in the confirmed binding pose.The contacts are categorised into four types: hydrogen bonds (green), hydrophobic (purple), ionic (red) and water bridges (blue).The frequency of the contracts is shown where the ligands make contact with residues in the binding pocket.The bar chart is normalised, with values over 1.0 representing protein residues that make multiple contacts with the ligand.

Figure 8 .
Figure 8. Plot of (A) Qvina2 score and (B) DUD-E trained RTF model against experimentally determined inhibitor constants (K i ) of known BACE1 inhibitors.

Table 1 .
In vitro BACE1 inhibitory activity and cytotoxicity results of compounds

Table 1 .
Continued a Data are the mean ± SEM of three independent experiments; b No observable inhibition detected; c Not determined due to precipitation occurring.