3. Force Fields

We can often obtain the 3D coordinates of molecules from online databases for use in our protein-ligand docking calculations. However, unless we are simply wanting to perform a virtual screen of known compounds we will at the very least want to modify an existing structure to suit our requirements.

Thus, we enter the realm of molecular modelling where we use computational methods to create the molecules that we wish to study. Of course, we need to know that the geometry of the molecule is correct and we can do this using energy minimisation techniques.

To obtain the best results for the broadest range of molecules we would ideally use quantum chemical methods and solve the electronic Schrödinger equation for a given configuration of atomic nuclei. However, in practice this is far too costly and generally speaking yields a level of accuracy that is much higher than we actually need for the purposes of docking.

In order to side-step this problem we use molecular mechanics which replaces the quantum mechanical problem with a simple classical potential that acts on the atoms in our molecule. The sum of all the potential terms describing the interactions in a molecule (bonding, electrostatic, vdW, …) is referred to as a force field.

The main idea behind the construction of such a force field is that molecules tend to be made up of units that are structurally similar in different molecules. Whilst there are obviously limitations to this idea it is generally accurate for molecules of similar types, e.g. C-H or C=O bond lengths or the angles surrounding them are rarely that different. This is reasonable since it is one of the main concepts underpinning organic chemistry (i.e. the ‘functional group’).

Chemists have long used physical space-filling or ball and stick models of molecules in order to try to understand their geometry and properties. Molecular mechanics can be thought of as a computational generalisation of these methods. It has the added advantage that, as mentioned above, it allows us to use automated energy minimisation to find the lowest energy geometry of a molecule (and therefore the one that will be thermodynamically favoured).

3.1. Atom types

Molecular mechanics force-fields depend on pairwise interactions between the atoms in a molecule. Treating each atom uniquely would be prohibitively expensive, defeating the aim of creating a computational method that is fast enough to be useful.

Generalised atom ‘types’ are therefore created corresponding to e.g. an \(\text{sp}^3\) carbon. A balance must be sought between over-generalisation and accuracy so whilst multiple atom types for a single element usually exist in any given force-field this number is kept to a minimum where possible. A selection of atom types from one of the early general-purpose force fields (MM2) are shown below

Type

Symbol

Description

1

C

\(\text{sp}^3\)-carbon

2

C

\(\text{sp}^2\)-carbon, alkene

3

C

\(\text{sp}^2\)-carbon, carbonyl, imine

4

C

\(\text{sp}\)-carbon

30

C+

carbocation

8

N

\(\text{sp}^3\)-nitrogen

39

N+

\(\text{sp}^3\)-nitrogen, ammonium

5

H

hydrogen, except on N or O

21

H

alcohol hydrogen (OH)

24

H

carboxyl hydrogen (COOH)

As you can see, the chemical symbols do not carry the information needed by the force field. Instead, unique numbers are given to the different atom types allowing the software to generate the correct interactions between the atoms in a molecule.

This is important to understand because only pre-specified atom types can be handled by any given force field. Fortunately, the force fields you will encounter in this course are quite general and can treat most kinds of molecule that you will be interested in.

Note

If you desperately want to work with some sort of unusual compound containing e.g. atoms in uncommon oxidation states or exotic metals you may be out of luck. In this case you can either develop force field parameters for your system or resort to the more expensive electronic structure methods mentioned earlier.

3.2. The force field energy

In general terms (details of different force fields may differ slightly) the total force field energy \(E_{\text{FF}}\) is obtained form a sum over various terms:

\[E_{\text{FF}} = E_{\text{str}} + E_{\text{bend}} + E_{\text{tors}} + E_{\text{vdW}} + E_{\text{el}} + E_{\text{cross}}\]
  • \(E_{\text{str}}\) – the energy arising from stretching of a bond between two atoms

  • \(E_{\text{bend}}\) – the energy of distortion of bond angles

  • \(E_{\text{tors}}\) – the energy associated with rotation about bond dihedral angles (torsions)

  • \(E_{\text{vdW}}\) – the van der Waals energy of interaction between two atoms

  • \(E_{\text{el}}\) – the energy of interaction of atomic partial charges

  • \(E_{\text{cross}}\) – a combined term taking into account coupling between \(E_{\text{str}}\), \(E_{\text{bend}}\) and \(E_{\text{tors}}\)

With this function in hand we can calculate energies and geometries for a given nuclear configuration. This means that we can compare geometries and calculate relative energies for each one.

In addition, we can use the gradient of the energy with respect to changes in the nuclear coordinates to obtain the minimum energy structure which will correspond most closely to the actual structure of the molecule.

3.3. Parameterisation

The explicit forms of the different terms in the energy equation will likely not be the same if you look at different force fields but they all depend on parameters which are usually fit to experimental data but may also be fit to high level ab-initio electronic structure calculations or, more recently, even developed from machine-learning models which can take into account huge amounts of reference data.

It is important to understand how a given force field was parameterised so that you can evaluate whether it is likely to be reliable for the problem at hand. The large majority of force fields were parameterised with biomolecules in mind and so are appropriate for e.g. proteins. Even if such a force field contains parameters for the bonds, angles, etc., in a small molecule it is unlikely that accurate results could be obtained with it (usually missing parameters are the main issue in trying to use these force fields for small molecules).

If we take the bond stretching energy \(E_{\text{str}}\) as an example we can look at the form we would expect this to take for two atoms \(A\) and \(B\):

\[E_{\text{str}}(R^{AB} - R_0^{AB}) = k^{AB}(R^{AB} - R_0^{AB})^2\]

Here, \(k^{AB}\) if the bond force constant, which penalises changes in the bond length away from the equilibrium length \(R_0^{AB}\). Both of these are fitted parameters and are not transferable to other situations where either the elements change or the nature of the bond is different. For example, \(k^{CO}\) and \(R_0^{CO}\) would both be completely different if the bond between carbon and oxygen was in a carbonyl or an alcohol group.

Fortunately, several ‘general’ force fields exist that are parameterised for the kind of bonding situations found in small molecules such as drugs and as such can be useful in the context of ligand preparation for subsequent protein-ligand docking calculations. Some of these force fields are shown below:

  • Universal Force Field (UFF)

This is perhaps the most general force field and has a very simple form that allows it to treat most elements in the periodic table. It will not give particularly accurate geometries but can be useful for an initial ‘rough’ optimisation which can then be refined with a more accurate method.
  • General AMBER Force Field (GAFF)

This force field gives good results for small molecules and is intended for producing a force field description that is compatible with the AMBER biomolecular force fields for performing molecular dynamics simulations.
  • CHARMM General Force Field (CGenFF)

In a similar spirit to GAFF, this small molecule force field was generated for compatibility with the CHARMM biomolecular force fields and is aimed at molecular dynamics simulations.
  • Merck Molecular Force Field (MMFF94)

This force field was developed by the Merck company (originally in 1994, hence ‘MMFF94’) to model small drug-like molecules. A variant, MMFF94s, has slightly altered parameters that reproduce the planarity of e.g. amines that is seen in crystal structures (‘s’ stands for ‘solid’).

Whilst other force fields do exist, perhaps the most commonly used of these for structure building and refinement of drug-like molecules is the MMFF94(s) force field as it is well parameterised for these systems and gives good geometries.

This is an important point to bear in mind when preparing ligands for docking since (generally speaking) docking software usually does not perform refinement of bond lengths and angles and instead focusses on rotating torsions in order to fit the ligand to the putative binding site.

Warning

If you provide a ligand to your docking software that has a significantly distorted geometry (e.g. a non-planar aromatic ring or unusually long or short bonds) it will be docked ‘as is’ and will lead to unreliable results. For this reason it is good practice to inspect the ligand structure and/or ensure that it has been subjected to a geometry optimisation before use.