Protein titration state calculations

Nathan Baker

Washington University in St. Louis
Department of Biochemistry and Molecular Biophysics
Center for Computational Biology

David Sept

Washington University in St. Louis
Department of Biomedical Engineering
Center for Computational Biology

Table of Contents

1. Overview
2. Introduction
2.1. Amino acid model pKa values
2.2. pKa values in proteins
2.3. Continuum electrostatics methods for pKa calculations in proteins
3. Application to lysozyme
3.1. Background
3.2. Overview and disclaimer
3.3. Preparing the PDB file
3.4. Setting up the total electrostatic energy calculations
3.5. Setting up the transfer free energy calculations
3.6. Putting it all together
3.7. What's next?

List of Figures

2.1. pKa perturbation free energy cycle schematic
3.1. Active site of HEWL

List of Tables

2.1. Model amino acid pKa values for common titratable groups; data from Nielsen et al (see footnote)

List of Equations

2.1. Acid dissociation free energy
2.2. Transfer free energy

Chapter 1. Overview

This tutorial assumes you've completed the APBS tutorial and are comfortable with PDB2PQR and APBS calculations for those applications. In order to perform the provided examples, you will need APBS 0.5.1, access to PDB2PQR (web or command line), and a molecular graphics program such as VMD or PyMOL.

Why pKa calculations?  Although pKa calculations may not seem like the most routine applications for demonstrating continuum electrostatics concepts, they have important scientific and pedagogic value. From a scientific standpoint, pKa values are important determinants of biomolecular (particularly enzymatic) function and can be used to assess functional activity and identify active sites. From a pedagogical perspective, pKa calculations require all of the important continuum electrostatics concepts and therefore serve as a "self-contained" introduction to solvation and binding energies.

[Note]Note

This tutorial covers Poisson-Boltzmann methods for determining biomolecule pKa values. Other methods for pKa and titration state determination are given in the PDB2PQR examples. Comparison with PDB2PQR results can provide hours of additional entertainment.

Chapter 2. Introduction

This is a very brief introduction to the concepts behind biomolecular pKas and titration states. More information can be obtained from most biochemistry or biophysics textbooks as well as some of the original articles on pKa evaluation [1].

Recall that the acid dissociation constant Ka describes the dissocation of an acid into its components:

by way of the activities

Under “conditions of ideality,[2] activities can be replaced with concentrations to give

You should also recall that chemical equilibrium constants can be related to free energies by

However, chemists found it easier to use base-10 logarithms than natural logarithms for measuring pH, therefore the pKa is defined as

2.1. Amino acid model pKa values

In many calculations, pKa values are assigned based on model values for amino acid side chains to mimic

for an isolated amino acid in solution. Some model pKa values are given in Table 2.1, “Model amino acid pKa values for common titratable groups; data from Nielsen et al (see footnote) ”.

Table 2.1. Model amino acid pKa values for common titratable groups; data from Nielsen et al (see footnote)

Amino acidModel pKa
Arginine13.0
Aspartic acid4.0
Cysteine8.7
C-terminus3.8
Glutamic acid4.4
Histidine6.3
Lysine10.4
N-terminus8.0
Tyrosine9.6


As we'll see in the next section, these model values provide the basis for calculating pKa values in proteins.

2.2.  pKa values in proteins

The role of the model pKa values introduced in the previous sections is to move all the chemical (bond making and breaking) complexity of protonation into the model values. In particular, pKa values in proteins are calculated as pertubations of the model compounds according to the free energy cycles shown below.

The pKa for the amino acid in the context of the protein is given by the free energy cycle:

We are interested in determining the unknown ΔaGHA from the known model ΔaGHA,model and the unknown ΔxferGHA and ΔxferGA- according to:

Equation 2.1. Acid dissociation free energy


In general, the quantities ΔxferGHA and ΔxferGA- are obainted from a computational approach. Nearly any free energy calculation method could be used to obtain these energies according to a scheme where the (de)solvation energies of the charged and uncharged amino acids are calculated according to:

Figure 2.1. pKa perturbation free energy cycle schematic

pKa perturbation free energy cycle schematic


Note that these energies have all assumed the same background state of the protein for the calculations; in other words, the same states are used for other titratable groups in the protein for charged and uncharged amino acid states. We'll discuss the implications of this assumption a bit later in this tutorial.

2.3. Continuum electrostatics methods for pKa calculations in proteins

Although nearly any free energy method could be used to evaluate the energies of transferring the protonated and unprotonated amino acids from solution into the protein environment, continuum electrostatics offer a (usually) satisfying compromise between accuracy and computational efficiency.

The transfer free energies to be calculated, ΔxferGHA and ΔxferGA-, can be determined from Poisson-Boltzmann (PB) energies. In particular, these energies can be calculated as effective “binding energy calculations” similar to those covered in the APBS examples and tutorial:

Equation 2.2. Transfer free energy


where

  • Gprotein with charged X is the electrostatic energy of the protein with group X bound and all charges on X set to their normal values.

  • Gprotein with uncharged X is the electrostatic energy of the protein with group X bound and all charges on X set to zero.

  • Gcharged X in solution is the electrostatic energy of group X in solution with all charges set to their normal values.

Note that, as with binding energies, Equation 2.2, “Transfer free energy” can be evaluated two ways:

Ways to evaluate transfer free energies

  • Directly in a PB equation by computing the difference of total electrostatic energies (including self energies) from PB calculations for each of the states. In order for this to work, all conformations/grid positions/charge states must be the same in each PB calculation.

  • Indirectly through PB calculations of solvation free energies and Coulomb's law calculations (in a dielectric εp) of electrostatic interaction energies. For a sufficiently fine grid, this method is much more stable.

Both of these methods can be combined, via free energy cycles, to give the desired ΔxferGX. However, when care is taken to use the same grids and conformations for all calculations, the direct method using total electrostatic energies is usually the most efficient.

Note that none of the methods discussed above have explicitly allowed for changes in titration state of other groups in the protein during protonation/deprotonation of the acid group of interest. Additionally, none of these methods explicitly provide for conformational changes in the protein coupled to protonation/deprotonoation. As such, we cannot claim to be computing true pKa values with this method. Instead, we are calculating so-called intrinsic pKa values.



[1] The following references provide a good introduction to biomolecular pKa calculations:

[2] Such conditions might happen at extremely low ion and coion concentrations in the absence of other interacting species, etc. In other words, for real biological systems, these conditions are rarely encountered.

Chapter 3. Application to lysozyme

3.1. Background

Hen egg white lysozyme (HEWL) is a very popular system for pKa calculations as it has a number of interesting values for its titratable residues. Early pKa work on this enzyme is presented in Tanford C, Roxby R. Interpretation of protein titration curves. Application to lysozyme. Biochemistry. 11 (11), 2192-8, 1972 which also contains the pKa values used in this lab. More recent pKa calculations and a review of some of the methodology can be found in Nielsen JE, Vriend G. Optimizing the hydrogen-bond network in Poisson-Boltzmann equation-based pk(a) calculations. Proteins-Structure Function and Genetics. 43 (4), 403-12, 2001. Finally, the biological relevance of lysozyme is briefly reviewed at Wikipedia.

HEWL has two active site residues GLU 35 and ASP 52 whose titration states determine the catalytic competency of the enzyme:

Figure 3.1. Active site of HEWL

Active site of HEWL


In particular, the enzyme is only active when ASP 52 is ionized (pKa ≈ 1.2) and GLU 35 is neutral (pKa = 6.3).

Additionally, lysozyme has ASP 66 with a pKa of 6.6 and HIS 15 with a pKa of 5.7.

3.2. Overview and disclaimer

In what follows, we will demonstrate the steps for determining the intrinsic pKa of GLU35. However, this actually doesn't work very well (you'll need to decide why!). Therefore, it is important that you also follow the same procedure for HIS15 and ASP66 to get better examples of working continuum solvent pKa calculations.

3.3. Preparing the PDB file

Download the PDB entry 2LZT from the PDB; save it as 2LZT-GLU35.pdb. If you have time, you should also visit the PDBSum analysis page as well for additional information about the structure.

[Warning]Problems with explicit water!

It is very important that you remove all explicit water from the PDB file before proceeding. (Why?)

We're going to need to generate a protonated form of GLU 35 to perform our pKa calculations. We will do this with the PDB2PQR web service. PDB2PQR “cleans up” PDB files for electrostatics calculations by performing a number of operations described on the PDB2PQR homepage.

Unless otherwise directed, PDB2PQR adds hydrogens to residues based on model pKa values. Therefore, we will need to specify the titration state of GLU 35 for our pKa calculation by changing the residue name from GLU to GLH using your favorite text editor. Save the result as 2LZT-GLH35.pdb

We're now ready to run PDB2PQR to generate protonated versions of our PDB files. Use the command line version of PDB2PQR or one of the web servers listed on the PDB2PQR homepage to generate protonoated PQR files for for 2LZT-GLU35.pdb and 2LZT-GLH35.pdb. Name your results 2LZT-GLU35.pqr and 2LZT-GLH35.pqr, respectively. Although it is always important to test sensitivity to various force fields, I'd recommend starting with PARSE.

[Note]Note

You can use PDB2PQR to assign titration states with PROPKA but don't do it for the above steps since we need to set the titration states explicit for our calculations.

Recall that we're also going to need the isolated residue for our electrostatics calculations of intrinsic pKa. Use your favorite text editor to extract the entire GLU 35 or GLH 35 residue from 2LZT-GLU35.pqr and 2LZT-GLH35.pqr. Save the results as GLU35.pqr and GLH35.pqr, respectively.

Finally, we'll also need to perform electrostatics calculations on HEWL with the uncharged sidechains. Use your favorite text editor to zero out the charges in 2LZT-GLU35.pqr and 2LZT-GLH35.pqr to create 2LZT-noGLU35.pqr and 2LZT-noGLH35.pqr. This can be done by setting the second-to-last column in the PQR file to zero; e.g.

ATOM    534  N   GLU    35       6.456  16.408  22.487 -0.5163 1.8240
ATOM    535  CA  GLU    35       6.354  15.205  23.349  0.0397 1.9080
ATOM    536  C   GLU    35       6.889  13.941  22.695  0.5366 1.9080
ATOM    537  O   GLU    35       7.705  13.192  23.277 -0.5819 1.6612
ATOM    538  CB  GLU    35       4.906  14.973  23.796  0.0560 1.9080
ATOM    539  CG  GLU    35       4.476  15.932  24.948  0.0136 1.9080
ATOM    540  CD  GLU    35       5.171  15.736  26.255  0.8054 1.9080
ATOM    541  OE1 GLU    35       5.844  14.786  26.570 -0.8188 1.6612
ATOM    542  OE2 GLU    35       5.038  16.682  27.056 -0.8188 1.6612
ATOM    543  H   GLU    35       5.671  16.799  22.056  0.2936 0.6000
ATOM    544  HA  GLU    35       6.856  15.382  24.215  0.1105 1.3870
ATOM    545  HB2 GLU    35       4.318  15.158  23.042 -0.0173 1.4870
ATOM    546  HB3 GLU    35       4.832  14.066  24.142 -0.0173 1.4870
ATOM    547  HG2 GLU    35       4.654  16.857  24.629 -0.0425 1.4870
ATOM    548  HG3 GLU    35       3.500  15.796  25.084 -0.0425 1.4870
				

might become

ATOM    534  N   GLU    35       6.456  16.408  22.487  0.0000 1.8240
ATOM    535  CA  GLU    35       6.354  15.205  23.349  0.0000 1.9080
ATOM    536  C   GLU    35       6.889  13.941  22.695  0.0000 1.9080
ATOM    537  O   GLU    35       7.705  13.192  23.277  0.0000 1.6612
ATOM    538  CB  GLU    35       4.906  14.973  23.796  0.0000 1.9080
ATOM    539  CG  GLU    35       4.476  15.932  24.948  0.0000 1.9080
ATOM    540  CD  GLU    35       5.171  15.736  26.255  0.0000 1.9080
ATOM    541  OE1 GLU    35       5.844  14.786  26.570  0.0000 1.6612
ATOM    542  OE2 GLU    35       5.038  16.682  27.056  0.0000 1.6612
ATOM    543  H   GLU    35       5.671  16.799  22.056  0.0000 0.6000
ATOM    544  HA  GLU    35       6.856  15.382  24.215  0.0000 1.3870
ATOM    545  HB2 GLU    35       4.318  15.158  23.042  0.0000 1.4870
ATOM    546  HB3 GLU    35       4.832  14.066  24.142  0.0000 1.4870
ATOM    547  HG2 GLU    35       4.654  16.857  24.629  0.0000 1.4870
ATOM    548  HG3 GLU    35       3.500  15.796  25.084  0.0000 1.4870
				

3.4. Setting up the total electrostatic energy calculations

We will be using focusing calculations to calculate the electrostatic potential and free energies for the systems of interest.

[Warning]Warning

In what follows, we are evaluating total electrostatic free energies -- e.g., energies which contain charge self-interaction terms. We will cancel these self-interaction terms in subsequent steps when we calculate solvation or transfer free energies. Therefore, it is very important that you use the same grid parameters (grid centers, dimensions, spacings, etc.) for every calculation.

Here is a template input that we will use for each of the solvation energy calculations (download it here):

read
	mol pqr compound.pqr # This is the compound for which we will calculate solvation energies
	mol pqr ref.pqr      # This is a compound used as a reference for grid centering
end

elec name inhom          
	mg-auto              # Focusing calculations
	dime 129 129 129     # This is a good grid spacing for this system
	cglen 52.0 66.0 79.0 # These are reasonable coarse grid settings for this system (PDB2PQR-recommended)
	fglen 51.0 59.0 67.0 # These are reasonable fine grid settings for this system (PDB2PQR-recommended)
	cgcent mol 2         # Center the grid on the reference molecule
	fgcent mol 2         # Center the grid on the reference molecule
	mol 1
	lpbe
	bcfl sdh
	pdie 20.00
	sdie 78.54
	srfm smol
	sdens 40.0
	chgm spl2
	srad 1.40
	swin 0.30
	temp 298.15
	calcenergy total
	calcforce no
end

# Print the final energy 
print energy inhom end

quit
				

There are a number of aspects to this input file which are worth noting:

  • In general, compound.pqr will change for each calculation but ref.pqr will not. Choose one molecule to be ref.pqr (2LZT-GLU35.pqr is a good choice) and use it in every calculation.

  • We are using a solute dielectric constant (εp) of 20 (see pdie). This is a common choice for pKa since [3] it is thought to implicitly represent internal relaxation and rearrangement of the solute [4].

  • We are using a molecular surface (srfm smol) with a reasonably high density of surface discretization points (sdens 40.0). pKa and other electrostatics results can be very sensitive to surface choice.

You now have enough information to calculate total electrostatic energies for all of the relevant molecules so far: 2LZT-GLU35.pqr, 2LZT-GLH35.pqr, 2LZT-noGLU35.pqr, 2LZT-noGLH35.pqr, GLU35.pqr, and GLH35.pqr. You should be able to construct APBS input files for each of these systems by modifying the template above. Once these input files are constructed, you can run the PB calculation by

$ apbs foo.in | tee foo.out
				

where foo.in is the input file of interest and the output is saved in foo.out.

3.5. Setting up the transfer free energy calculations

Recall from our earlier discussion that the transfer free energies can be evaluated in two ways. It is up to you to decide which method to use; both should give comparable results for a sufficiently fine grid. As mentioned above, the direct subtraction of total electrostatic energies is usually the most stable route, assuming identical grids and solute conformations are used for all calculations.

Any differences in results between these two transfer free energy methods are typically due to a lack of convergence in the calculations and can be resolved by decreasing the grid spacing (e.g., increasing the number of grid points).

If you choose to decompose the transfer free energies into solvation and Coulombic energy changes, then you will need to perform two additional electrostatics calculations beyond the APBS calculations outlined in the previous section:

  • First, you will need to re-run all of your APBS calculations with a homogeneous dielectric εs = εp to calculate solvation energies. This can be accomplished by appropriate modification of the input files from the previous section.

  • Second, you will need to calculate Coulombic interaction energies. The APBS program coulomb will evaluate the Coulomb's law electrostatic energy of a set of charges in a vacuum. The program's usage is

    $ coulomb foo.pqr
    							

    where foo.pqr is the particular PQR file you are interested in. Note that this program can also be run with no arguments to obtain an expanded list of options. Don't forget to scale by εp!

3.6. Putting it all together

At this point, you should have everything you need to calculate the intrinsic pKa of interest. However, if you get stuck, I've included some example files here that might be helpful:

GLU35.pqr, GLH35.pqr, 2LZT-GLH35.pqr, 2LZT-GLU35.pqr, 2LZT-noGLH35.pqr, 2LZT-noGLU35.pqr

PQR files described above

2LZT-noGLH35.in, 2LZT-GLH35.in, GLH35.in

APBS input files needed to calculate the transfer free energy for GLH directly using total electrostatic energies.

2LZT-noGLH35-vac.in, 2LZT-GLH35-vac.in, GLH35-vac.in

Additional APBS input files needed to calculate the transfer free energy for GLH using polar solvation energies.

2LZT-noGLU35.in, 2LZT-GLU35.in, GLU35.in

APBS input files needed to calculate the transfer free energy for GLU directly using total electrostatic energies.

2LZT-noGLU35-vac.in, 2LZT-GLU35-vac.in, GLU35-vac.in

Additional APBS input files needed to calculate the transfer free energy for GLU using polar solvation energies.

run-apbs.sh

Bash script to run all the APBS input files listed above. Assumes the apbs binary is in your PATH.

run-coul.sh

Bash script to run supporting Coulombic energy calculations for solvation energy-based evaluation of the transfer free energies. Assumes the APBS coulomb binary (usually found in apbs/tools/manip) is in your PATH.

process.sh

Bash script to process APBS output and calculate pKa shifts directly using total electrostatic energies. Assumes results were obtained by running run-apbs.sh, linked above.

3.7. What's next?

Based on this brief introduction, you should be in a good position to go back and try to evaluate intrinsic pKas for HIS 15 and ASP 66. How well do your results for those residues agree with experiment? What's different with those residues?

So far, we've only examined intrinsic pKas in this lab and have ignored coupling between titratable groups. Jens Nielsen has developed a very nice software package called pKaTool which allows you to explore coupling between titratable sites and its impact on titration events in a protein system. He has provided a tutorial (PDF) which you can use to explore coupled titration states and use to familiarize yourself with pKaTool.



[3] Start vigorous waving of hands...

[4] Stop vigorous waving of hands...