Computational Study of Protien-ligand Interactions in Cancer: Literature Review


This literature review is based of the chemistry research project Computational study of protein-ligand interactions in cancer


This literature review will be specifically focusing on the area of computational chemistry, alternatively known as theoretical chemistry or molecular modelling.1 Computational chemistry has been defined as the branch, or field, of chemistry that uses computer simulations to assist in solving chemical problems.2 Computational chemistry has its roots in the development of quantum mechanics, in the sense that the foundation was laid with the development of quantum mechanics in the early part of the twentieth century.1 The field we know today as “computational chemistry” is a product of last thirty five1 years of technological advancement, in today’s digital computer and technology centric society.

An important achievement in the field would be Schrödinger’s wave equation and Dirac[A] who said that, using quantum chemistry calculations, it is possible to calculate the energy of molecules.4 Although this method was very time consuming, as it was developed before modern computers. Now that we have computers fast enough, we can do these calculations for small molecules, however not entire proteins.

The computer being the ‘instrument’ of the computational chemist, chemists in the field have taken advantage of the technological advancement to develop and apply new theoretical methods at an astounding pace.1 The field uses theoretical chemistry methods, which are incorporated into efficient computer programs, that calculate the structure and properties of molecules solids. However, the computation chemistry is used more frequently to make chemical predictions, such as the predication of new drug targets or new reactions, which are investigated later experimentally.2 This literature review will be exploring the computational prediction method of docking.


Docking is defined as a computational method that attempts to predict noncovalent binding of receptor (such as proteins, carbohydrates, nucleic acids, or lipids[B]), and a small molecule (ligand) efficiently. This starts with their unbound structures, structures obtained from molecular dynamics simulations[C], or homology modelling.6Simply, molecular docking is used for computational structures that attempts to predict the structure of the intermolecular complex formed between two or more molecules7: a receptor and a ligand8, or in this review: a protein and a ligand.

Protein–ligand docking aims to predict and rank the structures from the association between a given ligand and a target protein of a known 3D structure. 7,8 Protein–ligand docking occupies a distinct place in the field of docking, due to its applications in medicine.7

After its development in the 1980’s7, docking remains a field of vital research, due to its ability to screen virtual libraries9 of drug-like molecules, in order to obtain leads for further drug development6, drug design8, in addition to being a primary component in drug discovery programs7,8, and protein-function prediction.9

The first stage of docking is pose generation. Pose generation is the prediction of the position, orientation, and conformation of a molecule as docked to the target’s binding site[D]. The second stage, scoring, usually consists in estimating how strongly the docked pose of a ligand binds to the target (the strength is calculated by measures of binding affinity or free energy of binding).9 Prediction of the binding energy is performed by evaluating the most important physical-chemical singularities involved in ligand-receptor binding, including intermolecular interactions, desolvation[E] and entropic effects. Ferreira et al. summarised that, the greater the number of physical-chemical parameters evaluated, the higher the accuracy of the scoring function.11However this statement isn’t necessarily true. Gabel et al. states that machine-learning based scoring functions are insensitive to docking poses, and just describe atomic element counts.12

While there are many comparatively strong and accurate algorithms for pose generation currently available, the inaccuracies in the prediction of binding affinity by scoring functions, continue to be the main restricting factor for the reliability of docking. Despite the thorough research over more than two decades9, the exact prediction of the binding affinities for larger sets of protein-ligand complexes, is still one of the most significant problems in computational chemistry.


The earliest reported docking methods were based on the ‘lock-and-key’ assumption proposed by Fischer13, stating that both the ligand and the receptor can be treated as rigid bodies and their tendency to react with another chemical species to form a chemical compound[F], is directly proportionate to a geometric fit between their shapes.14 This really early method is very limited, due to how limited computers were at the time.  Zsoldos et al. states that, say for an average sized ligand with 6 rotatable bonds, would have a total number of bonds[G] poses at 1020. This number alone is so huge, that a “brute force” evaluation of all said posses with a fast scoring function, processing 2000 poses, would take three billion years on a single CPU. Additionally, even using the largest current super computer available in 2007, at the time the article was written, it would still take twenty thousand years to dock a single ligand.

Realistically, treating the ligand and protein as rigid is an impractical assumption, as most of the time said assumption would be incorrect. However, searching every possible combination of the both the ligand and the protein would take, as stated before, thousands or millions­­­­­ of years to compute.15


Over the last two decades, more than sixty14 different docking tools and programs have been developed for academic[H] and commercial[I] use, such as; AutoDock, AutoDock Vina, Glide, GOLD, LeDock, LigandFit, MOE Dock, rDock, Surflex-Dock, UCSF DOCK and many more[J]14,16

Wang conducted a evaulation study that tested the ten forementioned docking programs, against the two most components for docking program: the sampling algorithm[K] and scoring function[L].16 A comprehensive understanding of the advantages and limitations of each docking program is important to conduct more sound research. Wang proposed that as most comercial docking programs are quite expensive. It was expected that these more expensive, commercial  pograms may show better perfomance than the acadmic programs, due to stronger funding. Wang evlauted the capability of each docking program to predict the ligand binding poses (sampling power), and rank the binding affinities (scoring power) for each program .16 The results of Wang’s study summarised that commercial docking programs GOLD and LeDock had the best sampling power[M] and the academic docking program AutoDock Vina had the best scoring power[N]. This suggests that the commercial programs did not show the expected better performance than the academic ones.16 Both commercial or academic programs both performed well in both sampling power and scoring power, regardless of the funding spent on said programs, so it is up to the user’s preference for which type of docking program to use.  As this research is being performed in an academic setting, and the researchers preferred an academic docking program, AutoDock Vina was selected.


AutoDock Vina is a sophisticated, widely used academic docking program that employs a quasi-newtonian optimisation method[O].16 AutoDock Vina (onwards called Vina) was created in 2010 to improve accuracy and performance of its predecessor AutoDock 4. In comparison to AutoDock, Vina gives better results with its rough estimations17 and predication accuracy6.

Vina’s scoring function is a combination of a knowledge-based and empirical approach18. Simply the scoring function is based off knowledge, observation, or experience approach rather than a theory or logic approach.

Vina is fitted to the strength of the potentials to known binding coefficients (Ki or Kd), also including a dispersion term, a repulsion term, hydrophobic term and hydrogen-bond interaction term, and tors term.19 This scoring function is displayed in Equation 1.

∆Gbinding = ∆Ggauss +∆Grepulsion +∆Ghbond + ∆Ghydrophobic + ∆Gtors

Equation 1 – Scoring function of AutoDock Vina18



is the binding term, can be represented by Ki or Kd


∆Ggaussis the as dispersion term (i.e Gaussian steric interaction term)


is the repulsion term, representing the square of the distance if closer than a threshold value


is the hydrogen-bond interaction term (i.e. ramp function)


is the hydrophobic interaction term (i.e. ramp function)


is the tors term representing ‘proportional to the number of rotatable bonds’


AutoDock Vina was chosen as the preferred program as this research is being performed in an academic setting, and due to it being a free, open-sourced, academic docking program.


Cluster of Differentiation 3820 (CD38), also known as cyclic ADP ribose hydrolase21, shown in Its strucutre is demonstrated in Figure 1

Equation 2 – Structure of the CD38 protein (the ribbons), with the 3ROK ligand, developed in Chimera22

CD38 is a signaling enzyme responsible for catalyzing the synthesis of cyclic ADP ribose (cADPR) and nicotinic acid adenine dinucleotide phosphate; both are universal Ca2+ messenger molecules. 23,24 CD38’s mechanism is demonstrated in Figure 2.

Equation 3 – reaction of CD38, H2O + NAD+ = ADP-D-ribose + H+ nicotinamide25

The enzyme is also a type II transmembrane glycoprotein that was initially identified as a surface antigen in lymphocytes.21,26,27 The enzyme is found on the surface of many immune cells including CD4+, CD8+, B lymphocytes and natural killer cells, and functions in cell adhesion, signal transduction and calcium signaling.21 In humans, the CD38 protein is encoded by the CD38 gene which is located on band 15 of chromosome 4.28

CD38 was identified by Reinherz[P] and Schlossman[Q] while conducting detailed analysis of the cell surface, by means of monoclonal antibodies, as part of their pioneer search for molecules acting as T-cell receptors and transducers of signals elicited by the encounter with specific antigens. 20,29

CD38 is used as a prognostic marker for patients with lukeimias and myelomas20,29, being directly invovled in the pathogenesis and result of human immunodeficiency virus (HIV) infection29 and chronic lymphocytic leukemia.21,29 The gene controls insulin release29 and secretion23, susceptibility to bacterial infection23, and the development of multiple conditions including: aging, asthma, obesity, heart disease, inflammation20, and diabetes20,29. Additionally, the loss of CD38 function is associated with impaired immune responses, metabolic disturbances, and affecting the social behavior of mice29 through modulating neuronal oxytocin secretion23.

CD38 was chosen for the research project emmense due to its importance in various cancer studies, medcine related research, and its ability for drug discovery and delivery.



3ROK is Crystal structure of human CD38 in complex with compound CZ-27. It was selected due to its low resolution, low mutation count, and high sequence identity.




  •                    (1) Cramer, C. J.: Preface to the First Edition. In Essentials of Computational Chemistry: Theories and Models; 2 ed.; Wiley, 2004; pp xv-vii.
  •                    (2) Khilari, S.; Kadam, S. Knowledge management in computational chemistry: a literature review. International Journal of Latest Trends in Engineering and Technology 20178, 156-162.
  •                    (3) Mukunda, N.: Images of Twentieth Century Physics; Jawaharlal Nehru Centre for Advanced Scientific Research: Bangalore, 2000. pp. 9.
  •                    (4) Dirac Paul Adrien, M.; Fowler Ralph, H. Quantum mechanics of many-electron systems. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character 1929123, 714-733.
  •                    (5) Epa, V.; Winkler, D.; Tran, L.: Chapter 5 – Computational Approaches. In Adverse Effects of Engineered Nanomaterials; Fadeel, B., Pietroiusti, A., Shvedova, A. A., Eds.; Academic Press: Boston, 2012; pp 85-96.
  •                    (6) Trott, O.; Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 201031, 455-461.
  •                    (7) Sousa, S. F.; Fernandes, P. A.; Ramos, M. J. Protein–ligand docking: Current status and future challenges. Proteins: Structure, Function, and Bioinformatics 200665, 15-26.
  •                    (8) Halperin, I.; Ma, B.; Wolfson, H.; Nussinov, R. Principles of docking: An overview of search algorithms and a guide to scoring functions. 200247, 409-443.
  •                    (9) Ain, Q. U.; Aleksandrova, A.; Roessler, F. D.; Ballester, P. J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdisciplinary Reviews: Computational Molecular Science 20155, 405-424.
  •                    (10) Binding Sites MeSH Descriptor Data 2019.
  •                    (11) Ferreira, L. G.; Santos, R. N. d.; Oliva, G.; Andricopulo, A. D. Molecular Docking and Structure-Based Drug Design Strategies Molecules 201520, 13384-13421.
  •                    (12) Gabel, J.; Desaphy, J.; Rognan, D. Beware of Machine Learning-Based Scoring Functions—On the Danger of Developing Black Boxes. Journal of Chemical Information and Modeling 201454, 2807-2815.
  •                    (13) Fischer, D.; Norel, R.; Wolfson, H.; Nussinov, R. Surface motifs by a computer vision technique: Searches, detection, and implications for protein–ligand recognition. Proteins: Structure, Function, and Bioinformatics 199316, 278-292.
  •                    (14) Pagadala, N. S.; Syed, K.; Tuszynski, J. Software for molecular docking: a review. Biophys Rev 20179, 91-102.
  •                    (15) Zsoldos, Z.; Reid, D.; Simon, A.; Sadjad, S. B.; Johnson, A. P. eHiTS: A new fast, exhaustive flexible ligand docking system. Journal of Molecular Graphics and Modelling 200726, 198-212.
  •                    (16) Wang, Z.; Sun, H.; Yao, X.; Li, D.; Xu, L.; Li, Y.; Tian, S.; Hou, T. Comprehensive evaluation of ten docking programs on a diverse set of protein–ligand complexes: the prediction accuracy of sampling power and scoring power. Physical Chemistry Chemical Physics 201618, 12964-12975.
  •                    (17) Tanchuk, V. Y.; Tanin, V. O.; Vovk, A. I.; Poda, G. A New, Improved Hybrid Scoring Function for Molecular Docking and Scoring Based on AutoDock and AutoDock Vina. Chemical Biology & Drug Design 201687, 618-625.
  •                    (18) Department of Chemistry at Washington University. Lecture 21 – What is Docking? [Washington University, 2014.
  •                    (19) Gaillard, T. Evaluation of AutoDock and AutoDock Vina on the CASF-2013 Benchmark. Journal of Chemical Information and Modeling 201858, 1697-1706.
  •                    (20) Chini, E. N.; Chini, C. C. S.; Espindola Netto, J. M.; de Oliveira, G. C.; van Schooten, W. The Pharmacology of CD38/NADase: An Emerging Target in Cancer and Diseases of Aging. Trends Pharmacol Sci 201839, 424-436.
  •                    (21) National Center for Biotechnology Information. CD38 CD38 molecule [ Homo sapiens (human) ]. United States of America Patent.
  •                    (22) Resource for Biocomputing Visualization and Informatics (RBVI): UCSF Chimera (Computer Program). chimera-1.13.1 ed.; University of California, San Francisco: California, San Francisco, 2018.
  •                    (23) Kwong, A. K. Y.; Chen, Z.; Zhang, H.; Leung, F. P.; Lam, C. M. C.; Ting, K. Y.; Zhang, L.; Hao, Q.; Zhang, L.-H.; Lee, H. C. Catalysis-Based Inhibitors of the Calcium Signaling Function of CD38. Biochemistry 201251, 555-564.
  •                    (24) Moreau, C.; Liu, Q.; Graeff, R.; Wagner, G. K.; Thomas, M. P.; Swarbrick, J. M.; Shuto, S.; Lee, H. C.; Hao, Q.; Potter, B. V. L. CD38 Structure-Based Inhibitor Design Using the N1-Cyclic Inosine 5′-Diphosphate Ribose Template. PLOS ONE 20138, e66247.
  •                    (25) UniProtKB – P28907 (CD38_HUMAN).
  •                    (26) Sinclair, D. A.; Price, N. L.; Chini, E.; Clardy, J. C.; Cao, S. Small molecule cd38 inhibitors and methods of using same [PATENT].
  •                    (27) Orciani, M.; Trubiani, O.; Guarnieri, S.; Ferrero, E.; Di Primio, R. CD38 is constitutively expressed in the nucleus of human hematopoietic cells. Journal of Cellular Biochemistry 2008105, 905-912.
  •                    (28) Nata, K.; Takamura, T.; Karasawa, T.; Kumagai, T.; Hashioka, W.; Tohgo, A.; Yonekura, H.; Takasawa, S.; Nakamura, S.; Okamoto, H. Human gene encoding CD38 (ADP-ribosyl cyclase/cyclic ADP-ribose hydrolase): organization, nucleotide sequence and alternative splicing. Gene 1997186, 285-292.
  •                    (29) Malavasi, F.; Deaglio, S.; Funaro, A.; Ferrero, E.; Horenstein, A. L.; Ortolan, E.; Vaisitti, T.; Aydin, S. Evolution and Function of the ADP Ribosyl Cyclase/CD38 Gene Family in Physiology and Pathology. Physiological Reviews 200888, 841-886.

[A] An English theoretical physicist who is regarded as one of the most significant physicists of the 20th century              (3)              Mukunda, N.: Images of Twentieth Century Physics; Jawaharlal Nehru Centre for Advanced Scientific Research: Bangalore, 2000. pp. 9.

[B] Also referred to as macromolecules

[C] Molecular dynamics simulations are a physics-based modelling method that provide detailed information on the fluctuations and conformational changes of atoms and molecules in materials                (5)              Epa, V.; Winkler, D.; Tran, L.: Chapter 5 – Computational Approaches. In Adverse Effects of Engineered Nanomaterials; Fadeel, B., Pietroiusti, A., Shvedova, A. A., Eds.; Academic Press: Boston, 2012; pp 85-96.

[D] The binding site is a region on a macromolecule (e.g a protein) that binds to another molecule with specificity (U.S. National Library of Medicine, 2019)              (10)              Binding Sites MeSH Descriptor Data 2019.

[E] The removal of solvent from a material in solution

[F] Known as their affinity

[G] Zsoldos et al. (2007) demonstrates the full calculations for the total number of poses in Section 2.1 of the paper if interested, however here is a summary of said calculations for context: total number of poses = translations along three axes × orientations about three axes × dihedral angle sampling

[H] AutoDock, AutoDock Vina, LeDock, rDock, and UCSF DOCK being used for academic use (Wang et al., 2016)

[I] Glide, GOLD, LigandFit, MOE Dock, and Surflex-Dock being used for commercial use (Wang et al., 2016)

[J] Some other programs includes: Cdocker, DOCK, FlexX, FRED, GOLD, ICM, or MCDock (Pagadala et al, 2017)

[K] Is also referred to as sampling power, or accuracies of binding pose prediction (Wang et al., 2016)

[L] Is also referred to as scoring power or accuracies of binding affinity estimation (Wang et al., 2016)

[M] The results being: GOLD: 59.8% accuracy for top scored poses; LeDock: 80.8% accuracy for best poses – “the success rate for the top scored poses is about from 40% to 60%…best poses is about from 60% to 80%” (Wang et al., 2016)

[N] The results being:  rp/rs of 0.564/0.580 top scored poses and 0.569/0.584 for best poses (Wang et al., 2016)


[P] Professor of Medicine at Harvard Medical School (Boston, Massachusetts, U.S.A)

[Q] Baruj Benacerraf Professor of Medicine at Beth Israel Deaconess Medical Center (Boston, Massachusetts, U.S.A)

Cite This Work

To export a reference to this article please select a referencing stye below:

Related Services

View all 

Female student working on a laptop

Literature Review Service
From £124
Male student reading book

Dissertation Writing Service
From £136
Female student reading and using laptop to study

Leave a Reply