Supercomputers used to go from genome to lead in under a month
Summary
Genomics and proteomics comprise a fundamental area of bioinformatics however exploiting the wealth of information gained using such disciplines remains a rate limiting step in the drug discovery process. Today’s edition of DailyUpdates (Jan 25th 2005) describes ground breaking work that has allowed researchers to identify candidate therapeutics for the treatment of SARs virus infection in under a month using the virus genome as a starting pointThe role of bioinformatics/informatics in the drug development sector was evaluated in our 2003 feature Bioinformatics/Informatics - Reducing Drug Discovery and Development Costs. The forecasted value for the worldwide informatics market in the life science sector is between $1.7 and $7 billion by 2007.
Genomics and proteomics comprise a fundamental area of bioinformatics (Addressing Pharma’s R&D Productivity Crisis) however exploiting the wealth of information gained using such disciplines remains a rate limiting step in the drug discovery process. A good example of this problem and a novel approach to its resolution has recently been reported by Yuan-Ping Pang and colleagues Andrea J. Dooley and Jewn-Giew Park at the Mayo Clinic and Nice Shindo and Barbara Taggart at the Southern Research Institute in Alabama .
In the February edition of the journal Bioorganic & Medicinal Chemistry Letters (abstract) Yuan-Ping Pang and colleagues present how they have developed a genome-to-drug-lead approach to identify a candidate treatment of infection with the coronavirus (SARS-CoV) associated virus
SARS-CoV infection results in the severe acute respiratory syndrome (SARS), an emerging infectious disease with severe mortality. The sequencing of the SARS-CoV genome just 31 days after the outbreak demonstrated just how far the scientific community has progressed in the use of genomics. The failure of this information to be harnessed to develop therapeutic candidates has equally highlighted that exploiting genomic information is sub-optimal. One of the reasons for this bottle-neck is the reliance on expression of encoded proteins and screening of candidate therapeutics that interact with these proteins. Screening can involve in vitro approaches or in silico docking approaches although the latter traditionally requires determination of the crystal structure of the target protein.
The SARS-CoV genome encodes a chymotrypsin-like cysteine proteinase (CCP) that proteolytically processes polypeptides required for viral replication and transcription, representing an ideal drug target for treating SARS. Although small-molecule inhibitors of CCP have been identified, the development of these inhibitors as clinical drugs for treating SARS has not yet been achieved. New inhibitor leads of CCP are thus required.
Yuan-Ping Pang and colleagues were able to model the flexible loop of CCP in its bound state thus predicting its structure obviating the need to resolve its crystal structure. In essence this modeling involved the prediction of 100,000 possible structures for the flexible loop of CCP from its genomic sequence by performing multiple molecular dynamic simulations of the interaction of CCP with its substrate. An average of the 100,000 predicted structures was then used as a target to screen for drug candidates. The challenge behind such modeling is the required computing power; the present study was conducted using a dedicated 1.1 terascale system (terascale refers to computational power beyond a "teraflop" - a trillion calculations per second). With such power the simulations took 20 days.
Vitual resolution of the structure of CCP was validated as an approach by using it as a drug target in virtual screening for small-molecule inhibitors, using a computer docking program, EUDOC. Virtual screening of 361413 small molecules identified 3958 candidate hits; 12 of these were selected for cell-based inhibition assays. Five out of 12 tested were active. One compound, CS11, inhibited the human SARS-CoV Toronto-2 strain with an EC50 of 23 mM; CS11 was not toxic to normal cells. The use of EUDOC allowed the prediction of strategies for lead optimization.
These results demonstrate that, given the SARS-CoV genome only, one can identify a small molecule that is able to penetrate cells and rescue them from viral infection, leapfrogging the experimentally determined structures of CCP. In this study, the 3D model of CCP was predicted from the genome in just 20 days and an excellent inhibitor lead for CCP was identified by virtual screening in 9 days. The potential of such an approach is obviously staggering however before companies are able to exploit such power they will either have to purchase terascale computing systems (currently costing in excess of $50 million) or form collaborations with institutions that already have access to such systems (Yuan-Ping Pang built his own terascale computer systems with less than $1 million).