There has been a lot of buzz around the GSK Protein Kinase Inhibitor Set (PKIS).
used with permission from http://kinase.com/human/kinome/
As noted in Derek Lowe’s In the Pipeline blog, “The company has made 367 compounds available to any academic investigator working in the kinase field, as long as they make their results publicly available… So if you're in academia, and interested in kinase pathways, you absolutely need to take a look at this compound set.” As Lowe noted, the dataset has been made available via the ChEMBL web site. Some of the comments on the blog noted that getting the data they wanted from ChEMBL was tricky given that the Kinase SARfari data base and the new ChEMBL database are out of sync.
In the spirit of improving access to important data, we have gathered the PKIS data that ChEMBL has kindly made available, and processed it so it could be accessed here at the Collaborative Drug Discovery (CDD) site (much like we have done previously for the Kinase SARfari database). Similarly in the spirit of the GSK and NCATS efforts (doi:10.1371/journal.pone.0057888) and CDD’s public data policy, CDD freely makes available to the public any SAR data that researchers wish to share and even help format data to be useful as best we can.
The transfer to CDD makes the data available in a more “med-chemist” friendly manner. We also did some tidying up of the data set. For example there are actually only 364 compounds (some duplicates were due to salt forms or alternate names of the same molecule). We also tried to normalize target names where possible (for example, the kinases IKKA, IKKB and IKKE were called IKK-alpha, IKK-beta, IKK-epsilon for the dataset from UNC). Not that we’re perfect… we appreciate any corrections to our dataset as well!
To summarize, there are 364 compounds that have been tested against 225 targets at 0.1 and 1 mM. For those of you familiar with CDD, the data has been made available via two Projects. In the first (MultiProtocol), each target is posted as separate protocol. So you can search by compounds structure (or substructure) as well as target name(s). In the orthogonal project (OneProtocol) there is only one protocol, but the 226 targets are listed in one readout (for our clients who prefer that format).
Either way, this great dataset is now available to the public in an alternate form. We hope you find it helpful.
This blog is authored by members of the CDD Vault community. CDD Vault is a hosted drug discovery informatics platform that securely manages both private and external biological and chemical data. It provides core functionality including chemical registration, structure activity relationship, chemical inventory, and electronic lab notebook capabilities!
CDD Vault: Drug Discovery Informatics your whole project team will embrace!