November 3, 2023
CDD Vault Update (November 2023): Importing Text in Data Files, and APIs for Structure Images, QC Report Details and Ambiguous Structures
Handling Non-Numeric Values in a Data Import File
Often, data files which are being imported into CDD Vault will contain certain text values in columns that are being mapped to numeric fields/readouts. Examples include “ND” for “Not Determined” or “N/A” for “Not Applicable”.
The CDD Vault Import Data Wizard will now report these rows as Suspicious Events in the QC Report. At which point, the user can choose 1 of 2 options:
- ACCEPT - the text values being mapped to a numeric destination will be “blanked out” and the rest of the data on these rows will be successfully imported
- REJECT - none of the data on these rows will be imported
As a quick example, importing this data file and mapping the Inhibition column to a Protocol numeric readout definition …
… will result in a Suspicious Event and any row containing textual data will be REJECTED by default.
The default REJECT selection matches the old behavior, and no data from the affected rows will be imported.
API Endpoint for Structure Images
The GET Molecules API call has a new /image
parameter that will retrieve the image of the registered Molecule.
GET …vaults/<vault_id>/molecules/<molecule_id>/image
- runs as an async API call
- returns an Export ID
GET …vaults/<vault_id/exports/<export_id>
- retrieves the image of the structure
New Parameter for GET Slurps API Endpoint
There is a new show_events
parameter on the GET Slurps API endpoint that will show any row that generated a Suspicious Event or an Error. The details of the event are also included.
Once you've done the POST Slurps call, the next step is to use GET Slurps to check the status of the import. Including the new show_events
parameter will add the details of any Suspicious Events and Errors to the JSON that is returned.
GET …vaults/<vault_id/slurps/<slurp_id>
With JSON like this:
{"show_events":true}
Now returns JSON that includes suspicious/ambiguous events and error, something like this:
{
"id": 1736694,
"class": "slurp",
"created_at": "2023-10-31T19:24:07.000Z",
"modified_at": "2023-10-31T19:24:09.000Z",
"state": "rejected",
"api_url": "...vaults/<vault_id>/slurps/<slurp_id>",
"total_records": 1.0,
"records_processed": 1.0,
"records_committed": 0.0,
"ambiguous_events_count": 0,
"suspicious_events_count": 0,
"import_errors": [
{
"class": "batch identifier not found",
"message": "Record rejected because no batch with External Identifier 'DoesNotExist' exists in your database."
}
],
"import_errors_count": 1
}
Easily Register Ambiguous Structures
Use the new duplicate_resolution
parameter to register ambiguous OR structures (structures drawn with the OR enhanced stereo label). This provides a way to register a new Molecule (versus a new Batch of an existing Molecule) via the API.
For a majority of CDD Vaults, which use the chemical registration system, use the duplicate_resolution
with the POST Batch
API call to register a new molecule.
By default (no parameter is used), a new Batch of the existing record is created. If more than one Molecule exists, and no parameter is used, an error is returned and no new Batch nor Molecule will be created.
Specify one of the following options when using this parameter:
- first
"duplicate_resolution":"first"
- results in a new Batch being registered for the first Molecule detected as a potential tautomer or duplicate
- new
-
"duplicate_resolution":"new"
- results in a new Molecule being registered
-
- prompt
-
"duplicate_resolution":"prompt"
- results in nothing being registered
- matching molecule IDs are returned
-
Helpful hint:
- For Vaults which do not utilize the chemical registration system, use the
duplicate_resolution
with thePOST Molecule
API call. - If this looks familiar, you are correct - these options were previously available for the
tautomer_resolution
parameter which is no longer needed since the new parameter handles all forms of duplicates: tautomers, ambiguous stereocenters, intentional duplicates.
This blog is authored by members of the CDD Vault community. CDD Vault is a hosted drug discovery informatics platform that securely manages both private and external biological and chemical data. It provides core functionality including chemical registration, data visualization, inventory, and electronic lab notebook capabilities.