CDD has effectively lowered the “activation barrier” for data archival of low, medium, and even high throughput experiments.
Before you can perform analyses or look for trends, data must be imported into your analysis tool of choice. Here are the top 10 reasons CDD makes it easy to get data in, raising adoption rates amongst users:
- First and foremost, CDD’s simple design makes the capture of data from Excel™ (.csv) and SDF files feel like navigating an intuitive (think Facebook or LinkedIn) web application, and a lot less like database administration, typically required with complex systems.
- The import interface has the look-and-feel similar to the well-known environment in Excel. This is especially helpful when importing results from a complex set of experiments, where an intuitive interface lets a novice database user "map" their file into the database fields, based on their own predefined protocols and parameters. This is a part of what makes the CDD Vault useful for many collaborative researchers- each user can and will define the readouts and parameters of their own protocols for their own experiments, and import disparate data with ease.
- Predictive algorithms will auto-suggest mapping based on file headers, with accuracy the majority of the time, taking guess-work out of field mapping. Easy manual override will take care of any missed fields.
- For incomplete or poorly annotated data, new runs of assays/protocols or new data fields can be added, on the fly, directly within the context of the data upload – effectively lowering traditional barriers to data capture.
- The algorithms learn from past user mapping within the CDD Vault to constantly improve accuracy specifically attuned for just your collaborations’ (often emerging) workflows. This ensures continuously improving efficiency, even with emerging, distributed, brand new processes-projects.
- Saved Mapping Templates – Once all the details are mapped (controls, x and y parameters for IC50s, molecule names, structures, batches, plates, wells, and even user defined fields) – they can be saved and reused at a later date. Saved mapping templates are used to add more data to the same or a future run of the protocols by all researchers with appropriate permissions. If researchers change over time, it would typically be a major hit to project momentum, but not so with the digital capture of previously optimized processes for any new users. CDD anticipates collaboration and user turnover, without the project missing a beat.
- Integrity of data is automatically assessed for multiple necessary uniqueness parameters. Before committing the data to the database for posterity, the CDD software will evaluate every row of data in the file to confirm either the creation of a new molecule/batch or addition of data to an existing batch. If there is an inconsistency in the structures or identifiers, the database will flag the specific row with the error or missing data. These pesky, yet critical, anomalies are categorized as noteworthy, suspicious events, or errors, so they can be appropriately addressed. They can be downloaded and fixed on the fly (a process that otherwise would be tedious and perhaps even prevent accurate, timely data capture from occurring at all).
- Experimental Data Quality Metrics – The critical ones needed to understand the scope, limitation, and accuracy of complex experiments are included. Specifically, the CDD Vault provides Z/Z’ statistics to evaluate runs, plates, and even individual results. The CDD Vault provides dose response curve fitting per NIH published guidance and outlier analysis (with options for removal to refit curves while still preserving the original raw data). Related curves may be overlapped to identify underlying trends from, say, experiments run in triplicate. The CDD Vault provides averaging and normalization across individual or all runs. The CDD Vault provides data mining tools with saves searches (search logic) and collections (lassos around sets of data with the ability to remove individual rows) for the secure, collaborative identification and dissemination of SAR trends and emerging series. Data visualization tools like heat maps and scatterplots are easily accessible within a single cost-effective, application for the whole team. All these capabilities take advantage of any complementary expertise within the project team. And all these data assessing capabilities are still easy enough to understand and use by novice, as well as more expert users. Another benefit of a collaborative system is the ability to tap into collective thinking.
- Over half a dozen user privileges at both the Vault and Project-specific levels, allow specific designation of exactly who can define protocol parameters and workflows, who can just add data to predefined parameters, and who can just mine/export (but not change) the underlying data.
- To track progress, recent changes such as new molecules or assay data captures are automatically tracked on the collaborative dashboard. Visible immediately upon login, the collaborative dashboard may be augmented with the collaborative message board that allows secure, peer to peer communication and discussion of remarkable results and new developments.
The above 1-10 capabilities ensure you capture the value from inherently expensive, complex experimentation. The real advantage of the CDD Vault for your distributed teams arises from the unique collaborative features. This creates efficiencies between organizations previously only possible when working with a tight-knit group in a single lab or building.
Download this article here for convenient offline reading.
This blog is authored by members of the CDD Vault community. CDD Vault is a hosted drug discovery informatics platform that securely manages both private and external biological and chemical data. It provides core functionality including chemical registration, structure activity relationship, chemical inventory, and electronic lab notebook capabilities!
CDD Vault: Drug Discovery Informatics your whole project team will embrace!