Article Contents:
“We had Excel sheets all over the place and data from different projects that were just separated in different folders.” A scientist shared this with me recently. It’s not the first time that I heard these sentiments. This scientist went on to say the following: “After a while, it got to the point that it was just hard to manage.”
Many scientists make the common mistake of storing and managing their data in insecure, difficult-to-find spreadsheets. While this method might be okay for a lone scientist working in a vacuum, it is not a smart protocol for collaborative scientists doing deep work in drug discovery or in other chemical or biological fields that rely on storing, recalling, and sharing large amounts of data.
Spreadsheet management can be a major issue for collaboration in the drug discovery process. Sharing documents with your labmates and/or co-workers via Excel or Google Docs is convenient but can be very insecure.
In science, especially in the drug discovery field, flawed data management can be catastrophic. If you make one little typo when emailing the Excel file or when sharing it with someone on Google Drive, you’re in trouble. If you share data in a way that violates government regulations, fail to back up critical data, or make hazardous data entry mistakes, your career could be over
3 Reasons Why Spreadsheets Fail Scientists
At best, scientists who rely on Excel files to manage scientific data and communicate results run the risk of operating inefficiently and wasting resources.
At worst, critical data becomes compromised, scientific innovation stalls, and identifying new development candidates suffers.
Other negative outcomes of using spreadsheets to store and manage your data include:
- Restricted access to your data
- Reduced control over your data & reduced security
- Less productive collaborations & longer design cycles
The New York Times reported that sharing data on general cloud-based platforms like Google Drive is exceptionally risky.
Wired Magazine also confirmed that storing data on a secure local server is not always ideal because even the world’s most secure local servers can become unsecure without anyone knowing about it.
But if you can’t trust general cloud-based platforms or “secure” local servers to store your data, what can you trust?
Before answering this question, you need to understand just how dramatically using Excel or other spreadsheets limits you scientifically.
-
1. Restricted access to your data.
What is a spreadsheet, and why do you or any scientist use it? You might not have thought about it before, but spreadsheets are files that allow you to store and manage data. This sounds like a positive, right? What if you read it this way - spreadsheets are files that require you to store and manage data. In other words, are spreadsheets requiring you to do more work than necessary?
For example, if you use spreadsheets to store your data, you must always have a current copy of your spreadsheet file with you, keep the file updated, and store it in a place where not only you but also your colleagues can quickly and easily access it.
What does this mean? It means that spreadsheets are not easily accessible because YOU are the one who has to manually manage their accessibility.
Ask yourself: Are spreadsheets searchable? No, not really. You cannot search for value ranges, chemical structures, or similarity, and you certainly cannot search for multiple complex criteria.
While spreadsheet files can hold simple tabular data for individual experiments, they cannot reveal relationships in data that crosses multiple experiments, e.g., cross-reactivity in multiple assays, remaining batch inventory, duplicate compounds, etc.
Can your spreadsheet files provide dose-response curves or Z statistics? The answer is most likely “no.”
-
2. Reduced control over your data & reduced security
The more you can control your data, the more secure your data is. When it comes to security, spreadsheets fail. This is because spreadsheet files can easily be forwarded to unauthorized people (intentionally or accidentally). In addition, data updates to your spreadsheet file are not propagated to all your labmates or all “users” of the data. And, as mentioned above, it is not always easy to manually keep track of which spreadsheet is your most current version.
You might not realize this, but passing data files back and forth over email is insecure, as is cloud file sharing. This is true even if your university or institution is using a local server with only their standard safeguards. In fact, according to Computer World, this one group alone claimed to have hacked over 100 university servers, including Harvard, Stanford, and Penn.
Worst of all, spreadsheet files can be lost or accidentally deleted. If you’re a scientist and this has happened to you in the past, then you know how devastating such a loss can be.
-
3. Less productive collaborations & longer design cycles
Spreadsheets, even cloud-based ones, provide very little benefit when it comes to collaborating with other labs, especially those outside of your institution.
In science, time matters. This is especially true when involved in scientific collaboration. The problem is that if you’re a scientist who shared data with collaborators via spreadsheets, you will be required to constantly wait for collaborators to send updated data, and vice versa. All of this waiting will delay the progress of your collaboration.
There are a variety of problems that scientists face when it comes to using spreadsheets for collaborations. For example, collaborators can accidentally use outdated data, wasting resources on old hypotheses. Most importantly, scientists cannot collaborate with a spreadsheet file in real time. Even cloud-based spreadsheets (assuming that they were secure, which most are not) make the process of sharing in real time cumbersome at best.
A spreadsheet holds only the experimental data, so it doesn’t foster real-time collaboration on analysis and does not help scientists share and explore their conclusions in real time.
Using spreadsheet files creates communication bottlenecks that slow progress. This is because it’s nearly impossible to keep everyone in sync with the latest data when you’re sharing multiple spreadsheets with multiple scientists. Sharing spreadsheets over email or over a basic cloud-sharing platform is insecure, even if your university or institution is using a local server with only its standard safeguards.
Finally, spreadsheets are not searchable. You cannot search a finder window on your computer for value ranges, chemical structures, similarity, or other criteria. You certainly cannot search your computer for multiple complex criteria. As a result, smart scientists must think beyond the use of spreadsheets to secure their data, ensure it’s accessible, and share it productively and safely.
FAQs About Spreadsheet Management
Find out more about spreadsheet management in our answers to some of the most common questions we hear.
What are the limitations of using spreadsheets for managing complex drug discovery data?
Spreadsheets can become unwieldy with large datasets, leading to errors, version control issues, and difficulties in collaboration. They lack the advanced features necessary for managing complex relationships between chemical and biological data.
How can cloud-based data management systems improve collaboration in drug discovery?
Cloud-based systems offer real-time collaboration, centralized data storage, and enhanced security. They allow multiple users to access and edit data simultaneously, reducing errors and ensuring that all team members work with the most current information.
What are the security concerns associated with using spreadsheets for sensitive drug discovery data?
Spreadsheets often lack robust security features, making them vulnerable to unauthorized access and data breaches. They may not comply with industry standards for data protection, posing risks to intellectual property and patient confidentiality.
Can specialized data management platforms integrate with existing laboratory information systems?
Yes, many specialized platforms are designed to integrate seamlessly with existing laboratory information management systems (LIMS), facilitating data flow and reducing manual data entry. This integration enhances efficiency and data accuracy.
Trust CDD Vault for Data Management
Are you still using spreadsheets to manage your scientific data? If so, you may be facing challenges similar to those outlined above. CDD Vault by Collaborative Drug Discovery is a simple and 100% secure data management platform that is hosted through an intuitive web interface.
CDD Vault helps your project team manage, analyze, and present chemical structures, biological assays, and other scientific data. Click here to demo CDD Vault for free right now.
This blog is authored by members of the CDD Vault community. CDD Vault is a hosted drug discovery informatics platform that securely manages both private and external biological and chemical data. It provides core functionality including chemical registration, structure activity relationship, chemical inventory, and electronic lab notebook capabilities.