Wednesday the 25th of September, first we got a workshop by Jacques Flores on high-quality data management. He started his presentation with a at first look childish video, but after the video started the relevancy of it became clear. The video showed two animated animals who struggled with some data-issues. I taught that some of the issues were unlikely to occur, but Jacques told us that this were some really common problems. We came rapidly to the conclusion that data is vulnerable and that it goes hand in hand with a lot of regulations.
Furthermore, we were told the importance of the accessibility of research data. By letting it be openly available, other researchers can reuse it. All those requirements for data can be summarized in the FAIR principles. Unfortunately, most data deals with at least a slight crisis.
- Findable, this means that data must remain findable in the future.
- Accessible, the data should be accessible without a license or so ever.
- Interoperable, data needs to be in the right format so multiple types of data can be combined and exchanged.
- Reusable, everything about your data needs to be clear so a right documentation is a must.
Data also often deals with protection issues. Some of the data contains private information. Privacy and security are really important for this kind of data. Good research data management is the key for this. To make a good research data management plan, you need to ask yourself the following questions:
- What type of data do you have?
- What is the right format for your data?
- How many files do you expect to have?
- What is going to be the size of those files?
- What is the origin of your data?
You can fill those answers in a DMP table:
Type | Description | Origin / collection | Formats | Software | Total file size | Number of files / samples |
Lab and stable journals | Dates, protocols, labworker, etc. | Labworker / researcher | .csv and .txt | eLabjournal | 100 – 500 Mb | 2 labjournals (consist of multiple files) |
Biological data | Blood samples | Veterinarian | 1mL/animal | NA | NA | 250 animals |
Lab results | Gene expression and antibody titers | Microarray data and ELISA data | .csv, .Rdata, .chp, .txt | Affymetric, locally developed tool | 200 Gb | 20 data output files |
Behavioural | Animal behavior visually scored | Researcher and research assistants | .csv | Noldus observer and ethovision | Kbs | 2 output datafiles |
Bodyweight | Biweekly bodyweight | Stable workers | .csv | NA | Kbs | 1 output datafile |
Statistical analyses | Scripts/codes and output tables and figures | Researcher | .R, .SAS, .cvs, .tiff | R, Rstudio, SAS, Excel | 1-50Mb | 5 scripts, 5 table files, 5 figure files |
After we filled in a DMP table for our own projects, we talked about the importance of using the right format for your data. I usually don’t think about the right format, I just use what I think is handy. After this session, I will more carefully think about this. The right format means that it’s:
- Non-proprietary
- Unencrypted
- Uncompressed
- Commonly used
- Interoperable
- Open source
Then we got an explanation about folder structure. This was very relatable, because every mistake that could be made, I made at least once. Especially the name of a document is important. Cause if you lose it, you can easily find it back. This is something that I need to be more critical about, because this happens often to me.
We finished the presentation by some information about data storage and some more information about the security that comes with it. There are multiple ways to store your data. Every option has advantages and disadvantages, so you need to choose the option that fits the best for your type of data. I haven’t dealt with a lot of research data so far, but for this course I definitely will. This presentation was very useful for me and I am glad that I saw it. Unfortunately, I wasn’t present during the presentation about Mallet, but my group told me about it afterwards.
By Stan Nuijten