[W4S1] Data Management - Living Pasts Exploring Futures

Wednesday the 25th of September, first we got a workshop by Jacques Flores on high-quality data management. He started his presentation with a at first look childish video, but after the video started the relevancy of it became clear. The video showed two animated animals who struggled with some data-issues. I taught that some of the issues were unlikely to occur, but Jacques told us that this were some really common problems. We came rapidly to the conclusion that data is vulnerable and that it goes hand in hand with a lot of regulations.

Furthermore, we were told the importance of the accessibility of research data. By letting it be openly available, other researchers can reuse it. All those requirements for data can be summarized in the FAIR principles. Unfortunately, most data deals with at least a slight crisis.

Findable, this means that data must remain findable in the future.
Accessible, the data should be accessible without a license or so ever.
Interoperable, data needs to be in the right format so multiple types of data can be combined and exchanged.
Reusable, everything about your data needs to be clear so a right documentation is a must.

Data also often deals with protection issues. Some of the data contains private information. Privacy and security are really important for this kind of data. Good research data management is the key for this. To make a good research data management plan, you need to ask yourself the following questions:

What type of data do you have?
What is the right format for your data?
How many files do you expect to have?
What is going to be the size of those files?
What is the origin of your data?

You can fill those answers in a DMP table:

Type	Description	Origin / collection	Formats	Software	Total file size	Number of files / samples
Lab and stable journals	Dates, protocols, labworker, etc.	Labworker / researcher	.csv and .txt	eLabjournal	100 – 500 Mb	2 labjournals (consist of multiple files)
Biological data	Blood samples	Veterinarian	1mL/animal	NA	NA	250 animals
Lab results	Gene expression and antibody titers	Microarray data and ELISA data	.csv, .Rdata, .chp, .txt	Affymetric, locally developed tool	200 Gb	20 data output files
Behavioural	Animal behavior visually scored	Researcher and research assistants	.csv	Noldus observer and ethovision	Kbs	2 output datafiles
Bodyweight	Biweekly bodyweight	Stable workers	.csv	NA	Kbs	1 output datafile
Statistical analyses	Scripts/codes and output tables and figures	Researcher	.R, .SAS, .cvs, .tiff	R, Rstudio, SAS, Excel	1-50Mb	5 scripts, 5 table files, 5 figure files

After we filled in a DMP table for our own projects, we talked about the importance of using the right format for your data. I usually don’t think about the right format, I just use what I think is handy. After this session, I will more carefully think about this. The right format means that it’s:

Non-proprietary
Unencrypted
Uncompressed
Commonly used
Interoperable
Open source

Then we got an explanation about folder structure. This was very relatable, because every mistake that could be made, I made at least once. Especially the name of a document is important. Cause if you lose it, you can easily find it back. This is something that I need to be more critical about, because this happens often to me.

We finished the presentation by some information about data storage and some more information about the security that comes with it. There are multiple ways to store your data. Every option has advantages and disadvantages, so you need to choose the option that fits the best for your type of data. I haven’t dealt with a lot of research data so far, but for this course I definitely will. This presentation was very useful for me and I am glad that I saw it. Unfortunately, I wasn’t present during the presentation about Mallet, but my group told me about it afterwards.

By Stan Nuijten