Skip to content

“How to make your messy data usable?” and “Metadata and README” courses REGISTRATION CLOSED

In the month of April, ELIXIR Estonia will be holding two data management online courses: "How to make your messy data usable?" on the 4th of April and "Metadata and README" on the 11th of April. Both of the courses will be held online, in Zoom, and in English. 

"How to make your messy data usable?" course will be in two parts: an 1.5 hour online lecture on how to make a spreadsheet usable for other people, held on the 4th of April at 10:00 in Zoom. The practical workshop on cleaning your messy data with OpenRefine software will be a video lecture that you can follow on your own time. Additionally, we will hold 3 Q&A sessions in Zoom, where you can talk about any problems you encountered with the OpenRefine software. In the "Metadata and README" lecture, we will be going over what exactly is metadata, what is the minimum information that should be included with each of the scientific results you are sharing and how exactly can you write a README file. 


In recent years, more attention is put on what researchers do with the data (and other resources) they produce. Especially in Europe, but also everywhere else. The main idea is that when researchers use taxpayers' money, the taxpayers themselves should also have access to the results, free of charge. This means that the research should be published in open access journals and data should be made publicly available. 

Good data management may help you with that, at least to make the process easier on the whole. If you think what to do with your data at the beginning and during the project and know what you plan to do with it at the end of the project, the process at the end will be easier. However, what is “good data management”, is up to debate. The FAIR Principles concentrates on making your data findable, accessible, interoperable and reusable, so this is a good start. And let’s be honest, some of these things you are probably already doing. 


How to make your messy data usable? course information

In this course, we will be going over how to name your files and variables, version control, compile a data dictionary, and what to do with empty cells. In the second part, OpenRefine software is introduced. With this, you can easily clean up the messy data. For the more practical aspect of using the OpenRefine software, I will share a video that will teach the basics. You can watch it anytime and do the lessons yourself. On three days (6.04, 7.04 and 8.04) there will be a 1h slot (10:00-11:00) on Zoom, when you can come and ask any question you have regarding tables and OpenRefine software. 


Information about the lecture:

Lecture: 4th of April, 2022 at 10:00 (lecture, 1.5h; in English)

Q&A session: 6.04, 7.04 and 8.04 at 10:00 (Q&A, feedback, 1h)

Place: ZOOM (link will be sent to your email)


Registration closes at 23:59 on 31.03.2022 or when the course gets full.

Learning outcomes: 

  • Compile a data table that abides by the FAIR Principles
  • Recognize what a clean table for others to use looks like
  • Explain how to use OpenRefine to clean the messy data


Metadata and REAME lecture information

In general, metadata is the descriptive information about your data. However, what exactly is metadata and how much of it should be included with your data? Good metadata can make up for human fallibilities. People forget and misplace things, and leave research projects taking their knowledge of the research methodology and the data with them. Metadata ensures that we will be able to find the data, use it, preserve and reuse it in the future.

  • Finding Data. Metadata makes it much easier to find relevant data. Most searches are done using text (like a Google search), so formats like audio, images, and video are limited unless text metadata is available. Metadata also makes text documents easier to find because it explains exactly what the document is about.
  • Using Data. To use a dataset, researchers need to understand how the data is structured, definitions of terms used, how it was collected, and how it should be read.
  • Reusing Data. Researchers often want to reuse data collected for another project for their own project. The data still needs to be found and used, but often at a higher level of trust and understanding. Reusing data often requires careful preservation and documentation of the metadata.

This means that the metadata provides additional information that helps data consumers to better understand the meaning of the dataset, its structure and to clarify other issues, such as rights and license terms, the organization that generated the data, data quality, data access methods and the update schedule of datasets. Additionally, metadata also gives information about the data in general. What an actual metadata file includes, varies between disciplines and types of data you are working with. However, the documentation for your data should contain the minimum information required to be able to reuse (or understand) the data described. 

In this lecture, we will be going over what metadata about your dataset should be included when you are sharing it. Additionally, we will also go over some examples on how to write a good README file. 


Information about the lecture:

Time: 11th of April, 2022 at 10:00 (lecture, 2h; in English)

Place: ZOOM (link will be sent to your email)


Registration closes at 23:59 on 31.03.2023 or when the course gets full.

Learning outcomes: 

  • Understands the importance of good data management
  • Knows what metadata means in data files
  • Knows how to add metadata to the data
  • Knows what should be included in the README file
  • Can write a simple README file to accompany the data