OpenRefine
OpenRefine is a data processing application that allows you to clean up and transform structured data. It has several functionalities suited for creating Linked Open Data (LOD), such as reconciliation, format translation, Resource Description Framework (RDF) mapping, and export options.
OpenRefine and LINCS
Within the LINCS project, OpenRefine is used for data cleaning and reconciliation. It is primarily used by researchers bringing their own datasets to the project. OpenRefine allows for these domain experts to have full control over the changes made to their data.
OpenRefine is best suited for structured data, since it will represent the data in a format similar to a spreadsheet or table. Any file type that follows a similar system, such as comma separated values (CSV), is best, though it is also compatible with other file types like XML, JSON, and RDF. If a researcher’s data falls within a certain domain or is unstructured, a different tool may be more appropriate:
- Use LINCS-API or NERVE for an unstructured dataset.
- Use VERSD to reconcile an entirely bibliographic dataset.
The software can be downloaded from OpenRefine’s website. When launched, the application will open in a browser tab that runs locally on your computer.
Though this tool can be useful for researchers and data specialists outside of LINCS, it is important for those who are in the process of getting their data into the LINCS system to begin cleaning and reconciling it in OpenRefine early in the data preparation process.
Check out the Authority Service to reconcile your data against the LINCS Knowledge Graph from within OpenRefine.
Prerequisites
- You don't need to create a user account.
- You do need to have your own dataset.
- A basic understanding of reconciliation and data cleaning is required.
OpenRefine supports the following inputs and outputs:
- Input: CSV, TSV, XLS, XLSX, JSON, XML, RDF, plain text, and more
- Output: CSV, TSV, XLS, XLSX, HTML-formatted tables, and more
Resources
To learn more about OpenRefine, see the following resources:
Clean Data:
- OpenRefine User Manual
- Rue & Hernandez (2019) “Using OpenRefine to Clean Your Data”
- Hervieux (2020) “OpenRefine Activity” [PowerPoint]
- van Hooland, Verborgh, & De Wilde (2021) “Cleaning Data with OpenRefine”
Reconcile Entities:
- OpenRefine User Manual—Reconciling
- Getty Digital (2020) “Getty Vocabularies OpenRefine Tutorial and Tips for Advanced Users”
Information about the team that developed OpenRefine is available on the Tool Credits page.