Once you have found a dataset, you have to assess if you have permission to use it, or if you even want to use it.
Permission: Many data repositories have labelled their datasets with a Creative Commons (CC) license or equivalent. They may also have set a terms of use, which specifies if you can re-use the data and how. Often it is a matter of citing the dataset as if it were an article. Sometimes access to the data requires that you accept the usage conditions. Make sure to read and follow these.
Quality: How do you know if the data is of a sufficient quality that you can rely on it for your research? Here it is important to know the source - the primary source. If the data has been used for a research project, one could assume that the quality is good, but you should take a closer look at the methodology described in the article, and you might have to communicate with the author of the article to get a real sense of the data. Is the documentation in order? Do you know all you need to, in order to understand the content of the data? If yes, then assessment is easier. If the source of information is a company, organisation or a goverment agency it can be more difficult to assess the quality, scope and completeness of the data, and more scrutiny could be called for.
Access is necessary if you want to work with the data. But how do you get access? Sometimes the data can be freely downloaded from a webpage or repository but in other instances you will have to ask for access first. Some repositories offer this functionality and prompts the author with a message that someone is asking for access. Check out which options you have in the repository.
If access to the database requires a subscription, and it is necessary for your research, it is possible to request your university library to buy an access license. There are no guarantees that the library can provide a license, but it is worth a try. See contact information on the Introduction tab.
In other instances the data are simply too big to download and it is more convenient to access them through an API. On this point, read the documentation for the data. In some cases access requires payment if the data have been collected and curated by a private organisation. Before you find funding for it, check if you have the software needed to work with the data and the terms of use do not prohibit your intended use.
Documentation provides information on where the data came from, who collected them, when, where and how. It can be a project description, a protocol, or a data management plan. It can also consist of the scripts that collect, process or analyze the data. Documentation is also a description of how the data was treated - analytics tools used, statistical tests chosen, or a description of how outliers were handled. Code books and lab notebooks are also an important piece of documentation and might be essential for understanding a dataset and its potential for reuse.
Photo by: janilson furtado on Unsplash
The source of a dataset might say something about its quality and trustworthiness. Was it created by other researchers who would strive for similar standards as you do? Were the data collected by citizens who might not have followed comparable procedures for quality control? Are the data provided by a public or private entity that have collected them for a completely different purpose? Might there be biases influencing the integrity or completeness of the data? If the data were used for a peer reviewed article there should be reason to rely on the quality - at least you know who to contact to get more information. However, peer review of an article does not necessarily cover the underlying data however, so your own assessment is still called for.