Data Control & Management

 Key elements of RDM include:
  • how to store your data and back them up effectively so that they are protected against corruption and loss;
  • how to organise your data, using meaningful file names and logical folder structures, and applying version control to modified files;
  • how to apply quality controls to your data workflow, so that their integrity is maintained and the incidence and impact of error is minimised;
  • how to document your data, so that you (and others) can understand what the data are, how they were collected/generated, and how they have been processed and analysed;
  • how to process personal and confidential data, to ensure you are meeting the requirements of the Data Protection Act and your ethical obligations;
  • how to preserve and share your data so that they can be consulted and re-used by other researchers, usually by using suitable data repositories.
The research data lifecycle
01 - Store your data
02 - Organise your data
03 - Apply quality controls
04 - Document your data
05 - Process personal and confidential data
06 - Preserve and share your data

01 - Plan 
02 - Collect
03 - Process
04 - Analyse
05 - Preserve
06 - Share
07 - Reuse

Plan
Here you will identify the data that will be collected or used to answer your research question, and will plan for data management throughout the lifecycle. This is the stage at which a data management plan would be created.

Collect
This is the stage at which experiments are carried out, observations made, surveys undertaken, secondary materials acquired, etc. This will involve documentation of data collection instruments and methods and information necessary to interpret and use the data.

Process
Data once collected will need to be processed in order to be usable. This might involve cleaning data to eliminate noise, combining data from multiple sources, transforming data from one state to another (e.g. by format conversion), and using procedures to validate or quality-control data. Any data processing will need to be documented, such that the end result could be replicated from the raw data.

Analyse
Data analysis is the stage at which the raw materials of research are interrogated to produce the insights that constitute the research findings, which will be written up and published in research outputs. Instruments and methods used for analysis should be documented; code written for purposes of data analysis and visualisation may need to be preserved and made available in support of research results.

Preserve
Towards the completion of your research you will preserve for the long term data that substantiate your research findings and have long-term value. Data will need to be prepared for preservation and archived in a suitable location. In many cases this will involve deposit of the digital data in a suitable data repository/data centre. Preservation activities may involve quality assurance of data, file format conversion, creation of metadata records with assignment of Digital Object Identifiers (DOIs) to datasets, licensing datasets for re-use, and putting in place any required access controls. Confidential and non-digital data may be held locally or in a non-public location, in which case they should be managed by an accountable person or group, who can ensure they are stored and preserved properly.

Share
Publications based on data should include a data citation or a statement indicating where and on what terms the data can be accessed. A data repository will enable discovery of the data in its care by exposing the metadata online, and will provide access to the data when this is permitted. Data may be made publicly available, or restrictions on access may be imposed where data are of a sensitive or confidential nature. Data held locally or in non-public locations should be managed in such a way that others can discover and apply for access to the data.

Re-use
Data that are available for discovery and access may be re-used by other researchers, either to substantiate the findings of the original research, or to generate new insights through further interrogation and analysis. At this stage the data may become raw materials collected within a new cycle of research. Research data may also have other valuable uses, e.g. in policy-making, development of commercial products and services, and teaching.



Classification of Raw Data
The raw data of research may exist in digital and non-digital formats, and may be broadly divide into five classes:
01 Observational
02 Experimental
03 Simulation
04 Derived or compiled
05 Reference

Observational
Facts recorded directly in real time from the physical and social environment, e.g. measurements collected by weather sensors, species abundance surveys, archaeological samples, brain scan images, experience and opinion surveys in the social sciences. These data are often unique to time and place and by definition cannot be reproduced.

Experimental
Data collected as the outputs of field or laboratory experiments and complex analytical processes, e.g. clinical trial data, chemical analyses of physical samples, DNA sequencing of organic material, field trial results. These data are generally in principle reproducible, assuming the experimental conditions can be replicated.

Simulation
Data generated by means of computational 'virtual experiments', often used to model complex systems and processes, e.g. climate and weather simulations, models of market processes. These data are usually reproducible, given information about the model, the code and computing environment used to execute the model, and any input conditions. This information may in fact be more important that the output data.

Derived or compiled
Datasets produced by processing or combining source data, e.g. databases compiled by extraction of information from multiple secondary sources, collections of digitized materials, corpora collected by means of text mining.

Reference
Published and curated data, usually existing as part of managed collections, e.g. national statistics archives, crystallographic databases, gene banks.

0 comments:

Post a Comment