Data integrity and storage: tips of icebergs in rough seas ahead

‘With the coming research-data deluge already arriving in torrents, how do we negotiate tricky waters in data storage and resilience?’ asks Niall English (University College Dublin)
15th March 2021
Back to overview

There is increasing pressure on Research Performing Organisations (RPO), including universities, to make research data conform with the Findable, Accessible, Interoperable and Reusable (FAIR) principles. This is largely driven by best practices in research data management (as described in a recent CESAER white paper) to boost research quality, integrity and innovation as well as to safeguard transparency and accountability of research towards funders and ultimately taxpayers. For instance, Europe’s Data Strategy, supported by the development of the European Open Science Cloud (EOSC) in recent years and now the EOSC Association, and EOSC’s projected growth and expansion under Horizon Europe from 2021 to 2027, has at its heart the technical development of FAIR-compliant Persistent-Identifier (PID) policy.

This is ambitious, imaginative – and empowering. However, in the challenging times ahead under the auspices of Horizon Europe, the technical and infrastructural challenges for mass data storage, with the appropriate levels of security, sensitivity and user access are clearly daunting. Having said this, the ´eu.dat´ network is advancing its role as a best-practice developer of common data-sharing standards and protocols to map onto resilient physical data-storage infrastructures, with appropriate levels of system redundancy. For instance, the ‘B2SAFE’ approach (and others), underpinned by the Integrated Rule-Oriented Data System (iRODS) running via CEPH displays excellent cross-system functionality and redundancy, pointing to CEPH being a promising systems-software approach to support safe, resilient and effective higher-level data-storage and -transfer protocols.

Outside of data-storage standards per se, system software and mapping on to mass-storage hardware, there is also the thorny political question of sustainable funding and the role of advocacy. In addition, over the next decade or so, it is imperative that all researchers embed a culture of systematic data back-up, not only as a desideratum in and of itself (with pragmatic ease of access to their own data in future for its handy re-use, etc), but as a form of self-protection lest there ever be any concerns vis-à-vis data or research integrity, with probable future changing mores and ‘cultural landscapes’ with respect to data integrity – at its worst, (retrospective) litigation for present-day data-storage sloppiness.

This ´food for thought´ on present and future drivers will help us to ´steer the ship´ through troubled waters.

Niall English (Professor at School of Chemical and Bioprocess Engineering of University College Dublin)

Request more information

If you want to know more about CESAER click on the button below.

Request more information here