Revisiting the Data Lifecycle with Big Data Curation
As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions. The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger ...
Line Pouchard
doaj +11 more sources
The craft and coordination of data curation: complicating "workflow" views of data science [PDF]
Data curation is the process of making a dataset fit-for-use and archiveable. It is critical to data-intensive science because it makes complex data pipelines possible, makes studies reproducible, and makes data (re)usable. Yet the complexities of the hands-on, technical and intellectual work of data curation is frequently overlooked or downplayed ...
A. Thomer+7 more
arxiv +3 more sources
Leveraging Machine Learning to Detect Data Curation Activities [PDF]
This paper describes a machine learning approach for annotating and analyzing data curation work logs at ICPSR, a large social sciences data archive. The systems we studied track curation work and coordinate team decision-making at ICPSR. Repository staff use these systems to organize, prioritize, and document curation work done on datasets, making ...
Sara Lafia+4 more
arxiv +3 more sources
Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia [PDF]
AI tools are increasingly deployed in community contexts. However, datasets used to evaluate AI are typically created by developers and annotators outside a given community, which can yield misleading conclusions about AI performance. How might we empower communities to drive the intentional design and curation of evaluation datasets for AI that ...
Tzu-Sheng Kuo+7 more
arxiv +3 more sources
Machine Learning Data Practices through a Data Curation Lens: An Evaluation Framework [PDF]
Studies of dataset development in machine learning call for greater attention to the data practices that make model development possible and shape its outcomes. Many argue that the adoption of theory and practices from archives and data curation fields can support greater fairness, accountability, transparency, and more ethical machine learning.
Eshta Bhardwaj+5 more
arxiv +3 more sources
Exploring the impact of data curation criteria on the observed geographical distribution of mosses. [PDF]
Biodiversity data records contain inaccuracies and biases. To overcome this limitation and establish robust geographic patterns, ecologists often curate records keeping those that are most suitable for their analyses.
Ronquillo C+3 more
europepmc +2 more sources
How Important is Data Curation? Gaps and Opportunities for Academic Libraries
INTRODUCTION Data curation may be an emerging service for academic libraries, but researchers actively “curate” their data in a number of ways—even if terminology may not always align. Building on past userneeds assessments performed via survey and focus
Jake Carlson+2 more
exaly +4 more sources
Updating the Data Curation Continuum
The Data Curation Continuum was developed as a way of thinking about data repository infrastructure. Since its original development over a decade ago, a number of things have changed in the data infrastructure domain. This paper revisits the thinking behind the original data curation continuum and updates it to respond to changes in research objects ...
Andrew Treloar, Jens Klump
openaire +3 more sources
Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform [PDF]
James J. Kozich+4 more
openalex +2 more sources
Data Curation in the World Data System: Proposed Framework [PDF]
The value of data in society is increasing rapidly. Organisations that work with data should have standard practices in place to ensure successful curation of data. The World Data System (WDS) consists of a number of data centres responsible for curating
P Laughton, T du Plessis
doaj +2 more sources