Computational workflows in the earth system sciences are becoming increasingly sophisticated, where data of different types and sources are integrated into large-scale, modelled data products. This is partly a consequence of a competition-driven diversification of tools and approaches, with the desirable side effect that we learn more about the earth's spheres from more distinct perspectives. Ideally, sophisticated and complex workflows are better at mapping the sophisticated interaction networks on our planet with less ambiguity. However, the reality is that practical considerations or a lack of resources or time in our projects demand non-ideal decisions, and how that impacts results often needs to be clarified. We quantify the errors of our output, and software engineering uses so-called unit tests, where the output of the "smallest units" of code are compared against expected results. While error reporting (of the output) is part of best practice in the earth sciences, analysis of intermediate data typically only happens project-internally but is rarely reported, even though intermediate data of one project are often the starting point of another project.
With the help of the bitfield
R-package, one can produce (simple) tests that document data and metadata snapshots along a computational workflow and store them in a very compact form (an integer stored as a column in a table or raster layer). This resulting computational footprint could be called meta-analytic or meta-algorithmic data because it allows spatially explicit documentation and re-use of an analysis or algorithm. The bitfield is a promising data structure already employed in the MODIS quality flag that allows vast information to be stored in a single integer. In this workshop, you will learn how to use the tools in bitfield
, get an introduction to the software logic, and we may discuss possible use cases and the future of this technology. https://github.com/EhrmannS/bitfield