Lessons learned: sustainability of workflow

The two use cases present different scenarios for the sustainability of processes for publishing Linked Data. As noted in an earlier blogpost, for the Penguin Archive, the process has been unexpectedly time-consuming. The production of Linked Data requires enhancement of the collection metadata way beyond the requirements of currently accepted archival standards and, for the majority of archivists, presents a considerable technical learning curve. The key lesson for the archives community and for those promoting the creation of Linked Data is around the limitations that archivists’ core values and practice, and their very constrained time, impose.

The Penguin Archive had benefitted from a funded cataloguing post for two and a half years to transfer paper catalogue records to Calm; one of the archivists too up to a week to create new records for publication as Linked Data as part of the project.

For both collections, free-text catalogue entries were particularly problematic; if any structure is needed, time has to be spent on extracting specific text into new fields manually, or on re-categorisation. Although the project coincided with the development of a new Drupal-based online catalogue for the Geology Museum and publication of Linked Data is therefore largely automatic, the extent of data cleaning required as part of the process came as a surprise. The project allowed for effort to be expended on data cleaning; without this effort, any Linked Data published would have been of very limited use, even though publication is largely automatic through the additional Drupal module. Even where structure exists internally to collection data, there are still issues of linking these with third party datasets.

One recommendation may be for JISC to support a project to identify the amount of effort required for the creation and publication of Linked Data for different types of collections.

Overall, the message from both use cases is that the creation of Linked Data, links to authority files and to third party datasets should not be considered a quick and easy solution. Future upgrades of products such as Calm may well integrate creation and publication of Linked Data without there needing to be a separate process; without this, it is unlikely that data from collections like the Penguin Archive, where much human intervention is needed, will be published as Linked Data.

Leave a Reply