You are currently browsing the Bricolage weblog archives for October, 2012.





Archive for October, 2012

Lessons learned: summary

Monday, October 15th, 2012

At the final Advisory Group meeting towards the end of July 2012, the following points were made in relation to the evaluation of the quality of Linked Data produced and techniques used:

  • Quality depends not only on time-consuming human-crafted links to third party datasets, but also on the quality of those datasets (the project had identified some potential mistakes in VAIF)
  • Export processes such as those developed for the Penguin Archive use case were not really sustainable with the limited resources that archives usually have
  • Limitations of some parts of Drupal mean that, in the Geology use case, we may not be able to make as much of the Linked Data as we would like
  • Although we have made considerable efforts to make the user interface to the export and publication processes as smooth as possible, they are still not integrated enough to be adopted in normal working practice
  • There is a considerable learning curve in understanding Linked Data and what is needed to create and publish them, which requires intensive support and/or time to read around the subject
  • From a technical perspective, the project has highlighted how much impact on the data the production of Linked Data has – it’s not a simple conversion process
  • The extent to which Linked Data has the potential to ‘draw in’ new audiences for collections is more limited than envisaged, as collection level descriptions are already available in the Archives Hub, ranked highly in Google searches and accessible via Calm
  • It needs more resource, more space and more time!

In terms of methodology, the bringing together of different use cases and technical expertise had worked well, despite learning curves on all sides. The project had been beneficial in raising awareness of Linked Data issues in the Special Collections and Geology teams, and of archival and cataloguing practice in the technical team. Geology and Special Collections were also more aware of each others’ collections and potential for working together in the future.

 

Lessons learned: sustainability of workflow

Monday, October 15th, 2012

The two use cases present different scenarios for the sustainability of processes for publishing Linked Data. As noted in an earlier blogpost, for the Penguin Archive, the process has been unexpectedly time-consuming. The production of Linked Data requires enhancement of the collection metadata way beyond the requirements of currently accepted archival standards and, for the majority of archivists, presents a considerable technical learning curve. The key lesson for the archives community and for those promoting the creation of Linked Data is around the limitations that archivists’ core values and practice, and their very constrained time, impose.

The Penguin Archive had benefitted from a funded cataloguing post for two and a half years to transfer paper catalogue records to Calm; one of the archivists too up to a week to create new records for publication as Linked Data as part of the project.

For both collections, free-text catalogue entries were particularly problematic; if any structure is needed, time has to be spent on extracting specific text into new fields manually, or on re-categorisation. Although the project coincided with the development of a new Drupal-based online catalogue for the Geology Museum and publication of Linked Data is therefore largely automatic, the extent of data cleaning required as part of the process came as a surprise. The project allowed for effort to be expended on data cleaning; without this effort, any Linked Data published would have been of very limited use, even though publication is largely automatic through the additional Drupal module. Even where structure exists internally to collection data, there are still issues of linking these with third party datasets.

One recommendation may be for JISC to support a project to identify the amount of effort required for the creation and publication of Linked Data for different types of collections.

Overall, the message from both use cases is that the creation of Linked Data, links to authority files and to third party datasets should not be considered a quick and easy solution. Future upgrades of products such as Calm may well integrate creation and publication of Linked Data without there needing to be a separate process; without this, it is unlikely that data from collections like the Penguin Archive, where much human intervention is needed, will be published as Linked Data.

Lessons learned: linked data hosting

Monday, October 15th, 2012

The hosting review workpackage had originally identified Talis as the main option for external hosting. Talis wound up development of its external hosting platform during the Bricolage project’s lifetime, so this was no longer an option. The sustainability or otherwise of hosting and other platforms in relatively experimental areas of work is something to consider for future projects; having several options, in this case including internal hosting solutions, helps reduce risk of not being able to deliver on project plans. The Geology data will in any case be hosted as an integrated part of the museum’s new enhanced online presence. The Penquin Archive data will remain as a snapshot, also hosted on internal servers.