Penguin Archive workflow progress

Progress to date: we’ve combined the components described in our previous post and produced a simple UI, illustrated below. The interface allows an administrator to:

  • Upload EAD XML files exported from CALM
  • Transform to RDF (using XSLT)
  • Publish to the triple store (Fuseki)
  • ‘Unpublish’ and delete EAD/RDF files

The triple store is fronted by a Linked Data API (elda), so this process results in the publishing of Linked Data (a web-friendly version is shown in the screenshot below)

This seems a reasonable start, though there are some obvious next steps:

  • Bulk actions. There are over 100 collections within the archive. We’ll need at least some of bulk upload, bulk RDF conversion, bulk publishing.
  • Automation. Again, thinking of the administrator who ends up looking after this, we should have the option for non-interactive upload through to publishing
  • Security. There is none as yet.
  • Link suggestions & validations. Need to include a process for producing lists of suggested links to third-party data, and for user validation of those links.

Leave a Reply