Penguin Archive workflow progress
Progress to date: we’ve combined the components described in our previous post and produced a simple UI, illustrated below. The interface allows an administrator to:
- Upload EAD XML files exported from CALM
- Transform to RDF (using XSLT)
- Publish to the triple store (Fuseki)
- ‘Unpublish’ and delete EAD/RDF files
The triple store is fronted by a Linked Data API (elda), so this process results in the publishing of Linked Data (a web-friendly version is shown in the screenshot below)
This seems a reasonable start, though there are some obvious next steps:
- Bulk actions. There are over 100 collections within the archive. We’ll need at least some of bulk upload, bulk RDF conversion, bulk publishing.
- Automation. Again, thinking of the administrator who ends up looking after this, we should have the option for non-interactive upload through to publishing
- Security. There is none as yet.
- Link suggestions & validations. Need to include a process for producing lists of suggested links to third-party data, and for user validation of those links.


