Users and use cases: The Penguin Archive

The Penguin Archive, housed in the Special Collections of the University of Bristol Library, contains the archives of Penguin Books Limited from its foundation in 1935 through to the 1980s. Its wide variety of materials covers the company’s establishment and business life, social events, legal cases (particularly the Lady Chatterley’s Lover trial of 1960), exhibitions on the company’s history and the private lives of prominent figures in the early history of the company. The archive also includes a large collection of Penguin books from 1935 to date. The collection comprises 2093 archive boxes of editorial files, 466 archive boxes, 24 records management boxes and 84 box files of other archival material and approximately 30,000 book titles. The digital catalogue is held in the Special Collections CALM (Computer Aided Library Management) installation. Holdings there comprise: 123 collection level descriptions containing over 4000 individual metadata records, plus detailed digital guides to areas of the archive.

JISC  has already undertaken work looking at techniques for exporting Linked Data from CALM and the current Step Change project will ensure that Linked Data support is embedded in a future release of CALM, albeit not within the Bricolage project’s lifetime. We will follow the approach developed by LOCAH and SALDA projects: data will be exported as EAD/XML, transformed via XSLT into Linked Data expressed in RDF/XML format, based on the XSLT stylesheet developed within LOCAH and made available as Linked Data. A handful of collection level Penguin Archive records are already lodged with the Archives Hub. Our project will augment this data with a Linked Data set containing thousands of resource-level catalogue records, which will be linked to the Archives Hub identifiers as and when these become available.

Initial work in the project focused on archivists trying to add authority terms to catalogue metadata but this proved extremely labour-intensive, more so than anticipated. The process has revealed how good authority data is needed for good Linked Data and that this needs to be taken into account when initially cataloguing collections – not an option for an existing catalogue like the Penguin Archive. Issues with the CALM export process and stability of URIs have been reported in other project blogposts.

Early development of tools to automate as far as possible the workflow of metadata review and export indicates the need to make it easy to keep the Lnked Data up to date after project funding ends. A batch upload process could be used for initial publication. The archivists confirm that the catalogue is “quite fluid” and is often updated, so ease of use and maintenance of Linked Data are important to our users. One option for increasing the automation of the publishing process could be to upload exports to a folder which was monitored for changes.  This may also address concerns that users have already expressed, i.e. that “any non-trivial publishing process would not be used in practice after the project ends”.  The project will aim to make the process as ‘light-touch’ as possible.

The Archivist in the University’s Special Collections notes that the primary concern of archivists is to publish sufficient metadata to enable those interested in the materials to be able to identify what exists, and to visit the Penguin Archive to use them for research, journalistic or other purposes. The Archivists have considered what would make an appropriate demonstrator for Linked Data published through the project; they would like to focus on the ‘administrative history’ of the Archive, plotting collection level records against a timeline of, for example, dates when key staff were appointed. Administrative history is a familiar archival concept so the demonstrator would be of interest both to other archivists and potentially to end users of the catalogue/Linked Data. A visual representation of the timeline list of events would need to be created manually; within the scope and timeframe of the project this will  only be possible for 1-2 decades, with just some key events plotted for the whole timeline.

The Penguin Archives archivists have engaged fully with the project, participating in Advisory Board meetings, 1-1 meetings with the development team and piloting and providing feedback on workflow processes.

Leave a Reply