Penguin archive

When we first looked at the data from the Penguin Archive, we wanted to index our collection level records with the names of the creating person or organization so that they could be linked to VIAF.  This seemed straightforward at first: we were working with a relatively small number of records (around 130); we use the Calm database; and the archivist who catalogued the collection had already created authority records for the most significant people and organizations in the archive.


However, the indexing turned out to be much more labour-intensive than it appeared.  I had not catalogued the collection, and I do not have the knowledge that our Penguin archivist had developed, so I needed to spend some time analysing the collections to work out an accurate provenance.  Although some of the creating people and organizations were available already, not all were, and it quickly became apparent that creating even the most basic authority records would use far too much of the time available in the whole project. At a very rough estimate, it took between twenty minutes and half an hour to find the information for a single authority record, and about a week to create the new records needed.


The time available for cataloguing in most archive services is extremely limited, and many important collections have little or no online presence beyond a collection level record.  Cataloguing frequently happens with short-term project funding, within tight timescales, and may be focused on providing a resource for outreach, for writing a company history, or for some other direct benefit to the organization that owns the collection.  This means that it can be difficult to justify spending time creating contextual information or doing a lot of indexing.  If there has to be a choice between creating a catalogue for a collection that is invisible on the Web and creating authority records relating to a collection that is already catalogued, many archivists would take the view that cataloguing is a better use of time.


This experience is a useful reminder that even when the right tools and standards are in place, projects should still plan for the research time which will be required to use them effectively.

