You are currently browsing the archives for the Geology Museum category.

Archive for the 'Geology Museum' Category

Lessons learned: summary

Monday, October 15th, 2012

At the final Advisory Group meeting towards the end of July 2012, the following points were made in relation to the evaluation of the quality of Linked Data produced and techniques used:

  • Quality depends not only on time-consuming human-crafted links to third party datasets, but also on the quality of those datasets (the project had identified some potential mistakes in VAIF)
  • Export processes such as those developed for the Penguin Archive use case were not really sustainable with the limited resources that archives usually have
  • Limitations of some parts of Drupal mean that, in the Geology use case, we may not be able to make as much of the Linked Data as we would like
  • Although we have made considerable efforts to make the user interface to the export and publication processes as smooth as possible, they are still not integrated enough to be adopted in normal working practice
  • There is a considerable learning curve in understanding Linked Data and what is needed to create and publish them, which requires intensive support and/or time to read around the subject
  • From a technical perspective, the project has highlighted how much impact on the data the production of Linked Data has – it’s not a simple conversion process
  • The extent to which Linked Data has the potential to ‘draw in’ new audiences for collections is more limited than envisaged, as collection level descriptions are already available in the Archives Hub, ranked highly in Google searches and accessible via Calm
  • It needs more resource, more space and more time!

In terms of methodology, the bringing together of different use cases and technical expertise had worked well, despite learning curves on all sides. The project had been beneficial in raising awareness of Linked Data issues in the Special Collections and Geology teams, and of archival and cataloguing practice in the technical team. Geology and Special Collections were also more aware of each others’ collections and potential for working together in the future.


Lessons learned: sustainability of workflow

Monday, October 15th, 2012

The two use cases present different scenarios for the sustainability of processes for publishing Linked Data. As noted in an earlier blogpost, for the Penguin Archive, the process has been unexpectedly time-consuming. The production of Linked Data requires enhancement of the collection metadata way beyond the requirements of currently accepted archival standards and, for the majority of archivists, presents a considerable technical learning curve. The key lesson for the archives community and for those promoting the creation of Linked Data is around the limitations that archivists’ core values and practice, and their very constrained time, impose.

The Penguin Archive had benefitted from a funded cataloguing post for two and a half years to transfer paper catalogue records to Calm; one of the archivists too up to a week to create new records for publication as Linked Data as part of the project.

For both collections, free-text catalogue entries were particularly problematic; if any structure is needed, time has to be spent on extracting specific text into new fields manually, or on re-categorisation. Although the project coincided with the development of a new Drupal-based online catalogue for the Geology Museum and publication of Linked Data is therefore largely automatic, the extent of data cleaning required as part of the process came as a surprise. The project allowed for effort to be expended on data cleaning; without this effort, any Linked Data published would have been of very limited use, even though publication is largely automatic through the additional Drupal module. Even where structure exists internally to collection data, there are still issues of linking these with third party datasets.

One recommendation may be for JISC to support a project to identify the amount of effort required for the creation and publication of Linked Data for different types of collections.

Overall, the message from both use cases is that the creation of Linked Data, links to authority files and to third party datasets should not be considered a quick and easy solution. Future upgrades of products such as Calm may well integrate creation and publication of Linked Data without there needing to be a separate process; without this, it is unlikely that data from collections like the Penguin Archive, where much human intervention is needed, will be published as Linked Data.

Users and use cases – overview

Thursday, June 28th, 2012

The Bricolage project will publish catalogue metadata as Linked Open Data for two of the University of Bristol’s most significant collections: the Penguin Archive and the Geology Museum (site in development). We will also encode resource microdata into the Geology Museum’s forthcoming online catalogue with the aim of improving collection visibility via major search engines and develop two simple demonstrators to illustrate the potential of data linking and reuse.

The project’s users are therefore archive and museum staff responsible for cataloguing and managing these important collections. The Linked Data production workflows need to be easy to use to enable embedding in the collection teams’ routine and to maximise sustainability of export and publication processes beyond the end of the project lifetime so that Linked Data can continue to be produced for reuse. Separate blog posts describe the use case for the Penguin Archive  and for the Geology Museum indicate how the project affects our users and how they are being engaged and are reacting to the project.

Users and use cases: The Geology Museum

Thursday, June 28th, 2012

The Geology Museum (site under development) is based in the University of Bristol’s School of Earth Sciences. It holds historically and scientifically important collections that are unique to the institution. The museum holds an estimated 100,000 museum specimens, many of which are unique and of international importance. Highlights include: an estimated 20,000 invertebrate fossils including material with important historical associations, over 4,500 mineral specimens, including many display-quality items from nowadays inaccessible mines, over 3,000 vertebrate fossils and casts and the Fry collection of over 4,000 invertebrate and plant fossils from the UK. There is also an extensive teaching collection of 16,000 specimens. Over the past 15 years 41,420 digital records have been produced on the basis of historic museum registers, card index catalogues and specimen labels. The creation of digital metadata has focused on valuable specimens and collection of national or international importance. These records represent about two thirds of the entire collection. Each metadata record contains information in 30 categories, 18 of which will be published by this project.

The School of Earth Sciences is already undertaking work to enhance the online presence of the Geology Museum by improving the museum website and online access to the collections. Included in this work is the migration of the existing collection metadata into a Drupal backed system, which can be used to publish Linked Data automatically.

Initial work focused on moving data from existing spreadsheet format into the Drupal database. Issues arise in the formats used, including free text, and the need to restrict terminology. There is a huge amount of data but it is largely unstructured, so requires manual effort to review and test. Unlike the Penguin Archive use case, the export and publication processes are largely automated by Drupal’s in-built modules for handling RDF, returning it in response to a Linked Data request. The aim is to embed data from the catalogue in the Geology Museum’s new public website using metadata in the HTML of the site, so that large search engines can find structured data.

The Collections & Practicals Manager in the School of Earth Sciences has suggested that a map demonstrator would be useful for the Geology Museum Linked Data. She is concerned, however, that much of the geo-location data about the collection is embedded as free text in description fields, which would make it difficult to plot the data on a map consistently, if at all. She has proposed using geodata for ‘type specimen’ data for the centre of the UK, although this also raises questions about the level of resolution at which these data could be plotted: for some, the catalogue may only include data about the nearest town or village rather than a precise geolocation related to OS references. Given the Museum’s relationships with local schools and geology enthusiast  groups, one way of resolving this issue – and assisting the ‘clean up’ of the data and giving information on use of the site overall – could be to invite these ‘end users’ to provide feedback and correct location data via the site. She has arranged a meeting with one such group in July which could provide a starting point for this. It will need to be made clear to any users beyond the Museum staff, however, that the demonstrators are not at ‘full service grade’.

The Collection Manager has engaged fully with the project, participating in Advisory Board meetings, 1-1 meetings with the development team and piloting and providing feedback on data migration to Drupal. The demonstrator will provide a concrete example of how Linked Data published via Drupal can be used but evaluation of the value of embedding microdata to facilitate search engine optimisation is unlikely to extend beyond the lifetime of the project.

Bricolage: demonstrators

Friday, May 25th, 2012

At the Advisory Group meeting on 20 April, we discussed potential scope and focus of the two demonstrators that the project will develop. For the Geology Museum, we may want to focus on a demonstrator that links to promoting their work in schools, which could include a mapping feature. The Penguin Archive may want to consider a timeline demonstrator linked to a specific area of the Archive.

We looked at some examples to help refine thinking on demonstrators:

Examples of the use of a timeline:

Example of a geographical view:

The Advisory Group will finalise demonstrators to be developed at its meeting in May; the key will be in demonstrating how the use of Linked Data can enhance the collections, which may in turn encourage sustainability of the tools and processes used.

Progress against workplan

Friday, May 25th, 2012

The Bricolage Advisory Group reviewed progress against the project workplan at its second meeting on 20 April 2012. We agreed to bring forward by a month each the two remaining AG meetings – to May and July – to ensure continuing review and steer towards the end of the project in August.

The Linked Data: Hosting work package has started with options such as Talis and data.bris being considered. Timing of the data.bris project may preclude its use, although it may not be the best place to host data from Bricolage anyway. It could, however, be used to create URIs and expose data in an external view. Options to be considered more fully at the meeting in May.

The Linked Data : Metadata review/export work package is moving at a different pace with the Penguin Archive and the Geology Museum (see separate blog posts on export work). In Earth Sciences, student effort is being used to move data from spreadsheet format into an online (Drupal) database. Some of the issues arising are in formats used, including free text, and the need to restrict terminology. There is a huge amount of currently unstructured data. Review and export piloting will carry on for another 2-3 weeks. For the Penguin Archive, wok has focused on trying to add authority terms but this has proved extremely labour-intensive, more so than anticipated. Good authority data is needed for good Linked Data, and this needs to be taken into account when initially cataloguing collections. Legacy data without authority data continue to pose problems.

The entity extraction process poses the question ‘can we identify things in textual descriptions, linking unlinked data? Some online services can analyse text, eg DBPedia Spotlight entity detection and disambiguation services for constructing bespoke Named Entity Recognition solutions. The Women’s Library at London Met deals effectively with disambiguation of names. We are interested in parsing text through the process to see how useful and accurate it can be. Entity extraction is at the experimental end of our work, but Linked Data to an authority source and processes around this are of interest.

Export implementation should be complete by end of May.

Identifiers and Linking work package: Geology have thus far been creating internal links within their data. Work to link this data to other datasets has not yet started. Work on the Penguin Archive to date has highlighted a problem around the stability of URIs; sustainability is an issue for the future. For example, a person identified in the Penguin Archive data could have a unique ID in CALM but that identifier could easily break if CALM’s internal ID scheme is changed by the vendor. Alternative ID schemes that rely on the person’s name or their biographical dates also pose similar problems if, say, the person changes their name or their commonly accepted biographical dates change. We need persistent IDs (eg DOIs) in combination with a resolver service to map from persistent IDs to appropriate internal current IDs (eg CALM IDs).

On microdata, for Geology we’re looking to embed data coming out of the catalogue in the public site so that big search engines can find structured data. We’ll be looking at metadata as RDFa within the HTML of the public site.

Sustainability of the tools and workflow developed during the project is important. The key is in developing a set of processes and tools that are easy to use in terms of the export process and publication of Linked Data, so that archivists might routinely use them. Questions arise about what is most useful for the long-term, what is transferable.

The demonstrator workpackages will begin at the end of May; evaluation and dissemination work packages will be discussed at the May Advisory Group meeting.