Terminology report (phase 1)

This report is compiled following the activities of the Linked Conservation Data (LCD) consortium on conservation terminology during phase 1 of the project (2019). These included the terminology workshop (6,7 June 2019, Stanford University’s Conservation Services, Redwood City), reviewing work undertaken by consortium members on vocabularies in use in conservation and discussions during video-conferencing calls.

Workshop

The detailed schedule and recording of the terminology workshop can be found here: Terminology workshop. The purpose of the workshop was to begin mapping the landscape of terminology in conservation and articulate the steps that the consortium would need to take to allow sharing conservation records which are produced using different vocabularies.
A list of participants can be found at the Appendix. Most consortium members attended as well as non-members from the San Francisco Museum of Modern Art, Harvard Libraries, a conservator in private practice, and adjunct faculty. Sessions were recorded.

Pre-workshop survey

In preparation for the workshop, a questionnaire was circulated to the wider conservation community requesting information on the vocabularies professionals use to produce their conservation documentation. We had a reasonable response to the questionnaire which allowed the compilation of a list of conservation vocabularies as shown on this website. This list was circulated to the participants in advance of the workshop with requests for reviewing (see Sessions).

Sessions

The opening session of the afternoon of June the 6th served as an introduction to how vocabularies are created and used in Linked Data and how vocabularies fit in the general picture of integrating datasets in conservation. An introduction to the Simple Knowledge Organisation System (SKOS) (https://www.w3.org/TR/skos-primer/) was given as a core technology in the field. This was followed by an introduction to the Conceptual Reference Model (CIDOC-CRM) (http://www.cidoc-crm.org/) published by CIDOC (http://network.icom.museum/cidoc/) as the defacto standard for integrating cultural heritage records.
The morning session of the 7th included in-depth discussion on options to consider in further developing/reflecting capacity for the conservation terminology landscape. Prof. Marcia Zeng outlined different ways of connecting diverse vocabularies to allow integration of records.
The afternoon of the 7th was devoted to break-out sessions where members of the consortium shared observations about the conservation vocabularies they had reviewed in advance of the workshop. Participants looked at the coverage of the vocabularies, i.e. the areas of the domain that are well-represented in the vocabularies. They also considered the structure of the vocabularies in relation to machine-friendly formats, i.e. whether the vocabulary data are structured or simply typeset text.
Our concluding thoughts from the workshop were further refined during consultations with consortium partners and domain experts.

Main decisions and findings

We summarise here the decisions and findings about conservation terminology in phase 1 of the project.

Scale

The workflows and guidelines provided for LCD should work equally well for thesauri of several thousands of concepts/terms and for short word-lists compiled by individual conservators in a day-to-day studio work scenario. Scale does not change the type of work required, only the amount to be undertaken.

Arts and Architecture Thesaurus (AAT)

The AAT (http://www.getty.edu/research/tools/vocabularies/aat/), part of the Getty Vocabularies programme, offers extensive, but not complete, coverage of conservation terminology. We decided that the AAT will be a core thesaurus for the project. We will encourage submission of terms from individual thesauri to the AAT so we will slowly improve conservation coverage on the AAT in parallel to individual thesauri being maintained. This can also be done at a small scale. For example, if a conservator uses a term in their everyday documentation work and that term does not appear in AAT, then they are encouraged to submit it.
The Getty Vocabularies staff are happy to expand the AAT to incorporate more Conservation Terminology. The process for submitting terms to AAT has been simplified in key ways within the recent past including: streamlined submission process, simplified contributor agreements and use of the OpenRefine (https://openrefine.org/) tool for checking whether terms to be submitted are already in place.

Publishing formats

The current state of conservation glossaries/vocabularies is diverse in terms of readiness to be used in Linked Data using SKOS. Few vocabularies are published as Linked Data or offer unique identifiers for the concepts/terms that they include. Some are published as structured or semi-structured data which will require relatively limited work to turn them into Linked Data. Some are only available in unstructured text format or published in print and would require significant effort to share as Linked Data. The vocabularies page on the website (https://www.ligatus.org.uk/lcd/controlled-vocabularies) includes a more detailed assessment on the state of each vocabulary. An estimate of the amount of effort needed to publish these vocabularies as SKOS Linked Data is also included in the same page.
As part of the effort in phase 1, a preliminary workflow has been produced which allows vocabulary maintainers to identify the required steps that they need to take in order to make their vocabularies available as Linked Data (see How to publish as SKOS).

Combining vocabularies

With Prof. Zeng’s guidance and discussions with experts in the field of knowledge organisation we considered how we will manage the multitude of conservation vocabularies. Conservation records are produced in different institutions using different terminology. Searching across different records often requires an understanding that a term in one record and a different term in another point to the same concept. Therefore we consider aligning concepts/terms and vocabularies (reconciliation) an essential task for this project. SKOS provides relevant tools for expressing how concepts are aligned. We have decided that we will accommodate alignment data between any two vocabularies. However given the unique position of the AAT in the field we encourage alignment with the AAT in addition to any other direct alignment.
We are also considering how conservation vocabularies should be used alongside the CIDOC-CRM in the future, i.e. to provide lists of types for CRM entities. This is practical for thesauri which are arranged hierarchically with consistent hierarchies in terms of the type of thing that they describe. This is often called an IsA hierarchy (see for example here: https://en.wikipedia.org/wiki/Is-a). A thesaurus which is built to utilise the CIDOC-CRM entities as starting points for domain specific vocabularies to attach to, is the Backbone Thesaurus (BBT) (https://www.backbonethesaurus.eu/). We therefore encourage the link with the BBT when this is possible.
The decision making process for matching vocabulary terms to the AAT, BBT and any other thesaurus is outlined in the diagram: How to align vocabularies.

Long-term LCD data

LCD will produce and maintain vocabulary alignment data. This includes any statements using the SKOS equivalence (close or exact match), associative (related concept) and hierarchical (broader concept) relationships across different vocabularies. This dataset will be held centrally in a repository with a distributed management team which will include members from the LCD consortium. This means that there will be no single institution responsible for maintaining the repository. The repository itself will be hosted in a free service (such as GitHub). The service is to be decided in phase 2 of the project.
This repository will then feed alignment data to a separate query/retrieval service. This service will be designed for used by conservators (i.e. outward facing). It will be hosted by a consortium member for as long as funding is available. The consortium will actively pursue further funding to maintain that service. In phase 2 of the project we will assess software for suitability in running such service. This is also summarised in LCD terminology architecture.
We recognise that a number of conservation vocabularies are not encoded in SKOS and do not have URIs. Maintainers of these vocabularies may not have the resources to host SKOS encoded datasets of the vocabularies. LCD will offer to host such vocabularies encoded in SKOS if necessary.

How to publish as SKOS

Flowchart explaining how to encode a vocabulary using SKOS

How to align vocabularies

Flowchart explaining how to decide on the strategy for aligning two vocabularies

LCD terminology architecture

Diagram explaining three stages of publishing thesauri in conservation

Terminology workshop participants

Speakers

  • John Graybeal, BioPortal, Stanford
  • Kristen St.John, Stanford Libraries
  • Eleni Tsouhoula, FORTH Hellas
  • Athanasios Velios, University of the Arts, London
  • Jonathan Ward, Getty Vocabularies
  • Marcia Zeng, Kent State University

Full list of participants

  • Richenda Brim, Head of Preservation, Stanford Libraries
  • Debra Cuoco, Paper Conservator for Special Collections, Weissman Preservation Center, Harvard Library
  • Annabel Enriquez, Specialist in Cultural Heritage Data, Getty Conservation Institute
  • John Graybeal, Technical Program Manager, Stanford Center for Biomedical Informatics Research, Stanford University
  • Martina Haidvogl, Associate Media Conservator, SFMOMA
  • Emilie van der Hoorn, Art Conservator, Van der Hoorn Art and Archival Conservation
  • Ryan Lieu, Conservation Operations Coordinator, Preservation, Stanford Libraries, Stanford University
  • Jennifer Marchant, Fitzwilliam Museum
  • Joseph Padfield, The National Gallery
  • Kristen St.John, Stanford Libraries
  • Vadim Soshkin, Gallery Systems
  • Jennifer Sweeney, Adjunct Faculty, Kent State University
  • Eleni Tsouhoula, FORTH Hellas
  • Athanasios Velios, University of the Arts London
  • Karen Waldermar, Product Manager, Conservation Studio, Gallery Systems
  • Jonathan Ward, Getty Research Institute
  • Layna White, Head of Collections Information and Access, San Francisco Museum of Modern Art
  • Marcia Zeng, Kent State University