Modelling report (phase 1)

This report is still under development

Introduction

Linked data is a set of technologies and practices which allow information to be accessible across the Semantic Web. It is based on the Resource Description Framework (RDF) which encodes information in triples (subject → predicate → object). For example, the description of the cover of a book would be encoded as:

Cover (subject) → made of (predicate) → leather (object)

Converting database records into triples requires the careful choice use subjects, predicates and objects so that meaning is preserved. This process is called modeling and it is often facilitated by the use of ontologies which provide predicates and define types of subjects and objects,

For this project where Conservation Documentation is being modeled, we are adopting the CIDOC-CRM ontology. CIDOC is ICOM’s International Committee for Documentation and CRM stands for Conceptual Reference Model which has been developed over many years to assist with encoding cultural heritage information.

Consortium Workshop

This report reflects discussions and work done over the course of a two day workshop at the University of the Arts, London in September 2019. Thirty participants from the UK, US, France, Italy, Greece, and Norway gathered to investigate the principles of modeling based on the CIDOC-CRM ontology and how to use these in describing conservation documentation and data. Prior to the workshop, several participants had shared conservation documentation of various kinds to use as examples.

Each day started with lectures about the CRM or examples of its use in cultural heritage projects. Afternoons were spent in four break-out groups led by members of the CRM Special Interest Group (SIG). Participants gained hands-on experience using the CRM to model the submitted examples, exploring the limitations of the model, and evaluating the effort needed to model our complex data.

Structure and CIDOC-CRM

Once data is modelled and encoded into triples, one can build relationships between objects in a structured way: a knowledge graph, continuing to add more information to the graph and adding complexity to the information that can be conveyed.
The CRM offers a further degree of utility by relationships that can describe cultural heritage objects, events, and actors. The CRM does this through defining Entities and Properties. An Entity (subject) has a Property (predicate) that describes or modifies an Entity (object). Using the example above, one would model the sentence in the CRM as:

Cover (E24 Human-Made Object) → made of (P46 is composed of) → leather (E54 Material)

In combination with specific definitions for each term, complicated information and relationships can be built into machine readable and searchable forms.

Theoretical Application to Conservation Documentation

Conservators, conservation scientists and other conservation professionals generate many kinds of documentation among them: treatment reports, condition assessments, collection surveys, scientific analysis, etc. This information can be generated in checkbox forms, databases, spectra and other analytical output, narrative reports, images, diagrams, and spreadsheets. All provide information on the material origin and state of collections as well as proposed and completed actions taken on and for these collections.

A conservation treatment report is one common type of documentation that can be used as an illustration. Treatment reports are often divided up into sections: description, condition, proposal and work done. Other sections such as testing, collations, diagrams, etc may be included as well. The main sections reflect specific questions about an object or group of objects:

  • How something is made
  • What is the condition of something
  • What are the plans for treating something
  • How something has been conserved

Practical Options for Modeling Conservation Documentation

As we looked at modeling conservation documentation using the CIDOC-CRM ontology, it became clear that the richness of conservation data can be overwhelming. In some cases, it should be considered whether a reduced amount of information will still provide access to and discovery of conservation documentation. Many teams ended up considering different levels of modeling for conservation data. Here is an outline of potential levels of consideration to explore in the future.

Level I: A pointer is created to an online document containing conservation records but not in a machine readable format. This would include only basic information to indicate that conservation activity or assessment has taken place and direct users to fuller resources. This could consist of a very simple RDF statement about the type of documentation extant ideally with connection to other metadata about the item itself. For libraries who use marc records - this could be utilizing a 583 field for a note, URL or URI.

Potential use cases:

  • Legacy documentation where retrospective modeling is not feasible.
  • Placeholder while fuller system is developed.
  • Situations where data is not open to the general public, but could be made available to internal staff, peer institutions or vetted researchers.

Level II: Selected data from documentation is modeled to enable discovery and basic querying without having to do extensive modeling of non-structured data. The selected data could pull from structured data (such as bibliographic or cataloging info) as well as a limited range of other information that is straightforward to model while being high-value information for limited groups/functions.
Potential use cases:

  • Treatment reports that include bib data, limited item description, basic categories of treatment type, and materials information (with information on how to access full data).
  • Semi-structured data such as assessments with extensive notes fields
  • Other types of conservation/preservation documentation with data that cannot yet be modeled such as risk assessments, potential action, etc

Level III: All data generated is modeled to the CIDOC-CRM ontology with full terminology references.
Potential use cases:

  • Documentation consisting of structured data. This could be information in databases, scientific data, or form data.
  • Documentation created in a CMS or other system that supports linked data.

Modelling issues

Treatment assessment

Part of the reasoning for treatments in conservation is the fact that previous treatments fail. It would be useful to be able to model this, so that we can assess which treatments are more successful based on wider samples. For example, the Ottoman Bird Stool was treated with water which caused swelling and was a failed treatment, followed by white spirit which was successful.

Preservation condition

The reference to Manuscript 1 from the Ligatus survey form is indicative. This is a commonly occurring issue.

Relative position

elative position of things means how components of a thing are arranged in relation to each-other. This may be the old example of the motorbike in spare parts and the motorbike assembled being two different things, but also simpler cases like a painting hanging “between” two other paintings (i.e. the exact position of each is irrelevant).

Expressing orientation

This issue came up when looking at the documentation for Blikk. It is a recurring issue for complex modern and contemporary art installations. In particular it is not clear how to describe the direction that an object points to, i.e. orientation of a physical thing.

Recording absence

It is important to document that an object does NOT consist of a material or a component, or does not have a characteristic. For example: a book cover without tooled decoration.