A case study / sub-project of my PhD – maybe even the centre of my PhD? – is the graphing of the Tate gallery / museum’s metadata. The metadata consists of more than 70,000 artworks and 3,500 artists. You can clone the metadata from the github repository at https://github.com/tategallery/collection.
When I write graphing, I am not referring to graphing of mathematical functions or the production of fancy bar charts, though there is a highly visual component to it. I mean the creation of a graph database. The graph data model is one of the more niche NOSQL data models and is suited to the querying of highly-related / highly-connected data. By creating a graph of cultural heritage metadata (for artworks, museum objects, etc), the hope is that either connections between ‘entities’ can be discovered, or they can be discovered more easily or more quickly than before.
While the most obvious relationship between entities in the Tate metadata is the Artist-ContributesTo->Artwork (in the Tate metadata, there can be multiple contributors to an artwork), many additional entities can be conceived and connected. Examples include medium (paper, graphite, watercolour, etc), movement (Young British Artists, Pre-Raphaelite Brotherhood, etc), subject (house, tree, man, etc) and others. Artists can be related to movements, artworks can be related to mediums (I use mediums rather than media to avoid confusing with, well, the media) and subjects.
The following is a subset of the graph showing how several artworks relate to mediums. As you can see, even this very small subset is very dense. While visual, the purpose of the research isn’t all about visualisation.
In the following image, a subset of the subject hierarchy is shown.
In the Tate metadata, the subjects are a three-level hierarchy with the top level being a generic categorisation of the subject, the middle being more specific, then the third / bottom level being the one that specifically applies to the artwork. Initially, I will connect artworks only to the third-level / most specific subjects, but for performance purposes, I will look at connecting artworks to all three levels in the subject hierarchy. This would allow for more generic searches, e.g. search for artworks containing “animals: mammals” rather than having to specify “cow”, “sheep”, “horse”, etc.