2016 Update: A lot of work (and some stresses) behind the scenes
I see with horror that it’s been a year since we last posted something to the Visionary Cross site!
This is not due to lack of work, as there has been plenty going on behind the scenes. Rather it has been the result of a series of unfortunate circumstances (especially a series of student sick-leaves) and the nature of the work itself, which has been a little more difficult to write up in small pieces than was true in 2015.
Our main work for 2016 has been developing the infrastructure to follow up on our December 2015 posting on the Visionary Cross data model. As we mention there, the Visionary Cross project has always seen itself intellectually as a curated data set and expressed allegiance to the kind of “just-in-time” editing advocated for by Joris van Zundert and others (see Boot and van Zundert 2011 and van Zundert 2012 below). But we had not, until our December 2015 meeting in Lethbridge, really understood the implications of that understanding in terms of what we might call “the craft of edition-making.” As I mentioned in my posting on the Lethbridge meeting, we had always understood the Digital Library parts of our project as being a question of system–D-Space vs. Omeka vs. Greenstone–and not really recognised the extent to which those systems were really secondary questions to metadata and organisation: “left hand” issues in the terminology of our meeting.
Investigating data publication: OPenn Pros and Cons
The model we began to use instead was that of OPenn, the new and minimalist Digital Library/Repository published by the University of Pennsylvania library. And in fact we spent most of the Spring looking at what would be required to get our data into OPenn: working out metadata standards required and, especially, thinking through the nature of our objects and their relationship to each other.
In the course of the Spring, however, we began to realise that OPenn was a good model, but not a great solution, for the particular needs of the Visionary Cross project. In particular, we ran into two main issues that led us into looking at ways of emulating rather than joining the Pennsylvania model:
- OPenn is organised around physical repositories and can’t easily handle virtual collections;
- OPenn is a system that requires negotiation to join.
The first case is an interesting mismatch: OPenn was designed to showcase collections from repositories. The unstated assumption behind this is, firstly, that the poster is the owner of the collection and is able to speak for that repository; and, secondly, that the collection is best organised by repository.
In our case, however, we are researchers rather than repository owners, and our material is both single objects from external repositories and something that gains meaning from their cross-repository relationships with each other. We felt quite reluctant to propose repositories to OPenn (and behave as the owners of these) for external parties like the Cathedral library in Vercelli or the Ruthwell and Bewcastle churches, especially when we needed to establish these repositories to hold single objects we were using in the context of a cross-repository collection. Let’s say, for example, we’d also been using a page from a manuscript in the British Library–a repository not represented in OPenn at the moment: would we then establish the British Library node on their behalf?
This then leads us to the second point: the degree to which OPenn requires negotiation to join. Our vision for the Visionary Cross project is for a dataverse of objects that can be used and added to by anybody. Placing something in OPenn requires the agreement of OPenn and, potentially, the physical repository itself. To establish or add to a repository named for the British Library, for example, presumably requires the permission of the British library. And it also requires you to agree to the terms mandated by OPenn–with its very open licence. In our case, however, we are also working with material that has different levels of openness. While the data we produce is available CC-BY, some of the data we use is under much more restrictive licencing: we still need to be able to “include” this data in some way (i.e. have it listed as part of our virtual collection), without forcing a more open licence on it than its owners are prepared to give.
Catch and release: Zenodo? Github? Some other system or combination?
Our work in the late Spring and Summer, therefore, involved investigating other ways of collecting and publishing data for our project. Our requirements during this stage were:
- The system should be as simple (and as much as possible compatible with) OPenn;
- Participants should be able to use, contribute, and organise to the collection without negotation;
- The system should be open to multiple virtual organisations (i.e. not repository-based);
- The system should be agnostic as to licencing, data formats, and so on (while recognising that some contributions may not be eligible for participation in ).
(We wrote up a version of this as an abstract for the Digital Scholarly Editions workshop in Graz: you can see the details here).
In the end, we decided that the real solution to this problem was to have no system at all. To instead focus entirely on making data as discoverable and well-documented as possible, but to avoid requiring others to join our system in order to participate or contribute to the collection.
In the course of the summer, therefore, we began searching for systems that would provide long-term, non-negotiated accessibility to our data and metadata and discoverability standards that would support non-negotiated access and reuse. After investigating several options, including the University of Lethbridge’s Institutional Repository, Figshare, arXiv, and Github, we decided to go with Zenodo, a repository hosted by CERN for the European Union and dedicated to the open distribution of scientific data. We have recently established a Zenodo “Community” for our data (https://zenodo.org/communities/visionarycrossproject/), and expect to start publishing our first datasets early in the new year.