It was recognised from the beginning that institutional repositories would be far more valuable if they formed part of a wider landscape including other IRs, subject repositories and archives, together with other services such as registries of repositories and policies (e.g. ROAR, OpenDOAR, RoMEO) and search and discovery tools. Ensuring that metadata could be harvested through OAI-PMH was fundamental.
Embedded repositories, with their roles in helping to manage research assessment submissions and assist researchers to fulfil the requirements of funder mandates, inhabit a complex research and publishing ecosystem which extends beyond the boundaries of the institution.
Increasingly services have made APIs available which can provide valuable information to repositories, such as bibliographic data from Web of Science and Scopus, and copyright and funders’ policy information from RoMEO and JULIET.
While some of these linkages are relatively unproblematic to implement (albeit sometimes at the price of subscriptions to resources which the institution may otherwise not have required), others are proving very complex. Even the more straightforward operations can require care in matching and reviewing imported data and de-duplication, which can be very time-consuming. Bulk import of data is discussed in Maximising deposit through embedding workflows section and metadata creation and flows section.
APIs may be implemented to work through a CRIS or directly with the repository itself.
An example: RoMEO
Since concerns over copyright are recognised to be one of the key barriers to greater levels of self-archiving, there has been significant work in integrating information about copyright status. The API provided by RoMEO allows information on publishers’ self-archiving policies to be used in the permissions/rights metadata for records in the repository itself or via a CRIS.
As John Fearns of Symplectic has said:
“Anything that allows clarification of publisher policy by a machine (in the context of a particular publication) will help greatly. Librarians are increasingly asking for RoMEO data to be passed through into digital repositories.”
All the major research publication management systems/CRIS systems have implemented ways of bringing RoMEO information into the record for the publication, so that researchers and administrators can see the copyright position for their particular paper as they make the decision to deposit the full text. Screenshots of the results can be seen in the various presentations by PURE, Symplectic and Converis here.
The experience of St.Andrew’s with the API as implemented through PURE does show that the process is not without pitfalls and also that there can be improvements in the user interface.
“Clearance of full text for release into the repository is done in PURE, embargoes are set in PURE and RoMEO is used as part of the PURE workflow. The API produces no results for some publishers e.g. Elsevier, Taylor and Francis because of the absence of ISSN in publication metadata in PURE. PURE only calls the API for the article publication type, not for other publication types. We cannot currently automatically use the publishers prescribed rights statement to populate a rights field in PURE and it can be quicker to use RoMEO independently. We still maintain a local knowledge base, i.e. print outs of particular publisher policies. Our wish list is:
- Disambiguation in journal title hits retrieved by the API
- Interactive data i.e. live links. Currently the data in PURE is flat and has no hyperlinks
- Automatic population of rights statements
- Keep a local knowledge base within PURE
- Clever evaluation of publication dates to calculate embargo period
- Something simple for end users and depositors, especially in startup advocacy phases, but with the option to have something more complex and sophisticated for administrators (and perhaps for more mature repositories, with more informed depositors)
- Can we send any useful data back to RoMEO? e.g. recent dialogues with publishers?”
Clearly, some of the problems occur because of limitations in RoMEO itself (dealing with often ambiguous publisher policy statements is not straightforward) and some because of the interaction between RoMEO and the research publications management system/CRISs. Solutions may be imminent in the next release of the API which will deliver journal-level results. The 3.6 website version now delivers this.
Approaches to sharing resources and data
Import of bibliographic metadata from other repositories such as ArXiV, PubMed and UK PubMed and from Web of Science and Scopus is discussed in the Metadata creation and flows section and experiences of doing so are also included in the final report of the IncReASe project. There are plug-ins for EPrints for PubMedID and PubMedXML import.
Funding body mandates and disciplinary preferences in some cases (such as ArXiv) mean that researchers may be required or want to deposit in repositories other than their own institutional one. Rather than impose extra burdens on the researcher and attempt to change their behaviour, repositories recognise that it is better to harvest metadata from the other repositories.
There is a lot of interest in helping researchers to fulfil funding body mandates, but this is mainly at the level of being able to remind researchers through associating outputs with project funding data. There don’t seem to yet be implementations which allow researchers to perform deposits in more than one repository at the same time. The Repository Junction project’s aim is precisely to assist open access deposit into, and interoperability between, existing repository services, by developing a deposit broker system. For example, one deposit into UKPubMedCentral will populate the institutional repositories of all the authors on a collaborative project. See Open Access Repository Junction.
The White Rose repository held discussions with the ESRC and supported its efforts to make deposit easier by use of the SWORD protocol, which should allow any SWORD compliant repository to deposit ESRC funded outputs into the ESRC repository, either by pushing metadata and files from a local repository into the ESRC repository or through harvesting. See here.
Besides the normal exposure for harvesting through OAI-PMH, OAI-ORE may also be used, which allows repositories to share aggregations of Web resources, referred to as compound digital objects. For a discussion of OAI-ORE and its relevance to repositories, see here.
There is an increasing interest in Linked Data throughout and beyond the library world. Exporting data on repository holdings as Linked Data offers repositories an opportunity to integrate with a greater universe of information resources as part of the Semantic Web. Work has been done by EPrints on this as part of the JISC dotAC project and there has also been work by DSpace. A UKOLN presentation suggests that Linked Data could be a way for repositories which are currently ‘hidden’ to become truly “of the Web”
Resources on interoperability
The aim of the RIO project is “to collect, analyse and review information about the opportunities for interoperability between repositories and other systems and services”. It will form a “collection of guidelines, briefing documents and other resources housed in an EPrints-based repository. This repository, RIO (Repository Interoperability Opportunities) will:
- capture information about a variety of services (basic service metadata e.g. title, URL, description, classification)
- capture objective information about the technical interoperability issues (e.g. metadata schemas, APIs) and legal concerns (e.g. licensing, copyright)
- collect subjective community judgments about the risks and benefits associated with working with each service
- identify exemplars of interoperability
- identify sources of expertise