According to a survey of repository managers undertaken as part of the JISC Names project in July 2010, over three-quarters of respondents had encountered problems relating to identification of authors, including issues around variant forms of names or materials being wrongly attributed to authors with similar names.
Names and email addresses (normally as part of registration processes) are generally the most required metadata elements in submission of materials to a repository. But less than 20% of repository managers said that they were able to control or disambiguate all names, showing that this is a significant issue. Fewer than half of respondents (38.5%) said that unique identifiers (mainly email addresses, staff or student number) are assigned to authors or contributors upon registration or submission of materials to the repository.
Cross-referencing or merging names (to deal with changes of name or different forms of entry) was not enabled by the software being used in nearly half of repositories.
This raises considerable difficulties for users searching in a repository or across repositories – they may have to search on variants of an author’s name or they may find works being retrieved from different authors sharing the same name. The names issue can also cause particular problems when checking data for submission to research evaluations. Embedding the repository with links to other institutional and external systems, has the potential to contribute to solving the problem, though this does not occur automatically.
For example the University of Glasgow has created a Glasgow author authority listing, based on user records imported from its data vault. Users now log in with their Glasgow Unique Identifier (GUID). Regardless of the cited form of the author’s name, or if they have published under other names e.g. maiden name, the publication can be associated with the author name in the authority file (through the addition of the GUID field in the author record) and can be browsed to using that list.
External sources of names data
The availability of external data on authors which can be imported via APIs is discussed in the section on Metadata Creation and Flows. Although access to these is not dependent on any embedding process, it would seem (according to the survey), that most repositories (61.5%) do not import records from sources such as Thomson-Reuters’ Web of Science. Cost could be the main barrier, though the survey did not ask about this. Of those that do, many do not disambiguate names before ingest or match them with pre-existing identities on import. Staff resource may be a major factor here.
Shared services and identification projects
The Names project was tasked with investigating the possibility of a national shared resource for name disambiguation through establishing a name authority service. This would work by aggregating names data, cleaning it and augmenting it with institutional affiliation information and assigning a number to each author as a unique identifier. The basic records for the pilot service have mainly come from Zetoc. Researchers will be able to register and edit their own information. A prototype exists but it is necessary to identify a viable business model if a sustainable service is to be established.
The project has recently completed an addition to the records, using the cleaned-up RAE 2008 data from the MERIT project (which was designed to showcase the best UK research) to populate the system with approximately 45,000 names of UK researchers. They are looking to expand and improve that data and have used information supplied by Robert Gordon University from their repository as a test case. They are also looking for other universities to take part. They are also developing a plug-in for the EPrints repository software that will automatically query the Names data as part of the data-entry process.
There are a number of other initiatives in this area as well, including:
- Open Researcher and Contributor ID (ORCID) This is a community effort to establish an open, independent registry and to resolve systemic name ambiguity, by assigning unique identifiers linkable to an individual’s research output. Participants are, in the main, universities but there are also publishers, learned societies, and other organisations involved.
- International Standard Name Identifier (ISNI), which is a broader initiative to create an ISO Standard for unique Public Identities of parties across media and content areas, “throughout the chain of creation, production, management and distribution of intellectual or artistic contents. It will identify the Public Identities of Parties such as authors, composers, cartographers, performers, or publishers”. In this case, a party can be a person, a legal entity or a fictional character.