RSP Logo Logo for JISC Repository Net

Metadata & Workflow

Metadata

Metadata is information about information or data about data. An institutional repository will contain a metadata record for each item of content contained within it. The metadata can be collected at various stages during the ingest of an item into the repository.

In the context of an institutional repository metadata is needed to facilitate discovery of your repository content. Resource discovery of repository content is enabled through assigning relevant criteria to content items. Metadata:

  • helps users identify resources.
  • brings similar resources together.
  • distinguishes dissimilar resources .
  • gives location information.
  • is essential to facilitate harvesting of your repository content by external systems.
  • helps you organise your repository content and supports archiving and preservation.

A metadata scheme is a set of metadata elements designed for a specific purpose, such as describing a particular type of information resource. Repository administrators will need to define their metadata schemas at an early stage of repository installation. The schemas being used will vary depending on the types of content being stored, and an institutional repository manager is likely to need to define schemas for everything from the relatively simply text based materials being received through to more complex multimedia objects. When defining your schema it is important to consider local needs such as departmental and research structures and any local decisions needed about subject fields etc. Most people will also need to extend their schemas to new types of materials as the repository grows. Realistically and in the first instance most people will work with the metadata scheme which comes with the out of the box installation of their chosen software. However, extra fields can be added and schemas customised over time. More information on metadata formats.

Subject classification

The use of a defined subject classification scheme in an institutional repositories is optional and an interesting debate has emerged as to the value of doing so. Some argue why bother spending time on classifying content within repositories when the full text of the items being included will be indexed? Some even say why bother assigning free text keywords in your metadata when the indexing will do the job for you automatically. Alternatively those who come from a stronger library background may argue that the use of an official classification scheme will improve subject discovery of content, in particular offering better ways of browsing items within the repository. The choice of whether to use an official scheme lies with the insitution itself and will largely depend on resources available to spend time inputting metadata and the level of mediation planned in the content ingest workflow. At root, how many academic are going to have the time or inclination to use an official classification scheme? and is it a valuable use of the repository administrator's time to classify all incoming content?

In order to map out the current use of subject classification in repositories the RSP conducted a piece of quick informal research using the OpenDOAR directory. The research looked at approximately 60 sites and the key findings were as follows:

  • 19 repositories use some kind of official classification scheme for subject discovery of their content. These were exclusively EPrint sites. Of these 19 sites, most (14) used Library of Congress Subject Classisfication but there were also JACS, JEL and what looked like custom schemes in evidence.
  • 9 DSpace sites offered a specific subject browse but these appeared to be an index built automatically from the free text keywords in the metadata records. No DSpace sites used a formal classification scheme.
  • 31 sites did not offer any subject browse other than a Department/Faculty/School browse option - although quite a few sites called this browse option a subject browse just to confuse!
  • 16 sites showed no evidence of using free text keywords in their metadata. 7 of these 16 offered a full classification scheme while 9 appeared to offer neither keyword or subject headings in their metadata.

Workflow

Workflows are a break down of the administrative tasks needed within a repository. They allow the various activities involved in the running of the repository to be assigned to the individuals or groups who best able to deal with them. A submission workflow defines the steps involved in adding content to the repository, gathering the necessary metadata, permissions and files associated with the content, and doing all the necessary checks on these elements before making the item available to the wider world.

Spend some time thinking about your submission workflows at an early stage in your repository development, this task often works well alongside defining your metadata schema . It is worthwhile having thought about and discussed submission workflows prior to the point when you come to configure and customise your repository software as you will need your input forms to reflect the choices you have made. Once you have draft workflows in place we recommend you test them with a group of your users and remain flexible to allow your workflows to adapt over time.