Home Content Area

Home Navigator

End Navigator


The Ingest process

The Ingest process takes over publications intended for the archives from their producer and reviews, prepares and, finally, files them away along with the relevant metadata. In addition, access systems such as the catalogue of the Swiss National Library (NL) are updated with information concerning publications newly received by the archives.
The Ingest process comprises a whole series of individual automatic steps:

1. Data acceptance

The Ingest process regularly controls the interfaces with producers and integrates the data and metadata in the process.
In essence, new publications can either be reported by way of a web form or filed directly as a ZIP file via WebDAV on a server at the Federal Office of Information Technology, Systems and Telecommunication (FOITT). In addition, the NL can use OAI-PMH (Open Archives Initiative - Protocol for Metadata Harvesting) to search for new dissertations in the catalogues of university libraries, and receive the metadata thus.

2. Unpack/Decompress data

In the event that data has been delivered as ZIP files, it needs to be decompressed.

3. Convert metadata

The delivery format of metadata is agreed upon with the data producer. Normally the metadata is taken in the form it presents itself in the producer's preprint process.
Generally speaking, they are XML or SGML files that are transformed via XSL in the format of internal data at the NL (see Wikipedia for explanations of technical terms). This internal format is constituted of a METS container which contains bibliographic metadata in MARCXML and uses the PRESMET (Preservation Metadata) format of the National Library of New Zealand for technical and administrative metadata.

4. Import electronic publications

Very often, at the moment of reception of data which constitutes the first step of the Ingest process, only metadata is delivered. This means that the Ingest system itself must go and get the online publication to which the metadata has indicated a link. This step of the process allows the harvesting of individual PDF documents as well as entire web sites.

5. Quality assurance

Quality controls are done at different stages of the process to check for the absence of viruses; the completeness of data packages delivered, the file formats, the conformity of metadata to the schema of metadata agreed upon and the double delivery of data packages. JHOVE is used to monitor file formats.
The guarantee of rights is another inherent aspect of quality control. On one hand, these rights relate to the right to archive the digital publication in question; to make as many copies of the publication that the NL deems necessary and to apply the necessary measures of conservation such as migrations. On the other hand, this step of the process includes the negotiation of access rights with the producers; these access rights can range from totally banning access to the document for years to come to the free access to the digital publication. The validity of the digital signature on each individual document is reviewed and documented as part of the quality control process carried out by the Swiss Official Gazette of Commerce (SOGC).

6. Version management

Just like a library catalogue Ingest lists the hierarchal relationships between archive packages. For serial publications for example, there is an archive package for the title of the publication and other packages for each issue.
For the web sites that are harvested periodically, a record for the title is also created and every snapshot of the web site will be attributed to this record.

7. Persistent Identifier / URN

Each package of archives receives a unique identifier in the Uniform Resource Name of the type of National Bibliography Number (URN NBN). This identifier is communicated by mail with an XML attachment to the resolver server of the Deutsche Nationalbibliothek (DNB) with accompanying URLs (Uniform Resource Locator). The Ingest system doesn't instigate the next steps of the process until the URN that has been submitted can be consulted on the resolver server.
For further information on URN see:
The NL attributes URNs on the basis of a numerus currens. For archive packages that can not be made available on the Internet, it uses an intern identifier that has the same structure as an URN.
The complete URN is for example urn:nbn:ch:bel-2567, where the part bel-2567 is used as an internal identifier. This way, it is easy to allocate an URN to an archive package which will subsequently be made available to the public. The only requirement is to precede the internal identifier by "urn:nbn:ch" and to proceed with the deposit on the DNB resolver server.

8. Checks, completeness

The final control is an automated check to determine if all the information generated in the course of the process is included in the metadata and to determine its comprehensiveness.

9. Feed into systems

This step of the process supplies information to the systems which are closely linked to Ingest. The complete metadata in XML format is deposited in the Data Management. The Archival Information Package which contains the digital publication itself along with the metadata is integrated as a tarball in the long-term memory. The bibliographic metadata is transformed from MARCXML to MARC21 and is transmitted in this form to Helveticat, the NL catalogue which integrates it by an automated process.

10. Cleanse

Cleaning up entails the deletion of superfluous information which has accumulated in an Archiving Information Package during the process.

Back to overview Ingest

Last updated on: 12.12.2012

End Content Area