The archiving process is comprised of two elements. The first, the Ingest system, prepares the data for archiving and ensures that the accompanying descriptive information (metadata) is also available. The second element is the archiving system itself (repository) in which the digital publications are stored with their metadata. It is essential that the metadata also be listed in a catalogue accessible to users.
The Swiss National Library (NL) has set up its digital repository according to the Reference Model for an Open Archival Information System (OAIS) conceived by the Consultative Committee for Space Data Systems.
The process which handles the data from the producer or the source of data available on the Internet to the actual deposit in the repository is called Ingest.
Harvesting refers to the collecting of web sites off the Internet. When a site is harvested, special programmes, in general operating on a server, ensure that all the links emanating from a page are followed and that all the files included in the collection domain are downloaded.
A reliable quality assurance of web sites can only be achieved using a technical instrument capable of analysing in detail and signalling errors in the documents collected from the Internet. The International Internet Preservation Consortium (IIPC) is in the process of developing such an instrument. Until this instrument is available, quality assurance must be done manually and therefore is rather rudimentary.
The aim of quality assurance is not to control the quality of the web site itself but rather to control the quality of the collecting process.
At the moment, the NL stores the web sites collected in a closed web environment where it carries out systematic manual controls.
The arrival of a new generation of harvesters for the collection of web sites along with the introduction of a new and improved data format Web Archive (WARC) specially developed for the preservation of web sites will greatly improve the process of quality assurance. The tools, which are being developed at the moment, are based on the WARC format.
The process of quality assurance is described in the information sheet Archiving of Web Archive Switzerland.
As there was no need to invest resources and develop its own metadata architecture, the NL uses existing XML formats. For the internal structure of the metadata, the NL uses the container METS which is maintained by the Library of Congress. MARCXML is integrated into this container for bibliographic records. MARCXML is also maintained by the Library of Congress and is compatible with MARC21, the structure of metadata in Helveticat. In the schema «Preservation Metadata» developed by the National Library of New Zealand, the non-bibliographic (technical and administrative) metadata is also integrated in the METS container.
A persistent identifier (unique identifier) must fulfil two requirements:
- be a unique identifier for each and every document in the repository
- be a stable reference to an online data source (links have proven to be highly unstable)
The NL decided to use Uniform Resource Names (URN) in the form of National Bibliography Numbers (NBN) because the URN fulfils both requirements mentioned above.
In the context of the collaboration between the Deutsche Nationalbibliothek (DNB) and the NL, the NL can use the URN resolver of the DNB to transform URNs into links.
Infrastructure for digital archiving previously consisted of two tape drives. This was replaced at the beginning of 2009 with the long-term storage system Ninive, which essentially consists of a redundant NAS (Network Attached Storage) system from NetWork Appliance. Both system components, each with a 9 TB storage capacity, operate in two locations in Bern, about 4.5 km away from each other. Automatic synchronisation of the data between both system components ensures that the stored data available at both locations is complete. At the secondary location, a third copy of the data is stored on magnetic tape using an IBM tape drive. This third copy is stored separately.
Last updated on: 17.12.2010