Home Content Area

Home Navigator

End Navigator

Print     


FAQs on web archiving

General

WHY IS THE SWISS NATIONAL LIBRARY ARCHIVING THE WEB?
The Swiss National Library's mandate is to collect and archive cultural heritage and make it accessible. That includes preserving digital knowledge for the future. Together with the Swiss cantonal libraries and other special libraries, the Swiss National Library is pursuing the aim of preserving parts of the Swiss Internet in the form of Web Archive Switzerland

WHAT STRATEGIES IS THE SWISS NATIONAL LIBRARY USING FOR WEB ARCHIVING?
The Swiss National Library has decided to take a selective approach. The focus is on freely accessible websites of patrimonial importance that have a close connection to Switzerland: for example, websites on the cantons and communes and on certain specialist areas, such as the social sciences and Swiss literature. The selection is primarily made by the Swiss cantonal libraries and other special libraries. Special collections are created to cover particular events that take place in Switzerland: websites that document an occasion such as the federal elections of 2011 are collected and archived. The idea is that the selected websites should provide the most meaningful snapshots possible of the Swiss web and preserve them for posterity. The selection criteria are described in detail in the "Collect" information sheet (in German). The National Library is not currently engaging in domain harvesting (top-level domain .ch).
IS THE SWISS NATIONAL LIBRARY PERMITTED TO ARCHIVE MY WEBSITE?
The Swiss National Library has the legal mandate to collect, list, and make available information on and about Switzerland, in print form or in electronic publications such as e-books and websites, and preserve it for the long term. The National Library informs website operators by e-mail or letter that it plans to "harvest" their website. In this way, operators have the opportunity to make their views known on the planned archiving of their website.
WHAT IS THE DIFFERENCE FROM THE INTERNET ARCHIVE?
The Internet Archive is a non-profit organisation that was founded in 1996 in the US with the aim of giving researchers worldwide access to historical digital collections.  Web Archive Switzerland only collects and archives websites from the .ch domain or websites with a close connection to Switzerland.

IS THE WEB ARCHIVE EXPLOITED COMMERCIALLY?
The joint web archive of the Swiss National Library, the Swiss cantonal libraries and additional special libraries is used for historical purposes and is not commercially exploited.  Web Archive Switzerland is available to users on the premises of the National Library and those of partner libraries free of charge.

CAN USERS MISTAKE THE ARCHIVED VERSION OF MY WEBSITE FOR THE CURRENT VERSION?
Access to the websites is only possible on the premises of the Swiss National Library and its partner libraries, and not on the Internet. In addition, archive copies are clearly labelled as such. It is therefore virtually impossible for any confusion to occur.

WHO MUST I CONTACT FOR MORE INFORMATION ON WEB ARCHIVING AND WEB ARCHIVE SWITZERLAND?
The Coordination Office for Web Archive Switzerland will be happy to deal with any further questions, suggestions or criticisms.
HOW CAN I AS A WEBSITE OPERATOR SUPPORT WEB ARCHIVE SWITZERLAND?
The aim is to archive a snapshot of a website at regular intervals. Websites are not always designed in such a way that the harvester can collect them in their entirety. One key feature of a crawler-friendly website is links in HTML or XHTML format that are not embedded in Flash or JavaScript. Alternative navigation options via a text-based version or sitemap are also helpful. If you are interested in receiving further information, please contact the Coordination Office for Web Archive Switzerland.
WHAT IS THE SWISS NATIONAL LIBRARY DOING TO ENSURE THAT THE WEBSITES CAN CONTINUE TO BE CONSULTED IN FUTURE?
The long-term preservation module of the OAIS reference model, which the Swiss National Library uses as the basis for archiving of electronic publications, is still in the project phase. The project is currently drawing up measures to preserve the readability and interpretability of electronic information. For the Swiss National Library to guarantee full readability and interpretability in future, when archiving electronic publications the data storage media and system environment must also be taken into account, as well as preserving the information itself.

Collection

DOES WEB ARCHIVE SWITZERLAND CONTAIN ALL SWISS WEBSITES?
The Swiss National Library has decided to engage in selective harvesting. The challenge here is to identify characteristic, representative websites from among the large number in existence. The Web Archive Switzerland collection mainly consists of Swiss websites that are of patrimonial importance. It is the task of the partner libraries (Swiss cantonal libraries and other special libraries) to make a sensible and representative selection from what is offered by their cantons and specialist areas, on the basis of jointly drawn up collection guidelines (see the "Collect" information sheet, in German).
WHO SELECTS THE WEBSITES?
The websites are selected by Web Archive Switzerland's partner libraries, i.e. the Swiss cantonal libraries and other special libraries, on the basis of jointly drawn up collection guidelines (see the "Collect" information sheet, in German).
I AM DESIGNING A NEW WEBSITE - CAN I APPLY TO HAVE IT ARCHIVED?
Only the partner libraries (the Swiss cantonal libraries and other special libraries) can select and submit websites for the Web Archive Switzerland collection. However, you are welcome to notify the Coordination Office for Web Archive Switzerland at the Swiss National Library of the URL. The Coordination Office will forward the proposal to the partner library responsible. On the basis of the collection criteria ("Collect" information sheet), a decision will then be taken on whether the website will be included in Web Archive Switzerland.
MY WEBSITE HAS A PASSWORD-PROTECTED AREA - WILL THIS ALSO BE ARCHIVED?
Only websites that are published and made accessible freely on the Internet are collected and archived in Web Archive Switzerland. Intranets and private data to which access is protected, for example, are not archived.

Harvesting and archiving

HOW DOES WEB ARCHIVING WORK? WHAT SOFTWARE IS USED?
For its harvesting, the Swiss National Library uses the open source software Heritrix, which is the most widely used around the globe for web archiving. The open source software PhantomJS also assists the crawler in identifying all the relevant links. The Heritrix crawler follows the links within a website and harvests all the files it finds. The aim is to archive a version of the website that is as comprehensive as possible and is displayed correctly. Pages that are password-protected and links to other sites are not collected. Access to the harvested website is by means of the Wayback Machine.

IS THE HARVESTING AND ARCHIVING OF WEBSITES A SIMPLE MATTER, OR ARE THERE TECHNICAL LIMITATIONS?
Web archiving is still in its infancy. It is therefore possible that a website cannot be archived for technical reasons, even though it would meet the Web Archive Switzerland collection criteria. As the tools are further improved, the quality of the crawls will also continually improve. In addition, the Swiss National Library is a member of the IIPC (International Internet Preservation Consortium) and is therefore involved in a constant exchange with other memory institutions around the world that carry out web archiving.

WHY ARE SOME WEBSITES MORE DIFFICULT TO ARCHIVE THAN OTHERS?
Large quantities of data, missing content or menu functions, Flash animations, dynamic script-based functions and crawler traps such as calendars or maps can make archiving more difficult. Librarians decide on whether or not a website is to be included. An attempt to obtain a better quality snapshot is made at a later date. However, websites can be archived despite quality issues so that at least they are documented. If a website is incorrectly displayed, this may also be due to shortcomings in the current version of the Wayback Machine. For this reason, the latest version of the tools is used for web archiving wherever possible.
Type: PDF
Webarchiv Schweiz : Glossar, Version 1.6, 05.02.2016 (in German)
The glossary contains the professional vocabulary as well as the abbreviations used in the information sheets of Web Archive Switzerland.
Last modification: 09.02.2016 | Size: 85 kb | Type: PDF

WHY HAVE I RECEIVED AN E-MAIL/LETTER HEADED "ARCHIVING YOUR WEBSITE"?
The Swiss National Library sends an e-mail or letter to all website operators whose sites have been selected for Web Archive Switzerland. It explains the goals being pursued by Web Archive Switzerland, how websites are harvested, and whom to contact if you have questions or want more information. You do not have to reply, unless you intend to refuse to allow the archiving of your website.

HOW OFTEN DOES THE SWISS NATIONAL LIBRARY HARVEST WEBSITES?
Normally a website is harvested once a year. Other possibilities are for it to be harvested only once, every 4 years, every 2 years and every 6 months. The interval is very much dependent on the content concerned and can be set individually for each website. For special collections, the event is the key criterion, and the interval is adjusted accordingly.

DO I HAVE TO PREPARE MY WEBSITE FOR WEB ARCHIVING?
No preparation of any kind by the website operator is needed. Nor is it necessary to update the website before harvesting. Normally a selected website is archived regularly (for example once a year). This allows the changes that the website has undergone over the years to be documented.
HOW MUCH LOAD DOES THE CRAWLER PLACE ON MY SERVER?
The web crawler operated by the Swiss National Library is configured so that the server load is kept to the minimum. If technical problems nevertheless arise due to web harvesting, please contact the Coordination Office for Web Archive Switzerland.
WHY ARE ROBOTS.TXT AND ROBOTS META TAGS IGNORED BY THE CRAWLER?
If robots.txt and robots meta tags were taken into account when harvesting, there would be a risk that the website obtained would not be complete and that its layout would not be reproduced correctly. To avoid this, robots.txt and robots meta tags are ignored.
Type: PDF
Webarchiv Schweiz : Glossar, Version 1.6, 05.02.2016 (in German)
The glossary contains the professional vocabulary as well as the abbreviations used in the information sheets of Web Archive Switzerland.
Last modification: 09.02.2016 | Size: 85 kb | Type: PDF

IS A CHARGE MADE FOR ARCHIVING MY WEBSITE?
There is no charge to the website operator for archiving.

I DO NOT WANT THE SWISS NATIONAL LIBRARY TO ARCHIVE MY WEBSITE. WHAT CAN I DO?
If you are a website operator and you have been notified by the Swiss National Library that your website is going to be harvested and archived, you can contact the Coordination Office for Web Archive Switzerland and advise them of your concerns. Harvesting will then be stopped if it has already begun, or not started if it has not. However, it is important for the Swiss National Library and its partner libraries as well as for future researchers that as many websites related to Switzerland as possible are harvested and archived. This allows the historical value of the web archive to be maintained over the long term.
WHAT HAPPENS TO THE WEBSITES AFTER THE HARVESTER HAS COLLECTED THEM?
After harvesting, the quality of the collection process is checked. The Swiss National Library places the harvested websites in a closed web environment and accesses it to systematically carry out manual testing processes. If the quality is deemed to be satisfactory, the archiving process is continued. If not, harvesting is repeated using different harvester settings.
HOW ARE THE HARVESTED WEBSITES FILED AND ARCHIVED?
The ingest system prepares the data for archiving and ensures that the associated metadata are available in the Swiss National Library catalogue (Helveticat). The websites are stored together with their metadata in the long-term archive. Overwriting or deletion of the information stored in the long-term archive is not permitted. The data is secured by means of backups which are stored at two separate locations in Bern. Automatic data replication ensures that the stored data are available in their entirety at both locations. Additionally, a third copy of the data is created at the secondary location using an IBM tape drive. This copy is stored separately.
CAN THE SWISS NATIONAL LIBRARY HOST MY WEBSITE TOO?
No, website operators are still responsible for the hosting of their websites. Regular harvesting provides only snapshots of a website that are archived individually and then can be used for research purposes.

Use

WILL I FIND MY WEBSITE IN THE HELVETICAT CATALOGUE?
When a website is archived, a corresponding entry is made in the Swiss National Library's Helveticat catalogue. All archived websites are indexed and can be found in Helveticat. There is a direct link to the digital collections (e-Helvetica Access). However, no entry is made in The Swiss Book, the national bibliography of the Swiss National Library.
HOW CAN I ACCESS THE ARCHIVED WEBSITES?
Archived websites may be researched and displayed in e-Helvetica Access, the access system for the digital collections. For legal reasons, the web archive can only be accessed on the premises of the Swiss National Library and those of its partner libraries.
WHAT IS THE DIFFERENCE BETWEEN HELVETICAT AND E-HELVETICA ACCESS?
Helveticat is the online catalogue of the Swiss National Library. It lists both printed and electronic publications.

e-Helvetica Access is the access system for the digital collections of the Swiss National Library. In addition to websites, e-books, e-journals as well as university and official publications in electronic form are also listed. It is also possible to consult printed works that have subsequently been digitised.
I'M LOOKING FOR WEB SITES. HOW DO I SEARCH FOR THEM?
You can search for websites not only in the Helveticat online catalogue, but also in e-Helvetica Access, the digital collections access system.
In Helveticat you can search using an extraction code, a URL, a Dewey number or by language. A detailed explanation can be found in the website search guide. E-Helvetica Access offers a special search for websites (Web Archive). You will find more information for each search using the e-Helvetica Access help text (buoy symbol).

WHY CAN'T I VIEW THE ARCHIVED WEBSITE FROM HOME?
The availability of the websites is governed by the law. Websites are subject to copyright and can only be consulted in the public rooms of the Swiss National Library.

CAN I PRINT OUT WEBSITES?
For copyright reasons, access to the archived time slices must be restricted. For this reason, reproduction in any form - saving to media, printing out, etc. - is not permitted.

WHY IS THE SWISS NATIONAL LIBRARY COLLECTING MY WEBSITE? WHY NOT JUST USE GOOGLE?
The Swiss National Library has a legal mandate to collect electronic publications that relate to Switzerland. The aim of Web Archive Switzerland is not to provide an additional way of accessing information that is currently on your website, but to document the changes that websites undergo over the years and decades. In 2025, for example, it would provide access to your website as it looked in 2015. To this end, we regularly collect a "copy" of all the selected websites. These copies enable us to record how websites change over the years.

WHAT ARE THE ADVANTAGES FOR MY COMMUNE IF OUR WEBSITE IS ARCHIVED BY THE SWISS NATIONAL LIBRARY?
On the basis of its legal mandate, the Swiss National Library has set itself the goal of compiling a selective collection of Swiss websites that are of patrimonial importance. By allowing us to include your website in Web Archive Switzerland, you help us to build up an interesting collection that can serve as the basis for historical research. Additionally, the time slice of your website is archived long-term without any cost or workload for you. This enables changes in content and visual design to be documented over the years.
Back to overview Websites

End Content Area

Full-text search

Contacts

Swiss National Library
Coordination Office for Web Archive Switzerland
e-Helvetica
E E-Mail


http://www.nb.admin.ch/nb_professionnel/01693/01695/01705/03333/index.html?lang=en