Home Content Area

Home Navigator

End Navigator


How to improve access to digitized historical texts: the IMPACT project

Aly Conteh
British Library, Digitisation Programme Manager

Library Science Talk
18.10.2011, 15:30                        17.10.2011, 15:30
Bern, Swiss National Library      Geneva, WHO

In recent years, large scale digitisation projects undertaken within cultural heritage institutions have provided access to digitised content on a scale never experienced before. While many millions of items have been made available through the World Wide Web, it represents only a small fraction of Europe's cultural heritage. At the British Library, which has one of the largest holdings in the world, less than 2% of the physical collections have been digitised.

A major focus of the recent large scale digitisation initiatives has been historical texts, primarily in the form of out-of-copyright newspapers and books. The use of advanced software tools, such as optical character recognition (OCR) engines, to translate the images of text into machine-readable text has transformed the way users interact with these types of resources. The benefits of OCR in the digitisation workflow are recognised but the challenge of dealing with historical texts adversely affects the accuracy levels in the OCR process. Different types of issues are manifested such as noise and artefacts; introduced through various production techniques, the effects of ageing on the materials to be digitised and obsolete language. These and other issues mean word accuracy rates can be as low as 50% consequentially having a severe impact on resource discovery and further processing.

Against this setting the IMPACT Project, a large-scale integrating project funded by the European Commission as part of the Seventh Framework Programme (FP7), was initiated in January 2008 with the aims of improving access to historical text, removing barriers that stand in the way of digitisation of European cultural heritage and ensure that the tools and services created within the project are sustained after the completion of the project in December 2011.

The tools and resources that have been developed as part of the IMPACT project will be presented along with the preliminary findings of applying those tools to a dataset of digitised and ground-truthed historical texts compiled by a number of European National Libraries.


For the talk in Geneva at the World Health Organization (WHO)
External guests should announce themselves at the WHO reception, Avenue Appia 20, and indicate that they are participating in Library Science Talk in the Library Meeting Room. Please register beforehand by contacting the WHO by e-mail or by phone at 022-791.35.57.

For the talks in Bern at NL
External guests should be at Hallwylstrasse 15 at 15:30. It is not necessary to register in advance, but for more information or a map, please contact Genevieve Clavel by e-mail or by phone at 031-322.89.36.

Back to overview 2011

Last updated on: 15.09.2011

End Content Area