The AHRC-funded Global Digitised Dataset Network aims to develop a new collaboration between organisations with an interest in supporting the discovery of digitised materials. The network is led by the University of Glasgow, in collaboration with HathiTrust, the British Library, the National Library of Scotland, the National Library of Wales, and Research Libraries UK.

Aims and Objectives

This network seeks to build a new collaboration, led by the University of Glasgow, bringing together leading UK libraries with large digitisation programmes (the British Library, National Library of Wales, and National Library of Scotland), with HathiTrust and other US partners to contribute to a single dataset of digitised texts. Other partners, such as Research Libraries UK (RLUK) will ensure that this work is represented in the wider library sector.

The aims of the proposal are:

  • To seek to answer the question of whether it is feasible and worthwhile to create a global dataset of digitised texts for digital scholars, libraries and readers;
  • To develop a stronger understanding of the impact that a global dataset of digitised texts could have;
  • To investigate models for developing a sustainable global dataset, increasing the impact from the UK and the US towards a global network.

We will achieve this through a series of objectives:

  • Undertaking an analysis of digital holdings data by taking major UK libraries through HathiTrust’s data collection process.
  • Documenting the benefits a global dataset could bring, through a facilitated workshop of academic, non-academic and library audiences.
  • Developing a prototype dataset of digitised texts from the US (HathiTrust) and the UK (project partner libraries).
  • Meeting with project partners and other organisations to investigate options and opportunities to develop a sustainable global dataset and related services.

Digital scholarship relies on access to digital sources, but locating these sources and particularly large corpora of open texts is often difficult. Many research libraries worldwide are undertaking mass digitisation programmes at a growing scale, but the difficulty of discovering digitised texts, including but extending beyond the millions of items digitised by national libraries and mass digitisation programmes, often means that these efforts do not have the impact they could and should.

When searching for digitised texts, either a large corpus for digital scholarship or a single text for in-depth study, there exists no global dataset of all digitised texts. If the searcher is lucky, the text is found. If they are unsuccessful, they need to stop their search at some point, having spent a fruitless period of time in their search. A global dataset describing digitised texts would address this issue, and bring three important benefits to researchers, digital scholars, and those undertaking digitisation:

  • Digital scholars seeking large corpora of texts could easily search and compile links to items across many sources, creating new or bespoke collections;
  • Readers wishing to find a digitised text would be able to search quickly and efficiently across all potential sources;
  • Libraries undertaking digitisation programmes would be able to discover already digitised texts, and thereby make their own digitisation efforts more efficient by avoiding duplication.