In order to facilitate research, and so we might better understand and preserve the UK’s web history, the UK Web Archive has decided to make a number of data and API services available for general use. We also make a few example tools available, showing how the open data might be used, and these are hosted in this GitHub repository.

We hope that by making these datasets available, the broader community will find interesting ways to re-use, explore and visualise the contents of our web archive. We are keen to work with any interested parties to exploit these datasets, and understand what other derivative or summary data would be of interest.

Datasets, Tools & APIs

Open Datasets

In general, we can’t provide remote bulk access to the primary datasets listed above (although bulk access can be arranged for particular projects - see below). This is mainly because the conditions underwhich we hold the content do not permit it, but also because the data sets are very large and so providing bulk downloads is not practical.

However, secondary datasets, composed of metadata that describes simple facts about the content, can be made available under open terms.

For all the details, follow the links on the left, or look through the README files and code in the GitHub repository.

Tools

We also make a few tools available, which illustrate how the open datasets might be used. These are hosted on GitHub, and you should feel free to fork our repository or download our tools using the links above. Pull requests containing new or improved tools are welcome!