British Library sets out to archive the Web

The British Library will use an automated Web harvester to archive about a billion pages, some of them daily.

Lefteris Pitarakis /Associated Press

The British Library will use an automated Web harvester to archive about a billion pages, some of them daily.

LONDON — For centuries, the British Library has kept a copy of every book, pamphlet, magazine, and newspaper published in Britain. Starting Saturday, it will also be bound to record every British website, e-book, online newsletter, and blog in a bid to preserve the nation’s ‘‘digital memory.’’ The library also has to make this digital archive available to future researchers.

It says the work is urgent; firsthand accounts of everything from the 2005 London transit bombings to Britain’s 2010 election campaign have already vanished.


‘‘Stuff out there on the Web is ephemeral,’’ said Lucie Burgess, head of content strategy. ‘‘The average life of a Web page is only 75 days.”

Like reference collections worldwide, the British Library has been trying to archive the Web for years in a piecemeal way, having to get permission from website owners before taking snapshots of their pages. That began to change with a law passed in 2003, but it has taken a decade of legislative and technological preparation to begin a vast trawling of all sites that end with the suffix .uk.

An automated Web harvester will scan and record 1 billion Web pages. Most will be captured once a year, but hundreds of thousands of fast-changing sites such as those of newspapers and magazines will be archived as often as once a day. The library plans to make the content publicly available by year’s end.

Loading comments...
Real journalists. Real journalism. Subscribe to The Boston Globe today.
We hope you've enjoyed your free articles.
Continue reading by subscribing to for just 99¢.
 Already a member? Log in Home
Subscriber Log In

We hope you've enjoyed your 5 free articles'

Stay informed with unlimited access to Boston’s trusted news source.

  • High-quality journalism from the region’s largest newsroom
  • Convenient access across all of your devices
  • Today’s Headlines daily newsletter
  • Subscriber-only access to exclusive offers, events, contests, eBooks, and more
  • Less than 25¢ a week
Marketing image of
Marketing image of