University of Southern California (University of Southern California Viterbi School of Engineering)

Log Number: LG-254829-OLS-23

The University of Southern California will study how current best practices for web archiving can be improved, by offering more effective and efficient means for archiving web pages. The project will develop methods to programmatically assess archived snapshots of webpages to determine if the faithfulness of a page might be compromised in a future web environment. It also will study how existing browser-based crawlers can be modified to analyze JavaScript coding in archived web content and its ability to mimic a page at the time of archiving. With a focus on reducing the compute, network, and storage requirements of web archives, the project will enable crawlers to identify JavaScript files that are acceptable to remove, where the code has no impact on preserving the faithfulness of a page. This work seeks to inform the development of the next generation of archival software and services, raise awareness among librarians and archivists, collect and incorporate feedback about this work, and disseminate findings through a variety of channels. The impact of this work will improve the fidelity of archived web pages and enable budget-constrained libraries and museums to archive many more pages than they can currently.

Project Proposals