Internet Archive Book Scanning with Davide Semenzin


Manage episode 272083867 series 1418007
By Software Engineering Daily. Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio is streamed directly from their servers. Hit the Subscribe button to track updates in Player FM, or paste the feed URL into other podcast apps.

The Internet Archive collects historical records of the Internet. The Wayback Machine is one tool from the Internet Archive which you may be familiar with. One project you may be unfamiliar with is book scanning. Internet Archive scans high volumes of books in order to digitize them.

In today’s episode, Davide Semenzin joins the show to talk through the history of the Internet Archive and the engineering behind book digitization. We talk through OCR, storage, architecture, and scalability.

Sponsorship inquiries:

The post Internet Archive Book Scanning with Davide Semenzin appeared first on Software Engineering Daily.

1392 episodes