University of Notre Dame

File(s) under embargo

Automated Analysis of Historical Documents

posted on 2024-02-12, 19:17 authored by Samuel Grieggs

Researchers in the humanities can spend years going to collections all throughout the world to find the primary sources that will give them the key discoveries that help us understand our cultural heritage. Unfortunately, the ability to access these documents can be limited by all sorts of external factors. From protective archivists, international travel restrictions, and even simple lack of resources, it is not always possible to physically get to them. Therefore, recent efforts have been made to digitize large collections of historical handwritten manuscripts, and make the scanned images available online. The transcription of handwritten historical documents into machine-encoded text has always been a difficult and time-consuming task, and in fact entire academic careers are built around transcribing individual codices and producing a definitive edition. The automatic transcription of handwritten text is known as Handwritten Text Recognition, and it is a robust research area for both modern and historical documents, but there are unique challenges that come when working with historical documents.

We look at how measuring human performance and incorporating that information into the loss function can improve handwritten text transcription on both medieval Latin manuscripts and modern English and French handwriting. We will also summarize an interdisciplinary collaborative project in which my collaborators and I created an easy-to-use open-source tool that converts an image of a manuscript page written in the historical Ethiopic script of Ge'ez into a transcription.

Finally, we introduce automated handwriting identification tools for which the results can be quickly visually understood and assessed, and used as one feature among many by expert paleographers when attributing previously unknown scribal hands. We also demonstrate a use case for our software by analyzing several items believed to be written by Thomas Hoccleve, a highly productive clerk of the Privy Seal who also happens to be an important fifteenth-century English poet.


Defense Date


CIP Code

  • 40.0501

Research Director(s)

Walter J. Scheirer

Committee Members

Adam Czajka Kevin Bowyer Gelila Tilahun


  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

OCLC Number


Program Name

  • Computer Science and Engineering

Usage metrics



    No categories selected


    Ref. manager