University of Notre Dame
Browse

File(s) under embargo

Automated Analysis of Historical Documents

thesis
posted on 2024-02-12, 19:17 authored by Samuel Grieggs

Researchers in the humanities can spend years going to collections all throughout the world to find the primary sources that will give them the key discoveries that help us understand our cultural heritage. Unfortunately, the ability to access these documents can be limited by all sorts of external factors. From protective archivists, international travel restrictions, and even simple lack of resources, it is not always possible to physically get to them. Therefore, recent efforts have been made to digitize large collections of historical handwritten manuscripts, and make the scanned images available online. The transcription of handwritten historical documents into machine-encoded text has always been a difficult and time-consuming task, and in fact entire academic careers are built around transcribing individual codices and producing a definitive edition. The automatic transcription of handwritten text is known as Handwritten Text Recognition, and it is a robust research area for both modern and historical documents, but there are unique challenges that come when working with historical documents.

We look at how measuring human performance and incorporating that information into the loss function can improve handwritten text transcription on both medieval Latin manuscripts and modern English and French handwriting. We will also summarize an interdisciplinary collaborative project in which my collaborators and I created an easy-to-use open-source tool that converts an image of a manuscript page written in the historical Ethiopic script of Ge'ez into a transcription.

Finally, we introduce automated handwriting identification tools for which the results can be quickly visually understood and assessed, and used as one feature among many by expert paleographers when attributing previously unknown scribal hands. We also demonstrate a use case for our software by analyzing several items believed to be written by Thomas Hoccleve, a highly productive clerk of the Privy Seal who also happens to be an important fifteenth-century English poet.

History

Defense Date

2023-07-28

CIP Code

  • 40.0501

Research Director(s)

Walter J. Scheirer

Committee Members

Adam Czajka Kevin Bowyer Gelila Tilahun

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

OCLC Number

1411842187

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC