University of Notre Dame
Browse

[Preprint] The Engineering History Project Database: Creating and Liking Datasets from Structured, Semi-Structured, and Unstructured Historical Sources

Download (780.71 kB)
preprint
posted on 2025-10-16, 16:45 authored by Israel Solares, Edward Beatty
<p dir="ltr">This paper describes the methods used to collect, organize, clean, and validate data drawn from three different types of digitized historical sources, and subsequently linked in a relational database. The constituent data are available as three separate datasets or in a linked, relational format. This paper describes the methods in detail. </p><p dir="ltr">The datasets can be located and cited as follows:</p><p dir="ltr">Israel G. Solares and Edward Beatty (2025). <i>Engineering History Project Dataset</i> (Version v.1) [Dataset]. CurateND. https://doi.org/10.7274/30108082. </p><p dir="ltr">The project uses three different types of digitized historical sources – one containing structured information, one semi structured, and one unstructured – we construct a relational database that connects individuals, firms, and textual material related to individuals and firms. The research project examines the emergence of professional engineering, 1870-1930, and uses the global mining sector as a case study. This paper explains the methods used to construct the initial three constituent datasets, including techniques to clean and validate each. It then explains the methods used to transform and link those datasets, creating a relational database that includes information on roughly 130,000 individuals, over 50,000 firms, and almost 400,000 journal articles. We are able to trace individuals, firms, and technologies over time and space and identify interconnected communities and networks in a globalized setting. This is a preprint version.</p>

Funding

National Science Foundation Grant #2020926

Kellogg Institute Seed Funding Grant

History

Related Materials

  1. 1.
    DOI - Is supplemented by Engineering History Project Datasets

Date Created

2025-10-16

Language

  • English

Usage metrics

    History

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC