University of Notre Dame
Browse
- No file added yet -

Operationalizing Classification in Applied Machine Learning

Download (14.33 MB)
thesis
posted on 2017-11-27, 00:00 authored by Saurabh Nagrecha

The increasing diversity of data sources has propelled machine learning into an equally diverse set of application domains. Across these applications, a key task is that of classification. While contemporary approaches manage to achieve impressive predictive performance on pre-structured datasets, surprisingly little work has been done to address how raw data is being structured to best address the underlying domain problem. The state of the art in domain-driven data mining, Actionable Knowledge Discovery, merely acts as a wrapper to transform domain data to feature matrices and class labels. To address these gaps in existing frameworks, we propose the Operationalized Data Science Paradigm (ODSP). Through this paradigm, we now have a formalized framework for structuring data and pipelines, time-censoring, Net Present Value considerations, interpretability and regulation compliance --- all using domain driven insights. We demonstrate the role of domain-driven problem and pipeline design across the diverse domains of cost-sensitive classification, online video content, Massive Open Online Courses (MOOCs) and auto insurance in the form of deployed solutions. For each of these use-cases, we provide a comparative ablation analysis to highlight the role of ODSP in ensuring their operational viability. As result, we show how the domain influences which questions we ask of the data and how we should interpret them.

History

Alt Title

Maturing Classification from Prototypes to Production in Applied Machine Learning

Date Created

2017-11-27

Date Modified

2018-04-18

Defense Date

2017-05-05

Research Director(s)

Nitesh Chawla

Committee Members

Sidney D'Mello Tim Weninger Reid Johnson

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC