Evaluating and Maintaining Classification Algorithms

Raeder, Troy William

doi:10.7274/4b29b56616h

File(s) under permanent embargo

Evaluating and Maintaining Classification Algorithms

thesis

posted on 2012-04-18, 00:00 authored by Troy William Raeder

Any practical application of machine learning necessarily begins with the selection of a classification algorithm. Generally, practitioners will try several different types of algorithms (such as decision trees, Bayesian algorithms, support vector machines, or neural networks) and select the algorithm that performs best on a subset of the available data. That is to say, some measurement of the classifier's performance on past data is used as an estimate of its performance on future data. Ideally, this estimate is perfectly aligned with the extit{true cost} of applying the classifier on future data, but this far from guaranteed in practice. First, any estimate of classifier performance has variance, and this variance is difficult to estimate. Additionally, misclassification costs are rarely known at model-selection time and the characteristics of the population from which data are drawn may change over time. If the training-time estimate of either misclassification cost or data distribution is incorrect, the chosen classifier is sub-optimal and may perform worse than expected. Finally, once a suitable classifier is built and deployed, there need to be systems in place to ensure that it continues to perform at a high level over time. The purpose of this dissertation is to improve the processes of classifier evaluation, selection, and maintenance in real-world situations.

History

Date Modified

2017-06-05

Defense Date

2012-03-27

Research Director(s)

Dr. Nitesh V. Chawla

Committee Members

Dr. W. Philip Kegelmeyer Dr. Patrick J. Flynn Dr. Kevin W. Bowyer

Degree

Doctor of Philosophy

Degree Level

Doctoral Dissertation

Language

English

Alternate Identifier

etd-04182012-202508

Publisher

University of Notre Dame

Program Name

Computer Science and Engineering

Usage metrics

Keywords

classification supervised learning evaluation concept drift

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under permanent embargo

Evaluating and Maintaining Classification Algorithms

History

Date Modified

Defense Date

Research Director(s)

Committee Members

Degree

Degree Level

Language

Alternate Identifier

Publisher

Program Name

Usage metrics

Categories

Keywords

Licence

Exports