Evaluating and Maintaining Classification Algorithms

Doctoral Dissertation
Thumbnail

Abstract

Any practical application of machine learning necessarily begins with the selection of a classification algorithm. Generally, practitioners will try several different types of algorithms (such as decision trees, Bayesian algorithms, support vector machines, or neural networks) and select the algorithm that performs best on a subset of the available data. That is to say, some measurement of the classifier’s performance on past data is used as an estimate of its performance on future data. Ideally, this estimate is perfectly aligned with the extit{true cost} of applying the classifier on future data, but this far from guaranteed in practice. First, any estimate of classifier performance has variance, and this variance is difficult to estimate. Additionally, misclassification costs are rarely known at model-selection time and the characteristics of the population from which data are drawn may change over time. If the training-time estimate of either misclassification cost or data distribution is incorrect, the chosen classifier is sub-optimal and may perform worse than expected. Finally, once a suitable classifier is built and deployed, there need to be systems in place to ensure that it continues to perform at a high level over time. The purpose of this dissertation is to improve the processes of classifier evaluation, selection, and maintenance in real-world situations.

Attributes

Attribute NameValues
URN
  • etd-04182012-202508

Author Troy William Raeder
Advisor Dr. Nitesh V. Chawla
Contributor Dr. W. Philip Kegelmeyer, Committee Member
Contributor Dr. Patrick J. Flynn, Committee Member
Contributor Dr. Nitesh V. Chawla, Committee Chair
Contributor Dr. Kevin W. Bowyer, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name PhD
Defense Date
  • 2012-03-27

Submission Date 2012-04-18
Country
  • United States of America

Subject
  • classification

  • supervised learning

  • evaluation

  • concept drift

Publisher
  • University of Notre Dame

Language
  • English

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.