University of Notre Dame
Browse
VidalMataRG102023D.pdf (18.98 MB)

Rich Embedding Techniques to Improve Scene Understanding

Download (18.98 MB)
thesis
posted on 2023-10-25, 00:00 authored by Rosaura G. VidalMata

Effective scene understanding is pivotal in various computer vision applications, from object recognition to autonomous navigation. The introduction of deep learning and embedding techniques has advanced the field significantly. However, a substantial challenge remains: interpreting media captured in real-world, uncontrolled conditions, where numerous environmental variables and imaging artifacts complicate the task.

This doctoral thesis addresses these challenges, aiming to develop scene recognition algorithms that offer high-fidelity situational awareness and understanding of complex scenes. It starts by examining the limitations of traditional methods and the impact of image restoration and enhancement on automatic visual recognition, covering various tasks like image classification, object detection, manipulation detection, and localization.

Exploratory work identifies effective image pre-processing algorithms, combined with robust features and supervised machine learning, suitable for challenging scenarios involving motion blur, adverse weather, and misfocus. Additionally, the study reviews state-of-the-art image manipulation detection techniques, highlighting their susceptibility to high-quality manipulations and the benefits of pre-processing to localize tampered regions more accurately.

Recognizing the limitations of image-based approaches, the thesis explores incorporating contextual information and temporal relationships into the embedding process. Inspired by human perception, it investigates fusing multiple modalities, like visual and temporal data, to create more informative and discriminative embeddings, aiming to better understand of scene structure and cleaner scene representations.

In conclusion, this doctoral thesis introduces novel approaches to scene understanding by leveraging rich embedding techniques in real-world computer vision applications. By addressing the limitations of traditional methods, exploring temporal relationships, and incorporating image enhancement, the research advances the field toward achieving high-fidelity situational awareness. It also emphasizes the challenges of object and manipulation detection and the importance of pre-processing techniques. This research paves the way for robust computer vision systems capable of interpreting real-world scenes and holds promise for applications in media forensics, surveillance, image enhancement, and more.

History

Defense Date

2023-07-25

CIP Code

  • 40.0501

Research Director(s)

Walter J. Scheirer

Committee Members

Patrick Flynn Jane Cleland Huang Anderson Rocha

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

OCLC Number

1407098932

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC