Rich Embedding Techniques to Improve Scene Understanding

Doctoral Dissertation


Effective scene understanding is pivotal in various computer vision applications, from object recognition to autonomous navigation. The introduction of deep learning and embedding techniques has advanced the field significantly. However, a substantial challenge remains: interpreting media captured in real-world, uncontrolled conditions, where numerous environmental variables and imaging artifacts complicate the task.

This doctoral thesis addresses these challenges, aiming to develop scene recognition algorithms that offer high-fidelity situational awareness and understanding of complex scenes. It starts by examining the limitations of traditional methods and the impact of image restoration and enhancement on automatic visual recognition, covering various tasks like image classification, object detection, manipulation detection, and localization.

Exploratory work identifies effective image pre-processing algorithms, combined with robust features and supervised machine learning, suitable for challenging scenarios involving motion blur, adverse weather, and misfocus. Additionally, the study reviews state-of-the-art image manipulation detection techniques, highlighting their susceptibility to high-quality manipulations and the benefits of pre-processing to localize tampered regions more accurately.

Recognizing the limitations of image-based approaches, the thesis explores incorporating contextual information and temporal relationships into the embedding process. Inspired by human perception, it investigates fusing multiple modalities, like visual and temporal data, to create more informative and discriminative embeddings, aiming to better understand of scene structure and cleaner scene representations.

In conclusion, this doctoral thesis introduces novel approaches to scene understanding by leveraging rich embedding techniques in real-world computer vision applications. By addressing the limitations of traditional methods, exploring temporal relationships, and incorporating image enhancement, the research advances the field toward achieving high-fidelity situational awareness. It also emphasizes the challenges of object and manipulation detection and the importance of pre-processing techniques. This research paves the way for robust computer vision systems capable of interpreting real-world scenes and holds promise for applications in media forensics, surveillance, image enhancement, and more.


Attribute NameValues
Author Rosaura G. VidalMata
Contributor Walter J. Scheirer, Research Director
Contributor Kevin W. Bowyer, Research Director
Contributor Patrick Flynn, Committee Member
Contributor Jane Cleland Huang, Committee Member
Contributor Anderson Rocha, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name Doctor of Philosophy
Banner Code

Defense Date
  • 2023-07-25

Submission Date 2023-10-25
  • Rich Embedding Techniques to Improve Scene Understanding

  • Computer Vision

  • Scene understanding

  • Video Segmentation

  • Manipulation detection

  • Object classification

  • English

Record Visibility Public
Content License
Departments and Units
Catalog Record

Digital Object Identifier


This DOI is the best way to cite this doctoral dissertation.


Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.