Effective scene understanding is pivotal in various computer vision applications, from object recognition to autonomous navigation. The introduction of deep learning and embedding techniques has advanced the field significantly. However, a substantial challenge remains: interpreting media captured in real-world, uncontrolled conditions, where numerous environmental variables and imaging artifacts complicate the task.
This doctoral thesis addresses these challenges, aiming to develop scene recognition algorithms that offer high-fidelity situational awareness and understanding of complex scenes. It starts by examining the limitations of traditional methods and the impact of image restoration and enhancement on automatic visual recognition, covering various tasks like image classification, object detection, manipulation detection, and localization.
Exploratory work identifies effective image pre-processing algorithms, combined with robust features and supervised machine learning, suitable for challenging scenarios involving motion blur, adverse weather, and misfocus. Additionally, the study reviews state-of-the-art image manipulation detection techniques, highlighting their susceptibility to high-quality manipulations and the benefits of pre-processing to localize tampered regions more accurately.
Recognizing the limitations of image-based approaches, the thesis explores incorporating contextual information and temporal relationships into the embedding process. Inspired by human perception, it investigates fusing multiple modalities, like visual and temporal data, to create more informative and discriminative embeddings, aiming to better understand of scene structure and cleaner scene representations.
In conclusion, this doctoral thesis introduces novel approaches to scene understanding by leveraging rich embedding techniques in real-world computer vision applications. By addressing the limitations of traditional methods, exploring temporal relationships, and incorporating image enhancement, the research advances the field toward achieving high-fidelity situational awareness. It also emphasizes the challenges of object and manipulation detection and the importance of pre-processing techniques. This research paves the way for robust computer vision systems capable of interpreting real-world scenes and holds promise for applications in media forensics, surveillance, image enhancement, and more.