Efficient Deep Learning Methods for Medical Image Analysis

Name: Efficient Deep Learning Methods for Medical Image Analysis
Published: 2024-10-04T19:06:26+00:00
License: https://creativecommons.org/licenses/by-nc/4.0/
Keywords: Dissertation

dataset

posted on 2024-10-04, 19:06 authored by Yaopeng Peng

Medical image analysis plays a critical role in a range of medical applications, including diagnosis, treatment planning, and monitoring disease progression. However, it presents significant challenges due to the inherent complexity of the human body, as well as variability in image acquisition techniques, noise, and artifacts. Although deep learning methods have demonstrated considerable promise in medical image analysis, they frequently necessitate large volumes of annotated data for effective model training. Acquiring such annotated data can be particularly challenging in medical imaging due to factors such as the complexity of medical images and the imperative to uphold patient privacy. Furthermore, the annotation process is both time-consuming and costly, requiring the specialized expertise of medical professionals. Consequently, the limited availability of annotated data for training deep learning models often results in overfitting and suboptimal generalization to new data. Advances in medical image analysis have benefited from progress in foundational models originally developed for natural image domains. Innovations such as the integration of topological features into image representations and the application of Vision Transformers (ViTs) to capture global dependencies have proven valuable. However, these models often face significant challenges, including high computational costs and inference latency. Thus, there is an urgent need to develop approaches that are both data-efficient and computationally efficient to overcome these limitations. This dissertation presents six methods designed to improve segmentation and classification performance across both medical and natural scene domains. These methods include selecting the most informative slices for annotation, utilizing unlabeled slices, extracting additional topological information from existing datasets, and developing efficient Vision Transformer models to enhance performance while reducing computational costs. First, we employ an unsupervised method to identify the most effective and representative 2D slices from 3D calf muscle images for annotation. Subsequently, we generate pseudo-labels for all unlabeled slices and train a 3D segmentation model using both the labeled and pseudo-labeled slices. Second, we enhance the model by refining the pseudo-labels with a bi-directional hierarchical Earth Mover's Distance (bi-HEMD) algorithm and fine-tuning the segmentation results using the Primal-Dual Interior Point Method (IPM). Third, we develop a method that integrates both topological features and features extracted by a convolutional neural network (CNN) to improve performance. Fourth, we introduce a Group Vision Transformer mechanism to reduce computational complexity and model parameters, while enhancing feature diversity and reducing feature redundancy. Finally, we develop two Vision Transformer models to improve segmentation performance for detecting thin-cap fibroatheroma (TCFA) in intravascular optical coherence tomography (IVOCT) images and for skin lesion and polyp segmentation. The performance of image recognition in both medical and natural domains can be further enhanced by developing more advanced models. Accordingly, we propose four promising future directions. First, we aim to utilize the Wavelet Transform to mitigate information loss during the down-sampling process, thereby improving detection of small objects. Second, we plan to develop a Multi-Branch Vision Transformer to capture features across various scales while reducing computational costs and inference latency. Third, we intend to create a hierarchical Hilbert Mamba framework for image recognition, which will introduce greater spatial locality and facilitate smoother transitions among image tokens. Finally, we propose to develop a semi-supervised model for medical image segmentation, based on the Segment Anything Model, to address challenges associated with sparse annotations.

History

Date Created

2024-09-30

Date Modified

2024-10-03

Defense Date

2024-09-04

CIP Code

14.0901

Research Director(s)

Danny Chen

Committee Members

Walter Scheirer Milan Sonka Xiangliang Zhang

Degree

Doctor of Philosophy

Degree Level

Doctoral Dissertation

Language

English

Library Record

006619676

OCLC Number

1458558019

Publisher

University of Notre Dame

Additional Groups

Computer Science and Engineering

Program Name

Computer Science and Engineering