Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/140583
Type: Thesis
Title: Towards Effective and Efficient Semantic Segmentation
Author: Zhang, Bowen
Issue Date: 2024
School/Discipline: School of Computer and Mathematical Sciences
Abstract: Semantic segmentation, a crucial per-pixel dense prediction task, requires both accuracy and efficiency for effective application in real-world scenarios. This thesis introduces a series of novel methods that address the challenges of semantic segmentation, focusing on achieving high performance while considering computational efficiency. First, we propose a dynamic neural representational decoder (NRD) to address the heavy computational costs associated with the decoder’s upsampling process. Traditional approaches rely on feature pyramid networks or dilated backbones to obtain high-resolution feature maps for semantic segmentation using convolutional network backbones. Instead, we leverage the smoothness prior in the semantic label space and utilize compact neural networks to represent semantic predictions at a patchlevel granularity. This significantly reduces computational costs while maintaining competitive performance. Second, considering the recent advancements in vision transformer networks (ViT), which demonstrate superior performance compared to traditional fully convolutionbased networks, this thesis introduces a state-of-the-art decoder framework called Attention-to-Mask (ATM). The ATM leverages attention mechanisms to generate binary masks for semantic segmentation. This decoder not only achieves high performance but also exhibits lightweight characteristics. Additionally, to address the computational cost and redundancy associated with ViT backbones, a Shrunk structure is proposed to enhance efficiency while maintaining the effectiveness of the ATM. Finally, despite Vision transformers achieving leading performance in various visual tasks, they still suffer from high computational complexity. This issue becomes more pronounced in dense prediction tasks like semantic segmentation, where highresolution inputs and outputs entail a larger number of tokens involved in computations. While direct token removal has been discussed for image classification tasks, it cannot be extended to semantic segmentation due to the requirement of dense predictions for every patch. To address this challenge, we propose a novel method called Dynamic Token Pruning (DToP) for semantic segmentation, based on the early exit of tokens. Inspired by the human coarse-to-fine segmentation process, we naturally partition the widely adopted auxiliary-loss-based network architecture into multiple stages, with each auxiliary block grading the difficulty level of each token. By leveraging this approach, we can make early predictions for easy tokens, eliminating the need to complete the entire forward pass. Experimental evaluations reveal that the proposed DToP architecture reduces, on average, 20%-35% of the computational cost for current semantic segmentation methods based on plain vision transformers, all while maintaining accuracy. The effectiveness of the proposed methods is extensively evaluated on benchmark datasets, demonstrating their efficiency and accuracy in achieving state-of-the-art semantic segmentation results.
Advisor: Liu, Yifan
Shen, Chunhua
Liao, Zhibin
Dissertation Note: Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 2024
Keywords: Semantic Segmentation
Provenance: This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
Zhang2024_PhD.pdf9.47 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.