A Multi-Task Vision Transformer for Segmentation and Monocular Depth Estimation for Autonomous Vehicles
In this paper, we investigate the use of Vision Transformers EYEBRIGHT HERB for processing and understanding visual data in an autonomous driving setting.Specifically, we explore the use of Vision Transformers for semantic segmentation and monocular depth estimation using only a single image as input.We present state-of-the-art Vision Transformers