Abstract
.Terrain recognition is critical in various applications, including autonomous navigation, disaster response, and remote sensing. Traditional methods rely heavily on convolutional neural networks (CNNs), which require significant computational resources for high accuracy. Vision transformers (ViTs) have recently emerged as a novel approach to image processing, offering superior capability in processing long-range dependencies in visual data. This paper proposes a terrain recognition model based on Vision Transformers, aiming to improve classification accuracy and processing efficiency on complex terrain datasets. Key steps include pre-processing satellite imagery, feature extraction through transformer architecture, and performance evaluation. Our results demonstrate that ViTs significantly enhance recognition accuracy, making them a promising alternative to CNNs in terrain analysis tasks.