Article Preview
TopIntroduction
Aerial object detection is a key computer vision task (Ding et al., 2021; Wang et al., 2020; Hu et al., 2022) that has been getting increasing attention in recent years and plays a significant role in remote image understanding. Unlike general object detection, aerial object localization presents particularly tricky questions, including nonaxis-aligned objects in arbitrary directions (Ding et al., 2019; Han et al., 2021; Pan et al., 2020) and dense distributions in complex contexts (Guo et al., 2021; Yang et al., 2018, 2021). For example, aircraft object detection techniques are mainly interfered with by external factors, for instance, noise, weather, light intensity, shadows, and background (Xiaolin et al., 2021) in remote sensing images.
Mainstream methods usually treat aerial object detection as a question of rotating object localization (Han et al., 2020; Yang et al., 2020a). Among them, the angle-based direct orientation regression method is dominant in this research area, and it comes from general detectors (Lin et al., 2016; Li et al., 2022a; Lu et al., 2022; Zhang et al., 2022) with additional orientation parameters. While promising performance has been achieved, direct orientation prediction still suffers from a number of problems, including loss discontinuities and regression inconsistencies (Wang et al., 2022; Yang et al., 2020a, 2021; Yang & Yan 2022). The reasons are the bounded periodicity of the angular directions and the orientation definition of the rotating bounding box. Detectors based on orientation regression may not be able to accurately predict the orientation despite their attractive localization results.
To effectively address the aforementioned issues, the representation of airborne objects is revisited in order to prevent orientation estimation that is overly sensitive. Point sets are exceptionally capable of capturing important semantic features in conventional general-purpose detectors, such as RepPoints, as a fine-grained object representation (Yao et al., 2022). However, its basic transformation function can only generate upright-horizontal bounding boxes, which are unable to precisely calculate the orientation of airborne objects with precision. Additionally, RepPoints ignores a measure of the learned point's quality and merely regresses significant points. Poor performance for complex scenes and nonaxis-aligned objects with dense distribution may result from this in aerial images.