This Ph.D. thesis is about enhancing small object detection and segmentation in road video sequences by integrating convolutional neural networks (CNNs) and super-resolution (SR) techniques. In response to the increasing volume of multimedia data, traditional manual analysis has become impractical, requiring innovative approaches for efficient processing. Deep learning, particularly CNNs, emerges as the solution to overcome inherent limitations in classical methods. The core objective is to augment input image pixel count significantly, improving object detection models' inference and processing capabilities. Notably, the thesis addresses the challenge of identifying small objects, where existing models encounter difficulties despite their success with larger objects. First, the theoretical background is detailed, establishing the state of the art of CNN-based foundations focused on neural models for super-resolution, detection, and segmentation. Finally, the problem of small object identification and anomaly detection is specified.
Seven works are presented to address these challenges, and they can be divided according to the problem of detection, segmentation, or identification of anomalies. The detection part comprises works based on applying super-resolution and re-inference, a methodology for automatically fine-tuning a CNN without needing previous manual labeling, and the generation of optimal regions to re-infer on it. The segmentation includes works based on several optimized and super-resolution techniques applied in urban sequences. Finally, a work for improving anomaly identification is detailed. The application of CNN and super-resolution proves crucial in enhancing the effectiveness of object detection and segmentation models.
Finally, the research conclusions are presented and discussed, as well as the possible future research lines that can use the results obtained in this Ph.D. thesis.