AMFEF-DETR: An End-to-End Adaptive Multi-Scale Feature Extraction and Fusion Object Detection Network Based on UAV Aerial Images

  1. Wang, Sen 11
  2. Jiang, Huiping 11
  3. Yang, Jixiang 11
  4. Ma, Xuan 11
  5. Chen, Jiamin 11
  6. González-Aguilera, Diego ed. lit. 2
  1. 1 Minzu University of China
    info

    Minzu University of China

    Pekín, China

    ROR https://ror.org/0044e2g62

  2. 2 Universidad de Salamanca
    info

    Universidad de Salamanca

    Salamanca, España

    ROR https://ror.org/02f40zc51

Journal:
Drones

ISSN: 2504-446X

Year of publication: 2024

Volume: 8

Issue: 10

Pages: 523

Type: Article

DOI: 10.3390/DRONES8100523 GOOGLE SCHOLAR lock_openOpen access editor

More publications in: Drones

Funding information

Funders

Bibliographic References

  • Colomina, (2014), ISPRS J. Photogramm. Remote Sens., 92, pp. 79, 10.1016/j.isprsjprs.2014.02.013
  • Pouyanfar, (2018), ACM Comput. Surv. (CSUR), 51, pp. 1
  • Shi, (2016), IEEE Internet Things J., 3, pp. 637, 10.1109/JIOT.2016.2579198
  • Ke, (2018), IEEE Trans. Intell. Transp. Syst., 20, pp. 54, 10.1109/TITS.2018.2797697
  • Feng, (2015), Remote Sens., 7, pp. 1074, 10.3390/rs70101074
  • Erdelj, (2017), IEEE Pervasive Comput., 16, pp. 24, 10.1109/MPRV.2017.11
  • Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (2018, January 18–22). DOTA: A large-scale dataset for object detection in aerial images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.
  • Liu, Y., Piramanayagam, S., Monteiro, S.T., and Saber, E. (2017, January 21–26). Dense semantic labeling of very-high-resolution aerial imagery and lidar with fully-convolutional neural networks and higher-order CRFs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.
  • Liu, (2020), Int. J. Comput. Vis., 128, pp. 261, 10.1007/s11263-019-01247-4
  • Bai, Z., Pei, X., Qiao, Z., Wu, G., and Bai, Y. (2024). Improved YOLOv7 Target Detection Algorithm Based on UAV Aerial Photography. Drones, 8.
  • Mandal, (2019), IEEE Geosci. Remote Sens. Lett., 17, pp. 494, 10.1109/LGRS.2019.2923564
  • Mohsan, (2023), Intell. Serv. Robot., 16, pp. 109
  • Zhang, M., Zhang, R., Yang, Y., Bai, H., Zhang, J., and Guo, J. (2022, January 18–24). ISNet: Shape matters for infrared small target detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.
  • Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C.L. (2014, January 6–12). Microsoft coco: Common objects in context. Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland. Proceedings, Part V 13.
  • Baykara, H.C., Bıyık, E., Gül, G., Onural, D., Öztürk, A.S., and Yıldız, I. (2017, January 6–8). Real-time detection, tracking and classification of multiple moving objects in UAV videos. Proceedings of the 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), Boston, MA, USA.
  • Bazi, (2018), IEEE Trans. Geosci. Remote Sens., 56, pp. 3107, 10.1109/TGRS.2018.2790926
  • Abughalieh, (2019), Multimed. Tools Appl., 78, pp. 9149, 10.1007/s11042-018-6508-1
  • Ren, (2016), IEEE Trans. Pattern Anal. Mach. Intell., 39, pp. 1137, 10.1109/TPAMI.2016.2577031
  • He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2017, January 22–29). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.
  • Tan, M., Pang, R., and Le, Q.V. (2020, January 14–19). Efficientdet: Scalable and efficient object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.
  • Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27–30). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.
  • Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23–28). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.
  • Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv.
  • Roh, B., Shin, J., Shin, W., and Kim, S. (2021). Sparse detr: Efficient end-to-end object detection with learnable sparsity. arXiv.
  • Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., and Chen, J. (2024, January 16–22). Detrs beat yolos on real-time object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.
  • Cheng, Q., Wang, Y., He, W., and Bai, Y. (2024). Lightweight air-to-air unmanned aerial vehicle target detection model. Sci. Rep., 14.
  • Zhang, (2024), J. Artif. Intell. Soft Comput. Res., 14, pp. 251, 10.2478/jaiscr-2024-0014
  • Wang, S., Jiang, H., Li, Z., Yang, J., Ma, X., Chen, J., and Tang, X. (2024). PHSI-RTDETR: A Lightweight Infrared Small Target Detection Algorithm Based on UAV Aerial Photography. Drones, 8.
  • Jin, R., Jia, Z., Yin, X., Niu, Y., and Qi, Y. (2024). Domain Feature Decomposition for Efficient Object Detection in Aerial Images. Remote Sens., 16.
  • Wu, (2024), Digit. Signal Process., 146, pp. 104390, 10.1016/j.dsp.2024.104390
  • Tan, S., Duan, Z., and Pu, L. (2024). Multi-scale object detection in UAV images based on adaptive feature fusion. PLoS ONE, 19.
  • Battish, (2024), Image Vis. Comput., 150, pp. 105232, 10.1016/j.imavis.2024.105232
  • Wang, (2023), Multimed. Syst., 29, pp. 3329, 10.1007/s00530-023-01182-y
  • Chen, L., Gu, L., Zheng, D., and Fu, Y. (2024, January 16–22). Frequency-Adaptive Dilated Convolution for Semantic Segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.
  • He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.
  • Pan, (2022), Adv. Neural Inf. Process. Syst., 35, pp. 14541
  • Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21–26). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.
  • Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18–22). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.
  • Zhang, H., and Zhang, S. (2023). Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale. arXiv.
  • Zhang, H., Xu, C., and Zhang, S. (2023). Inner-IoU: More effective intersection over union loss with auxiliary bounding box. arXiv.
  • Zhu, (2021), IEEE Trans. Pattern Anal. Mach. Intell., 44, pp. 7380, 10.1109/TPAMI.2021.3119563
  • Qi, Y., He, Y., Qi, X., Zhang, Y., and Yang, G. (2023, January 2–6). Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France.
  • Zhang, X., Song, Y., Song, T., Yang, D., Ye, Y., Zhou, J., and Zhang, L. (2023). AKConv: Convolutional Kernel with Arbitrary Sampled Shapes and Arbitrary Number of Parameters. arXiv.
  • Zhong, (2022), IEEE Trans. Neural Netw. Learn. Syst., 34, pp. 9528, 10.1109/TNNLS.2022.3151138
  • Chen, J., Kao, S.-h., He, H., Zhuo, W., Wen, S., Lee, C.-H., and Chan, S.-H.G. (2023, January 17–24). Run, Don’t walk: Chasing higher FLOPS for faster neural networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.
  • Zhang, J., Li, X., Li, J., Liu, L., Xue, Z., Zhang, B., Jiang, Z., Huang, T., Wang, Y., and Wang, C. (2023, January 1–6). Rethinking mobile block for efficient attention-based models. Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France.
  • Jiang, (2021), IEEE Trans. Image Process., 30, pp. 5875, 10.1109/TIP.2021.3089943
  • Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., and Savarese, S. (2019, January 15–20). Generalized intersection over union: A metric and a loss for bounding box regression. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.
  • Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D. (2020, January 7–12). Distance-IoU loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.
  • Gevorgyan, Z. (2022). SIoU loss: More powerful learning for bounding box regression. arXiv.
  • Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., and Ding, G. (2024). Yolov10: Real-time end-to-end object detection. arXiv.
  • Yang, C., Huang, Z., and Wang, N. (2022, January 18–24). QueryDet: Cascaded sparse query for accelerating high-resolution small object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.
  • Feng, C., Zhong, Y., Gao, Y., Scott, M.R., and Huang, W. (2021, January 11–17). Tood: Task-aligned one-stage object detection. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.
  • Lyu, C., Zhang, W., Huang, H., Zhou, Y., Wang, Y., Liu, Y., Zhang, S., and Chen, K. (2022). Rtmdet: An empirical study of designing real-time object detectors. arXiv.
  • Yao, Z., Ai, J., Li, B., and Zhang, C. (2021). Efficient detr: Improving end-to-end object detector with dense prior. arXiv.
  • Wang, C.-Y., Yeh, I.-H., and Liao, H.-Y.M. (2024). Yolov9: Learning what you want to learn using programmable gradient information. arXiv.
  • Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., and Nie, W. (2022). YOLOv6: A single-stage object detection framework for industrial applications. arXiv.