AMFEF-DETR: An End-to-End Adaptive Multi-Scale Feature Extraction and Fusion Object Detection Network Based on UAV Aerial Images
-
Wang, Sen
11
- Jiang, Huiping 11
- Yang, Jixiang 11
- Ma, Xuan 11
- Chen, Jiamin 11
- González-Aguilera, Diego ed. lit. 2
-
1
Minzu University of China
info
-
2
Universidad de Salamanca
info
ISSN: 2504-446X
Year of publication: 2024
Volume: 8
Issue: 10
Pages: 523
Type: Article
More publications in: Drones
Funding information
Funders
-
National Natural Science Foundation of China
- 61773416
-
Graduate Research and Practice Projects of Minzu University of China
- SJCX2024021
Bibliographic References
- Colomina, (2014), ISPRS J. Photogramm. Remote Sens., 92, pp. 79, 10.1016/j.isprsjprs.2014.02.013
- Pouyanfar, (2018), ACM Comput. Surv. (CSUR), 51, pp. 1
- Shi, (2016), IEEE Internet Things J., 3, pp. 637, 10.1109/JIOT.2016.2579198
- Ke, (2018), IEEE Trans. Intell. Transp. Syst., 20, pp. 54, 10.1109/TITS.2018.2797697
- Feng, (2015), Remote Sens., 7, pp. 1074, 10.3390/rs70101074
- Erdelj, (2017), IEEE Pervasive Comput., 16, pp. 24, 10.1109/MPRV.2017.11
- Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (2018, January 18–22). DOTA: A large-scale dataset for object detection in aerial images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.
- Liu, Y., Piramanayagam, S., Monteiro, S.T., and Saber, E. (2017, January 21–26). Dense semantic labeling of very-high-resolution aerial imagery and lidar with fully-convolutional neural networks and higher-order CRFs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.
- Liu, (2020), Int. J. Comput. Vis., 128, pp. 261, 10.1007/s11263-019-01247-4
- Bai, Z., Pei, X., Qiao, Z., Wu, G., and Bai, Y. (2024). Improved YOLOv7 Target Detection Algorithm Based on UAV Aerial Photography. Drones, 8.
- Mandal, (2019), IEEE Geosci. Remote Sens. Lett., 17, pp. 494, 10.1109/LGRS.2019.2923564
- Mohsan, (2023), Intell. Serv. Robot., 16, pp. 109
- Zhang, M., Zhang, R., Yang, Y., Bai, H., Zhang, J., and Guo, J. (2022, January 18–24). ISNet: Shape matters for infrared small target detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.
- Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C.L. (2014, January 6–12). Microsoft coco: Common objects in context. Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland. Proceedings, Part V 13.
- Baykara, H.C., Bıyık, E., Gül, G., Onural, D., Öztürk, A.S., and Yıldız, I. (2017, January 6–8). Real-time detection, tracking and classification of multiple moving objects in UAV videos. Proceedings of the 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), Boston, MA, USA.
- Bazi, (2018), IEEE Trans. Geosci. Remote Sens., 56, pp. 3107, 10.1109/TGRS.2018.2790926
- Abughalieh, (2019), Multimed. Tools Appl., 78, pp. 9149, 10.1007/s11042-018-6508-1
- Ren, (2016), IEEE Trans. Pattern Anal. Mach. Intell., 39, pp. 1137, 10.1109/TPAMI.2016.2577031
- He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2017, January 22–29). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.
- Tan, M., Pang, R., and Le, Q.V. (2020, January 14–19). Efficientdet: Scalable and efficient object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.
- Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27–30). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.
- Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23–28). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.
- Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv.
- Roh, B., Shin, J., Shin, W., and Kim, S. (2021). Sparse detr: Efficient end-to-end object detection with learnable sparsity. arXiv.
- Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., and Chen, J. (2024, January 16–22). Detrs beat yolos on real-time object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.
- Cheng, Q., Wang, Y., He, W., and Bai, Y. (2024). Lightweight air-to-air unmanned aerial vehicle target detection model. Sci. Rep., 14.
- Zhang, (2024), J. Artif. Intell. Soft Comput. Res., 14, pp. 251, 10.2478/jaiscr-2024-0014
- Wang, S., Jiang, H., Li, Z., Yang, J., Ma, X., Chen, J., and Tang, X. (2024). PHSI-RTDETR: A Lightweight Infrared Small Target Detection Algorithm Based on UAV Aerial Photography. Drones, 8.
- Jin, R., Jia, Z., Yin, X., Niu, Y., and Qi, Y. (2024). Domain Feature Decomposition for Efficient Object Detection in Aerial Images. Remote Sens., 16.
- Wu, (2024), Digit. Signal Process., 146, pp. 104390, 10.1016/j.dsp.2024.104390
- Tan, S., Duan, Z., and Pu, L. (2024). Multi-scale object detection in UAV images based on adaptive feature fusion. PLoS ONE, 19.
- Battish, (2024), Image Vis. Comput., 150, pp. 105232, 10.1016/j.imavis.2024.105232
- Wang, (2023), Multimed. Syst., 29, pp. 3329, 10.1007/s00530-023-01182-y
- Chen, L., Gu, L., Zheng, D., and Fu, Y. (2024, January 16–22). Frequency-Adaptive Dilated Convolution for Semantic Segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.
- He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.
- Pan, (2022), Adv. Neural Inf. Process. Syst., 35, pp. 14541
- Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21–26). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.
- Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18–22). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.
- Zhang, H., and Zhang, S. (2023). Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale. arXiv.
- Zhang, H., Xu, C., and Zhang, S. (2023). Inner-IoU: More effective intersection over union loss with auxiliary bounding box. arXiv.
- Zhu, (2021), IEEE Trans. Pattern Anal. Mach. Intell., 44, pp. 7380, 10.1109/TPAMI.2021.3119563
- Qi, Y., He, Y., Qi, X., Zhang, Y., and Yang, G. (2023, January 2–6). Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France.
- Zhang, X., Song, Y., Song, T., Yang, D., Ye, Y., Zhou, J., and Zhang, L. (2023). AKConv: Convolutional Kernel with Arbitrary Sampled Shapes and Arbitrary Number of Parameters. arXiv.
- Zhong, (2022), IEEE Trans. Neural Netw. Learn. Syst., 34, pp. 9528, 10.1109/TNNLS.2022.3151138
- Chen, J., Kao, S.-h., He, H., Zhuo, W., Wen, S., Lee, C.-H., and Chan, S.-H.G. (2023, January 17–24). Run, Don’t walk: Chasing higher FLOPS for faster neural networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.
- Zhang, J., Li, X., Li, J., Liu, L., Xue, Z., Zhang, B., Jiang, Z., Huang, T., Wang, Y., and Wang, C. (2023, January 1–6). Rethinking mobile block for efficient attention-based models. Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France.
- Jiang, (2021), IEEE Trans. Image Process., 30, pp. 5875, 10.1109/TIP.2021.3089943
- Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., and Savarese, S. (2019, January 15–20). Generalized intersection over union: A metric and a loss for bounding box regression. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.
- Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D. (2020, January 7–12). Distance-IoU loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.
- Gevorgyan, Z. (2022). SIoU loss: More powerful learning for bounding box regression. arXiv.
- Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., and Ding, G. (2024). Yolov10: Real-time end-to-end object detection. arXiv.
- Yang, C., Huang, Z., and Wang, N. (2022, January 18–24). QueryDet: Cascaded sparse query for accelerating high-resolution small object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.
- Feng, C., Zhong, Y., Gao, Y., Scott, M.R., and Huang, W. (2021, January 11–17). Tood: Task-aligned one-stage object detection. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.
- Lyu, C., Zhang, W., Huang, H., Zhou, Y., Wang, Y., Liu, Y., Zhang, S., and Chen, K. (2022). Rtmdet: An empirical study of designing real-time object detectors. arXiv.
- Yao, Z., Ai, J., Li, B., and Zhang, C. (2021). Efficient detr: Improving end-to-end object detector with dense prior. arXiv.
- Wang, C.-Y., Yeh, I.-H., and Liao, H.-Y.M. (2024). Yolov9: Learning what you want to learn using programmable gradient information. arXiv.
- Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., and Nie, W. (2022). YOLOv6: A single-stage object detection framework for industrial applications. arXiv.