场景文本检测旨在从自然场景中准确检测出存在的文本。目前基于分割的场景文本检测技术面临文字种类多样、背景复杂、形状不规则等挑战,但是缺少相应的综合技术,因此,本文将对自然场景文本检测技术进行综述。以下是本文主要内容:1) 阐述场景文本检测领域基于分割的检测算法,包括语义分割和实例分割。2) 介绍一些经典模型和近年提出的创新模型,对其进行分析整合。3) 介绍常用自然场景文本数据集以及对比不同算法的优缺点、性能等。4) 展望基于分割的自然场景文本检测算法未来发展趋势。 Scene text detection aims to accurately detect the presence of text from natural scenes. The current segmentation-based scene text detection technology faces challenges such as diverse text types, complex backgrounds, irregular shapes, etc., but lacks the corresponding comprehensive technology; therefore, this paper will review the natural scene text detection technology. The following is the main content of this paper: 1) Explaining the segmentation-based detection algorithms in the field of scene text detection, including semantic segmentation and instance segmentation. 2) Introducing some classical models and innovative models proposed in recent years, and analyzing and integrating them. 3) Introducing the commonly used natural scene text datasets as well as comparing the strengths and weaknesses of different algorithms and their performances, etc. 4) Prospecting the future development of segmentation-based natural scene text detection algorithms, looking ahead to the future development trends of segmentation-based natural scene text detection algorithms.
文本检测,分割,综述, Text Detection
Segmentation
Overview
摘要
A Review on the Application of Segmentation-Based Text Detection Techniques for Natural Scenes
Weijie Chen, Yixing Xia, Shijie Du
School of Information and Intelligent Engineering, Zhejiang Wanli University, Ningbo Zhejiang
Received: Apr. 11th, 2024; accepted: May 24th, 2024; published: May 31st, 2024
ABSTRACT
Scene text detection aims to accurately detect the presence of text from natural scenes. The current segmentation-based scene text detection technology faces challenges such as diverse text types, complex backgrounds, irregular shapes, etc., but lacks the corresponding comprehensive technology; therefore, this paper will review the natural scene text detection technology. The following is the main content of this paper: 1) Explaining the segmentation-based detection algorithms in the field of scene text detection, including semantic segmentation and instance segmentation. 2) Introducing some classical models and innovative models proposed in recent years, and analyzing and integrating them. 3) Introducing the commonly used natural scene text datasets as well as comparing the strengths and weaknesses of different algorithms and their performances, etc. 4) Prospecting the future development of segmentation-based natural scene text detection algorithms, looking ahead to the future development trends of segmentation-based natural scene text detection algorithms.
陈伟杰,夏易行,杜世杰. 基于分割的自然场景文本检测技术应用综述A Review on the Application of Segmentation-Based Text Detection Techniques for Natural Scenes[J]. 人工智能与机器人研究, 2024, 13(02): 399-407. https://doi.org/10.12677/airr.2024.132041
参考文献References
Tian, Z., Huang, W., He, T., et al. (2016) Detecting Text in Natural Image with Connectionist Text Proposal Network. Computer Vision—ECCV 2016, Amsterdam, 11-14 October 2016, 56-72. https://doi.org/10.1007/978-3-319-46484-8_4
Ren, S., He, K., Girshick, R., et al. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. https://doi.org/10.1109/tpami.2016.2577031
Liao, M., Shi, B., Bai, X., et al. (2022) TextBoxes: A Fast Text Detector with a Single Deep Neural Network. Proceedings of the AAAI Conference on Artificial Intelligence, 31. https://doi.org/10.1609/aaai.v31i1.11196
Liao, M., Shi, B. and Bai, X. (2018) TextBoxes : A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, 27, 3676-3690. https://doi.org/10.1109/tip.2018.2825107
Guo, L., Chen Z. and Chen, X. (2022) Arbitrary-Shaped Text Detection with Gaussian Probability Distance Distribution. 2022 IEEE 5th International Conference on Computer and Communication Engineering Technology (CCET), Beijing, 19-21 August 2022, 58-64. https://doi.org/10.1109/CCET55412.2022.9906393
Cui, C., Lu, L., Tan, Z. and Hussain, A. (2021) Conceptual Text Region Network: Cognition-Inspired Accurate Scene Text Detection. Neurocomputing, 464, 252-264.
Liu, F., Gu, D. and Chen, C. (2019) IoU-Related Arbitrary Shape Text Scoring Detector. IEEE Access, 7, 180428-180437. https://doi.org/10.1109/access.2019.2959018
Wu, Y., Kong, Q., Lai, Y., Narducci, F. and Wan, S. (2023) CDText: Scene Text Detector Based on Context-Aware Deformable Transformer. Pattern Recognition Letters, 172, 8-14. https://doi.org/10.1016/j.patrec.2023.05.025
Naim, S. and Moumkine, N. (2023) Semantic Segmentation Architecture for Text Detection with an Attention Module. In: Kacprzyk, J., Ezziyyani, M. and Balas, V.E., Eds., International Conference on Advanced Intelligent Systems for Sustainable Development, Springer, Cham, 359-367. https://doi.org/10.1007/978-3-031-35251-5_35
Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, 5-9 October 2015, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28
Wang, Z., et al. (2022) A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information. Sensors, 22, Article 9982. https://doi.org/10.3390/s22249982
Zhang, Z, et al. (2016) Multi-Oriented Text Detection with Fully Convolutional Networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 4159-4167. https://doi.org/10.1109/CVPR.2016.451
Chen, J., et al. (2019) Irregular Scene Text Detection via Attention Guided Border Labeling. Science China Information Sciences, 62, Article No. 220103. https://doi.org/10.1007/s11432-019-2673-8
Baek, Y., et al. (2019) Character Region Awareness for Text Detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 9357-9366. https://doi.org/10.1109/cvpr.2019.00959
Zhao, L., et al. (2022) Background-Insensitive Scene Text Recognition with Text Semantic Segmentation. Springer, Cham. https://doi.org/10.1007/978-3-031-19806-9_10
Liao, M., et al. (2020) Real-Time Scene Text Detection with Differentiable Binarization. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 11474-11481. https://doi.org/10.1609/aaai.v34i07.6812
Liu, Y., et al. (2022) Efficient and Accurate Text Detection Combining Differentiable Binarization with Semantic Segmentation. Lecture Notes in Computer Science. Artificial Neural Networks and Machine Learning—ICANN 2022, Bristol, 6-9 September 2022, 630-642. https://doi.org/10.1007/978-3-031-15934-3_52
Liao, M., et al. (2023) Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 919-931. https://doi.org/10.1109/tpami.2022.3155612
Liu, C., et al. (2019) Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention. MultiMedia Modeling, Thessaloniki, 8-11 January 2019, 531-542. https://doi.org/10.1007/978-3-030-05710-7_44
He, K., et al. (2020) Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 386-397. https://doi.org/10.1109/tpami.2018.2844175
Liao, M., et al. (2021) Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 532-548. https://doi.org/10.1109/tpami.2019.2937086
Liao, M.H., Lyu, P.Y., He, M.H., et al. (2019) Mask TextSpotter: An End-to End Trainable Neural Network for Spotting Text with Arbitrary Shapes. IEEE Trans Pattern Anal Machine Intelligence, 43, 532-548. https://doi.org/0.1109/TPAMI.2019.2937086
Xie, E., et al. (2019) Scene Text Detection with Supervised Pyramid Context Network. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 9038-9045. https://doi.org/10.1609/aaai.v33i01.33019038
Deng, D., et al. (2022) PixelLink: Detecting Scene Text via Instance Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 32. https://doi.org/10.1609/aaai.v32i1.12269
Wang, W., et al. (2019) Shape Robust Text Detection with Progressive Scale Expansion Network. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 9328-9337. https://doi.org/10.1109/cvpr.2019.00956
Wang, W., et al. (2019) Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 8439-8448. https://doi.org/10.1109/iccv.2019.00853
Liu, Y., et al. (2021) FCENet: An Instance Segmentation Model for Extracting Figures and Captions from Material Documents. IEEE Access, 9, 551-564. https://doi.org/10.1109/access.2020.3046496
Chen, H., et al. (2020) BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 8570-8578. https://doi.org/10.1109/cvpr42600.2020.00860
Wang, W., et al. (2021) PAN : Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 5349-5367. https://doi.org/10.1109/tpami.2021.3077555
Qian, X., et al. (2020) MGPAN: Mask Guided Pixel Aggregation Network. 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, 25-28 October 2020, 1981-1985. https://doi.org/10.1109/icip40778.2020.9190897
Fu, Z., et al. (2023) Learning Pixel Affinity Pyramid for Arbitrary-Shaped Text Detection. ACM Transactions on Multimedia Computing, Communications, and Applications, 19, Article No. 29. https://doi.org/10.1145/3524617
Li, H., et al. (2023) Arbitrary Shape Scene Text Detector with Accurate Text Instance Generation Based on Instance-Relevant Contexts. Multimedia Tools and Applications, 82, 17827-17852. https://doi.org/10.1007/s11042-022-13897-7
Zhang, S.-X., et al. (2022) Arbitrary Shape Text Detection via Segmentation with Probability Maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 2736-2750. https://doi.org/10.1109/tpami.2022.3176122
Ye, J, et al. (2020) TextFuseNet: Scene Text Detection with Richer Fused Features. Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI-20 2020), 516-522. https://doi.org/10.24963/ijcai.2020/72
Xu, Y., et al. (2019) TextField: Learning a Deep Direction Field for Irregular Scene Text Detection. IEEE Transactions on Image Processing, 28, 5566-5579. https://doi.org/10.1109/tip.2019.2900589
Liu, Z., et al. (2021) MFECN: Multi-Level Feature Enhanced Cumulative Network for Scene Text Detection. ACM Transactions on Multimedia Computing, Communications, and Applications, 17, Article No. 78. https://doi.org/10.1145/3440087
Song, X., et al. (2020) TK-Text: Multi-Shaped Scene Text Detection via Instance Segmentation. MultiMedia Modeling, Daejeon, 5-8 January 2020, 201-213. https://doi.org/10.1007/978-3-030-37734-2_17
Wu, Y., et al. (2021) Multiple Attention Encoded Cascade R-CNN for Scene Text Detection. Journal of Visual Communication and Image Representation, 80, Article 103261. https://doi.org/10.1016/j.jvcir.2021.103261
Yang, P., et al. (2020) Instance Segmentation Network with Self-Distillation for Scene Text Detection. IEEE Access, 8, 45825-45836. https://doi.org/10.1109/access.2020.2978225
Sheng, T., et al. (2021) CentripetalText: An Efficient Text Instance Representation for Scene Text Detection. https://doi.org/10.48550/arXiv.2107.05945
Zhu, Y. and Du, J. (2021) TextMountain: Accurate Scene Text Detection via Instance Segmentation. Pattern Recognition, 110, Article 107336. https://doi.org/10.1016/j.patcog.2020.107336
Hu, Z., et al. (2021) TCATD: Text Contour Attention for Scene Text Detection. 2020 25th International Conference on Pattern Recognition (ICPR), Milan, 10-15 January 2021, 1083-1088. https://doi.org/10.1109/icpr48806.2021.9412223