语义分割是计算机视觉中最具挑战性的任务之一。然而,在许多应用中,由于像素级标记的高成本,缺乏标记图像是一个常见的问题。这极大地阻碍了在一些注释大量图像成本高甚至不可行的领域的广泛应用。为了缓解这一问题,半监督语义分割被提出并受到广泛的关注,其中标记图像和未标记图像都被充分利用。本文首先介绍语义分割技术并引入半监督语义分割,然后对深度学习在半监督语义分割中的常用方法分类,随后重点对每种方法及其经典网络结构详细介绍并进行优劣对比,对现有的方法提出了几种可行的改进策略,最后对未来发展进行总结与展望。 Semantic segmentation is one of the most challenging tasks in computer vision. However, lack of marked images is a common problem in many applications due to the high cost of pixel-level marking. This greatly hinders widespread application in some fields where annotating large numbers of images is costly or even unfeasible. In order to alleviate this problem, semi-supervised semantic segmentation has been proposed and received widespread attention, in which both labeled and unlabeled images are fully utilized. This paper first introduces semantic segmentation technology and introduces semi-supervised semantic segmentation, then classifies common methods of deep learning in semi-supervised semantic segmentation, then focuses on each method and its classical network structure in detail and compares its advantages and disadvantages, puts forward several feasible improvement strategies for existing methods, and finally summarizes and looks forward to the future development.
深度学习,语义分割,半监督学习, Deep Learning
Semantic Segmentation
Semi-Supervised Learning
摘要
Semantic segmentation is one of the most challenging tasks in computer vision. However, lack of marked images is a common problem in many applications due to the high cost of pixel-level marking. This greatly hinders widespread application in some fields where annotating large numbers of images is costly or even unfeasible. In order to alleviate this problem, semi-supervised semantic segmentation has been proposed and received widespread attention, in which both labeled and unlabeled images are fully utilized. This paper first introduces semantic segmentation technology and introduces semi-supervised semantic segmentation, then classifies common methods of deep learning in semi-supervised semantic segmentation, then focuses on each method and its classical network structure in detail and compares its advantages and disadvantages, puts forward several feasible improvement strategies for existing methods, and finally summarizes and looks forward to the future development.
具体来说,基于平滑假设和聚类假设,具有不同标签的数据点在低密度区域分离,并且相似的数据点具有相似的输出。那么,如果对一个未标记的数据应用实际的扰动,其预测结果不应该发生显著变化,也就是输出具有一致性。由于这种方法一般基于模型输出的预测向量,不需要具体的标签,所以其刚好能能应用于半监督学习。通过在未标记数据上构造添加扰动后的预测结果 y ˜ 与正常预测结果y之间的无监督正则化损失项,提高模型的泛化能力。
李一彤,张长伦. 基于深度学习的半监督语义分割算法研究Algorithm Research of Deep Learning in Semi-Supervised Semantic Segmentation[J]. 人工智能与机器人研究, 2023, 12(04): 328-339. https://doi.org/10.12677/AIRR.2023.124036
参考文献References
Medley, D.O., Santiago, C. and Nascimento, J.C. (2021) Cycoseg: A Cyclic Collaborative Framework for Automated Medical Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 8167-8182.
https://doi.org/10.1109/TPAMI.2021.3113077
Orsic, M., Kreso, I., Bevandic, P. and Segvic, S. (2019) In Defense of Pre-Trained Imagenet Architectures for Real-Time Semantic Segmentation of Road-Driving Images. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 12599-12608.
https://doi.org/10.1109/CVPR.2019.01289
Mou, L., Hua, Y. and Zhu, X. (2019) A Relation-Augmented Fully Convolutional Network for Semantic Segmentation in Aerial Scenes. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 12408-12417.
https://doi.org/10.1109/CVPR.2019.01270
Luengo, J., Moreno, R., Sevillano, I., Charte, D., Pelaez-Vegas, A., Fernández-Moreno, M., Mesejo, P. and Herrera, F. (2022) A Tutorial on the Segmentation of Metallographic Images: Taxonomy, New Metaldam Dataset, Deep Learning-Based Ensemble Model, Experimental Analysis and Challenges. Information Fusion, 78, 232-253.
https://doi.org/10.1016/j.inffus.2021.09.018
Katircioglu, I., Rhodin, H., Constantin, V., Sporri, J., Salzmann, M. and Fua, P. (2021) Self-Supervised Human Detection and Segmentation via Background Inpainting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 9574-9588.
https://doi.org/10.1109/TPAMI.2021.3123902
Sakaridis, C., Dai, D. and Van Gool, L. (2022) Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 3139-3153.
https://doi.org/10.1109/TPAMI.2020.3045882
Chapelle, O., Schlkopf, B. and Zien, A. (2006) Semi-Supervised Learning. The MIT Press, Cambridge.
Rawat, W. and Wang, Z. (2017) Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review. Neural Computation, 29, 2352-2449.
https://doi.org/10.1162/neco_a_00990
Schroff, F., Criminisi, A. and Zisserman, A. (2008) Object Class Segmentation Using Random Forests. The British Machine Vision Conference, 1-10.
https://doi.org/10.5244/C.22.54
Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A. and Ramanan, D. (2009) Object Detection with Discriminatively Trained Part Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627-1645.
https://doi.org/10.1109/TPAMI.2009.167
Redmon, J., Divvala, S.K., Girshick, R.B. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788.
https://doi.org/10.1109/CVPR.2016.91
Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W. and Frangi, A., Eds., Medical Image Computing and Computer-Assisted Intervention, Vol. 9351, Springer, Cham, 234-241.
https://doi.org/10.1007/978-3-319-24574-4_28
Chen, L.-C., Papandreou, G., Schroff, F. and Adam, H. (2017) Rethinking Atrous Convolution for Semantic Image Segmentation. ArXiv, 3, 1-14.
Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) Imagenet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1097-1105
Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv, 6, 1-14.
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., et al. (2019) UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Transactions on Medical Imaging, 39, 1856-1867
https://doi.org/10.1109/TMI.2019.2959609
Milletari, F., Navab, N. and Ahmadi, S.A. (2016) V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. 2016 Fourth International Conference on 3D Vision (3DV), Stanford, 25-28 October 2016, 565-571.
https://doi.org/10.1109/3DV.2016.79
Yu, F. and Koltun, V. (2015) Multi-Scale Context Aggregation by Dilated Convolutions. ArXiv, 3, 1-13.
Zhao, H., Shi, J., Qi, X., et al. (2017) Pyramid Scene Parsing Network. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2881-2890.
https://doi.org/10.1109/CVPR.2017.660
Chen, L.C., Papandreou, G., Kokkinos, I., et al. (2017) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848.
https://doi.org/10.1109/TPAMI.2017.2699184
Chen, L.C., Papandreou, G., Schroff, F., et al. (2017) Rethinking Atrous Convolution for Semantic Image Segmentation. ArXiv, 3, 1-14.
Hung, W.-C., Tsai, Y.-H., Liou, Y.-T., Lin, Y.-Y. and Yang, M.-H. (2018) Adversarial Learning for Semisupervised Semantic Segmentation. BMVC.
https://doi.org/10.48550/arXiv.1802.07934
Mittal, S., Tatarchenko, M. and Brox, T. (2019) Semi-Supervised Semantic Segmentation with High- and Low-Level Consistency. TPAMI.
https://doi.org/10.48550/arXiv.1908.05724
Li, D., Yang, J., Kreis, K., Torralba, A. and Fidler, S. (2021) Semantic Segmentation with Generative Models: Semisupervised Learning and Strong Out-of-Domain Generalization. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 8296-8307.
https://doi.org/10.1109/CVPR46437.2021.00820
Souly, N., Spampinato, C. and Shah, M. (2017) Semi Supervised Semantic Segmentation Using Generative Adversarial Network. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 5689-5697.
https://doi.org/10.1109/ICCV.2017.606
Chen, Y., Ouyang, X., Zhu, K. and Agam, G. (2021) Complexmix: Semi-Supervised Semantic Segmentation via Mask-Based Data Augmentation. 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, 19-22 September 2021, 2264-2268.
https://doi.org/10.1109/ICIP42928.2021.9506602
Grubišić, I., Oršić, M. and Šegvić, S. (2021) A Baseline for Semi-Supervised Learning of Efficient Semantic Segmentation Models. 2021 17th International Conference on Machine Vision and Applications (MVA), Aichi, 25-27 July 2021, 1-5.
https://doi.org/10.23919/MVA51890.2021.9511402
Olsson, V., Tranheden, W., Pinto, J. and Svensson, L. (2021) Classmix: Segmentation-Based Data Augmentation for Semi-Supervised Learning. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2021, 1368-1377.
https://doi.org/10.1109/WACV48630.2021.00141
French, G., Laine, S., Aila, T., Mackiewicz, M. and Finlayson, G.D. (2020) Semi-Supervised Semantic Segmentation Needs Strong, Varied Perturbations. BMVC.
https://doi.org/10.48550/arXiv.1906.01916
Li, X., He, Q., Dai, S., Wu, P. and Tong, W. (2020) Semi-Supervised Semantic Segmentation Constrained by Consistency Regularization. 2020 IEEE International Conference on Multimedia and Expo (ICME), London, 6-10 July 2020, 1-6.
https://doi.org/10.1109/ICME46284.2020.9102851
Kim, J., Jang, J. and Park, H. (2020) Structured Consistency Loss for Semi-Supervised Semantic Segmentation. ArXiv, 2, 1-12.
Ouali, Y., Hudelot, C. and Tami, M. (2020) Semi-Supervised Semantic Segmentation with Cross-Consistency Training. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 12671-12681.
https://doi.org/10.1109/CVPR42600.2020.01269
An, S., Zhu, H., Zhang, J., Ye, J., Wang, S., Yin, J. and Zhang, H. (2022) Deep Tri-Training for Semi-Supervised Image Segmentation. IEEE Robotics and Automation Letters, 7, 10097-10104.
https://doi.org/10.1109/LRA.2022.3185768
Chen, X., Yuan, Y., Zeng, G. and Wang, J. (2021) Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 2613-2622.
https://doi.org/10.1109/CVPR46437.2021.00264
Peng, J., Estrada, G., Pedersoli, M. and Desrosiers, C. (2020) Deep Co-Training for Semi-Supervised Image Segmentation. Pattern Recognition, 107, Article ID: 107269.
https://doi.org/10.1016/j.patcog.2020.107269
Wu, Y., Liu, C., Chen, L., Zhao, D., Zheng, Q. and Zhou, H. (2022) Perturbation Consistency and Mutual Information Regularization for Semi-Supervised Semantic Segmentation. Multimedia Systems, 29, 511-523.
https://doi.org/10.1007/s00530-022-00931-9
Liu, Y., Tian, Y., Chen, Y., Liu, F., Belagiannis, V. and Carneiro, G. (2022) Perturbed and Strict Mean Teachers for Semi-Supervised Semantic Segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 4258-4267.
https://doi.org/10.1109/CVPR52688.2022.00422
Zhu, X.J. (2008) Semi-Supervised Learning Literature Survey. Computer Sciences TR, 1530, 52.
Yang, L., Zhuo, W., Qi, L., Shi, Y. and Gao, Y. (2022) ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 4268-4277.
https://doi.org/10.1109/CVPR52688.2022.00423
Teh, E.W., Devries, T., Duke, B., Jiang, R., Aarabi, P. andTaylor, G.W. (2022) The Gist and Rist of Iterative Self-Training for Semi-Supervised Segmentation. 2022 19th Conference on Robots and Vision (CRV), Toronto, 31 May-2 June 2022, 58-66.
https://doi.org/10.1109/CRV55824.2022.00016
Li, H. and Zheng, H. (2021) A Residual Correction Approach for Semi-Supervised Semantic Segmentation. In: Ma, H., et al., Eds., Pattern Recognition and Computer Vision, Vol. 13022. Springer, Cham, 90-102.
https://doi.org/10.1007/978-3-030-88013-2_8
Yuan, J., Liu, Y., Shen, C., Wang, Z. and Li, H. (2021) A Simple Baseline for Semi-Supervised Semantic Segmentation with Strong Data Augmentation. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 8209-8218.
https://doi.org/10.1109/ICCV48922.2021.00812
He, R., Yang, J. and Qi, X. (2021) Re-Distributing Biased Pseudo Labels for Semi-Supervised Semantic Segmentation: A Baseline Investigation. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 6930-6940.
https://doi.org/10.1109/ICCV48922.2021.00685
Zhu, Y., Zhang, Z., Wu, C., Zhang, Z., He, T., Zhang, H., Manmatha, R., Li, M. and Smola, A.J. (2021) Improving Semantic Segmentation via Efficient Self-Training. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1.
https://doi.org/10.1109/TPAMI.2021.3138337
Chen, Z., Zhang, R., Zhang, G., Ma, Z. and Lei, T. (2020) Digging into Pseudo Label: A Low-Budget Approach for Semisupervised Semantic Segmentation. IEEE Access, 8, 41830-41837.
https://doi.org/10.1109/ACCESS.2020.2975022
Zhang, F.H., Torr, P., Ranftl, R. and Richter, S.R. (2021) Looking beyond Single Images for Contrastive Semantic Segmentation Learning. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P.S., and Vaughan, J.W., Eds., Advances in Neural Information Processing Systems (NeurIPS), Curran Associates, Inc., New York.
Zhao, X.Y., Vemulapalli, R., Mansfield, P.A., Gong, B.Q., Green, B., Shapira, L. and Wu, Y. (2021) Contrastive Learning for Label Efficient Semantic Segmentation. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 10603-10613.
https://doi.org/10.1109/ICCV48922.2021.01045
Liu, S., Zhi, S., Johns, E. and Davison, A.J. (2021) Bootstrapping Semantic Segmentation with Regional Contrast. ArXiv, 4, 1-23.
Sohn, K., Berthelot, D., Carlini, N., Zhang, Z., Zhang, H., Raffel, C.A., Cubuk, E.D., Kurakin, A. and Li, C.-L. (2020) Fixmatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. Advances in Neural Information Processing Systems, 33, 596-608.
Lai, X., Tian, Z., Jiang, L., Liu, S., Zhao, H., Wang, L. and Jia, J. (2021) Semi-Supervised Semantic Segmentation with Directional Context-Aware Consistency. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 1205-1214.
https://doi.org/10.1109/CVPR46437.2021.00126