[1] |
Bemelmans, R., Gelderblom, G.J., Jonker, P. and de Witte, L. (2012) Socially Assistive Robots in Elderly Care: A Systematic Review into Effects and Effectiveness.Journal of the American Medical Directors Association, 13, 114-120.E1. https://doi.org/10.1016/j.jamda.2010.10.002 |
[2] |
Bolme, D., Beveridge, J.R., Draper, B.A. and Lui, Y.M. (2010) Visual Object Tracking Using Adaptive Correlation Filters. 2010IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 13-18 June 2010, 2544-2550. https://doi.org/10.1109/cvpr.2010.5539960 |
[3] |
Dee, H.M. and Velastin, S.A. (2007) How Close Are We to Solving the Problem of Automated Visual Surveillance?Machine Vision and Applications, 19, 329-343. https://doi.org/10.1007/s00138-007-0077-z |
[4] |
Feichtenhofer, C., Pinz, A. and Wildes, R.P. (2017) Spatiotemporal Multiplier Networks for Video Action Recognition. 2017IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, 21-26 July 2017, 7445-7454. https://doi.org/10.1109/cvpr.2017.787 |
[5] |
李宝珍, 张晋, 王宝录, 等. 融合多层次视觉信息的人物交互动作识别[J]. 计算机科学, 2022, 49(S2): 643-650. |
[6] |
吴伟, 刘泽宇. 基于图的人-物交互识别[J]. 计算机工程与应用, 2021, 57(3): 175-181. |
[7] |
Wang, T., Anwer, R.M., Khan, M.H., Khan, F.S., Pang, Y., Shao, L.,et al. (2019) Deep Contextual Attention for Human-Object Interaction Detection. 2019IEEE/CVF International Conference on Computer Vision(ICCV), Seoul, 27 October-2 November 2019, 5693-5701. https://doi.org/10.1109/iccv.2019.00579 |
[8] |
Wan, B., Zhou, D., Liu, Y., Li, R. and He, X. (2019) Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection. 2019IEEE/CVF International Conference on Computer Vision(ICCV), Seoul, 27 October-2 November 2019, 9468-9477. https://doi.org/10.1109/iccv.2019.00956 |
[9] |
Kim, B., Choi, T., Kang, J. and Kim, H.J. (2020) UnionDet: Union-Level Detector towards Real-Time Human-Object Interaction Detection.ComputerVision-ECCV2020, Glasgow, 23-28 August 2020, 498-514. https://doi.org/10.1007/978-3-030-58555-6_30 |
[10] |
Liao, Y., Liu, S., Wang, F., Chen, Y., Qian, C. and Feng, J. (2020) PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection. 2020IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle, 13-19 June 2020, 479-487. https://doi.org/10.1109/cvpr42600.2020.00056 |
[11] |
Wang, T., Yang, T., Danelljan, M., Khan, F.S., Zhang, X. and Sun, J. (2020) Learning Human-Object Interaction Detection Using Interaction Points. 2020IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle, 13-19 June 2020, 4115-4124. https://doi.org/10.1109/cvpr42600.2020.00417 |
[12] |
Zhong, X., Qu, X., Ding, C. and Tao, D. (2021) Glance and Gaze: Inferring Action-Aware Points for One-Stage Human-Object Interaction Detection. 2021IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Nashville, 20-25 June 2021, 13229-13238. https://doi.org/10.1109/cvpr46437.2021.01303 |
[13] |
Newell, A., Yang, K. and Deng, J. (2016) Stacked Hourglass Networks for Human Pose Estimation.ComputerVision-ECCV2016, Amsterdam, 11-14 October 2016, 483-499. https://doi.org/10.1007/978-3-319-46484-8_29 |
[14] |
Yu, F., Wang, D., Shelhamer, E. and Darrell, T. (2018) Deep Layer Aggregation. 2018IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 2403-2412. https://doi.org/10.1109/cvpr.2018.00255 |
[15] |
Vaswani, A., Shazeer, N., Parmar, N.,et al. (2017) Attention Is All You Need.Proceedings of the31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010. |
[16] |
Girshick, R. (2015) Fast R-CNN. 2015IEEE International Conference on Computer Vision(ICCV), Santiago, 7-13 December 2015, 1440-1448. https://doi.org/10.1109/iccv.2015.169 |
[17] |
Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Las Vegas, 27-30 June 2016, 779-788. https://doi.org/10.1109/cvpr.2016.91 |
[18] |
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q. and Tian, Q. (2019) CenterNet: Keypoint Triplets for Object Detection. 2019IEEE/CVF International Conference on Computer Vision(ICCV), Seoul, 27 October-2 November 2019, 6568-6577. https://doi.org/10.1109/iccv.2019.00667 |
[19] |
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A. and Zagoruyko, S. (2020) End-to-End Object Detection with Transformers.ComputerVision-ECCV2020, Glasgow, 23-28 August 2020, 213-229. https://doi.org/10.1007/978-3-030-58452-8_13 |
[20] |
Tamura, M., Ohashi, H. and Yoshinaga, T. (2021) QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information. 2021IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Nashville, 20-25 June 2021, 10405-10414. https://doi.org/10.1109/cvpr46437.2021.01027 |
[21] |
Zou, C., Wang, B., Hu, Y., Liu, J., Wu, Q., Zhao, Y.,et al. (2021) End-to-End Human Object Interaction Detection with HOI Transformer. 2021IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Nashville, 20-25 June 2021, 11820-11829. https://doi.org/10.1109/cvpr46437.2021.01165 |
[22] |
Kim, B., Lee, J., Kang, J., Kim, E. and Kim, H.J. (2021) HOTR: End-to-End Human-Object Interaction Detection with Transformers. 2021IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Nashville, 20-25 June 2021, 74-83. https://doi.org/10.1109/cvpr46437.2021.00014 |
[23] |
Chen, M., Liao, Y., Liu, S., Chen, Z., Wang, F. and Qian, C. (2021) Reformulating HOI Detection as Adaptive Set Prediction. 2021IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Nashville, 20-25 June 2021, 9000-9009. https://doi.org/10.1109/cvpr46437.2021.00889 |
[24] |
Zhang, A., Liao, Y., Liu, S.,et al. (2021) Mining the Benefits of Two-Stage and One-Stage HOI Detection.Advances in NeuralInformation Processing Systems, 34, 17209-17220. |
[25] |
Qu, X., Ding, C., Li, X., Zhong, X. and Tao, D. (2022) Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection. 2022IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), New Orleans, 18-24 June 2022, 19536-19545. https://doi.org/10.1109/cvpr52688.2022.01895 |
[26] |
Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M. and Zhang, L. (2022) DN-DETR: Accelerate DETR Training by Introducing Query DeNoising. 2022IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), New Orleans, 18-24 June 2022, 13609-13617. https://doi.org/10.1109/cvpr52688.2022.01325 |
[27] |
Chen, J., Wang, Y. and Yanai, K. (2023) Focusing on What to Decode and What to Train: Efficient Training with HOI Split Decoders and Specific Target Guided DeNoising. arXiv:2307.02291. |
[28] |
Gao, P., Zheng, M., Wang, X., Dai, J. and Li, H. (2021) Fast Convergence of DETR with Spatially Modulated Co-Attention. 2021IEEE/CVF International Conference on Computer Vision(ICCV), Montreal, 10-17 October 2021, 3601-3610. https://doi.org/10.1109/iccv48922.2021.00360 |
[29] |
Zhu, X., Su, W., Lu, L.,et al. (2020) Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv:2010.04159. https://doi.org/10.48550/arXiv.2010.04159 |
[30] |
Chen, J. and Yanai, K. (2023) QAHOI: Query-Based Anchors for Human-Object Interaction Detection. 2023 18th International Conference on Machine Vision and Applications(MVA), Hamamatsu, 23-25 July 2023, 1-5. https://doi.org/10.23919/mva57639.2023.10215534 |
[31] |
Ma, S., Wang, Y., Wang, S. and Wei, Y. (2024) FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 2415-2429. https://doi.org/10.1109/tpami.2023.3331738 |
[32] |
Kim, B., Mun, J., On, K., Shin, M., Lee, J. and Kim, E. (2022) MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection. 2022IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), New Orleans, 18-24 June 2022, 19556-19565. https://doi.org/10.1109/cvpr52688.2022.01897 |
[33] |
He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Las Vegas, 27-30 June 2016, 770-778. https://doi.org/10.1109/cvpr.2016.90 |
[34] |
Tan, M. and Le, Q. (2021) EfficientNetV2: Smaller Models and Faster Training. arXiv: 2104.00298. https://doi.org/10.48550/arXiv.2104.00298 |
[35] |
Dosovitskiy, A., Beyer, L., Kolesnikov, A.,et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929. |
[36] |
Park, J., Park, J. and Lee, J. (2023) ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection. 2023IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Vancouver, 17-24 June 2023, 17152-17162. https://doi.org/10.1109/cvpr52729.2023.01645 |
[37] |
Lim, J., Baskaran, V.M., Lim, J.M., Wong, K., See, J. and Tistarelli, M. (2023) ERNet: An Efficient and Reliable Human-Object Interaction Detection Network.IEEE Transactions on Image Processing, 32, 964-979. https://doi.org/10.1109/tip.2022.3231528 |
[38] |
Kamath, A., Singh, M., LeCun, Y., Synnaeve, G., Misra, I. and Carion, N. (2021) MDETR-Modulated Detection for End-to-End Multi-Modal Understanding. 2021IEEE/CVF International Conference on Computer Vision(ICCV), Montreal, 10-17 October 2021, 1760-1770. https://doi.org/10.1109/iccv48922.2021.00180 |
[39] |
Cai, Z., Kwon, G., Ravichandran, A., Bas, E., Tu, Z., Bhotika, R.,et al. (2022) X-DETR: A Versatile Architecture for Instance-Wise Vision-Language Tasks.Computer Vision-ECCV2022, Tel Aviv, 23-27 October 2022, 290-308. https://doi.org/10.1007/978-3-031-20059-5_17 |
[40] |
Li, L.H., Zhang, P., Zhang, H., Yang, J., Li, C., Zhong, Y.,et al. (2022) Grounded Language-Image Pre-Training. 2022IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), New Orleans, 18-24 June 2022, 10955-10965. https://doi.org/10.1109/cvpr52688.2022.01069 |
[41] |
Yao, L., Han, J., Wen, Y.,et al. (2022) DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-Training for Open-World Detection.Advances in Neural Information Processing Systems, 35, 9125-9138. |
[42] |
Liao, Y., Zhang, A., Lu, M., Wang, Y., Li, X. and Liu, S. (2022) GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection. 2022IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), New Orleans, 18-24 June 2022, 20091-20100. https://doi.org/10.1109/cvpr52688.2022.01949 |
[43] |
Radford, A., Kim, J.W., Hallacy, C.,et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. arXiv:2103.00020. https://doi.org/10.48550/arXiv.2103.00020 |
[44] |
Yuan, H., Jiang, J., Albanie, S.,et al. (2022) RLIP: Relational Language-Image Pre-Training for Human-Object Interaction Detection.Advances in Neural Information Processing Systems, 35, 37416-37431. |
[45] |
Yuan, H., Zhang, S., Wang, X., Albanie, S., Pan, Y., Feng, T.,et al. (2023) RLIPv2: Fast Scaling of Relational Language-Image Pre-training. 2023IEEE/CVF International Conference on Computer Vision(ICCV), Paris, 1-6 October 2023, 21592-21604. https://doi.org/10.1109/iccv51070.2023.01979 |
[46] |
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J.,et al. (2020) The Open Images Dataset V4.International Journal of Computer Vision, 128, 1956-1981. https://doi.org/10.1007/s11263-020-01316-z |
[47] |
Shao, S., Li, Z., Zhang, T., Peng, C., Yu, G., Zhang, X.,et al. (2019) Objects365: A Large-Scale, High-Quality Dataset for Object Detection. 2019IEEE/CVF International Conference on Computer Vision(ICCV), Seoul, 27 October-2 November 2019, 8429-8438. https://doi.org/10.1109/iccv.2019.00852 |
[48] |
Li, J., Li, D., Xiong, C.,et al. (2022) BLIP: Bootstrapping Language-Image Pre-Training for Unified Vision-Language Understanding and Generation. arXiv:2201.12086, https://doi.org/10.48550/arXiv.2201.12086 |
[49] |
Ning, S., Qiu, L., Liu, Y. and He, X. (2023) HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models. 2023IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Vancouver, 17-24 June 2023, 23507-23517. https://doi.org/10.1109/cvpr52729.2023.02251 |
[50] |
Zhang, F.Z., Campbell, D. and Gould, S. (2021) Spatially Conditioned Graphs for Detecting Human-Object Interactions. 2021IEEE/CVF International Conference on Computer Vision(ICCV), Montreal, 10-17 October 2021, 13299-13307. https://doi.org/10.1109/iccv48922.2021.01307 |
[51] |
Zhang, F.Z., Campbell, D. and Gould, S. (2022) Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer. 2022IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), New Orleans, 18-24 June 2022, 20072-20080. https://doi.org/10.1109/cvpr52688.2022.01947 |
[52] |
Zhang, Y., Pan, Y., Yao, T., Huang, R., Mei, T. and Chen, C. (2022) Exploring Structure-Aware Transformer over Interaction Proposals for Human-Object Interaction Detection. 2022IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), New Orleans, 18-24 June 2022, 19526-19535. https://doi.org/10.1109/cvpr52688.2022.01894 |
[53] |
Zhang, F.Z., Yuan, Y., Campbell, D., Zhong, Z. and Gould, S. (2023) Exploring Predicate Visual Context in Detecting of Human-Object Interactions. 2023IEEE/CVF International Conference on Computer Vision(ICCV), Paris, 1-6 October 2023, 10377-10387. https://doi.org/10.1109/iccv51070.2023.00955 |
[54] |
Gupta, S. and Malik, J. (2015) Visual Semantic Role Labeling. arXiv: 1505.04474. |
[55] |
Chao, Y., Liu, Y., Liu, X., Zeng, H. and Deng, J. (2018) Learning to Detect Human-Object Interactions. 2018IEEE Winter Conference on Applications of Computer Vision(WACV), Lake Tahoe, 12-15 March 2018, 381-389. https://doi.org/10.1109/wacv.2018.00048 |
[56] |
Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D.,et al. (2014) Microsoft COCO: Common Objects in Context.ComputerVision-ECCV2014, Zurich, 6-12 September 2014, 740-755. https://doi.org/10.1007/978-3-319-10602-1_48 |