Artificial intelligence-based Bayesian optimization and transformer model for tennis motion recognition

Shaowei  Shi; Kun  Huang

doi:10.6180/jase.202601_29(1).0017

Artificial intelligence-based Bayesian optimization and transformer model for tennis motion recognition

Physics

Shaowei Shi and Kun HuangThis email address is being protected from spambots. You need JavaScript enabled to view it.

Faculty of Physical Education, Hubei University of Arts and Science, Xiangyang 441053, China

Received: April 7, 2025
Accepted: April 29, 2025
Publication Date: May 10, 2025

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202601_29(1).0017

Because the traditional methods are used to analyze human motion behavior, there are large errors and serious over-fitting phenomenon, so a novel tennis motion recognition based on Bayesian optimization and transformer model is proposed in this paper. First, we use an improved generative adversarial network to optimize heat map location detection of human key points. A human key point recognition algorithm is designed based on Transformer. Then, the optimal pruning rate of each layer of the network is found by using Bayesian optimization algorithm to improve the efficiency and accuracy of subnet search. Finally, the proposed method is tested on several mainstream behavior recognition datasets. The results show that the recognition accuracy rates with proposed method on UCF101, HMDB51 and Something-SomethingV1 datasets are 97.6%, 73.8% and 66.7%, respectively, when only RGB video frames are used as input. It can be seen that the proposed method can efficiently extract the spatio-temporal features of motion.

Keywords: Tennis motion recognition; Bayesian optimization; Transformer model; Key point recognition; Artificial intelligence

[1] T. Alpay, S. Magg, P. Broze, and D. Speck, (2023) “Multimodal video retrieval with CLIP: a user study" Information Retrieval Journal 26(1): 6. DOI: 10.1007/s10791-023-09425-2.
[2] R. Zuo, X. Deng, K. Chen, Z. Zhang, Y.-K. Lai, F. Liu, C. Ma, H. Wang, Y.-J. Liu, and H. Wang, (2023) “Fine-grained video retrieval with scene sketches" IEEE Transactions on Image Processing 32: 3136–3149. DOI: 10.1109/TIP.2023.3278474.
[3] L. Teng, (2023) “Brief Review of Medical Image Segmentation Based on Deep Learning [J]" IJLAI Transactions on Science and Engineering 1(02): 01–08.
[4] S. Sharma and K. Guleria. “Deep learning models for image classification: comparison and applications”. In: 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE). IEEE. 2022, 1733–1738. DOI: 10.1109/ICACITE53722.2022.9823516.
[5] J. Yang, R. Shi, D. Wei, Z. Liu, L. Zhao, B. Ke, H. Pfister, and B. Ni, (2023) “Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification" Scientific Data 10(1): 41. DOI: 10.1038/s41597-022-01721-8.
[6] T. Liu, Y. Ma, W. Yang, W. Ji, R. Wang, and P. Jiang, (2022) “Spatial-temporal interaction learning based two stream network for action recognition" Information Sciences 606: 864–876. DOI: 10.1016/j.ins.2022.05.092.
[7] S. Yin, H. Li, A. A. Laghari, T. R. Gadekallu, G. A. Sampedro, and A. Almadhor, (2024) “An Anomaly Detection Model Based on Deep Auto-Encoder and Cap sule Graph Convolution via Sparrow Search Algorithm in 6G Internet of Everything" IEEE Internet of Things Journal 11(18): 29402–29411. DOI: 10.1109/JIOT.2024.3353337.
[8] I. U. Khan and J. W. Lee, (2024) “PAR-Net: An Enhanced Dual-Stream CNN–ESN Architecture for Human Physical Activity Recognition" Sensors 24(6): 1908. DOI: 10.3390/s24061908.
[9] Y. Mou, X. Jiang, K. Xu, T. Sun, and Z. Wang, (2023) “Compressed video action recognition with dual-stream and dual-modal transformer" IEEE Transactions on Circuits and Systems for Video Technology 34(5): 3299–3312. DOI: 10.1109/TCSVT.2023.3319140.
[10] D. Chen, M. Wu, T. Zhang, and C. Li, (2023) “Fea ture fusion for dual-stream cooperative action recognition" IEEE Access 11: 116732–116740. DOI: 10.1109/ ACCESS.2023.3325401.
[11] Q. Ren, Z. Lu, H. Wu, J. Zhang, and Z. Dong, (2023) “HR-Net: Al and mark based high realistic face reenactment network" IEEE Transactions on Circuits and Systems for Video Technology 33(11): 6347–6359. DOI: 10. 1109/TCSVT.2023.3268062.
[12] P. S. Yee, K. M. Lim, and C. P. Lee, (2022) “DeepScene: Scene classification via convolutional neural network with spatial pyramid pooling" Expert Systems with Applications 193: 116382. DOI: 10.1016/j.eswa.2021.116382.
[13] T. Zhou, Q. Li, H. Lu, Q. Cheng, and X. Zhang, (2023) “GANreview: Models and medical image fusion applications" Information Fusion 91: 134–148. DOI: 10.1016/j.inffus.2022.10.017.
[14] Z. Liu, J. Ning, Y. Cao, Y. Wei, Z. Zhang, S. Lin, and H. Hu. “Video swin transformer”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, 3202–3211. DOI: 10.1109/CVPR52688.2022.00320.
[15] G. Mai, K. Janowicz, Y. Hu, S. Gao, B. Yan, R. Zhu, L. Cai, and N. Lao, (2022) “A review of location encod ing for GeoAI: methods and applications" International Journal of Geographical Information Science 36(4): 639–673. DOI: 10.1080/13658816.2021.2004602.
[16] M. Jiang and S. Yin, (2023) “Facial expression recognition based on convolutional block attention module and multi-feature fusion" International journal of computational vision and robotics 13(1): 21–37. DOI: 10.1504/IJCVR.2023.127298.
[17] M. Ramesh and K. Mahesh, (2022) “Sports Video Classification Framework Using Enhanced Threshold Based Keyframe Selection Algorithm and Customized CNN on UCF101 and Sports1-M Dataset" Computational Intelligence and Neuroscience 2022(1): 3218431. DOI: 10.1155/2022/3218431.
[18] S. Alamuru and S. Jain, (2024) “Effective Video Event Detection Using Optimized Bidirectional Long Short Term Memory Network" International Journal of Information Technology & Decision Making 23(05): 1911–1933. DOI: 10.1142/S0219622023500621.
[19] Q. Tian, K. Wang, B. Liu, and Y. Wang. “Multi-kernel excitation network for video action recognition”. In: 2022 16th IEEE international conference on signal processing (ICSP). 1. IEEE. 2022, 155–159. DOI: 10.1109/ICSP56322.2022.9965286.
[20] X. Wang, J. Ding, Z. Zhang, J. Xu, and J. Gao, (2024) “Ipnet: Polarization-based camouflaged object detection via dual-flow network" Engineering Applications of Artificial Intelligence 127: 107303. DOI: 10.1016/j.engappai.2023.107303.
[21] H. Li, X. Li, L. Su, D. Jin, J. Huang, and D. Huang, (2022) “Deep spatio-temporal adaptive 3d convolutional neural networks for traffic flow prediction" ACM Transactions on Intelligent Systems and Technology (TIST) 13(2): 1–21. DOI: 10.1145/3510829.
[22] J. Lee and S. B. Kim, (2022) “Uncertainty-aware hierarchical segment-channel attention mechanism for reliable and interpretable multichannel signal classification" Neural Networks 150: 68–86. DOI: 10.1016/j.neunet.2022.02.019.
[23] S. Yin, L. Wang, M. Shafiq, L. Teng, A. A. Laghari, and M. F. Khan, (2023) “G2Grad-CAMRL: An object detection and interpretation model based on gradient-weighted class activation mapping and reinforcement learning in remote sensing images" IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 16: 3583–3598. DOI: 10.1109/JSTARS.2023.3241405.
[24] Z.Bingyu, L. Zhen, and Z. Jingxiang, (2022) “COVID 19 Detection Algorithm Combining Grad-CAM and Convolutional Neural Network" Journal of Frontiers of Computer Science & Technology 16(9): 2108. DOI: 10.3778/j.issn.1673-9418.2105117.