Student Behavior Recognition Based on Multi-scale Deformable Graph Convolution via Feature Fusion and Its Application in English Teaching

Kaikai  Liang

doi:10.6180/jase.202605_29(5).0023

Student Behavior Recognition Based on Multi-scale Deformable Graph Convolution via Feature Fusion and Its Application in English Teaching

Research Categories

Kaikai LiangThis email address is being protected from spambots. You need JavaScript enabled to view it.

School of Foreign Languages, Zhengzhou University of Science and Technology, Zhengzhou, 450064 China

Received: August 4, 2025
Accepted: September 27, 2025
Publication Date: October 18, 2025

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202605_29(5).0023

This paper proposes a multi-scale deformable graph convolutional network (MD-GCN), which integrates spatio-temporal and semantic features to accurately identify students’ behaviors in the classroom, and verifies its immediate feedback value in English teaching. The model takes the classroom monitoring sequence as the input, it first uses lightweight pose estimation to extract joint coordinates and construct a dynamic human body graph. Then it designs multi-scale deformable convolution kernels to adaptively capture the topological changes of fine-grained actions such as raising hands, reading, and whispering, and through the cross-layer feature fusion module, aggregates short-term poses, long-term trajectories, and text semantic information. On the English classroom dataset, the mAP of the proposed method reaches 91.7%, and the inference speed is 38 FPS, which can meet real-time requirements. The teaching experiment shows that after the system pushes the recognition results to the teacher’s end in real time, the response rate of classroom questions increases by 24%, and the duration of students’ concentration increases by 17%. This research provides an expandable artificial intelligence solution for smart intervention in smart classrooms.

Keywords: multi-scale deformable graph convolutional network, spatio-temporal and semantic feature, student behavior recognition, English teaching

[1] X. Liu, Q. Zhang, W. Min, G. Geng, and S. Jiang, (2025)“Solutions and challenges in AI-basedpest and diseasere cognition "Computers and Electronics in Agriculture 238: 110775. DOI: 10.1016/j.compag.2025.110775.
[2] S. Yin, H. Li, A. A. Laghari, L. Teng, T. R. Gadekallu, and A. Almadhor, (2024) “FLSN-MVO: edge computing and privacy protection based on federated learning Siamese network with multi-verse optimization algorithm for industry 5.0" IEEE Open Journal of the Communications Society: DOI: 10.1109/OJCOMS.2024. 3520562.
[3] Y. Xue, T. Schincariol, T. Chadefaux, and D. Groen, (2025) “Using machine learning to forecast conflict events for use in forced migration models" Scientific Reports 15(1): 28202. DOI: 10.1038/s41598-025-11812-2.
[4] K. Cao, J. Ji, Z. Cao, C.-Y. Chang, and J. C. Niebles. “Few-shot video classification via temporal align ment”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020, 10618 10627. DOI: 10.1109/CVPR42600.2020.01063.
[5] Z. Du, X. Wang, and Q. Wang, (2023) “Self-supervised global spatio-temporal interaction pre-training for group activity recognition" IEEE Transactions on Circuits and Systems for Video Technology 33(9): 5076–5088. DOI: 10.1109/TCSVT.2023.3249906.
[6] G. Andresini, A. Appice, N. Di Mauro, C. Loglisci, and D. Malerba, (2020) “Multi-channel deep feature learning for intrusion detection" IEEE Access 8: 53346 53359. DOI: 10.1109/ACCESS.2020.2980937.
[7] W. Fan, Y. Wang, C. Wang, Y. Zhang, W. Wang, and D. Zhou, (2025) “Semantic-Guided Global-Local Collaborative Networks for Lightweight Image Super-Resolution" IEEE Transactions on Instrumentation and Measurement: DOI: 10.1109/TIM.2025.3556171.
[8] W.Arrighetti, (2017) “The academy color encoding sys tem (ACES): Aprofessional color-management framework for production, post-production and archival of still and motion pictures" Journal of Imaging 3(4): 40. DOI: 10.3390/jimaging3040040.
[9] J. Liang, C. Xu, Z. Feng, and X. Ma, (2016) “Affective interaction recognition using spatio-temporal features and context" Computer Vision and Image Understanding 144: 155–165. DOI: 10.1016/j.cviu.2015.10.008.
[10] X. Lin, X. Zhang, X. Zhang, X. Chen, and X. Chen, (2023) “DSDAN: Dual-step domain adaptation network based on bidirectional knowledge distillation for cross-user myoelectric pattern recognition" IEEE Sensors Journal 23(21): 26765–26775. DOI: 10.1109/JSEN.2023.3305619.
[11] H. Gao, G.-S. Xie, R. Yan, Q. Cui, H. Qu, and X. Shu, (2024) “Hierarchical Motion-Enhanced Matching Frame work for Few-Shot Action Recognition" IEEE Transactions on Multimedia: DOI: 10.1109/TMM.2024.3521712.
[12] X. Wang, S. Zhang, Z. Qing, M. Tang, Z. Zuo, C. Gao, R. Jin, and N. Sang. “Hybrid relation guided set matching for few-shot action recognition”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, 19948–19957. DOI: 10.1109/CVPR52688.2022.01932.
[13] S. Yin, H. Li, A. A. Laghari, T. R. Gadekallu, G. A. Sampedro, and A. Almadhor, (2024) “An anomaly detection model based on deep auto-encoder and capsule graph convolution via sparrow search algorithm in 6G In ternet of Everything" IEEE Internet of Things Journal 11(18): 29402–29411. DOI: 10.1109/JIOT.2024.3353337.
[14] Y. Yan, R. Liu, Z. Ding, X. Du, J. Chen, and Y. Zhang, (2019) “A parameter-free cleaning method for SMOTE in imbalanced classification" IEEE Access 7: 23537–23548. DOI: 10.1109/ACCESS.2019.2899467.
[15] D. Freire-Obregón, P. Barra, M. Castrillón-Santana, and M.D.Marsico, (2022) “Inflated 3D ConvNet con text analysis for violence detection" Machine Vision and Applications 33(1): 15. DOI: 10.1007/s00138-021-01264-9.
[16] G. H. Messias, S. Iasulaitis, and A. D. B. Valejo, (2025) “Graph neural networks for positive and unlabeled learning: a rewiring approach" Knowledge and Information Systems: 1–23. DOI: 10.1007/s10115-025-02547-7.
[17] Y. Jiang and S. Yin, (2023) “Heterogenous-view occluded expression data recognition based on cycle-consistent ad versarial network and K-SVD dictionary learning under intelligent cooperative robot environment" Computer Science and Information Systems 20(4): 1869–1883. DOI: 10.2298/CSIS221228034J.
[18] X. Zhao, C. Tang, H. Hu, W. Wang, S. Qiao, and A. Tong, (2025) “Attention mechanism based multimodal feature fusion network for human action recognition" Journal of Visual Communication and Image Representation: 104459. DOI: 10.1016/j.jvcir.2025.104459.
[19] W. Zhang, G. Yu, and S. Deng, (2024) “SABO–LSTM: A Novel Human Behavior Recognition Method for Wear able Devices" Journal of Engineering2024(1): 5604741. DOI: 10.1155/je/5604741.
[20] L. Jin, R. Fan, X. Han, and X. Cui, (2025) “Convolutional Spatio-Temporal Sequential Inference Model for Human Interaction Behavior Recognition" Frontiers in Computer Science 7: 1576775. DOI: 10.3389/fcomp.2025.1576775.
[21] H. C. Liu, X. M. Zhao, A. S. M. Khairuddin, J. H. Chuah, and L. M. Fang, (2025) “CECL: Context Embedded Contrastive Learning for Enhanced Recognition of Abnormal Behavior" IEEE Internet of Things Journal: DOI: 10.1109/JIOT.2025.3593893.