Journal of Applied Science and Engineering

Published by Tamkang University Press

1.30

Impact Factor

2.10

CiteScore

Hao XiaThis email address is being protected from spambots. You need JavaScript enabled to view it.

College of Intelligent Manufacturing, Luohe Food Engineering Vocational University, Luohe 462300, China


 

Received: September 15, 2025
Accepted: October 13, 2025
Publication Date: November 22, 2025

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.


Download Citation: ||https://doi.org/10.6180/jase.202606_29(6).0025  


To address challenges such as large variations in button sizes, strong reflective interference, and navigation drift of robotic arms in industrial environments, a multimodal perception and navigation method was proposed by integrating a multi-scale attention YOLOv8s detection network with a semantic-inertial collaborative Simultaneous Localization and Mapping (SLAM) system. In the detection module, a Cross-Scale Aggregation Module (CSAM) and a dual-gated attention mechanism were introduced to improve detection accuracy for small objects and complex backgrounds. In the navigation module, semantic factor embedding and Inertial Measurement Unit (IMU) pre-integration were jointly optimized to achieve synergy between semantic and geometric constraints. A dynamic probabilistic graph controlled by confidence gating and a spatiotemporal cost function were further designed. Combined with chance-constrained reinforcement learning, these enabled trajectory optimization and risk control. Experimental results showed that YOLOv8s-CSAM achieved a mean Average Precision (mAP) exceeding 0.90 (with a maximum of 0.931 ) across three datasets, and the recall rate for small targets reached 86.9%. The semantic-inertial SLAM system reduced the Root Mean Square Error (RMSE) of trajectories to 0.34 m , with more than 92% of predicted points having a lateral error below 0.2 m . The method maintained stable performance under complex lighting and vibration conditions, demonstrating robustness and generalization in multi-interference industrial environments. It provides a unified algorithmic framework for autonomous perception and navigation in intelligent manufacturing.


Keywords: small object detection; multi-scale attention; YOLOv8s-CSAM; semanticinertial SLAM; trajectory optimization


  1. [1] A. Pore, Z. Li, D. Dall’Alba, A. Hernansanz, E. De Momi, A. Menciassi, A. C. Gelpi, J. Dankelman, P. Fiorini,andE.VanderPoorten,(2023)“Autonomous navigation for robot-assisted intraluminal and endovascular procedures: A system aticreview "IEEE Transactions onRobotics39:2529–2548.DOI:10.1109/TRO.2023.32693843.
  2. [2] J.A.J.Builes,G.A.Amaya,andJ.L.Velásquez,(2023) “Autonomous navigation and indoor map for a service robot "Investigacióne Innovaciónen Ingenierías11: 28–38.DOI:10.17081/invinno.11.2.6459.
  3. [3] Q.Wang,X.Du,D.Jin,andL.Zhang,(2022)“Real time ultrasound doppler tracking and autonomous navigation of a miniature helical robot for accelerating thrombolysis in dynamic blood flow "ACS Nano16:604–616. DOI:10.1021/acsnano.1c07830.
  4. [4] A. Abanay, L. Masmoudi, M. El Ansari, J. Gonzalez Jimenez, and F. A. Moreno, (2022)“LIDAR-based autonomous navigation method for an agricultural mobile robot in straw berry green house: Agri Eco Robot "AIMS Electronics and Electrical Engineering6:317–328. DOI:10.3934/electreng.2022019.
  5. [5] J. Chen, M. Zhuang, B. Tao, Y. Wu,L.Ye,andF.Wang, (2024)“Accuracy of immediated entalimplant placement with task-autonomous robotic system and navigation system: An invitro study "Clinical Oral Implants Research35: 973–983.DOI:10.1111/clr.14104.
  6. [6] A. Kaleem, S. Hussain, M. Aqib, M. J. M. Cheema, S. R. Saleem, and U. Farooq, (2023) “Development challenges of fruit-harvesting robotic arms: A critical re view" Agri Engineering 5: 2216–2237. DOI: 10.3390/agriengineering5040136.
  7. [7] B. Zhou, J. Yi, X. Zhang, L. Chen, D. Yang, F. Han, and H. Zhang, (2022) “An autonomous navigation approach for unmanned vehicle in outdoor unstructured terrain with dynamic and negative obstacles" Robotica 40: 2831 2854. DOI: 10.1017/S0263574721001983.
  8. [8] A. Saha and B. C. Dhara, (2024) “3D LiDAR-based obstacle detection and tracking for autonomous navigation in dynamic environments" International Journal of Intelligent Robotics and Applications 8: 39–60. DOI: 10.1007/s41315-023-00302-1.
  9. [9] J. Zhang, Z. I. Ye, X. Jin, J. Wang, and J. Zhang, (2022) “Real-time traffic sign detection based on multiscale attention and spatial information aggregator" Journal of Real-Time Image Processing 19: 1155–1167. DOI: 10.1007/s11554-022-01252-w.
  10. [10] W. Zhang, D. Chen, Y. Xiao, and H. Yin, (2023) “Semi supervised contrast learning based on multiscale attention and multitarget contrast learning for bearing fault diagnosis" IEEE Transactions on Industrial Informatics 19: 10056–10068. DOI: 10.1109/TII.2023.3233960.
  11. [11] S. Poomam and J. J. R. Angelina, (2024) “BrainNeu roNet: Advancing brain tumor detection with hierarchical transformers and multiscale attention" International Journal of Information Technology 16: 4749–4756. DOI: 10.1007/s41870-024-02216-y.
  12. [12] H. Chu, W. Wang, and L. Deng, (2022) “Tiny-Crack Net: A multiscale feature fusion network with attention mechanisms for segmentation of tiny cracks" Computer Aided Civil and Infrastructure Engineering 37: 1914–1931. DOI: 10.1111/mice.12881.
  13. [13] C. Yin, Y. Zheng, X. Ding, Y. Shi, J. Qin, and X. Guo, (2024) “Detection of coronary artery disease based on clinical phonocardiogram and multiscale attention convolutional compression network" IEEE Journal of Biomedical and Health Informatics 28: 1353–1362. DOI: 10. 1109/JBHI.2024.3354832.
  14. [14] H. Yang, H. Zhang, J. Xiao, Q. Wang, S. Sheng, C. Peng, and T. Zhang, (2024) “Research on apple sur face defect detection based on improved YOLOv8" Proceedings of the 2024 IEEE 7th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) 7: 1129–1133. DOI: 10.1109/IAEAC59436.2024.10503873.
  15. [15] J. Wei, Z. Chen, and H. Xu, (2025) “Detection of egg appearance based on Fastner Net and YOLOv5 model" FoodandMachinery40:105–112,165. DOI: 10.13652/j.spjx.1003.5788.2023.81137.
  16. [16] X. Zhao and P. Chang, (2024) “Improvement and pruning lightweight research of low-light target detection model based on layer aggregation network and cross-stage partial network" Journal of Electronic Imaging 33: 063013. DOI: 10.1117/1.JEI.33.6.063013.
  17. [17] Z.Hao,(2023)“Methodforidentifying motor vehicle traffic violations based on improved YOLO network" Scalable Computing: Practice and Experience 24: 217 228. DOI: 10.12694/scpe.v24i3.2335.
  18. [18] F. Wang and Y. Luo, (2024) “A study on corn pest detection based on improved YOLOv7" Proceedings of the 2024 7th International Conference on Computer Information Science and Application Technology (CISAT): 1039–1047. DOI: 10.1109/CISAT62382.2024.10695113.
  19. [19] Q. Bo, W. Ma, Y.-K. Lai, and H. Zha, (2022) “All higher-stages-in adaptive context aggregation for semantic edge detection" IEEE Transactions on Circuits and Systems for Video Technology 32: 6778–6791. DOI: 10.1109/TCSVT.2022.3170048.
  20. [20] S. M. Al-Selwi, M. F. Hassan, S. J. Abdulkadir, et al., (2023) “LSTM inefficiency in long-term dependencies regression problems" Journal of Advanced Research in Applied Sciences and Engineering Technology 30: 16–31. DOI: 10.37934/araset.30.3.1631.
  21. [21] X. Liu, C. Zhao, Z. Li, and C. Feng, (2025) “Ripeness identification of pitaya fruit based on YOLOv8 and PSP Ellipse" Food and Machinery 40: 122–128. DOI: 10.37934/araset.30.3.1631.