Multi-focus image fusion based on Transformer and weighted guided filtering

Haihui  Wu

doi:10.6180/jase.202605_29(5).0021

Multi-focus image fusion based on Transformer and weighted guided filtering

Computer Science and Information Engineering

Haihui WuThis email address is being protected from spambots. You need JavaScript enabled to view it.

Liaoning National Normal College, Fuxin, 123000, China

Received: June 8, 2025
Accepted: September 18, 2025
Publication Date: October 18, 2025

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202605_29(5).0021

In multi-focus image fusion tasks, traditional methods have the problem of uneven processing between focusing and defocusing boundary regions. To address the above problem, this paper proposes a novel multi-focus image fusion based on a Transformer and weighted guided filtering. In this paper, the generative adversarial network is the backbone network. The generator performs an end-to-end multi-focus image fusion task. The Transformer is used to obtain global dependencies and low-frequency spatial details. Then it uses a cross-domain cross attention mechanism to help generate the double branches to achieve the effect of information interaction, and obtains redundant information and complementary information. An improved guided filtering algorithm is used to fuse CbCr channels after RGB color space conversion. The experimental results show that the proposed algorithm is superior to other algorithms in both subjective visual evaluation and objective evaluation, and the image fusion quality is further improved.

Keywords: multi-focus image fusion, Transformer, weighted guided filtering, cross-domain cross-attention mechanism

[1] H.Zhai,L.Chen,B. Lin, M. Deng, andY. Yang, (2025) “Hybrid-domain parallel attention and collaborative inter action transformer for multi-focus image fusion" Applied Soft Computing: 113683. DOI: 10.1016/j.asoc.2025.113683.
[2] P. Wu and J. Tang, (2025) “MHDBN: Mamba-based Hybrid Dual-Branch Network for Multi-Focus Image Fusion" Neural Networks: 107916. DOI: 10.1016/j.neunet.2025.107916.
[3] S. Yin, L. Wang, T. Chen, H. Huang, J. Gao, J. Zhang, M. Liu, P. Li, and C. Xu, (2025) “LKAFormer: A Lightweight Kolmogorov-Arnold Transformer Model for Image Semantic Segmentation" ACM Transactions on Intelligent Systems and Technology: DOI: 10.1145/3759254.
[4] X. Luo, G. Fu, J. Yang, Y. Cao, and Y. Cao, (2023) “Multi-modal image fusion via deep Laplacian pyramid hybrid network" IEEE Transactions on Circuits and Systems for Video Technology 33(12): 7354–7369. DOI: 10.1109/TCSVT.2023.3281462.
[5] D. Onufriienko and Y. Taranenko, (2023) “Filtering and compression of signals by the method of discrete wavelet decomposition into one-dimensional series" Cybernetics and Systems Analysis 59(2): 331–338. DOI: 10.1007/s10559-023-00567-1.
[6] W.Li, (2023) “Texture feature extraction of a landscape design image based on the contour wave transform" International Journal of Data Science 8(1): 39–51. DOI: 10.1504/IJDS.2023.129456.
[7] S. Lv, X. Gong, B. Hu, Z. Xu, J. Zhang, and Q. Cheng, (2024) “Adaptive Non-subsampled Shearlet Transform and its Application to Surface Wave Suppression" IEEE Transactions on Geoscience and Remote Sensing: DOI: 10.1109/TGRS.2024.3477954.
[8] S. Li, X. Kang, and J. Hu, (2013) “Image fusion with guided filtering" IEEE Transactions on Image processing 22(7): 2864–2875. DOI: 10.1109/TIP.2013.2244222.
[9] L.Teng,Y. Qiao, M.Shafiq, G. Srivastava, A. R. Javed, T. R. Gadekallu, and S. Yin, (2023) “FLPK-BiSeNet: Federated learning based on priori knowledge and bilateral segmentation network for image edge extraction" IEEE Transactions on Network and Service Management 20(2): 1529–1542. DOI: 10.1109/TNSM.2023.3273991.
[10] T. Lindeberg, (2022) “Scale-covariant and scale invariant Gaussian derivative networks" Journal of Mathematical Imaging and Vision 64(3): 223–242. DOI: 10.1007/s10851-021-01057-9.
[11] D. Van der Weken, M. Nachtegael, and E. E. Kerre, (2004) “Using similarity measures and homogeneity for the comparison of images" Image and Vision Computing 22(9): 695–702. DOI: 10.1016/j.imavis.2004.03.002.
[12] Y. Zhang, X. Bai, and T. Wang, (2017) “Boundary finding based multi-focus image fusion through multi-scale morphological focus-measure" Information fusion 35: 81–101. DOI: 10.1016/j.inffus.2016.09.006.
[13] Z. Zhou, S. Li, and B. Wang, (2014) “Multi-scale weighted gradient-based fusion for multi-focus images" Information Fusion 20: 60–72. DOI: 10.1016/j.inffus.2013.11.005.
[14] L. Teng and Y. Qiao, (2022) “BiSeNet-oriented context attention model for image semantic segmentation" Computer Science and Information Systems 19(3): 1409 1426. DOI: 10.2298/CSIS220321040T.
[15] D. Avcı, E. Sert, F. Özyurt, and E. Avcı, (2024) “MFIF DWT-CNN:Multi-focus ımage fusion based on discrete wavelet transform with deep convolutional neural net work" Multimedia Tools and Applications 83(4): 10951–10968. DOI: 10.1007/s11042-023-16074-6.
[16] L. Li, C. Li, X. Lu, H. Wang, and D. Zhou, (2023) “Multi-focus image fusion with convolutional neural net work based on Dempster-Shafer theory" Optik 272: 170223. DOI: 10.1016/j.ijleo.2022.170223.
[17] B. Tian, L. Yang, and J. Dang, (2023) “Fine-grained multi-focus image fusion based on edge features" Scientific Reports 13(1): 2478. DOI: 10.1038/s41598-023-29584-y.
[18] J. Huang, J. Tu, G. Meng, Y. Wang, Y. Dong, X. Tu, X. Ding, and Y. Huang. “Efficient perceiving local de tails via adaptive spatial-frequency information inte gration for multi-focus image fusion”. In: Proceedings of the 32nd ACM International Conference on Multimedia. 2024, 9350–9359. DOI: 10.1145/3664647.3680738.
[19] Q. Fu, H. Fu, and Y. Wu, (2023) “Infrared and visible image fusion based on mask and cross-dynamic fusion" Electronics 12(20): 4342. DOI: 10.3390/electronics12204342.
[20] H. Li, M. Yuan, J. Li, Y. Liu, G. Lu, Y. Xu, Z. Yu, and D. Zhang, (2024) “Focus affinity perception and super resolution embedding for multifocus image fusion" IEEE Transactions on Neural Networks and Learning Systems 36(3): 4311–4325. DOI: 10.1109/TNNLS.2024.3367782.
[21] J. Liu, Z. Jiang, G. Wu, R. Liu, and X. Fan, (2023) “A unified image fusion framework with flexible bilevel paradigm integration" The Visual Computer 39(10): 4869–4886. DOI: 10.1007/s00371-022-02633-9.

Latest Articles