Research Article | | Peer-Reviewed

Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling

Received: 16 January 2025     Accepted: 3 May 2025     Published: 16 June 2025
Views:       Downloads:
Abstract

Visual saliency refers to an area of an image that attracts human attention. The Human Visual System (HVS) can focus on specific parts of a scene, rather than the whole image. Visual attention describes a set of cognitive procedures that choose important information and filter out unnecessary information from cluttered visual scenes. Images become a soul in computer vision since it contains plenty of information and human beings receive 80% of information through vision. In processing the whole image while only a certain part of an image is needed, more resources are consumed. Instead of processing the whole pixels of an image, specifying only the needed pixel is computationally efficient to minimize the efforts. This is achieved by using GAN with CHASPP module and EfficientNet-B7 which uniformly scales up all dimensions of the image (depth, width, and resolution) is selected as feature extractor in this study which improves the way of extracting features in visual saliency prediction. Different datasets like CAT2000, MIT1003, DUTOMRON, and PASCALS are used in this study to illustrate the efficiency of the selected models and techniques. In this study, we developed effective visual saliency prediction using GAN with CHASPP and other factors like edge loss and perceptual loss. CHASPP module scored the best result on the same datasets measured by different evaluation metrics.

Published in American Journal of Mathematical and Computer Modelling (Volume 10, Issue 2)
DOI 10.11648/j.ajmcm.20251002.13
Page(s) 66-73
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Visual Saliency Prediction, Attention Area, Generative Adversarial Network, Low-level Features, High-level Features, Feature Extraction

References
[1] A. Le, “Predicting Visual Saliency: Where Do People Look?”.
[2] S. J. B and S. S. Kamath, “Saliency Prediction for Visual Regions,” pp. 48–60, 2017,
[3] W. Wang and J. Shen, “Deep Visual Attention Prediction,” IEEE Trans. Image Process., vol. 27, no. 5, pp. 2368–2378, 2018,
[4] P. Christiaan Klink, P. Jentgens, and J. A. M. Lorteije, “Priority maps explain the roles of value, Attention, And salience in goal-oriented behavior,” J. Neurosci., vol. 34, no. 42, pp. 13867–13869, 2014,
[5] F. Yan, C. Chen, P. Xiao, S. Qi, Z. Wang, and R. Xiao, “Review of visual saliency prediction: Development process from neurobiological basis to deep models,” Appl. Sci., vol. 12, no. 1, 2022,
[6] R. Sharma and E. N. Singh, “Comparative Study of Different Low Level Feature Extraction Techniques,” Int. J. Eng. Res. Technol., vol. 3, no. 4, pp. 1454–1460, 2014.
[7] I. Goodfellow et al., “Generative adversarial networks,” Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020,
[8] M. Assens, X. Giro-i-Nieto, K. McGuinness, and N. E. O’Connor, “PathGAN: Visual scanpath prediction with generative adversarial networks,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11133 LNCS, pp. 406–422, 2019,
[9] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” 32nd Int. Conf. Mach. Learn. ICML 2015, vol. 1, pp. 448–456, 2015.
[10] K. R. Avery et al., “Fatigue Behavior of Stainless Steel Sheet Specimens at Extremely High Temperatures,” SAE Int. J. Mater. Manuf., vol. 7, no. 3, pp. 560–566, 2014,
[11] M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” 36th Int. Conf. Mach. Learn. ICML 2019, vol. 2019-June, pp. 10691–10700, 2019.
[12] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 1800–1807, 2017,
[13] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 2261–2269, 2017,
[14] X. Lian, Y. Pang, J. Han, and J. Pan, “Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation,” Pattern Recognit., vol. 110, 2021,
[15] Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand, “What Do Different Evaluation Metrics Tell Us about Saliency Models?,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 3, pp. 740–757, 2019,
[16] N. Riche, M. Duvinage, and M. Mancas, “A study of parameters affecting visual saliency assessment A study of parameters affecting visual saliency assessment,” no. May 2014, 2013.
[17] S. Barratt and R. Sharma, “A Note on the Inception Score,” 2018, [Online]. Available:
[18] A. Borji and L. Itti, “CAT2000: A Large Scale Fixation Dataset for Boosting Saliency Research,” May 2015.
[19] J. Smith et al., “Placeholder Text: A Study,” Citation Styles, vol. 3, Jul. 2021.
Cite This Article
  • APA Style

    Dufera, D., Abate, F. (2025). Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling. American Journal of Mathematical and Computer Modelling, 10(2), 66-73. https://doi.org/10.11648/j.ajmcm.20251002.13

    Copy | Download

    ACS Style

    Dufera, D.; Abate, F. Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling. Am. J. Math. Comput. Model. 2025, 10(2), 66-73. doi: 10.11648/j.ajmcm.20251002.13

    Copy | Download

    AMA Style

    Dufera D, Abate F. Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling. Am J Math Comput Model. 2025;10(2):66-73. doi: 10.11648/j.ajmcm.20251002.13

    Copy | Download

  • @article{10.11648/j.ajmcm.20251002.13,
      author = {Daniel Dufera and Felmeta Abate},
      title = {Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling
    },
      journal = {American Journal of Mathematical and Computer Modelling},
      volume = {10},
      number = {2},
      pages = {66-73},
      doi = {10.11648/j.ajmcm.20251002.13},
      url = {https://doi.org/10.11648/j.ajmcm.20251002.13},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajmcm.20251002.13},
      abstract = {Visual saliency refers to an area of an image that attracts human attention. The Human Visual System (HVS) can focus on specific parts of a scene, rather than the whole image. Visual attention describes a set of cognitive procedures that choose important information and filter out unnecessary information from cluttered visual scenes. Images become a soul in computer vision since it contains plenty of information and human beings receive 80% of information through vision. In processing the whole image while only a certain part of an image is needed, more resources are consumed. Instead of processing the whole pixels of an image, specifying only the needed pixel is computationally efficient to minimize the efforts. This is achieved by using GAN with CHASPP module and EfficientNet-B7 which uniformly scales up all dimensions of the image (depth, width, and resolution) is selected as feature extractor in this study which improves the way of extracting features in visual saliency prediction. Different datasets like CAT2000, MIT1003, DUTOMRON, and PASCALS are used in this study to illustrate the efficiency of the selected models and techniques. In this study, we developed effective visual saliency prediction using GAN with CHASPP and other factors like edge loss and perceptual loss. CHASPP module scored the best result on the same datasets measured by different evaluation metrics.
    },
     year = {2025}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling
    
    AU  - Daniel Dufera
    AU  - Felmeta Abate
    Y1  - 2025/06/16
    PY  - 2025
    N1  - https://doi.org/10.11648/j.ajmcm.20251002.13
    DO  - 10.11648/j.ajmcm.20251002.13
    T2  - American Journal of Mathematical and Computer Modelling
    JF  - American Journal of Mathematical and Computer Modelling
    JO  - American Journal of Mathematical and Computer Modelling
    SP  - 66
    EP  - 73
    PB  - Science Publishing Group
    SN  - 2578-8280
    UR  - https://doi.org/10.11648/j.ajmcm.20251002.13
    AB  - Visual saliency refers to an area of an image that attracts human attention. The Human Visual System (HVS) can focus on specific parts of a scene, rather than the whole image. Visual attention describes a set of cognitive procedures that choose important information and filter out unnecessary information from cluttered visual scenes. Images become a soul in computer vision since it contains plenty of information and human beings receive 80% of information through vision. In processing the whole image while only a certain part of an image is needed, more resources are consumed. Instead of processing the whole pixels of an image, specifying only the needed pixel is computationally efficient to minimize the efforts. This is achieved by using GAN with CHASPP module and EfficientNet-B7 which uniformly scales up all dimensions of the image (depth, width, and resolution) is selected as feature extractor in this study which improves the way of extracting features in visual saliency prediction. Different datasets like CAT2000, MIT1003, DUTOMRON, and PASCALS are used in this study to illustrate the efficiency of the selected models and techniques. In this study, we developed effective visual saliency prediction using GAN with CHASPP and other factors like edge loss and perceptual loss. CHASPP module scored the best result on the same datasets measured by different evaluation metrics.
    
    VL  - 10
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Sections