Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling

Daniel Dufera; Felmeta Abate

doi:doi:10.11648/j.ajmcm.20251002.13

Research Article |

| Peer-Reviewed

Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling

Daniel Dufera

, Felmeta Abate^*

Published in American Journal of Mathematical and Computer Modelling (Volume 10, Issue 2)

Received: 16 January 2025 Accepted: 3 May 2025 Published: 16 June 2025

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Visual saliency refers to an area of an image that attracts human attention. The Human Visual System (HVS) can focus on specific parts of a scene, rather than the whole image. Visual attention describes a set of cognitive procedures that choose important information and filter out unnecessary information from cluttered visual scenes. Images become a soul in computer vision since it contains plenty of information and human beings receive 80% of information through vision. In processing the whole image while only a certain part of an image is needed, more resources are consumed. Instead of processing the whole pixels of an image, specifying only the needed pixel is computationally efficient to minimize the efforts. This is achieved by using GAN with CHASPP module and EfficientNet-B7 which uniformly scales up all dimensions of the image (depth, width, and resolution) is selected as feature extractor in this study which improves the way of extracting features in visual saliency prediction. Different datasets like CAT2000, MIT1003, DUTOMRON, and PASCALS are used in this study to illustrate the efficiency of the selected models and techniques. In this study, we developed effective visual saliency prediction using GAN with CHASPP and other factors like edge loss and perceptual loss. CHASPP module scored the best result on the same datasets measured by different evaluation metrics.

Published in	American Journal of Mathematical and Computer Modelling (Volume 10, Issue 2)
DOI	10.11648/j.ajmcm.20251002.13
Page(s)	66-73
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Visual Saliency Prediction, Attention Area, Generative Adversarial Network, Low-level Features, High-level Features, Feature Extraction

References

[1]	A. Le, “Predicting Visual Saliency: Where Do People Look?”.
[2]	S. J. B and S. S. Kamath, “Saliency Prediction for Visual Regions,” pp. 48–60, 2017, https://doi.org/10.1007/978-3-319-56687-0
[3]	W. Wang and J. Shen, “Deep Visual Attention Prediction,” IEEE Trans. Image Process., vol. 27, no. 5, pp. 2368–2378, 2018, https://doi.org/10.1109/TIP.2017.2787612
[4]	P. Christiaan Klink, P. Jentgens, and J. A. M. Lorteije, “Priority maps explain the roles of value, Attention, And salience in goal-oriented behavior,” J. Neurosci., vol. 34, no. 42, pp. 13867–13869, 2014, https://doi.org/10.1523/JNEUROSCI.3249-14.2014
[5]	F. Yan, C. Chen, P. Xiao, S. Qi, Z. Wang, and R. Xiao, “Review of visual saliency prediction: Development process from neurobiological basis to deep models,” Appl. Sci., vol. 12, no. 1, 2022, https://doi.org/10.3390/app12010309
[6]	R. Sharma and E. N. Singh, “Comparative Study of Different Low Level Feature Extraction Techniques,” Int. J. Eng. Res. Technol., vol. 3, no. 4, pp. 1454–1460, 2014.
[7]	I. Goodfellow et al., “Generative adversarial networks,” Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020, https://doi.org/10.1145/3422622
[8]	M. Assens, X. Giro-i-Nieto, K. McGuinness, and N. E. O’Connor, “PathGAN: Visual scanpath prediction with generative adversarial networks,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11133 LNCS, pp. 406–422, 2019, https://doi.org/10.1007/978-3-030-11021-5_25
[9]	S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” 32nd Int. Conf. Mach. Learn. ICML 2015, vol. 1, pp. 448–456, 2015.
[10]	K. R. Avery et al., “Fatigue Behavior of Stainless Steel Sheet Specimens at Extremely High Temperatures,” SAE Int. J. Mater. Manuf., vol. 7, no. 3, pp. 560–566, 2014, https://doi.org/10.4271/2014-01-0975
[11]	M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” 36th Int. Conf. Mach. Learn. ICML 2019, vol. 2019-June, pp. 10691–10700, 2019.
[12]	F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 1800–1807, 2017, https://doi.org/10.1109/CVPR.2017.195
[13]	G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 2261–2269, 2017, https://doi.org/10.1109/CVPR.2017.243
[14]	X. Lian, Y. Pang, J. Han, and J. Pan, “Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation,” Pattern Recognit., vol. 110, 2021, https://doi.org/10.1016/j.patcog.2020.107622
[15]	Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand, “What Do Different Evaluation Metrics Tell Us about Saliency Models?,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 3, pp. 740–757, 2019, https://doi.org/10.1109/TPAMI.2018.2815601
[16]	N. Riche, M. Duvinage, and M. Mancas, “A study of parameters affecting visual saliency assessment A study of parameters affecting visual saliency assessment,” no. May 2014, 2013.
[17]	S. Barratt and R. Sharma, “A Note on the Inception Score,” 2018, [Online]. Available: http://arxiv.org/abs/1801.01973
[18]	A. Borji and L. Itti, “CAT2000: A Large Scale Fixation Dataset for Boosting Saliency Research,” May 2015.
[19]	J. Smith et al., “Placeholder Text: A Study,” Citation Styles, vol. 3, Jul. 2021.

Cite This Article

Plain Text BibTeX RIS

APA Style

Dufera, D., Abate, F. (2025). Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling. American Journal of Mathematical and Computer Modelling, 10(2), 66-73. https://doi.org/10.11648/j.ajmcm.20251002.13

Copy | Download

ACS Style

Dufera, D.; Abate, F. Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling. Am. J. Math. Comput. Model. 2025, 10(2), 66-73. doi: 10.11648/j.ajmcm.20251002.13

Copy | Download

AMA Style

Dufera D, Abate F. Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling. Am J Math Comput Model. 2025;10(2):66-73. doi: 10.11648/j.ajmcm.20251002.13

Copy | Download

@article{10.11648/j.ajmcm.20251002.13,
  author = {Daniel Dufera and Felmeta Abate},
  title = {Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling
},
  journal = {American Journal of Mathematical and Computer Modelling},
  volume = {10},
  number = {2},
  pages = {66-73},
  doi = {10.11648/j.ajmcm.20251002.13},
  url = {https://doi.org/10.11648/j.ajmcm.20251002.13},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajmcm.20251002.13},
  abstract = {Visual saliency refers to an area of an image that attracts human attention. The Human Visual System (HVS) can focus on specific parts of a scene, rather than the whole image. Visual attention describes a set of cognitive procedures that choose important information and filter out unnecessary information from cluttered visual scenes. Images become a soul in computer vision since it contains plenty of information and human beings receive 80% of information through vision. In processing the whole image while only a certain part of an image is needed, more resources are consumed. Instead of processing the whole pixels of an image, specifying only the needed pixel is computationally efficient to minimize the efforts. This is achieved by using GAN with CHASPP module and EfficientNet-B7 which uniformly scales up all dimensions of the image (depth, width, and resolution) is selected as feature extractor in this study which improves the way of extracting features in visual saliency prediction. Different datasets like CAT2000, MIT1003, DUTOMRON, and PASCALS are used in this study to illustrate the efficiency of the selected models and techniques. In this study, we developed effective visual saliency prediction using GAN with CHASPP and other factors like edge loss and perceptual loss. CHASPP module scored the best result on the same datasets measured by different evaluation metrics.
},
 year = {2025}
}

Copy | Download

TY  - JOUR
T1  - Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling

AU  - Daniel Dufera
AU  - Felmeta Abate
Y1  - 2025/06/16
PY  - 2025
N1  - https://doi.org/10.11648/j.ajmcm.20251002.13
DO  - 10.11648/j.ajmcm.20251002.13
T2  - American Journal of Mathematical and Computer Modelling
JF  - American Journal of Mathematical and Computer Modelling
JO  - American Journal of Mathematical and Computer Modelling
SP  - 66
EP  - 73
PB  - Science Publishing Group
SN  - 2578-8280
UR  - https://doi.org/10.11648/j.ajmcm.20251002.13
AB  - Visual saliency refers to an area of an image that attracts human attention. The Human Visual System (HVS) can focus on specific parts of a scene, rather than the whole image. Visual attention describes a set of cognitive procedures that choose important information and filter out unnecessary information from cluttered visual scenes. Images become a soul in computer vision since it contains plenty of information and human beings receive 80% of information through vision. In processing the whole image while only a certain part of an image is needed, more resources are consumed. Instead of processing the whole pixels of an image, specifying only the needed pixel is computationally efficient to minimize the efforts. This is achieved by using GAN with CHASPP module and EfficientNet-B7 which uniformly scales up all dimensions of the image (depth, width, and resolution) is selected as feature extractor in this study which improves the way of extracting features in visual saliency prediction. Different datasets like CAT2000, MIT1003, DUTOMRON, and PASCALS are used in this study to illustrate the efficiency of the selected models and techniques. In this study, we developed effective visual saliency prediction using GAN with CHASPP and other factors like edge loss and perceptual loss. CHASPP module scored the best result on the same datasets measured by different evaluation metrics.

VL  - 10
IS  - 2
ER  -

Copy | Download

Author Information

Daniel Dufera

Computer Science, Dilla University, Dilla, Ethiopia

Contact Email

http://orcid.org/0009-0008-7790-1604
Felmeta Abate

Computer Science, Dilla University, Dilla, Ethiopia

Contact Email

http://orcid.org/0009-0002-6793-2059

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Dufera, D., Abate, F. (2025). Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling. American Journal of Mathematical and Computer Modelling, 10(2), 66-73. https://doi.org/10.11648/j.ajmcm.20251002.13

Copy | Download

ACS Style

Dufera, D.; Abate, F. Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling. Am. J. Math. Comput. Model. 2025, 10(2), 66-73. doi: 10.11648/j.ajmcm.20251002.13

Copy | Download

AMA Style

Dufera D, Abate F. Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling. Am J Math Comput Model. 2025;10(2):66-73. doi: 10.11648/j.ajmcm.20251002.13

Copy | Download

@article{10.11648/j.ajmcm.20251002.13,
  author = {Daniel Dufera and Felmeta Abate},
  title = {Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling
},
  journal = {American Journal of Mathematical and Computer Modelling},
  volume = {10},
  number = {2},
  pages = {66-73},
  doi = {10.11648/j.ajmcm.20251002.13},
  url = {https://doi.org/10.11648/j.ajmcm.20251002.13},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajmcm.20251002.13},
  abstract = {Visual saliency refers to an area of an image that attracts human attention. The Human Visual System (HVS) can focus on specific parts of a scene, rather than the whole image. Visual attention describes a set of cognitive procedures that choose important information and filter out unnecessary information from cluttered visual scenes. Images become a soul in computer vision since it contains plenty of information and human beings receive 80% of information through vision. In processing the whole image while only a certain part of an image is needed, more resources are consumed. Instead of processing the whole pixels of an image, specifying only the needed pixel is computationally efficient to minimize the efforts. This is achieved by using GAN with CHASPP module and EfficientNet-B7 which uniformly scales up all dimensions of the image (depth, width, and resolution) is selected as feature extractor in this study which improves the way of extracting features in visual saliency prediction. Different datasets like CAT2000, MIT1003, DUTOMRON, and PASCALS are used in this study to illustrate the efficiency of the selected models and techniques. In this study, we developed effective visual saliency prediction using GAN with CHASPP and other factors like edge loss and perceptual loss. CHASPP module scored the best result on the same datasets measured by different evaluation metrics.
},
 year = {2025}
}

Copy | Download

TY  - JOUR
T1  - Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling

AU  - Daniel Dufera
AU  - Felmeta Abate
Y1  - 2025/06/16
PY  - 2025
N1  - https://doi.org/10.11648/j.ajmcm.20251002.13
DO  - 10.11648/j.ajmcm.20251002.13
T2  - American Journal of Mathematical and Computer Modelling
JF  - American Journal of Mathematical and Computer Modelling
JO  - American Journal of Mathematical and Computer Modelling
SP  - 66
EP  - 73
PB  - Science Publishing Group
SN  - 2578-8280
UR  - https://doi.org/10.11648/j.ajmcm.20251002.13
AB  - Visual saliency refers to an area of an image that attracts human attention. The Human Visual System (HVS) can focus on specific parts of a scene, rather than the whole image. Visual attention describes a set of cognitive procedures that choose important information and filter out unnecessary information from cluttered visual scenes. Images become a soul in computer vision since it contains plenty of information and human beings receive 80% of information through vision. In processing the whole image while only a certain part of an image is needed, more resources are consumed. Instead of processing the whole pixels of an image, specifying only the needed pixel is computationally efficient to minimize the efforts. This is achieved by using GAN with CHASPP module and EfficientNet-B7 which uniformly scales up all dimensions of the image (depth, width, and resolution) is selected as feature extractor in this study which improves the way of extracting features in visual saliency prediction. Different datasets like CAT2000, MIT1003, DUTOMRON, and PASCALS are used in this study to illustrate the efficiency of the selected models and techniques. In this study, we developed effective visual saliency prediction using GAN with CHASPP and other factors like edge loss and perceptual loss. CHASPP module scored the best result on the same datasets measured by different evaluation metrics.

VL  - 10
IS  - 2
ER  -

Copy | Download