Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity

Neda Mohammadi

doi:doi:10.11648/j.ajsea.s.2016050301.13

| Peer-Reviewed

Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity

Neda Mohammadi

Published in American Journal of Software Engineering and Applications (Volume 5, Issue 3-1)

Received: 2 February 2016 Accepted: 6 February 2016 Published: 24 June 2016

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Nowadays a number of applications with high volume of calculations are constantly increasing. Central Processing Unit (CPU) consists of finite number of cores. Degree of parallelism and implementation speed are issues that data high volume on CPU is low. Using the thread concept in programming, algorithms which have the parallelism capabilities, can be executed in parallel. There are many issues which in order to solving them, finding similar items in a metric space and grouping them in these issues is necessary. Computational complexity finding nearest neighbors is a challenge for run time. To evaluate the performance of GPUs speed in searching nearest neighbors, GPGPU and CUDA are used and compared with CPU usage. In this paper parallel implementation of the algorithm on GPU with access to its shared memory, is compared with parallel implementation of the algorithm on CPU through threads. It is understood that threads use graphics card's shared memory for communications, storing temporary data and retrieving data. Therefore, the parallelism on GPU is more useful than parallelism on CPU in High-Dimensional spaces. Finally, it is discussed that GPU reduces complexity to a considerable amount and is scalable.

Published in	American Journal of Software Engineering and Applications (Volume 5, Issue 3-1) This article belongs to the Special Issue Advances in Computer Science and Information Technology in Developing Countries
DOI	10.11648/j.ajsea.s.2016050301.13
Page(s)	10-14
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2016. Published by Science Publishing Group

Keywords

Nearest Neighbor, CUDA, GPU, Shared Memory, Parallelism

References

[1]	M., Guevara, G., Chris, K., Hazelwood, and K., Skadron. "Enabling task parallelism in the cuda scheduler." In Workshop on Programming Models for Emerging Architectures, 2009, pp. 69-76.
[2]	V., Garcia,, E., Debreuve, F., Nielsen, and M., Barlaud,. “K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching”. In Image Processing (ICIP), 2010, September. 17th IEEE International Conference on (pp. 3757-3760).
[3]	CUDA C Programming Guide: CUDA Toolkit Documentation, http://docs.nvidia.com/cuda/,cuda-c-programming-guide/index.html.
[4]	S., Liang, Y., Liu, C. Wang, and L., Jian,. “Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU”. In Web Society (SWS), 2010 IEEE 2nd Symposium on (pp. 53-60). IEEE, 2010, August.
[5]	Hawick, Kenneth A., Arno Leist, and Daniel P. Playne. "Parallel graph component labelling with GPUs and CUDA". Parallel Computing (2010) 36, no. 12 655-678.
[6]	S., Liang, Y., Liu, C., Wang, and L., Jian, 2009, October. “A CUDA-based parallel implementation of K-nearest neighbor algorithm”. In Cyber-Enabled Distributed Computing and Knowledge Discovery, 2009. CyberC'09. International Conference on (pp. 291-296). IEEE
[7]	D., Qiu, S., May, & A., Nüchter. “GPU-accelerated nearest neighbor search for 3D registration”. In Computer Vision Systems, 2009. (pp. 194-203). Springer Berlin Heidelberg.
[8]	V., Garcia, E., Debreuve, & M. Barlaud, "Fast k nearest neighbor search using GPU". In Computer Vision and Pattern Recognition Workshops, 2008. CVPRW'08. 2008, June. IEEE Computer Society Conference on (pp. 1-6). IEEE.
[9]	N. Mohammadi. "Presentation of a Parallel Algorithm for Nearest Neighbor Search on GPU Using CUDA". Current Trends in Technology and Sciences (CTTS) 2015.
[10]	J., Nickolls, I., Buck, M., Garland, & K.Skadron, "Scalable parallel programming with CUDA". Queue, 6(2), 2008. 40-53.
[11]	Nourzad. "The introduction of context-sensitive hashing algorithm for finding nearest neighbors in high-dimensional spaces". Amirkabir University of Technology, (2010).
[12]	X., Wu, V., Kumar, J. R., Quinlan, J., Ghosh, Q., Yang, H., Motoda, & D. Steinberg. "Top 10 algorithms in data mining. Knowledge and Information Systems", 2008.14(1), 1-37.
[13]	S., Liang, C., Wang, Y., Liu, and L., Jian,. “CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU”. In Information, Computing and Telecommunication, 2009. YC-ICT'09. 2009, September. IEEE Youth Conference on (pp. 415-418).
[14]	A. Neumann, " Parallel reduction of multidimensional arrays for supporting online analytical processing (olap) on a graphics processing unit (gpu)". The University of Western Australia, 2008.
[15]	S. Ryoo, C. I Rodrigues, S. S Baghsorkhi, S. S Stone, D. B. Kirk,, & W. M. W. Hwu, "Optimization principles and application performance evaluation of a multithreaded GPU using CUDA".. (2008, February) In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. (pp. 73-82). ACM.

Cite This Article

Plain Text BibTeX RIS

APA Style

Neda Mohammadi. (2016). Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity. American Journal of Software Engineering and Applications, 5(3-1), 10-14. https://doi.org/10.11648/j.ajsea.s.2016050301.13

Copy | Download

ACS Style

Neda Mohammadi. Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity. Am. J. Softw. Eng. Appl. 2016, 5(3-1), 10-14. doi: 10.11648/j.ajsea.s.2016050301.13

Copy | Download

AMA Style

Neda Mohammadi. Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity. Am J Softw Eng Appl. 2016;5(3-1):10-14. doi: 10.11648/j.ajsea.s.2016050301.13

Copy | Download

@article{10.11648/j.ajsea.s.2016050301.13,
  author = {Neda Mohammadi},
  title = {Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity},
  journal = {American Journal of Software Engineering and Applications},
  volume = {5},
  number = {3-1},
  pages = {10-14},
  doi = {10.11648/j.ajsea.s.2016050301.13},
  url = {https://doi.org/10.11648/j.ajsea.s.2016050301.13},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajsea.s.2016050301.13},
  abstract = {Nowadays a number of applications with high volume of calculations are constantly increasing. Central Processing Unit (CPU) consists of finite number of cores. Degree of parallelism and implementation speed are issues that data high volume on CPU is low. Using the thread concept in programming, algorithms which have the parallelism capabilities, can be executed in parallel. There are many issues which in order to solving them, finding similar items in a metric space and grouping them in these issues is necessary. Computational complexity finding nearest neighbors is a challenge for run time. To evaluate the performance of GPUs speed in searching nearest neighbors, GPGPU and CUDA are used and compared with CPU usage. In this paper parallel implementation of the algorithm on GPU with access to its shared memory, is compared with parallel implementation of the algorithm on CPU through threads. It is understood that threads use graphics card's shared memory for communications, storing temporary data and retrieving data. Therefore, the parallelism on GPU is more useful than parallelism on CPU in High-Dimensional spaces. Finally, it is discussed that GPU reduces complexity to a considerable amount and is scalable.},
 year = {2016}
}

Copy | Download

TY - JOUR
T1 - Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity
AU - Neda Mohammadi
Y1 - 2016/06/24
PY - 2016
N1 - https://doi.org/10.11648/j.ajsea.s.2016050301.13
DO - 10.11648/j.ajsea.s.2016050301.13
T2 - American Journal of Software Engineering and Applications
JF - American Journal of Software Engineering and Applications
JO - American Journal of Software Engineering and Applications
SP - 10
EP - 14
PB - Science Publishing Group
SN - 2327-249X
UR - https://doi.org/10.11648/j.ajsea.s.2016050301.13
AB - Nowadays a number of applications with high volume of calculations are constantly increasing. Central Processing Unit (CPU) consists of finite number of cores. Degree of parallelism and implementation speed are issues that data high volume on CPU is low. Using the thread concept in programming, algorithms which have the parallelism capabilities, can be executed in parallel. There are many issues which in order to solving them, finding similar items in a metric space and grouping them in these issues is necessary. Computational complexity finding nearest neighbors is a challenge for run time. To evaluate the performance of GPUs speed in searching nearest neighbors, GPGPU and CUDA are used and compared with CPU usage. In this paper parallel implementation of the algorithm on GPU with access to its shared memory, is compared with parallel implementation of the algorithm on CPU through threads. It is understood that threads use graphics card's shared memory for communications, storing temporary data and retrieving data. Therefore, the parallelism on GPU is more useful than parallelism on CPU in High-Dimensional spaces. Finally, it is discussed that GPU reduces complexity to a considerable amount and is scalable.
VL - 5
IS - 3-1
ER -

Copy | Download

Author Information

Neda Mohammadi

Department of Computer Engineering, Shiraz University of Technology, Shiraz, Iran

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Neda Mohammadi. (2016). Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity. American Journal of Software Engineering and Applications, 5(3-1), 10-14. https://doi.org/10.11648/j.ajsea.s.2016050301.13

Copy | Download

ACS Style

Neda Mohammadi. Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity. Am. J. Softw. Eng. Appl. 2016, 5(3-1), 10-14. doi: 10.11648/j.ajsea.s.2016050301.13

Copy | Download

AMA Style

Neda Mohammadi. Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity. Am J Softw Eng Appl. 2016;5(3-1):10-14. doi: 10.11648/j.ajsea.s.2016050301.13

Copy | Download

@article{10.11648/j.ajsea.s.2016050301.13,
  author = {Neda Mohammadi},
  title = {Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity},
  journal = {American Journal of Software Engineering and Applications},
  volume = {5},
  number = {3-1},
  pages = {10-14},
  doi = {10.11648/j.ajsea.s.2016050301.13},
  url = {https://doi.org/10.11648/j.ajsea.s.2016050301.13},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajsea.s.2016050301.13},
  abstract = {Nowadays a number of applications with high volume of calculations are constantly increasing. Central Processing Unit (CPU) consists of finite number of cores. Degree of parallelism and implementation speed are issues that data high volume on CPU is low. Using the thread concept in programming, algorithms which have the parallelism capabilities, can be executed in parallel. There are many issues which in order to solving them, finding similar items in a metric space and grouping them in these issues is necessary. Computational complexity finding nearest neighbors is a challenge for run time. To evaluate the performance of GPUs speed in searching nearest neighbors, GPGPU and CUDA are used and compared with CPU usage. In this paper parallel implementation of the algorithm on GPU with access to its shared memory, is compared with parallel implementation of the algorithm on CPU through threads. It is understood that threads use graphics card's shared memory for communications, storing temporary data and retrieving data. Therefore, the parallelism on GPU is more useful than parallelism on CPU in High-Dimensional spaces. Finally, it is discussed that GPU reduces complexity to a considerable amount and is scalable.},
 year = {2016}
}

Copy | Download