Novel Integer Division for Embedded Systems: Generic Algorithm Optimal for Large Divisors

Mervat Mohamed Adel Mahmoud; Nahla Elazab Elashker

doi:doi:10.11648/j.acm.20241304.12

Research Article |

| Peer-Reviewed

Novel Integer Division for Embedded Systems: Generic Algorithm Optimal for Large Divisors

Mervat Mohamed Adel Mahmoud^*, Nahla Elazab Elashker

Published in Applied and Computational Mathematics (Volume 13, Issue 4)

Received: 9 June 2024 Accepted: 1 July 2024 Published: 24 July 2024

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

The integer Constant Division (ICD) is the type of integer division in which the divisor is known in advance, enabling pre-computing operations to be included. Therefore, it can be more efficient regarding computing resources and time. However, most ICD techniques are restricted by a few values or narrow boundaries for the divisor. On the other hand, the main approaches of the division algorithms, where the divisor is variable, are digit-by-digit and convergence methods. The first techniques are simple and have less sophisticated conversion logic for the quotient but also have the problem of taking significantly long latency. On the contrary, the convergence techniques rely on multiplication rather than subtraction. They estimate the quotient of division providing the quotient with minimal latency at the expense of precision. This article suggests a precise, generic, and novel integer division algorithm based on sequential recursion with fewer iterations. The suggested methodology relies on extracting the division results for non-powers-of-two divisors from those for the closest power-of-two divisors, which are obtained simply using the right bit shifting. To the authors’ best knowledge of the state-of-the-art, the number of iterations in the recurrent variable division is half the divisor bit size, and the Sweeney, Robertson, and Tocher (SRT) division, which is named after its developers, involves log₂(n) iterations. The suggested algorithm has an [(m/(n-1))-1] number of recursive iterations, where m and n are the number of bits of the dividend and the divisor, respectively. The design is simulated in the Vivado tool for validation and implemented with a Zynq UltraScale FPGA. The technique performance depends on the number of nested divisions and the size of a LUT. The two factors change according to the value of the divisor. Nevertheless, the size of the LUT is proportional to the range and the number of bits of the divisor. Furthermore, the equation that controls the number of nested blocks is illustrated in the manuscript. The proposed technique applies to both constant and variable divisors with a compact hardware area in the case of constant division. The hardware implementation of constant division has unlimited values for dividends and divisors with a compact hardware area in the case of large divisors. However, using the design in the hardware implementation of variable division is up to 64-bit dividend and 12-bit divisor. The result analysis demonstrates that this algorithm is more efficient for constant division for large numbers.

Published in	Applied and Computational Mathematics (Volume 13, Issue 4)
DOI	10.11648/j.acm.20241304.12
Page(s)	83-93
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Algorithms, Arithmetic Functions, Embedded Systems, Hardware Implementation, Integer Division

References

[1]	S. Wagh, S. Tople, F. Benhamouda, E. Kushilevitz, P. Mittal, and T. Rabin, “Falcon: Honest-majority maliciously secure framework for private deep learning,” Proceedings on Privacy Enhancing Technologies, vol. 2021, pp. 188-208, 2020.
[2]	S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, p. 448-456, JMLR.org, 2015.
[3]	S. Josefsson and I. Liusvaara, “Rfc 8032: Edwards-curve digital signature algorithm (eddsa),” Internet Research Task Force (IRTF), 2017.
[4]	T. Elgamal, “A public key cryptosystem and a signature scheme based on discrete logarithms,” IEEE Transactions on Information Theory, vol. 31, no. 4, pp. 469-472, 1985.
[5]	H. T. Sihotang, S. Efendi, E. M. Zamzami, and H. Mawengkang, “Design and implementation of rivest shamir adleman’s (rsa) cryptography algorithm in text file data security,” in Journal of Physics: Conference Series, vol. 1641, IOP Publishing, 2020.
[6]	D. Cavagnino and A. E. Werbrouck, “Efficient algorithms for integer division by constants using multiplication,” The Computer Journal, vol. 51, no. 4, pp. 470-480, 2008.
[7]	R. Vemula and K. M. Chari, “A review on various divider circuit designs in vlsi,” in 2018 Conference on Signal Processing And Communication Engineering Systems (SPACES), pp. 206-209, 2018.
[8]	X. Wei, Y. Yang, and J. Chen, “A low-latency divider design for embedded processors,” Sensors, vol. 22, no. 7, p. 2471, 2022.
[9]	P. K. Meher and T. Stouraitis, Arithmetic Circuits for DSP Applications. Wiley-IEEE Press, 2017.
[10]	S. Deshpande, S. M. d. Pozo, V. Mateu, M. Manzano, N. Aaraj, and J. Szefer, “Modular inverse for integers using fast constant time gcd algorithm and its applications,” in 2021 31st International Conference on Field-Programmable Logic and Applications (FPL), pp. 122-129, 2021.
[11]	C.-Y. Tsai, M.-H. Fan, and C.-H. Huang, “Vlsi circuit design of digital signal processing algorithms using tensor product formulation,” Computer Science and Technology, vol. 21, 2008.
[12]	P. Montuschi, J. Bruguera, L. Ciminiera, and J.-A. Pieiro, “A digit-by-digit algorithm for mth root extraction,” Computers, IEEE Transactions on, vol. 56, pp. 1696- 1706, 01 2008.
[13]	X. Yang and M. Sun, “Theoretical convergence analysis of ageneral division-deletion algorithm for solving global search problems,” Journal of Global Optimization, vol. 37, p. 27, Jan 2007 2007/01//. Copyright - Springer Science+Business Media B. V. 2007.
[14]	N. Aggarwal, K. Asooja, S. S. Verma, and S. Negi, “An improvement in the restoring division algorithm (needy restoring division algorithm),” in 2009 2nd IEEE International Conference on Computer Science and Information Technology, pp. 246-249, 2009.
[15]	L. Chen, J. Han, W. Liu, and F. Lombardi, “Design of approximate unsigned integer non-restoring divider for inexact computing,” in Proceedings of the 25th Edition on Great Lakes Symposium on VLSI, GLSVLSI ¡¯15, (New York, NY, USA), p. 51-56, Association for Computing Machinery, 2015.
[16]	F. M. Yassin, N. M. Syamimi, A. M. bin Manah, and Z. A. Omar, “Rounded constant division via add-shift in verilog,” International Journal of Science, Environment and Technology, 2015.
[17]	H. F. Ugurdag, F. de Dinechin, Y. S. Gener, S. Gören, and L.-S. Didier, “Hardware division by small integer constants,’ IEEE Transactions on Computers, vol. 66, no. 12, pp. 2097-2110, 2017.
[18]	S. Obermann and M. Flynn, “Division algorithms and implementations,” IEEE Transactions on Computers, vol. 46, no. 8, pp. 833-854, 1997.
[19]	S. Dixit and M. Nadeem, “Fpga accomplishment of a 16- bit divider,” Imperial journal of interdisciplinary research, vol. 3, 2017.
[20]	K. J. N. S. Bhargav, S. Palisetti, and M. Rao, “A newton raphson method based approximate divider design for color quantization application,” in 2021 18th International SoC Design Conference (ISOCC), pp. 115- 116, 2021.
[21]	M. P. Vestias and H. C. Neto, “Revisiting the newtonraphson iterative method for decimal division,” in 2011 21st International Conference on Field Programmable Logic and Applications, pp. 138-143, 2011.
[22]	J. Wei, A. Kuwana, H. Kobayashi, and K. Kubo, “Revisit to floating-point division algorithm based on taylor-series expansion,” in 2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp. 240-243, 2020.
[23]	U. S. Patankar and A. Koel, “Review of basic classes of dividers based on division algorithm,” IEEE Access, vol. 9, pp. 23035-23069, 2021.
[24]	N. Arya, T. Soni, M. Pattanaik, and G. Sharma, “Energy efficient logarithmic-based approximate divider for asic and fpga-based implementations,” Microprocess. Microsyst., vol. 90, apr 2022.
[25]	D. Lemire, C. Bartlett, and O. Kaser, “Integer division by constants: optimal bounds,” Heliyon, vol. 7, p. e07442, June 2021.
[26]	G. Schewior, H. Flatt, C. Dolar, C. Banz, and H. Blume, “A hardware accelerated configurable asip architecture for embedded real-time video-based driver assistance applications,” in 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp. 209-216, 2011.
[27]	H. F. Ugurdag, A. Bayram, V. E. Levent, and S. Gören, “Efficient combinational circuits for division by small integer constants,” in 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH), pp. 1-7, 2016.
[28]	M. McNerney and E. Papadopoulos, “Hacker’s delight: Law firm risk and liability in the cyber age,” American University Law Review, vol. 62, pp. 1243-1269, 2013. Copyright - Copyright American University Law Review 2013.
[29]	N. Jones, “Division of integers by constants.” https://embeddedgurus.com/stack-overflow/2009/06/- url-division-of-integers-by-constants/, 2009. Accessed: 2023-01-1.
[30]	S. Deshpande, S. M. d. Pozo, V. Mateu, M. Manzano, N. Aaraj, and J. Szefer, “Modular inverse for integers using fast constant time gcd algorithm and its applications,” in 2021 31st International Conference on Field-Programmable Logic and Applications (FPL), pp. 122-129, 2021.
[31]	D. J. Bernstein and B.-Y. Yang, “Fast constant-time gcd computation and modular inversion,” IACR Transactions on Cryptographic Hardware and Embedded Systems, vol. 2019, p. 340-398, May 2019.
[32]	A. Greuet, S. Montoya, and C. Vermeersch, “Quotient approximation modular reduction,” 2022 IEEE 29th Symposium on Computer Arithmetic (ARITH), pp. 103- 110, 2022.

Cite This Article

Plain Text BibTeX RIS

APA Style

Mahmoud, M. M. A., Elashker, N. E. (2024). Novel Integer Division for Embedded Systems: Generic Algorithm Optimal for Large Divisors. Applied and Computational Mathematics, 13(4), 83-93. https://doi.org/10.11648/j.acm.20241304.12

Copy | Download

ACS Style

Mahmoud, M. M. A.; Elashker, N. E. Novel Integer Division for Embedded Systems: Generic Algorithm Optimal for Large Divisors. Appl. Comput. Math. 2024, 13(4), 83-93. doi: 10.11648/j.acm.20241304.12

Copy | Download

AMA Style

Mahmoud MMA, Elashker NE. Novel Integer Division for Embedded Systems: Generic Algorithm Optimal for Large Divisors. Appl Comput Math. 2024;13(4):83-93. doi: 10.11648/j.acm.20241304.12

Copy | Download

@article{10.11648/j.acm.20241304.12,
  author = {Mervat Mohamed Adel Mahmoud and Nahla Elazab Elashker},
  title = {Novel Integer Division for Embedded Systems: Generic Algorithm Optimal for Large Divisors},
  journal = {Applied and Computational Mathematics},
  volume = {13},
  number = {4},
  pages = {83-93},
  doi = {10.11648/j.acm.20241304.12},
  url = {https://doi.org/10.11648/j.acm.20241304.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.20241304.12},
  abstract = {The integer Constant Division (ICD) is the type of integer division in which the divisor is known in advance, enabling pre-computing operations to be included. Therefore, it can be more efficient regarding computing resources and time. However, most ICD techniques are restricted by a few values or narrow boundaries for the divisor. On the other hand, the main approaches of the division algorithms, where the divisor is variable, are digit-by-digit and convergence methods. The first techniques are simple and have less sophisticated conversion logic for the quotient but also have the problem of taking significantly long latency. On the contrary, the convergence techniques rely on multiplication rather than subtraction. They estimate the quotient of division providing the quotient with minimal latency at the expense of precision. This article suggests a precise, generic, and novel integer division algorithm based on sequential recursion with fewer iterations. The suggested methodology relies on extracting the division results for non-powers-of-two divisors from those for the closest power-of-two divisors, which are obtained simply using the right bit shifting. To the authors’ best knowledge of the state-of-the-art, the number of iterations in the recurrent variable division is half the divisor bit size, and the Sweeney, Robertson, and Tocher (SRT) division, which is named after its developers, involves log2(n) iterations. The suggested algorithm has an [(m/(n-1))-1] number of recursive iterations, where m and n are the number of bits of the dividend and the divisor, respectively. The design is simulated in the Vivado tool for validation and implemented with a Zynq UltraScale FPGA. The technique performance depends on the number of nested divisions and the size of a LUT. The two factors change according to the value of the divisor. Nevertheless, the size of the LUT is proportional to the range and the number of bits of the divisor. Furthermore, the equation that controls the number of nested blocks is illustrated in the manuscript. The proposed technique applies to both constant and variable divisors with a compact hardware area in the case of constant division. The hardware implementation of constant division has unlimited values for dividends and divisors with a compact hardware area in the case of large divisors. However, using the design in the hardware implementation of variable division is up to 64-bit dividend and 12-bit divisor. The result analysis demonstrates that this algorithm is more efficient for constant division for large numbers.},
 year = {2024}
}

Copy | Download

TY - JOUR
T1 - Novel Integer Division for Embedded Systems: Generic Algorithm Optimal for Large Divisors
AU - Mervat Mohamed Adel Mahmoud
AU - Nahla Elazab Elashker
Y1 - 2024/07/24
PY - 2024
N1 - https://doi.org/10.11648/j.acm.20241304.12
DO - 10.11648/j.acm.20241304.12
T2 - Applied and Computational Mathematics
JF - Applied and Computational Mathematics
JO - Applied and Computational Mathematics
SP - 83
EP - 93
PB - Science Publishing Group
SN - 2328-5613
UR - https://doi.org/10.11648/j.acm.20241304.12
AB - The integer Constant Division (ICD) is the type of integer division in which the divisor is known in advance, enabling pre-computing operations to be included. Therefore, it can be more efficient regarding computing resources and time. However, most ICD techniques are restricted by a few values or narrow boundaries for the divisor. On the other hand, the main approaches of the division algorithms, where the divisor is variable, are digit-by-digit and convergence methods. The first techniques are simple and have less sophisticated conversion logic for the quotient but also have the problem of taking significantly long latency. On the contrary, the convergence techniques rely on multiplication rather than subtraction. They estimate the quotient of division providing the quotient with minimal latency at the expense of precision. This article suggests a precise, generic, and novel integer division algorithm based on sequential recursion with fewer iterations. The suggested methodology relies on extracting the division results for non-powers-of-two divisors from those for the closest power-of-two divisors, which are obtained simply using the right bit shifting. To the authors’ best knowledge of the state-of-the-art, the number of iterations in the recurrent variable division is half the divisor bit size, and the Sweeney, Robertson, and Tocher (SRT) division, which is named after its developers, involves log2(n) iterations. The suggested algorithm has an [(m/(n-1))-1] number of recursive iterations, where m and n are the number of bits of the dividend and the divisor, respectively. The design is simulated in the Vivado tool for validation and implemented with a Zynq UltraScale FPGA. The technique performance depends on the number of nested divisions and the size of a LUT. The two factors change according to the value of the divisor. Nevertheless, the size of the LUT is proportional to the range and the number of bits of the divisor. Furthermore, the equation that controls the number of nested blocks is illustrated in the manuscript. The proposed technique applies to both constant and variable divisors with a compact hardware area in the case of constant division. The hardware implementation of constant division has unlimited values for dividends and divisors with a compact hardware area in the case of large divisors. However, using the design in the hardware implementation of variable division is up to 64-bit dividend and 12-bit divisor. The result analysis demonstrates that this algorithm is more efficient for constant division for large numbers.
VL - 13
IS - 4
ER -

Copy | Download

Author Information

Mervat Mohamed Adel Mahmoud

Microelectronics Department, Electronics Research Institute, Cairo, Egypt

Contact Email
Nahla Elazab Elashker

Microelectronics Department, Electronics Research Institute, Cairo, Egypt

Contact Email

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Mahmoud, M. M. A., Elashker, N. E. (2024). Novel Integer Division for Embedded Systems: Generic Algorithm Optimal for Large Divisors. Applied and Computational Mathematics, 13(4), 83-93. https://doi.org/10.11648/j.acm.20241304.12

Copy | Download

ACS Style

Mahmoud, M. M. A.; Elashker, N. E. Novel Integer Division for Embedded Systems: Generic Algorithm Optimal for Large Divisors. Appl. Comput. Math. 2024, 13(4), 83-93. doi: 10.11648/j.acm.20241304.12

Copy | Download

AMA Style

Mahmoud MMA, Elashker NE. Novel Integer Division for Embedded Systems: Generic Algorithm Optimal for Large Divisors. Appl Comput Math. 2024;13(4):83-93. doi: 10.11648/j.acm.20241304.12

Copy | Download

@article{10.11648/j.acm.20241304.12,
  author = {Mervat Mohamed Adel Mahmoud and Nahla Elazab Elashker},
  title = {Novel Integer Division for Embedded Systems: Generic Algorithm Optimal for Large Divisors},
  journal = {Applied and Computational Mathematics},
  volume = {13},
  number = {4},
  pages = {83-93},
  doi = {10.11648/j.acm.20241304.12},
  url = {https://doi.org/10.11648/j.acm.20241304.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.20241304.12},
  abstract = {The integer Constant Division (ICD) is the type of integer division in which the divisor is known in advance, enabling pre-computing operations to be included. Therefore, it can be more efficient regarding computing resources and time. However, most ICD techniques are restricted by a few values or narrow boundaries for the divisor. On the other hand, the main approaches of the division algorithms, where the divisor is variable, are digit-by-digit and convergence methods. The first techniques are simple and have less sophisticated conversion logic for the quotient but also have the problem of taking significantly long latency. On the contrary, the convergence techniques rely on multiplication rather than subtraction. They estimate the quotient of division providing the quotient with minimal latency at the expense of precision. This article suggests a precise, generic, and novel integer division algorithm based on sequential recursion with fewer iterations. The suggested methodology relies on extracting the division results for non-powers-of-two divisors from those for the closest power-of-two divisors, which are obtained simply using the right bit shifting. To the authors’ best knowledge of the state-of-the-art, the number of iterations in the recurrent variable division is half the divisor bit size, and the Sweeney, Robertson, and Tocher (SRT) division, which is named after its developers, involves log2(n) iterations. The suggested algorithm has an [(m/(n-1))-1] number of recursive iterations, where m and n are the number of bits of the dividend and the divisor, respectively. The design is simulated in the Vivado tool for validation and implemented with a Zynq UltraScale FPGA. The technique performance depends on the number of nested divisions and the size of a LUT. The two factors change according to the value of the divisor. Nevertheless, the size of the LUT is proportional to the range and the number of bits of the divisor. Furthermore, the equation that controls the number of nested blocks is illustrated in the manuscript. The proposed technique applies to both constant and variable divisors with a compact hardware area in the case of constant division. The hardware implementation of constant division has unlimited values for dividends and divisors with a compact hardware area in the case of large divisors. However, using the design in the hardware implementation of variable division is up to 64-bit dividend and 12-bit divisor. The result analysis demonstrates that this algorithm is more efficient for constant division for large numbers.},
 year = {2024}
}

Copy | Download