Reinforcement Learning Based Neuro-fuzzy Controller for Coffee Roasting Process

Abiy Amare; Solomon Seid

doi:doi:10.11648/j.acis.20251302.12

Research Article |

| Peer-Reviewed

Reinforcement Learning Based Neuro-fuzzy Controller for Coffee Roasting Process

Abiy Amare^*

, Solomon Seid

Published in Automation, Control and Intelligent Systems (Volume 13, Issue 2)

Received: 15 July 2025 Accepted: 28 July 2025 Published: 26 August 2025

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Supervised learning is mainly used to optimize Adaptive Neural Fuzzy Inference System (ANFIS) controllers. In order to generate data for supervised learning, a controller is designed and optimized using Particle Swarm Optimization (PSO) or any other algorithms. This paper proposes and compares reinforcement learning based ANFIS and Approximate Reasoning Intelligent controller (ARIC) controllers. Reinforcement learning based ANFIS reduces the work flow required to train it by directly optimizing the membership functions using Proximal Policy Optimization (PPO) algorithm. ANFIS and ARIC neuro fuzzy controllers are designed for nonlinear dynamics of coffee roasting process using Schwartzberg’s model. A custom layer is designed for every membership function and fuzzy inference operations using MATLAB’s Deep Learning Toolbox. This neural connectionist model of ANFIS and ARIC is used as actor. The critic which evaluates the goodness of action taken is a two-layer neural network with sigmoidal activation function. Simulink environment is also created to represent the dynamics of coffee roasting process. The agent is trained to track roast profile for 50 episodes. The training converged at 50th iteration. After training, the Root Mean Square Error (RMSE) for ARIC architecture reduced from 0.5134 to 0.08122. Similarly, the RMSE of ANFIS improved from 0.2026 to 0.0624.

Published in	Automation, Control and Intelligent Systems (Volume 13, Issue 2)
DOI	10.11648/j.acis.20251302.12
Page(s)	31-48
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Reinforcement Learning, ANFIS, ARIC, PPO, Schwartzberg Model, Spouted Bed Roasting

1. Introduction

Neural network and fuzzy system are combined due to the need to tune fuzzy systems to obtain optimal performance. According to Ajiht

[1]

, one drawback of neural network is its incapability to accommodate prior knowledge in the learning process. In addition, the black box nature of neural networks is fundamental drawback for control system applications. On the contrary fuzzy systems have capability to accommodate prior knowledge in the form of if-then rules and input-output membership functions. To incorporate prior knowledge in to Fuzzy Inference System (FIS) it is mandatory to have know-how on how inputs affect output. This know-how can be obtained from an expert, here in this case the control engineer is responsible to generate if-then rules using knowledge from control engineering. By combining neural network and fuzzy systems it is possible to gain advantages of the two paradigms.

Ivan Petrovic

[2]

designed a reinforcement learning based Generalized Approximate Reasoning Intelligent Controller (GARIC) for inverted pendulum on a cart system. Ivan utilized weak reinforcement signals, as described in Equation (1), but these weak reinforcement signal yield unsatisfactory result.

r = \{\begin{matrix} - 1 failure state \\ 0 any other state \end{matrix}

(1)

The author also experimented with more informative reinforcement signals as shown Equation (2). This elaborate signal enabled the system to reduce error and controller effort.

r = \{\begin{matrix} - 1 failure state \\ - \frac{| x |}{x_{msx}} any other state \end{matrix}

(2)

Miaolei Zhou proposed GARIC neural fuzzy controller for Magnetic Shape Memory Alloy Actuator to control displacement

[3]

. According to S. Röbler Magnetic Shape Memory (MSM) is a phenomenon where shape or size of magnetic material changes when magnetic field changes

[4]

. The author used gaussian membership functions in fuzzification layer, and a lookup table to represent crisp defuzzied outputs. Mohammad Hossein Fazel Zarandi and Javid Jouzdani

[5]

proposed a GARIC architecture in which Action Evaluation Network (AEN) has similar structure with Action Selection Network (ASN) but only consequent labels of both AEN are updated during learning.

HA DUC NGUYEN proposed reinforcement learning based Adaptive Neural Fuzzy Inference System (ANFIS) controller for Maximum Power Point Tracking (MPPT) control of variable speed wind turbine. The author implemented ANFIS structure for actor and critic representation in reinforcement learning algorithm

[6]

. Mohamed Elsisi proposed ANFIS controller for Blade Pitch Controller for Wind Energy Conversion Systems Against Wind Speed Fluctuations. The author utilized supervised learning method to optimize ANFIS architecture. The data set needed to train ANFIS is obtained using Mayfly Optimization Algorithm (MOA) optimized Proportional Integral Derivative (PID) controller. Machrus Ali

[7]

has also presented such controller for photovoltaic axis tracking. Genetic Algorithm, Particle Swarm Optimization and Ant Colony optimization are also used to optimize ANFIS controllers

[7-10]

. Proximal Policy Optimization, (PPO) reinforcement learning algorithm is used by Ming Chen to control classical cartpole system. The author specifically proposed adaptable learning policy rate PPO and compared to a fixed learning rate PPO

[11]

Wei He used reinforcement learning based controller to control flexible two link robot. Both the actor and critic are represented as neural networks

[12]

Jonas Degrave proposed a neural network controller optimized by using maximum a posteriori policy optimization (MPO) for Magnetic control of tokamak plasmas. The author stated that the control objective is to shape and maintain high temperature plasma in tokamak vessel. Tokomak is a machine used to produce nuclear fusions. According to Jonas in order to maintain stable plasma position in is mandatory to use feedback control. Jonas Degrave trained a neural network on a simulated environment that sufficiently represents the dynamics of tokamak machine. To represent the actor the author used four-layer feedforward network. The size of actor neural network is limited to computational capacity of hardware on which the policy will be implemented, but there is no restriction on critic network. The author did not perform real time training of the neural network and directly implemented it on 10kHz control hardware

[13]

2. Coffee Roasting Process Model

In spouted bed roasters hot air is used to heat and agitate coffee beans in a roasting chamber. Spouted bed roaster machines have three basic components; a blower, a heater, and a roasting chamber. The blower is used blow air at higher pressure and velocity so that coffee beans are agitated by the upward pressure. Air at higher velocity passes through a heating element picking up heat energy from resistive heating elements. The roasting chamber is where green coffee beans are placed.

[7]	M. Ali, T. Fahmi, H. Nurohmah, H. Suyono, and M. A. Muslim, “Optimization on PID and ANFIS Controller on Dual Axis Tracking for Photovoltaic Based on Firefly Algorithm.”
[8]	H. Vinh Nguyen, H. Chi Minh City, H. Nguyen, M. Tien Cao, and K. Hung Le, “Performance Comparison between PSO and GA in Improving Dynamic Voltage Stability in ANFIS Controllers for STATCOM,” 2019. [Online]. Available: www.etasr.com View Article
[9]	N. Hamouda, B. Babes, S. Kahla, A. Boutaghane, A. Beddar, and O. Aissa, “ANFIS Controller Design Using PSO Algorithm for MPPT of Solar PV System Powered Brushless DC Motor Based Wire Feeder Unit,” in 2020 International Conference on Electrical Engineering, ICEE 2020, Sep. 2020. https://doi.org/10.1109/ICEE49691.2020.9249869 View Article
[10]	B. Selma, S. Chouraqui, and U. Artois, “Hybrid ANFIS-ant colony based optimisation for quadrotor trajectory tracking control Hassane Abouaïssa,” 2020.

[16]	H. R. Berenji, “A Reinforcement Learning-Based Architecture for Fuzzy Logic Control,” 1992.
[17]	Hung T. Nguyen and Michio Sugeno, Fuzzy Systems. Springer US, 1998. https://doi.org/10.1007/978-1-4615-5505-6 View Article

[16]	H. R. Berenji, “A Reinforcement Learning-Based Architecture for Fuzzy Logic Control,” 1992.
[19]	H. R. Berenji and P. Khedkar, “Learning and Tuning Fuzzy Logic Controllers Through Reinforcements,” 1992.

Membership Function	Before Optimization	Learnable Parameters
Negative big error	[-20 -20 -2 -1]	[-20 -20 c d]
Negative small error	[-2 -1 0.1]	[a b 0.1]
Positive small Error	[-0.1 1 2]	[-0.1 b c]
Positive big Error	[1 2 300 300]	[a b 300 300]
Negative error derivative	[-0.01 -0.01 -0.002 0]	None
Zero error derivative	[-0.002 0 0.002]	None
Positive error derivative	[0 0.02 0.01 0.01]	None
Low Voltage	[-20 -20 125]	[b b c]
Medium Low Voltage	[100 150 200]	[a 150 b]
Medium High Voltage	[160 200 240]	[a 200 b]
High Voltage	[200 240 240]	[a b b]

Membership Function	Before Optimization	Learnable Parameters
Negative big error	[-20 -20 -2 -1]	[-20 -20 -0.029004676 -6.4834234e-07]
Negative small error	[-2 -1 0.1]	[-0.029004676 -6.4834234e-07 0.1]
Positive small Error	[-0.1 1 2]	[-0.1 6.4834234e-07 0.029004676]
Positive big Error	[1 2 300 300]	[6.4834234e-07 0.029004676 300 300]
Negative error derivative	[-0.01 -0.01 -0.002 0]	None
Zero error derivative	[-0.002 0 0.002]	None
Positive error derivative	[0 0.02 0.01 0.01]	None
Low Voltage	[1 1 22.75]	[6.4834234e-07 6.4834234e-07 23.780174]
Medium Low Voltage	[1 1 150]	[6.4834234e-07 6.4834234e-07 146.25401]
Medium High Voltage	[1 1 200]	[6.4834234e-07 6.4834234e-07 196.25311]
High Voltage	[1 1 227.5]	[6.4834234e-07 6.4834234e-07 223.75261]

AEN	Action Evaluation Network
ANFIS	Adaptive Neural Fuzzy Inference System
ARIC	Approximate Reasoning Intelligent Controller
ASN	Action Selection Network
FIS	Fuzzy Inference System
GARIC	Generalized Approximate Reasoning Intelligent Controller
MOA	Mayfly Optimization Algorithm
MPO	Maximum Aposteriori Policy Optimization
MPPT	Maximum Power Point Tracking
MSM	Magnetic Shape Memory
PID	Proportional Integral Derivative
PPO	Proximal Policy Optimization
PSO	Particle Swarm Optimization
RGA	Relative Gain Array
SAM	Stochastic Action Modifier

[1]	A. Abraham, “BEYOND INTEGRATED NEURO-FUZZY SYSTEMS: REVIEWS, PROSPECTS, PERSPECTIVES AND DIRECTIONS.” [Online]. Available: http://ajith.softcomputing.netPetrovic
[2]	Petrovic, K. Macek, and N. Peru, “A KNOWLEDGE-BASE GENERATING FUZZY-NEURAL CONTROLLER,” 2000.
[3]	M. Zhou, B. Hu, W. Gao, and J. Wang, “Reinforcement Learning Fuzzy Neural Network Control for Magnetic Shape Memory Alloy Actuator,” International Journal of Control and Automation, vol. 7, no. 6, pp. 109-122, Jun. 2014, https://doi.org/10.14257/ijca.2014.7.6.11
[4]	S. Rößler et al., “Two types of magnetic shape-memory effects from twinned microstructure and magneto-structural coupling in Fe1+yTe,” Proc Natl Acad Sci U S A, vol. 116, no. 34, pp. 16697-16702, Aug. 2019, https://doi.org/10.1073/pnas.1905271116
[5]	M. H. F. Zarandi, J. Jouzdani, and I. B. Turksen, “Generalized reinforcement learning fuzzy control with vague states,” Advances in Soft Computing, vol. 41, pp. 811-820, 2007, https://doi.org/10.1007/978-3-540-72432-2_81
[6]	N. T. T. Vu, H. D. Nguyen, and A. T. Nguyen, “Reinforcement Learning-Based Adaptive Optimal Fuzzy MPPT Control for Variable Speed Wind Turbine,” IEEE Access, vol. 10, pp. 95771-95780, 2022, https://doi.org/10.1109/ACCESS.2022.3205124
[7]	M. Ali, T. Fahmi, H. Nurohmah, H. Suyono, and M. A. Muslim, “Optimization on PID and ANFIS Controller on Dual Axis Tracking for Photovoltaic Based on Firefly Algorithm.”
[8]	H. Vinh Nguyen, H. Chi Minh City, H. Nguyen, M. Tien Cao, and K. Hung Le, “Performance Comparison between PSO and GA in Improving Dynamic Voltage Stability in ANFIS Controllers for STATCOM,” 2019. [Online]. Available: www.etasr.com
[9]	N. Hamouda, B. Babes, S. Kahla, A. Boutaghane, A. Beddar, and O. Aissa, “ANFIS Controller Design Using PSO Algorithm for MPPT of Solar PV System Powered Brushless DC Motor Based Wire Feeder Unit,” in 2020 International Conference on Electrical Engineering, ICEE 2020, Sep. 2020. https://doi.org/10.1109/ICEE49691.2020.9249869
[10]	B. Selma, S. Chouraqui, and U. Artois, “Hybrid ANFIS-ant colony based optimisation for quadrotor trajectory tracking control Hassane Abouaïssa,” 2020.
[11]	M. Chen, H. K. Lam, Q. Shi, and B. Xiao, “Reinforcement Learning-Based Control of Nonlinear Systems Using Lyapunov Stability Concept and Fuzzy Reward Scheme,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 10, pp. 2059-2063, Oct. 2020, https://doi.org/10.1109/TCSII.2019.2947682
[12]	W. He, H. Gao, C. Zhou, C. Yang, and Z. Li, “Reinforcement Learning Control of a Flexible Two-Link Manipulator: An Experimental Investigation,” IEEE Trans Syst Man Cybern Syst, vol. 51, no. 12, pp. 7326-7336, Dec. 2021, https://doi.org/10.1109/TSMC.2020.2975232
[13]	J. Degrave et al., “Magnetic control of tokamak plasmas through deep reinforcement learning,” Nature, vol. 602, no. 7897, pp. 414-419, Feb. 2022, https://doi.org/10.1038/s41586-021-04301-9
[14]	A. Allen, L. Allen, D. Geier, B. Miller, F. Advisor, and R. Diersing, “Fluid-Bed Coffee Roaster.” [Online]. Available: http://creativecommons.org/publicdomain/zero/1.0/
[15]	C. T. (Ching T. Lin and C. S. G. (C. S. G. Lee, Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems. Prentice Hall PTR, 1996.
[16]	H. R. Berenji, “A Reinforcement Learning-Based Architecture for Fuzzy Logic Control,” 1992.
[17]	Hung T. Nguyen and Michio Sugeno, Fuzzy Systems. Springer US, 1998. https://doi.org/10.1007/978-1-4615-5505-6
[18]	Nikam S R, N. P. J. And, and Kulkarni S P, “FUZZY LOGIC AND NEURO-FUZZY MODELING Journal of Artificial Intelligence,” vol. 3, no. 2, 2012, [Online]. Available: http://www.bioinfo.in/contents.php?id=71
[19]	H. R. Berenji and P. Khedkar, “Learning and Tuning Fuzzy Logic Controllers Through Reinforcements,” 1992.

	e
de	NbE	NsE	PsE	PbE
NdE	LV	LV	MLV	HV
ZdE	LV	MLV	MHV	HV
PdE	LV	MHV	HV	HV