Artificial Neural Network Approach for the Identification of Clove Buds Origin Based on Metabolites Composition

This paper examines the use of artificial neural network approach in identifying the origin of clove buds based on metabolites composition. Generally, large data sets are critical for accurate identification. Machine learning with large data sets lead to precise identification based on origins. However, clove buds uses small data sets due to lack of metabolites composition and their high cost of extraction. The results show that backpropagation and resilient propagation with one and two hidden layers identifies clove buds origin accurately. The backpropagation with one hidden layer offers 99.91% and 99.47% for training and testing data sets, respectively. The resilient propagation with two hidden layers offers 99.96% and 97.89% accuracy for training and testing data sets, respectively.


Introduction
There is a variation in the flavour and aroma of different plantation commodities.For example, in Indonesia, the clove buds from Java have a prominent wooden aroma and sour flavour while those in Bali have a sweet-spicy flavour [1].Arabica coffee from Gayo has a lower acidity and a strong bitterness.In contrast, coffee from Toraja has a medium browning, tobacco, or caramel flavour, not too acidic and bitter.Furthermore, Kintamani coffee from Bali has a fruit flavour and acidity, mixed with a fresh flavour.Contrastingly, Coffee from Flores has a variety of flavours ranging from chocolate, spicy, tobacco, strong, citrus, flowers and wood.Coffee from Java has a spicy aroma while that from Wamena has a fragrant aroma and without pulp [2].The specific flavours and aromas are attributed to the composition of commodities' metabolites.Generally, specific metabolite is responsible for particular flavours and aroma.For this reason, it is vital to recognize the characteristics of each plantation commodity based on the metabolite composition.This study investigates the origin of clove buds.This helps to maintain the flavour of a product using clove buds as a mixture.Also, the characteristics of food products can be predicted based on the origin of clove buds used due to the differences in flavour and taste between regions [3].
Metabolic profiling is a widely used approach in obtaining information related to metabolites contained in a biological sample.This is a quantitative measurement of metabolites from biological samples [4,5].To give meaning to the metabolites data sets, the chemo-metrics technique was developed.This is a chemical sub-discipline that uses mathematics, statistics and formal logic to gain knowledge about chemical systems.It provides maximum relevant information by analysing metabolites data sets from biological samples [6].Additionally, it is used in pattern recognition of metabolites data sets in complex chemical systems [3].Pattern recognition in biological samples identifies specific metabolites or biomarkers that form a particular flavour and aroma.
Artificial neural networks have been widely used in pattern recognition [7] and other applications in various fields as shown by some researchs [8][9][10][11][12][13].However, it has not been fully implemented, especially in clove buds.The small data sets available limit the implementation of artificial neural networks for clove buds.This is attributed to the lack of metabolite composition in the clove buds and the high cost for extracting them.Furthermore, some clove buds have zero metabolite concentration.However, this is because of inefficient tools in the laboratory to detect metabolites whose values are very small.Therefore, this study implements artificial neural networks as pattern recognition in clove buds data sets.Each origin of clove buds has specific metabolites as a biomarker.therefore, there are twelve regions in total.In the laboratory, eight experiments are carried out in each region, except for Java with only six experiments.For each experiment, 47 types of metabolites were recorded.In the matrix, data sets are 94 × 47.The row and column represent the number of experiments and metabolites, respectively.

Data preprocessing
In total, the clove buds data sets have a wide range, specifically between 10 −4 and 10.Therefore, logarithmic transformations are used to obtain reliable numerical data.Since some metabolites data have zero concentration, logarithmic transformation cannot be applied directly.This is because their concentrations range below the specified threshold.The metabolite data with zero concentration are not removed because of acting as biological markers.Therefore, they are replaced with a value of one order smaller than the smallest concentration available.In this case, the zeros are replaced with 10 −5 .Before implementing artificial neural networks, one stage preprocessing clove buds data sets from [14] are added to normalize the values of metabolites data.The normalization ensures that each data has the same influence or contribution to determine its origin.The following normalization formula is used [15] Here, z kl is the result of normalization of x kl , x is the mean of the k-th experiment and s is (2)

Artificial neural network
Artificial neural networks are a false representation of the human brain that simulates the learning process [16].Backpropagation and resilient propagation are learning algorithms widely used in artificial neural networks [17][18][19][20][21][22][23][24][25][26][27].In this study, two different network architectures, including resilient and backpropagation, are used.The first and second architectures consist of two and one hidden layers, respectively.

Backpropagation learning algorithm
The backpropagation learning algorithm is based on the repeated use of chain rules to calculate the effect of each weight in network concerning the error function E [28].
where w ij is the weight from j − th neuron to i − th neuron, o i is the output, and net i is the weighted number of neurons input i.Once the partial derivatives for each weight are known, the goal of minimizing the error function is achieved with gradient descent [28]: where t is iteration and 0 < < 1 the learning rate.
From the Equation ( 4), choosing a large learning rate (close to 1), allows for oscillations.This makes the error fall above the specified tolerance value and lessens the identification accuracy.Conversely, in case the learning rate ( ) is too small (close to 0), many steps are needed for convergence of the error function E.
To avoid these, the backpropagation learning algorithm is expanded by adding the momentum parameter (0 < α < 1) as shown in Equation ( 5).The addition of the momentum parameter also accelerates the convergence of error function [28].
+ α∆w where it measures the effect of the previous step on the current one.
To activate neurons in the hidden and output layers, the sigmoid activation function is used.Three essential properties used in backpropagation and resilient propagation include bounded, monotonic and continuously differentiable.This helps to convert a weighted amount of input into an output signal for each neuron i as shown by Equation ( 6) [29].
where I i is the input of i-th weighted number of neuron, σ the slope parameter of the sigmoid activation function and O i the output of i-th neuron.The threshold used on the output layer for the sigmoid activation function is The weighted amount input is given in the following equation [29].
The sum of i represents the input received from all neurons in the input layer, while B is the bias neuron.Weight w ij is associated with connections from i-th neuron to j-th neuron, while w Bj weight relates to the connections from the biased to j-th neuron.
The weighted amount obtained in the hidden and the output layers are activated by substituting the weighted amount from Equation ( 8) to be an exponent in Equation (6).

2.3.2.
Resilient propagation learning algorithm Riedmiller et al. in [28] proposed a resilient propagation learning algorithm developed by the backpropagation algorithm.The algorithm directly adapts to the weight value based on the local gradient information.Riedmiller et al. [28] introduced an update value ∆ ij for each weight determining the size of the weight update.The adaptive update value evolves during the learning process based on its local sight on the error function E, according to the following learning rule [28]: where (0 < η − < 1 < η + ) η − and η + represents the decrease and increase factors, respectively.According to this adaptation rule, every time the partial derivative of the corresponding weight w ij changes its sign, which indicates that the last update is too big and the algorithm is above the local minimum, the update value ∆ ij is decreased by the factor η − .In case the derivative retains its sign, the update value slightly increases to accelerate the convergence in the shallow regions [28].
Once the update value for each weight is adjusted, the update weight itself follows the rule stating that in case the derivative is positive, the weight is decreased by its update value.If the derivative is negative, the update value is added However, in case the partial derivative sign changes, which means the previous step was too large and the minimum missed, the previous weight update is reverted: Due to the 'backtracking' weight step, the derivative should change its sign once again in the next step.To avoid another problem, there should be no adaptation of the update value in the succeeding step.In practice, this can be carried out by setting ∂E ∂wij (t−1) = 0 in the ∆ ij adaptation rule.The update values and weights are changed every time the whole set of patterns is presented to the network once (learning by epoch).
The following shows the process of adaptation and resilient propagation learning process.The minimum(maximum) operator is expected to provide a minimum or maximum of two numbers.The sign operator returns +1 if the argument is positive, -1 in the case it is negative, and 0 for otherwise.
For each weight and bias{

Results and discussions
In this study, the percentage of training and testing data sets are 80 % and 20 %, respectively.The metabolites data sets in matrix are 94×47.Out of 94 rows, 75 were chosen randomly as training data sets, while the remaining were used as testing sets.The selection of training data sets is carried out randomly 30 times.Therefore, in each network architecture, there are 30 values for the percentage of identification accuracy, coefficient of determination and the mean squared error (M SE).The average is chosen as a representative of the 30 values.In each network architecture, learning rate ( ) 0.9, momentum parameter (α) 0.1 and maximum epoch 5000 are used with an error target of 10 −3 .In this study, each origin is represented by a binary code.Specifically, the binary code for the Java origin is 1000, Bali 0100, Manado 0010 and Toli-Toli 0001.The calculation of the identification accuracy and M SE is shown in Equation ( 13) and (14).
Where a is the number of origins identified correctly, while k is the total number.M SE calculated by the following equation [29] where T kp is the desired target, O kp the network output and p the variable corresponding to the number of origins.The suitability between the expected target and network output was evaluated based on the coefficient of determination R 2 .It was calculated using the following equation [21] Where T kp is the average desired target.
In this study, backpropagation and resilient propagation were used, each consisting of two and one hidden layers.For one hidden layer, the number of neurons was determined using the formula proposed by Shibata and Ikeda in 2009 [30], specifically , where N h , N i , and N o represent hidden, input and output neurons, respectively.In both the backpropagation and resilient propagation, the number of neurons used does not exceed one hidden layer.Based on Shibata and Ikeda [30] formula, the number of neurons in one hidden layer was obtained, specifically 71.However, in this study, it was rounded up to 15 neurons.Some experiments were conducted to evaluate the identification accuracy, and whether using one hidden layer with 15 neurons might lead to a better accuracy of identification than two hidden layers.However, the number of neurons varied, setting less than 15 neurons.For two hidden layers, experiments were conducted with the number of consecutive neurons as follows; 3-5 ( 8), 4-6 (10), 5-7 (12) and 6-8 (14).The number of neurons in the hidden layer never exceeded 15 neurons.

Backpropagation (B-Prop) with two hidden layers
In this section, the backpropagation learning algorithm with two hidden layers was used.The number of neurons in the hidden layer varied with not more than 15 neurons.There were four variations of the network architecture, including 47-3-5-4, 47-4-6-4, 47-5-7-4 and 47-6-8-4.The input layer consists of 47 neurons based on the number of metabolites.The output layer consists of 4 neurons according to the number of clove buds origins.Table 1 shows the network architecture 47-3-5-4 gives the highest value for the identification accuracy and coefficient of determination in training and testing data sets.Similar to the M SE, this network architecture provides the smallest amount of both training and the testing data sets.From Table 1, increasing the number of neurons in the backpropagation with two hidden layers decreases network performance.This is in line with Shafi et al. in 2006 [31], which stated that increasing the number of neurons in the hidden layer only heightened the complexity of the network.Still, it does not increase the accuracy of the pattern recognition.

Backpropagation (B-Prop) with one hidden layer
The backpropagation learning algorithm with one hidden layer was implemented to evaluate its result in the case of a comparison using two hidden layers.The results obtained are shown in Table 2.
Table 2 shows that the network architecture 47-15-4 identifies the clove buds origin effectively.The identification accuracy percentage is 99.91% and 99.47 % for training and testing data sets, respectively.Moreover, the M SE value is also smaller compared to the two hidden layers.
For the backpropagation algorithm, the results show one hidden layer is better than two.This is in line with Villiers and Barnard [32], which stated that a network architecture with one hidden layer is on average better than two hidden layers.They concluded that two hidden layers are more difficult to train.Additionally, they also established that this behaviour is caused by a local minimum problem.The networks with two hidden layers are more prone to the local minimum problem during the training.

Resilient propagation (R-Prop) with two hidden layers
Resilient propagation learning algorithm contains some parameters, including the upper and lower limits as well as the decrease and increase factors.In this study, the range of update values is limited to the upper limit (∆ max ) = 50, the lower limit (∆ min ) = 10 −6 , and the decrease and increase factors (η − ) = 0.5 and (η + ) = 1.2, respectively.The reason for choosing these values is shown in [28].Similar to Section 3.1, the resilient propagation learning algorithm is applied to the network architecture with two hidden layers.
The results in Table 3 show the network architecture 47-5-7-4 gives the highest identification accuracy of the clove buds origin.The percentage of the identification accuracy is 99.96 % and 97.89 % for training and testing data sets, respectively.

Resilient propagation (R-Prop) with one hidden layer
In this section, the resilient propagation learning algorithm is implemented with one hidden layer.Similar to section 3.2, the number of neurons in the hidden layer is 15 neurons, and have the network architecture 47-15-4.Table 4 shows the network architecture 47-15-4 identifies the origin of clove buds with an identification accuracy of 99.86 % and 94.74 % on training and testing data sets, respectively.
The network architecture of the resilient propagation algorithm, both two hidden layers and one hidden layer, provides identification results with a very high  accuracy.However, the network architecture with two hidden layers has a slightly lower accuracy.Tables 3 and 4 show the two-layered resilient propagation with deficient neurons performs better than the single-layer having more neurons.This is in line with Santra et al. [24], which established that the performance of two hidden layers with 8-10 (18) neurons is better that of one hidden layer with 62 neurons.

MSE
The summary of the best identification accuracy and determination coefficient are shown in Figures 1,  2, 3 and 4, respectively.For each network architecture, the smallest MSE in training and testing data sets are shown in Figures 5 and 6, respectively.
The results of the identification from the origins of clove buds have been obtained.In small data set categories, backpropagation with one hidden layer provides an accurate identification in the training and testing data sets.It accurately identifies the origins of clove buds obtained using the resilient propagation algorithm with two hidden layers.
The neural networks model obtained in this paper can be a reference from a scientific perspective.For instance, it can be used in future studies to identify the      origin of various plantation commodities with small metabolites data sets.At the moment, the most appropriate way of determining the origin of a plantation commodity is qualitative, relying on the services of flavourist to evaluate the flavour and taste.This is because each commodity has a specific flavour and taste based on the origin of its region.Furthermore, the different origins of clove buds data sets have not been reported in the literature and thus no direct comparison can be presented in this paper.

Conclusions
This paper demonstrated the potential and ability of a neural network approach with backpropagation and resilient propagation learning algorithms.It was meant to identify the clove buds origin based on metabolites composition.The work was divided into two parts, the first one being an identification of the clove buds origin using the backpropagation learning algorithm.In this algorithm, two network architectures were constructed.One having a single hidden layer and the second one having two.The results showed that the use of one hidden layer gives the clove buds origin identification accurately, specifically 99.91 % and 99.47 % in training and testing data sets, respectively.The second step involved the identification of the clove buds origin using a resilient propagation learning algorithm.In this algorithm, two network architectures were constructed.One having a single hidden layer and the second one having two.The results showed that the use of two hidden layers gives an accurate clove buds origin identification, including 99.96 % and 97.89 % in training and testing data sets, respectively.From these results, it was concluded that for an identification of small metabolites data sets from a plantation commodity, the backpropagation algorithm with one hidden layer and the resilient propagation algorithm with two hidden layers should be used.This paper also confirmed the contribution of artificial neural networks to the pattern recognition of metabolites data sets obtained by the metabolic profiling technique.

Figure 1 .
Figure 1.Identification accuracy percentage of training data sets.

Figure 2 .
Figure 2. Identification accuracy percentage of testing data sets.

Figure 3 .
Figure 3. Determination coefficient of training data sets.

Figure 4 .
Figure 4. Determination coefficient of testing data sets.

Figure 5 .
Figure 5. M SE of training data sets.

Figure 6 .
Figure 6.M SE of testing data sets.

Table 1 .
Backpropagation with two hidden layers.

Table 2 .
Backpropagation with one hidden layer.

Table 3 .
Resilient propagation with two hidden layers.

Table 4 .
Resilient propagation with one hidden layer.