Abstract
India, with a population of 1.43 billion, is the most populous country in the world, necessitating more significant food production. To ensure this food production, the country’s farmers must focus on high-yielding and draught-tolerant varieties of the crops. However, just keeping an eye on this will not solve the purpose. Reliable crop yield estimation well before the sowing season is essential for planning and management in light of the changing climate. Knowing the expected yield of one’s standing crops is crucial to farmers and can be a complicated task in and of itself. In this regard, modern artificial intelligence algorithms have shown to be handy tools for accurately predicting agricultural production. With this view, in this present study and attempt was made to develop multi-crop yield prediction models using an Adaptive Neuro Fuzzy Interference System (ANFIS). The crop yield is significantly influenced by climatic variables such as rainfall, minimum and maximum temperatures, relative humidity, and evaporation. Therefore, these variables were selected as input parameters, and yield of five significant crops, Kharif rice, sorghum, maize, groundnut, and Sugarcane, were selected as the output for the model development. With acceptable accuracy, the developed models have functioned well. The association between climatic variables and the agricultural production of the crops under study was disclosed by the ANFIS, and the accuracy of the rule validated this relationship. It was found that the rules formed in ANFIS accurately predicted each crop’s crop yield and, therefore, emerged as a novel technique to predict seasonal multi-crop yield. It is also proposed that the ANFIS technique is best suited to predict the seasonal multi-crop yield of the semi-arid region of India.
Similar content being viewed by others
Introduction
India is the world’s most populated country, with over 1.43 billion population1. Nearabout 82% of its farmers are small and marginal, and around 70% of the country’s rural population still depends on agriculture for their livelihood. Total food grain production in India is estimated at a record 330.5 MT for 2022–23. This is higher by almost 15 MT compared to 2021–22. Total production during 2022–23 is estimated at a record high of 135.5 MT, 35.9 MT, 27.5 MT, and 49.42 MT for rice, maize, pulses, and Sugarcane, respectively2. It is also the world’s second-largest producer of fruit and vegetables, contributing 10.9% and 8.6% of the world’s output of these foods. Worldwide, India stands second in producing rice, wheat, sugarcane, cotton, and groundnuts3. However, food security is a major issue that plays an important role in feeding a such large population. The key factor in addressing the problems associated with food security is the effective management of agricultural output to meet basic nutritional needs. The state governments, decision-makers, and associated organizations are most significant in establishing new policies or specific alterations in the current policies to increase the farmers’ net production and revenue4,5. Therefore, for decision-making about issues related to food security, an accurate assessment of crop yields is crucial6. There are numerous approaches for predicting yield; the traditional approach of yield prediction is crop status estimation by professionals. Tillage operations, seed quality, seed maturity rate, plant mortality rate, irrigation, soil type, growing stages, maturity, flowering and fruit, pesticide use, climatic conditions, fertilizer choice, harvesting stage, and so on are all observed and measured during the crop growing season7,8,9. The yield can be predicted using regression algorithms or local expert knowledge based on the data gathered in this manner. Remote sensing, crop simulation models, and artificial intelligence are also employed to predict crop production10,11,12,13,14. The fundamental goal of yield prediction is to provide precise, scientifically sound, and independent predictions of agricultural yield as early as feasible throughout the crop’s growing season while considering climate and environment15,16,17.
In recent years, artificial intelligence techniques like machine learning (ML), deep learning (DL), and miscellaneous algorithms have emerged as compelling techniques for predicting crop yields because of their accuracy18,19,20. Artificial intelligence is nothing but the simulation of human intelligence processes by machines, especially computer systems. Several ML, DL, and inductive algorithms are available in the era of technology, but every algorithm has advantages and limitations21,22,23,24. Bosma et al.25 analysed agricultural yield models by combining agriculture and aquaculture, employing a decision tree-based model with fuzzy rules. The created model enabled a farmer to use many aquaculture systems simultaneously. Further, the fuzzy model simulation accurately predicted the frequency distribution of fish production systems; however, it did not perform well when identifying individual farmers. To improve simulation accuracy, added a third layer to the decision tree and replaced the combined variables with fuzzy rules26,27,28. Naderloo et al.29 evaluated ANFIS models for estimating wheat grain yield depending on energy inputs. Furthermore, artificial neural networks (ANNs) were developed, and their performance was compared to ANFIS models. The results showed that the ANFIS model predicts yields more precisely than the ANN model. Meena and Singh30 investigated agricultural yield forecasting using neural networks (NN) for pearl millet (bajra) in the Jaipur region of Rajasthan, India. Using time series data, the suggested method uses a backpropagation artificial neural network to forecast the crop’s present yield relative to environmental parameters. The acquired findings are encouraging and substantially better than a recent fuzzy time series-based technique for forecasting. Khoshnevisan et al.31 tested numerous adaptive neuro-fuzzy inference system (ANFIS) models for predicting tomato and cucumber yields based on energy input.
The results showed that the ANFIS model can forecast yield more precisely than the ANN. Wang et al.32 created a deep learning model to forecast soybean crop yields in Argentina and then used a transfer learning strategy to predict Brazil’s soybean harvests using a smaller quantity of data. Khaki and Wang33 developed a deep neural network model to predict corn yield over 2,247 sites from 2008 to 2016. Their model outperformed other methods, such as regression trees, shallow neural networks, and Lasso and Kim et al.34 developed a deep neural network model for crop yield prediction using optimal input variables from satellite products and meteorological datasets between 2006 and 2015. Maya Gopal and Bhargavi35 created a hybrid MLR-ANN technique for crop yield prediction. Kuradusenge et al.36 forecasted the yield of Irish potatoes and maize using three machine learning techniques: Random Forest, Polynomial Regression, and Support Vector Machine. The results indicate that Random Forest is the recommended early crop yield prediction model. Ahmed37 created a software framework for calculating maize production using a modified multi-layer perceptron. The Multi-Layer Perceptron model and the Spider Monkey Optimization method were used to predict yield.
The suggested MLP-SMO model’s prediction performance was examined using different evaluation metrics. The model outperformed in the prediction process with a Root-Mean-Square Error value of 0.11, which is the lowest among all strategies considered in the statistical analysis in their study. Jhajharia et al.38 estimated crop yield for seven crops farmed in Rajasthan, India, using a variety of machine learning approaches (RF, SVM, GD, LSTM, and LASSO regression). The RF approach outperformed the others, with 0.963 R2, 0.035 RMSE, and 0.0251 MAE. Borse and Agnihotri39 developed the multi-crop yield prediction models using the decision tree (m5tree) algorithm, and their analysis indicated that the yield prediction of Kharif rice, sorghum, maize, groundnut, and Sugarcane was mostly influenced by the average rainfall followed by average temperature, relative humidity, and evaporation. Esmaili et al.40 accessed the adaptive neuro-fuzzy inference system (ANFIS) to predict lettuce production and water productivity in response to different light intensities and CO2 concentrations. Their study found that ANFIS can be used as a reliable model for agricultural water management and crop performance. Further, it also predicted different growth parameters and stomatal characteristics with acceptable performance. Khosla et al.41 predicted the yield of Bajra, maize, rice, and ragi using the MANNs-SVR hybrid model. Kim et al.42 predicted the yields of Corn and soybean using MARS, SVM, RF, extra tree, ANN, and DNN techniques. Cedric et al.43 created three machine-learning models for yams, bananas, seed cotton, cassava, rice, maize, and cassava. The forecast was created nationally and is done throughout the year. Based on a statistical examination of the whole data, they discovered that their chosen crops are among the most consumed and farmed foods in West Africa.
It was found that numerous artificial intelligence techniques were used to model crop yield prediction. It was noticed that very few studies (Khosla and Kim, Cedric, Kuradusenge) have considered more than one crop in the process of developing the crop yield prediction models. Most of the studies are based on a single crop, but it is significant to predict the cop yield of multiple crops as farmers are not dependent on only one crop during the Kharif season. Despite considering more than one crop for yield prediction, those studies lack the selection of multi-climatic parameters to predict multi-crop yield. Furthermore, it was also noticed that while developing crop yield prediction models, most of the studies only consider one or two input climatic parameters (rainfall or temperature), but the fact that the crop yield is not only dependent on these parameters but also minimum and maximum temperature, evaporation, humidity etc. Also, the accuracy of the models already developed for implementation is a matter of concern when applicable in actual field conditions. It is also revealed that the ANFIS model can forecast yield more precisely than the ANN.
In this regard, an attempt has been made to develop the multi-crop yield prediction models by considering climatic parameters for five major Kharif crops, namely Rice, Sorghum, Maize, Groundnut, and one perennial cash crop, i.e., Sugarcane grown in the Nashik region using the Adaptive Neuro Fuzzy Interference System (ANFIS). The significance of this study is that it considers multiple input climatic parameters and more than a single crop while developing crop yield prediction models. Further, this study addresses the gap in development of multi-crop yield prediction models using multi climatic parameters. Doing so makes a meaningful contribution to the literature.
Materials and methods
Study area and data collection
The study mainly focuses on the Nashik region of Maharashtra (India). The Nashik region lies between 19° 35′ 6.37″ N to 20° 32′ 51″ N latitude and 73° 25′ 9.79″ E to 74° 22′ 1.15″ E longitudes in the Maharashtra state of India. The total geographical area of the Nashik region is about 6376.93 km2. The semi-arid region of the state of Maharashtra experiences a dry season that lasts from the beginning of October until the middle of June and a rainy season that lasts from June until the beginning of October. Summers in Nashik can get quite hot, with daytime temperatures ranging from 30 to 42 °C. The region experiences dry weather with occasional heat waves, especially in May and June. Nights are relatively cooler compared to the daytime. The temperature during the monsoon season ranges from 25 to 35 °C. Nashik receives moderate to heavy rainfall approximately 713.50 mm during the monsoon months, primarily from the southwest monsoon. The rain is beneficial for agriculture, as the region is known for its vineyards, and the temperature becomes more pleasant due to the rain. Winters in Nashik are mild and pleasant, with temperatures ranging between 10 and 28 °C. Humidity levels tend to be higher during the monsoon season and lower during the summer and winter. The geographical location of the Nashik region is shown in Fig. 1 created by the authors using QGIS software.
For the present study, station wise climatological data, including daily rainfall, daily minimum and maximum temperature, evaporation, and relative humidity, were obtained from WRDHP, Nashik (Maharashtra) for the seven study stations viz. Nashik, Trimbakeshwar, Dindori, Igatpuri, Sinnar, Niphad, and Peint from 1987 to 2020, (i.e., 33 years) for developing the individual crop yield models. Following that, block wise seasonal crop yield statistics for all 07 stations were obtained from the Agriculture Department, Government of Maharashtra, Pune, for the aforementioned period (i.e., from 1987 to 2020, 33 years). The datasets utilized for this investigation are included in Table 1 below, along with their descriptions.
Methodology
The primary goal of this study is to use the ANFIS technique to develop agricultural yield prediction models for each Kharif crop, including rice, sorghum, maize, groundnut, and one cash crop, Sugarcane, grown in the study area. The obtained climatological and crop yield data were statistically verified for outliers before analysis. At the level of 95% significance, outliers were examined. The current outliers were then eliminated. Further, the models’ performance was assessed using the simple train-test split approach. The dataset was specifically split into two parts: Seventy percent of the datasets were used in the model training phase, while the remaining thirty percent were held aside for further model testing. The approach is generally acknowledged and has been used in numerous studies44,45,46.
Input parameter selection and data synthesis
Although several factors affect the crop yield, in this present study an attempt was taken to develop individual crop yield models by considering the maximum effect of climatic parameters such as Average Rainfall (R), Minimum Average Temperature (Tmin), Maximum Average Temperature (Tmax), Average Evaporation (E), and Relative Humidity (Rh) on the crop growth and its yield. However, developing yield prediction models using novel AI techniques often requires large amount of training data to perform well47. In this regard, the number of datasets were increased of each input parameter, as well as observed yield by using normal and inverse normal distributions as the availability of the seasonal datasets are not enough to achieve the desired model accuracy. To increase the amount of data in data drive modelling generally normal and inverse normal distributions techniques can be used.
In this study, the distribution method was used for increasing data using normal and inverse normal distributions with equations. Suppose we have a normal distribution with mean μ and standard deviation σ. The probability density function (PDF) of the normal distribution is given by Eq. (1)
where x is a random variable, μ is the mean of the distribution, σ is the standard deviation of the distribution, e is the base of the natural logarithm, and π is the mathematical constant pi.
Similarly, samples were generated from an inverse normal distribution using the inverse Cumulative Distribution Function (CDF) method. The inverse normal distribution, also known as the Gaussian inverse, is the inverse of the standard normal distribution. The PDF of the inverse normal distribution is given by Eq. (2)
where x is a random variable, μ is the mean of the distribution, σ is the standard deviation of the distribution, e is the base of the natural logarithm, and π is the mathematical constant pi.
To generate samples from this distribution, the inverse CDF method was used. The CDF of the inverse normal distribution is given by Eq. (3)
where f(t) is the PDF of the inverse normal distribution, numerical methods were used to calculate the CDF, or we can use lookup tables that provide pre-calculated values of the CDF for a range of values of x. Once we have the CDF, we can use the inverse CDF method to generate samples from the distribution. The inverse CDF method generates a random value u between 0 and 1 and then finds the value x such that F(x) = u. The value x is the generated sample. After performing the normal distributions, the no. of datasets generated for each crop is represented in the results and discussion section table. We used the seasonal average of all the input parameters to achieve maximum accuracy in developing the ANFIS-based crop yield prediction models. A proposed methodology for developing crop yield models for the five major crops using the ANFIS is shown in Fig. 2.
ANFIS
The ANFIS fuzzy algorithm is built on the Takagi–Sugeno-Kang (TSK) fuzzy inference system. ANFIS is a powerful soft computing technology that builds on the foundations of two powerful computing techniques: artificial neural networks and neural networks. ANFIS uses linguistic information from fuzzy logic and an ANN’s learning ability to automatically generate fuzzy if–then rules and optimize parameters. This method is suitable for modeling non-linear systems by interpolating between multiple linear models. In Takagi–Sugeno method, the first step in fuzzy inference process is fuzzifying the inputs and then applying the fuzzy operator. The Takagi–Sugeno output membership functions are either linear or constant48. In this neural fuzzy method, the consequents of the Takagi–Sugeno (TS) fuzzy rules are linear combinations of their preconditions. In other words, ANFIS has a feed-forward neural network structure where each layer is a neuro-fuzzy system component49. Five significant and important adjustments were made in the structure of ANFIS network so that the best adjustment [needed] to reach the most effective ANFIS model could be found, and the best model with minimum errors could be selected. These settings include the number of membership functions, types of membership functions (triangular, trapezoidal, bell-shaped, gaussian and sigmoid), types of output membership function (constant or linear), optimization methods (hybrid or back propagation) and the number of epochs. In order to create FIS using ANFIS, fuzzy logic toolbox of MATLAB, version R2016a was used50.
Figure 3 shows an ANFIS input interference panel in MATLAB 2016a.
A step-by-step procedure was followed to predict the crop yield. Procured meteorological and crop yield data is subjected to stationarity and needs missing analysis to remove biases in the data. After this, the quality data was used to develop the model to predict the crop yield while considering the abovementioned climatic parameters, which significantly affect crop yield.
ANFIS has many advantages over standalone computing tools like the FUZZY system and artificial neural networks. Creating rules is critical in the FUZZY system; predictions depend on how the rules are created. Making all such rules will get more complete as the parameters are increased. ANFIS solves this problem with the assistance of ANN, which optimizes the parameters and creates the rules. The FUZZY inference system receives these rules. ANFIS incorporates fuzzy logic, which allows for human-readable rules and reasoning. The system can express relationships in the form of linguistic rules (e.g., “If temperature is high, then pressure is low”), making it easier for humans to interpret the results. On the other hand, ANNs are often seen as “black boxes” because they produce output without providing insights into how they arrived at the result, making interpretation more challenging. ANFIS is designed to model systems with nonlinear relationships and complex behaviours. However, ANN struggles with systems that require explicit rule-based reasoning and interpretation.
Statistical measures for obtaining results and performance evaluation of developed models
The model’s performance is evaluated using the metrics and dataset. The performance efficacy of the developed ANFIS-based models for estimating crop yields of five main crops under diverse climatic conditions is tested.
The following performance measures were used to assess the developed models.
Root mean square error (RMSE)
The model’s performance is evaluated using the metrics and dataset. The standard deviation of the residuals is referred to as the RMSE (prediction errors). The RMSE displays how distant the data points are from the regression line by measuring how evenly the residuals should be distributed. To put it another way, it represents how densely the data is clustered around the line of best fit. The ideal model has an RMSE of zero. RMSE was calculated by using the following Eq. (4).
where n is the sample size, \(\widehat{{y}_{i}}\) is the actual target value, \({y}_{i}\) is the predicted target value. Here the unit of RMSE is same as observed and predicted values.
Normalized root mean square error (nRMSE)
It normalizes the Root Mean Square Error (RMSE) by dividing it by the range or mean of the observed data, making the error relative to the scale of the data. This normalization allows for easier comparison across different datasets or models, regardless of their scales. The nRMSE is expressed as a percentage or a decimal, with lower values indicating better model accuracy and is given by Eq. (5).
where, \({\overline{y} }_{i}\) is the mean of observed values.
Nash sutcliffe efficiency (NSE)
It evaluates the performance of a model. It might vary from -infinity to one. The efficiency of one (E = 1) shows that the predicted discharge fits the observed data precisely. When the observed mean outperforms the model or the residual variance (given by the numerator in the formula above) is larger than the data variance, the efficiency is zero (E = 0). (Described by the denominator). The more precise the model, the closer the model efficiency is to one. NSE was calculated using the Eq. (6).
where \(Qobs=\) Observed value of crop yield, \(Qsim\)= simulated value of crop yield and \(Qmean\) = mean value of crop yield
Correlation coefficient (r)
Correlation coefficients measure Eq. (7) the strength of a relationship between two variables. A correlation equal to one is the best model.
where x is the observed value of crop yield, and y is the simulated value of crop yield.
Coefficient of determination (R)2
It measures how the developed model has a strong relationship between observed and predicted. A correlation between variables equal to one is the best model. Coefficient of determination was calculated using Eq. (8).
where x is the observed value of crop yield, and y is the simulated value of crop yield.
Results and discussions
Generation of synthetic data
To achieve the desire accuracy of ANFIS based developed models, the synthetic data plays a vital role while developing those models. We employed the inverse CDF method to generate that data or samples from the distribution. After the analysis, it was possible to generate the 213, 66, 52, 215, and 155 samples for Kharif Rice, Kharif Sorghum, Kharif Maize, Kharif Groundnut, and Sugarcane, respectively. The Table 2 shows the number of datasets generated for each input parameter of each crop after performing normal distribution. Here, it is seen that number of datasets generated using inverse CDF method varies for each crop. This is due to the availability of data from the primary sources. The crop yield dataset has been obtained from 1987 to 2020, however, there were several years in those the crop yield were not reported in respect of sorghum and maize.
The time series plots of seasonal average datasets generated after performing the standard distributions are presented in Figs. 4, 5, 6, 7, 8, 9 for each crop viz Kharif rice, sorghum, maize, groundnut, and Sugarcane, respectively.
Moreover, a similar technique (normal distribution) has been adopted to generate the actual yield datasets for each crop. The time series plots of actual seasonal average crop yield datasets generated after performing the normal distributions are presented in Fig. 7 for each crop viz Kharif rice, sorghum, maize, groundnut, and Sugarcane, respectively.
Model development
The parameters of the models are determined by the elements that influence crop growth and production. The current study criteria are purely climatic in nature. Many researchers have studied parameters other than climate, but the other parameters are difficult to obtain or require lengthy studies. This study shows the creation of a crop yield model based purely on meteorological characteristics such as seasonal average rainfall (mm), minimum average temperature (°C), maximum average temperature (°C), average evaporation (mm/day), and average relative humidity (%).
From the above Fig. 10, it can be seen that five inputs were selected. All five inputs were divided into trapezoidal membership functions. Further, different combinations of input parameters were tried to obtain the best results. The models have trained at 1000 epochs (trials) to obtain the best results with acceptable accuracy. Figures 11 and 12 depict the optimized results obtained during the model training and validation, respectively.
Table 3 below shows the ANFIS parameters used in developing the crop yield prediction models.
Yield prediction using ANFIS
For yield prediction of crops under consideration, five climatic parameters, viz. average rainfall, minimum average temperature, maximum average temperature, average evaporation and average relative humidity, which adversely affect crop yield, were selected as input to the model. All the developed models and their performance are discussed in the subsequent sections. For the development of each model, 70% of the data was used for the calibration, whereas 30% of the total dataset was used for the validation.
Kharif rice yield
For developing the rice yield prediction model, 173 generated data records were used to calibrate the model, while 40 generated data records were used for validation. The model was trained and tested up to 1000 epochs (trials) to achieve maximum efficiency. Figure 13 depicts the time-series graph of predicted yield. It is seen that the predicted yield data is nearly following the observed yield series with acceptable accuracy. Figure 14 shows the plot of observed vs predicted yield of Kharif rice during calibration and validation. It is seen that the data fits the linear line with a correlation of 0.75 and 0.86 during calibration and validation, respectively.
Kharif sorghum yield
Total of 53 generated data records have been used for the calibration of the model, while 13 were used for validation purposes. The model was trained and tested up to 1000 epochs (trials) to achieve maximum efficiency. Figure 15 depicts the time-series graph of observed and predicted yield. Further, Fig. 16 shows the plot of observed vs predicted yield of Kharif Sorghum during calibration and validation. It is seen that the data fits the linear line with a correlation of 0.82 and 0.74 during calibration and validation, respectively.
Kharif maize yield
Out of 52 generated records, 42 data records have been used for the calibration of the model, while 10 generated data records were used for validation purposes. The model was trained and tested up to 1000 epochs (trials) to achieve maximum efficiency. Figure 17 depicts the time-series graph of observed and predicted yield. Further, Fig. 18 shows the plot of observed vs predicted yield of Kharif maize during calibration and validation. It is seen that the data fits the linear line with a correlation of 0.76 and 0.77 during calibration and validation, respectively.
Kharif groundnut yield
The groundnut yield prediction model was developed using a total of 215 records. A total 151 generated data records were used to calibrate the model, while 64 generated data records were used for validation. The model was trained and tested up to 1000 epochs (trials) to achieve maximum efficiency. Figure 19 depicts the time-series graph of observed and predicted yield. Further, Fig. 20 shows the plot of observed vs predicted yield of Kharif groundnut during calibration and validation. It is seen that the data fits the linear line with a correlation of 0.75 and 0.71 during calibration and validation, respectively.
Sugarcane yield
The sugarcane yield prediction model was created using 155 generated records. Out of that total, 124 generated data records were used to calibrate the model, while 31 were used for validation. The model was trained and tested up to 1000 epochs (trials) to achieve maximum efficiency. Figure 21 depicts the time-series graph of observed and predicted yield. Further, Fig. 22 shows the plot of observed vs predicted yield of Sugarcane during calibration and validation. It is seen that the data fits the linear line with a correlation of 0.94 and 0.91 during calibration and validation, respectively.
Performance analysis of developed models
Table 4 depicts the performance metrics of each developed model during the calibration. From the table, it is seen that among all the developed models, the sugarcane model has the highest accuracy in terms of correlation achieved between observed and predicted yield with RMSE, nRMSE, NSE, r, and R2 of 0.009,10.56, 0.75, 0.90, and 0.94 respectively. However, the rice model accuracy is lowest with RMSE, nRMSE, NSE, r, and R2 of 0.273, 16.60, 0.82, 0.81, and 0.78 respectively during the calibration of each model.
Table 5 depicts the performance metrics of each developed model during the validation. From the table, it is seen that among all the developed models, the sugarcane model accuracy is the highest with RMSE, nRMSE, NSE, r, and r2 of 0.013, 5.76, 0.88, 0.91, and 0.91 respectively, whereas the rice model accuracy is lowest with RMSE, nRMSE, NSE, r, and r2 of 0.163, 9.23, 0.76, 0.86, and 0.90 respectively during the validation of each model.
Discussion
The data requirement for the development of crop yield prediction models is crucial. To overcome this, the current study demonstrated the practice of data synthesis using available data. In this study we synthesized the climatic parameters and crop yield data using the inverse CDF method. However, previous studies5,12 based on secondary crop yield data misses the use of synthetic data.
The predictive ability of ANFIS was accessed using RMSE, nRMSE, NSE, r and R2 metrics. Considering the RMSE and, nRMSE values during training phase, the sugarcane (0.009 kg/ha, 10.56%) crop yield model headed the top position, followed by maize (0.264 kg/ha, 15.16%), groundnut (0.330 kg/ha, 18.76%), maize (0.264 kg/ha, 15.16%) and sorghum (0.374 kg/ha, 20.50%)36,37,38. Considering NSE, rice (0.82) yield model heads top, followed by groundnut (0.81), sugarcane (0.75), sorghum (0.71), and maize (0.60). Further, considering R2 here, the sugarcane (0.94) yield model remains at the top, followed by sorghum (0.84), maize (0.82), groundnut (0.79), and rice (0.78). Based on overall performance, ANFIS predicts sugarcane’s crop yield outstandingly with acceptable accuracy. In every metric, the sorghum crop yield model performance is low due to the lower water requirement of that particular crop and the low dependency on climatic parameters39,40,41. Further, Naderloo et al.29 and Meena and Singh30 have applied the ANN and ANFIS to predict the wheat, tomato, and cucumber crop yields, respectively, based on the different energy inputs. The results showed that ANFIS outperformed ANN in terms of accuracy. Their findings are closely aligned with the current study’s results. However, these studies have considered one crop or single input to predict the crop yield but the current study demonstrates the practical application of ANFIS for multi-crop yield prediction using multi-input parameters with an acceptable accuracy.
Conclusions
The current study mainly focuses on applying the Adaptive Neuro Fuzzy Interference System (ANFIS) to develop the yield prediction models for five major and essential crops cultivated in the Nashik region. The study illustrated the potential application of this technique for deriving pertinent knowledge from secondary data on climatic parameters already in existence. The ANFIS demonstrates the relationship between the yield of the crops under examination and the climatic factors, as well as the influence of these variables on the crop yield. Further Zhang et al.52,53 conducted study on winter wheat yield prediction of Henan province located in central China using advance satellite data and machine learning algorithms such as such as least absolute shrinkage and selection operator (LASSO), support vector regression (SVR), and random forest (RF). There results shows that the RF model demonstrated excellent performance in estimating winter wheat yield due to its ability to handle high-dimensional data features. Although the study uses robust techniques to estimate the winter wheat yield still their model’s performance ( R2 values 0.70, 0.63, and 0.68) is somehow low in compare to present study (R2 values 0.78, 0.84, 0.82, 0.79, 0.94). This show that the overall accuracy of predicting crop yield is using ANFIS is highest, and the rules formed in ANFIS accurately predicted each crop’s yield and emerged as a novel technique to predict seasonal multi-crop yield. Compared to the models developed for multi-crop yield prediction in earlier studies6,12,13,21,22,23, the current study has only considered the climatic variables to predict the yields of five major crops. But many other things, including the soil type, soil texture, amount of fertilizer used, irrigation, the amount of water needed, etc., also affect the yield of some crops. Prediction accuracy can be improved if those factors are considered. Even if there is room to increase prediction accuracy, our models produced satisfactory results when the climate factors are merely considered. This shows that the ANFIS have practical applicability in predicting the seasonal crop yield based on the climatic parameters. However, because the models were developed using climatic parameters, only values of climatic parameters, such as rainfall, minimum and maximum temperatures, relative humidity, and evaporation, must be considered when determining whether the model would produce the desired results.
Limitations and future scope
The current study’s findings offer more valuable insights into the practical applications of ANFIS from ult-crop yield prediction, while there are several limitations constrain for their direct applicability. Firstly, as the number of input variables increases, the fuzzy rule base grows exponentially, making the system complex and resulting in very slow data training. Secondly, ANFIS may overfit to the training data if data is not properly validated. Further, the simulation period of developed models is also a matter of concern. ANFIS’s learning ability to generate fuzzy if–then rules and optimize parameters automatically reduces the simulation time and gives the accurate output. Thirdly, the model developed using ANFIS has only considered the climatic variables to predict the yields of five essential crops. But many other things, including the soil type, soil texture, amount of fertilizer used, irrigation, water requirement of the crop, etc., also affect the yield of crops. Prediction accuracy can be improved if those factors are considered. Fourthly, current study is limed to the semi-arid region of state of Maharashtra, however to upscale it or replicate it for another agroclimatic zones the developed models requires more reliable data to predict the yield accurately. In near future, a similar study can be extended at district/state/national level for the prediction of multi-crop yield well before the starting of sowing which ensures to release of better agro advisory to the farmers to take their choice of crops and line departments to make availability of planting material, planning of irrigation water, supply chain, marketing of produce etc.
Data availability
The data and codes can be made available from the correspond author to the readers upon reasonable request.
References
World Bank. https://databank.worldbank.org/Databases/source/population-estimates-and-projectionsPopulation Projection Report (2023).
Department of Agriculture & Farmers Welfare, G. Final Estimates of Production of Major Crops released for the Year 2022–23. (2023).
FAO. http//www.fao.org/india/fao-in-india/india-at-a-glance/en (2021).
Pawlak, K. & Kołodziejczak, M. The role of agriculture in ensuring food security in developing countries: Considerations in the context of the problem of sustainable food production. Sustainability 12, 5488. https://doi.org/10.3390/su12135488 (2020).
Albahri, G. et al. Enhancing essential grains yield for sustainable food security and bio-safe agriculture through latest innovative approaches. Agronomy https://doi.org/10.3390/agronomy13071709 (2023).
Wang, K. & Zhang, Q. Intelligent systems and decision making for risk analysis and crisis response, communications in cybernetics, systems science and engineering & proceedings. In Crop yield risk assessment 709–716 (CRC Press, 2013).
Morugán-Coronado, A., Linares, C., Gómez-López, M., Faz, Á. & Zornoza, R. The impact of intercropping, tillage and fertilizer type on soil and crop yield in fruit orchards under Mediterranean conditions: A meta-analysis of field studies. Agric. Syst. 178, 102736. https://doi.org/10.1016/j.agsy.2019.102736 (2020).
Mahesh, P. & Soundrapandiyan, R. Yield prediction for crops by gradient-based algorithms. PLoS ONE https://doi.org/10.1371/journal.pone.0291928 (2024).
Feng, Y. et al. Effects of fallow tillage on winter wheat yield and predictions under different precipitation types. PeerJ https://doi.org/10.7717/peerj.12602 (2021).
Van Klompenburg, T., Kassahun, A. & Catal, C. Crop yield prediction using machine learning: A systematic literature review. Comput. Electron. Agric. 177, 105709. https://doi.org/10.1016/j.compag.2020.105709 (2020).
Paudel, D. et al. Machine learning for large-scale crop yield forecasting. Agric. Syst. https://doi.org/10.1016/j.agsy.2020.103016 (2020).
Joshi, A., Pradhan, B., Gite, S. & Chakraborty, S. Remote-sensing data and deep-learning techniques in crop mapping and yield prediction: A systematic review. Remote. Sens. 15, 2014. https://doi.org/10.3390/rs15082014 (2023).
Jiang, D., Yang, X., Clinton, N. & Wang, N. An artificial neural network model for estimating crop yields using remotely sensed information. Int. J. Remote Sens. 25, 1723–1732. https://doi.org/10.1080/0143116031000150068 (2004).
Elavarasan, D., Member, I. & Durairaj, P. Crop yield prediction using deep reinforcement learning model for sustainable agrarian applications. IEEE Access 8, 86886–86901. https://doi.org/10.1109/ACCESS.2020.2992480 (2020).
Mohapatra, S., Sahoo, D., Sahoo, A., Sharp, B. & Wen, L. Heterogeneous climate effect on crop yield and associated risks to water security in India. Int. J. Water Resour. Dev. 40, 345–378. https://doi.org/10.1080/07900627.2023.2244086 (2023).
Dhanya, P., Ramachandran, A., Jagannathan, R. & Geethalakshmi, V. Sowing window adjustments in rice (Oryza sativa), groundnut (Arachis hypogaea) and sugarcane (Saccharum species) as an adaptation strategy to changing climate scenario: A simulation study in Tamil Nadu. Indian J. Agron. 68(3), 260–265. https://doi.org/10.59797/ija.v68i3.2804 (2023).
Kumar, T., Barbosa, H., Madhu, S. & Rao, K. Studies on crop yields and their extreme value analysis over India. Sustainability https://doi.org/10.3390/su11174657 (2019).
Sedighkia, M. & Datta, B. Suitability assessment of cage fish farming location in reservoirs through neural networks-based remote sensing analysis. Remote. Sens. 16, 236. https://doi.org/10.3390/rs16020236 (2024).
Mehdizadeh, S., Mohammadi, B., Pham, Q. & Duan, Z. Development of boosted machine learning models for estimating daily reference evapotranspiration and comparison with empirical approaches. Water https://doi.org/10.3390/w13243489 (2021).
Gopi, A. et al. Weather impact on solar farm performance: A comparative analysis of machine learning techniques. Sustainability https://doi.org/10.3390/su15010439 (2022).
Liu, Y. et al. Improved potato AGB estimates based on UAV RGB and hyperspectral images. Comput. Electron. Agric. 214, 108260 (2023).
Liu, Y. et al. Crop canopy volume weighted by color parameters from UAV-based RGB imagery to estimate above-ground biomass of potatoes. Comput. Electr. Agricult. 227, 109678 (2024).
Zhang, H. et al. Predicting stomatal conductance of chili peppers using TPE-optimized LightGBM and SHAP feature analysis based on UAVs’ hyperspectral, thermal infrared imagery, and meteorological data. Comput. Electr. Agricult. 231, 110036 (2025).
Liu, Y. et al. Utilizing UAV-based hyperspectral remote sensing combined with various agronomic traits to monitor potato growth and estimate yield. Comput. Electr. Agricult. 231, 109984 (2025).
Bosma, R. H. et al. Assessing and modelling farmers’ decision-making on integrating aquaculture into agriculture in the Mekong Delta. NJAS - Wageningen J. Life Sci. 53, 281–300 (2006).
Reddy, D., & Kumar, M. Crop yield prediction using machine learning algorithm. in 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), 1466–1470. https://doi.org/10.1109/ICICCS51141.2021.9432236. (2021).
Bosma, R. et al. Assessing and modelling farmers’ decision-making on integrating aquaculture into agriculture in the Mekong Delta. NJAS: Wageningen J. Life Sci. 53, 281–300. https://doi.org/10.1016/S1573-5214(06)80010-2 (2006).
Shastry, A., Sanjay, H. & Hegde, M. A parameter based ANFIS model for crop yield prediction. IEEE Int. Adv. Comput. Conf. (IACC) 2015, 253–257. https://doi.org/10.1109/IADCC.2015.7154708 (2015).
Naderloo, L. et al. Application of ANFIS to predict crop yield based on different energy inputs. Measurement 45, 1406–1413 (2012).
Meena, M. & Singh, P. K. Crop yield forecasting using neural networks. In Swarm, Evolutionary, and Memetic Computing (eds Ketan, P. B., Suganthan, P. N. et al.) 319–331 (Springer, 2013).
Khoshnevisan, B., Rafiee, S., Omid, M., Mousazadeh, H. & Clark, S. Environmental impact assessment of tomato and cucumber cultivation in greenhouses using life cycle assessment and adaptive neuro-fuzzy inference system. J. Clean Prod. 73, 183–192 (2014).
Wang, A. X., Tran, C., Desai, N., Lobell, D. & Ermon, S. Deep transfer learning for crop yield prediction with remote sensing data. in Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies, COMPASS 2018 (Association for Computing Machinery, Inc, 2018). https://doi.org/10.1145/3209811.3212707
Khaki, S. & Wang, L. Crop yield prediction using deep neural networks. Front. Plant Sci. 10, 621 (2019).
Kim, N. et al. A comparison between major artificial intelligence models for crop yield prediction Case study of the midwestern United States 2006–2015. ISPRS Int. J. Geo.-Inform. https://doi.org/10.3390/ijgi8050240 (2019).
Maya Gopal, P. S. & Bhargavi, R. Performance evaluation of best feature subsets for crop yield prediction using machine learning algorithms. Appl. Artif. Intell. 33, 621–642 (2019).
Kuradusenge, M. et al. Crop yield prediction using machine learning models case of Irish Potato and Maize. Agriculture (Switzerland) https://doi.org/10.3390/agriculture13010225 (2023).
Ahmed, S. A software framework for predicting the maize yield using modified multi-layer perceptron. Sustainability (Switzerland) https://doi.org/10.3390/su15043017 (2023).
Jhajharia, K., Mathur, P., Jain, S. & Nijhawan, S. Crop yield prediction using machine learning and deep learning techniques. Procedia Comput Sci 218, 406–417 (2023).
Borse, K. S. & Agnihotri, P. Application of decision tree (M5Tree) algorithm for multicrop yield prediction of the semi-arid region of Maharashtra, India. J. Soft Comput. Civil Eng. https://doi.org/10.22115/scce.2024.383387.1601 (2024).
Esmaili, M. et al. Assessment of adaptive neuro-fuzzy inference system (ANFIS) to predict production and water productivity of lettuce in response to different light intensities and CO2 concentrations. Agric. Water Manag. 258, 107201 (2021).
Khosla, E., Dharavath, R. & Priya, R. Crop yield prediction using aggregated rainfall-based modular artificial neural networks and support vector regression. Environ. Dev. Sustain. 22, 5687–5708 (2020).
Kim, N. et al. An artificial intelligence approach to prediction of corn yields under extreme weather conditions using satellite and meteorological data. Appl. Sci. (Switzerland) https://doi.org/10.3390/app10113785 (2020).
Cedric, L. S. et al. Crops yield prediction based on machine learning models Case of West African countries. Smart Agricult. Technol. 2, 100049 (2022).
Satpathi, A. et al. Evaluating statistical and machine learning techniques for sugarcane yield forecasting in the Tarai region of North India. Comput. Electr. Agricult. 229, 109667. https://doi.org/10.1016/j.compag.2024.109667 (2025).
Satpathi, A. et al. Comparative analysis of statistical and machine learning techniques for rice yield forecasting for Chhattisgarh, India. Sustainability 15, 2786. https://doi.org/10.3390/su15032786 (2023).
Setiya, P., Nain, A. S. & Satpathi, A. Comparative analysis of SMLR, ANN, Elastic net and LASSO based models for rice crop yield prediction in Uttarakhand. Mausam 75(1), 191–196. https://doi.org/10.54302/mausam.v75i1.3576 (2024).
Durrant, A. et al. How might technology rise to the challenge of data sharing in agri-food?. Glob Food Sec 28, 100493 (2021).
Buragohain, M. & Mahanta, C. A novel approach for ANFIS modelling based on full factorial design. Appl. Soft Comput. 8(1), 609–625. https://doi.org/10.1016/j.asoc.2007.03.010 (2008).
Fahimifard, S. M., Salarpour, M., Sabouhi, M. & Shirzady, S. Application of ANFIS to agricultural economic variables forecasting case study: poultry retail price. J. Artif. Intell. 2(2), 65–72 (2009).
Matlab. Fuzzy, Anfis toolbox, The Mathworks Inc., Natick, MA (2016).
Muruganantham, P., Wibowo, S., Grandhi, S., Samrat, N. & Islam, N. A systematic literature review on crop yield prediction with deep learning and remote sensing. Remote. Sens. 14, 1990. https://doi.org/10.3390/rs14091990 (2022).
Zhang, H. et al. Winter wheat yield prediction using integrated Landsat 8 and Sentinel-2 vegetation index time-series data and machine learning algorithms. Comput. Electron. Agric. 213, 108250 (2023).
Zhang, H., Zhang, Y., Tong, F. & Li, M. A novel winter wheat yield prediction framework using fused spatial–temporal-spectral (STS) information from Sentinel-2 and Landsat 8 via improved Pix2Pix network. Comput. Electron. Agric. 231, 109982 (2025).
Acknowledgements
The authors wish to thank the State Water Data Centre (SWDC), Hydrology Project, Government of Maharashtra, Nashik, and the Chief Statistical Officer, Department of Agriculture, Government of Maharashtra, Pune for supplying meteorological and agricultural yield data respectively. The authors would also like to thank the Department of Civil Engineering, SVNIT, Surat for providing the necessary facilities and infrastructure to carry out this work.
Funding
This research has not received not any funding from external sources.
Author information
Authors and Affiliations
Contributions
Kalpesh Borse: Formulation, Data collection, Formal Investigation, Methodology, Writing Codes, Development of models, Validation and Testing, and writing of original draft; P.G. Agnihotri: Writing & editing original draft, Muralitharan Jothimani: Writing & editing original draft, Review and Editing the draft. Roomesh Kumar Jena: Writing & editing original draft, Review and Editing the draft.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Borse, K., Agnihotri, P.G., Jothimani, M. et al. Adaptive neuro fuzzy inference system based multicrop yield prediction in the semi arid region of India. Sci Rep 15, 18787 (2025). https://doi.org/10.1038/s41598-025-03334-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-03334-8