Correction
28 Mar 2025: The PLOS One Staff (2025) Correction: Detection and recognition of foreign objects in Pu-erh Sun-dried green tea using an improved YOLOv8 based on deep learning. PLOS ONE 20(3): e0321409. https://doi.org/10.1371/journal.pone.0321409 View correction
Figures
Abstract
The quality and safety of tea food production is of paramount importance. In traditional processing techniques, there is a risk of small foreign objects being mixed into Pu-erh sun-dried green tea, which directly affects the quality and safety of the food. To rapidly detect and accurately identify these small foreign objects in Pu-erh sun-dried green tea, this study proposes an improved YOLOv8 network model for foreign object detection. The method employs an MPDIoU optimized loss function to enhance target detection performance, thereby increasing the model’s precision in targeting. It incorporates the EfficientDet high-efficiency target detection network architecture module, which utilizes compound scale-centered anchor boxes and an adaptive feature pyramid to achieve efficient detection of targets of various sizes. The BiFormer bidirectional attention mechanism is introduced, allowing the model to consider both forward and backward dependencies in sequence data, significantly enhancing the model’s understanding of the context of targets in images. The model is further integrated with sliced auxiliary super-inference technology and YOLOv8, which subdivides the image and conducts in-depth analysis of local features, significantly improving the model’s recognition accuracy and robustness for small targets and multi-scale objects. Experimental results demonstrate that, compared to the original YOLOv8 model, the improved model has seen increases of 4.50% in Precision, 5.30% in Recall, 3.63% in mAP, and 4.9% in F1 score. When compared with the YOLOv7, YOLOv5, Faster-RCNN, and SSD network models, its accuracy has improved by 3.92%, 7.26%, 14.03%, and 11.30%, respectively. This research provides new technological means for the intelligent transformation of automated color sorters, foreign object detection equipment, and intelligent sorting systems in the high-quality production of Yunnan Pu-erh sun-dried green tea. It also provides strong technical support for the automation and intelligent development of the tea industry.
Citation: Wang H, Guo X, Zhang S, Li G, Zhao Q, Wang Z (2025) Detection and recognition of foreign objects in Pu-erh Sun-dried green tea using an improved YOLOv8 based on deep learning. PLoS ONE 20(1): e0312112. https://doi.org/10.1371/journal.pone.0312112
Editor: Worradorn Phairuang, Chiang Mai University, THAILAND
Received: July 21, 2024; Accepted: September 27, 2024; Published: January 8, 2025
Copyright: © 2025 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data presented in this study are available on request from the corresponding author.
Funding: This research was supported by the Scientific Research Fund Project of the Department of Education of Yunnan Province (No. 2022Y282) and the "Intelligent Storage of Puer Tea based on LoRa Technology" project (No. 2022Y231). The research funder (Gongming Li) had no role in the design, data collection and analysis, decision to publish, or preparation of the manuscript. However, they did make the page charge payment after the publication and participated in the project's investigation, methodology, and provision of resources. Our research is entirely based on scientific principles and independent academic judgment.
Competing interests: The authors have declared that no competing interests exist.
Instruction
Yunnan’s favorable geography and climate have given birth to Pu-erh tea, a jewel in China’s and the world’s tea culture [1,2]. As the cornerstone of Pu-erh tea’s distinctive flavor, the production process of sun-dried green tea is vital for quality control [3]. However, the modern industrial production of Pu-erh sun-dried green tea still primarily relies on traditional manual trash removal [4]. Despite the precision advantages of manual sorting, it is labor-intensive, inefficient, and prone to subjective biases, making it ill-suited for the high efficiency and standardization demands of modern mass production [5]. Currently, the tea industry’s level of mechanization and automation is at an early stage, lacking in efficient automated detection and sorting technologies, which have become a key bottleneck for technological progress and quality assurance in the Pu-erh sun-dried green tea industry [6].
With rising food safety standards and increasing consumer demand for high-quality tea, the application of automated and intelligent foreign object detection technology in the production of sun-dried green tea has become particularly urgent [7,8]. Existing detection technologies are significantly inadequate for small object detection, where small foreign objects in sun-dried green tea, due to their small size and inconspicuous features, coupled with the complexity of the production environment, pose challenges to the accuracy and robustness of foreign object detection [9,10]. Currently, there is no specialized recognition model for the detection of small foreign objects in sun-dried green tea, and this research gap urgently needs to be filled [11].
The rapid development of deep learning technology has brought new opportunities to the field of agricultural product quality detection [12]. The YOLO (You Only Look Once) series of algorithms have been widely applied in defect detection [13], quality grading [14], safety detection [15], and foreign object recognition [16] of agricultural products. Fan et al. [17] developed a deep learning architecture based on convolutional neural networks for online detection of apple defects. Deng et al. [18] proposed an automatic grading system for carrots based on computer vision and deep learning, using the lightweight deep learning model to achieve surface quality detection of carrots. Wu et al. [19] improved the YOLOv7 network, achieving an average detection accuracy of 91.12% for tea buds through RGB-D multimodal feature fusion. Wang et al. [20] proposed an improved TBC-YOLOv7 algorithm for grading detection of tea buds, enhancing the connection of global feature information by integrating a context-based Transformer module, with an average accuracy of 90%, an overall average accuracy of 87.5%, and a 3.4% improvement over the original YOLOv7. Li et al. [21] studied and proposed a lightweight improved YOLOv5s model for detecting dragon fruit in daytime and nighttime lighting environments. The model achieved an average accuracy rate of 97.80% and a frame rate of 139 FPS in the GPU environment, with a model size of only 2.5 MB. The model was successfully deployed on Android devices, achieving real-time dragon fruit detection. Bello et al. [22] studied a drone vision system based on Mask YOLOv7 for automatic detection and counting of cattle. The system achieved detection accuracies of 93% and 95% in controlled and uncontrolled environments, respectively, demonstrating its potential in automatic monitoring and reporting in animal husbandry. Meng et al. [23] proposed a model based on STCNN (Spatio-Temporal Convolutional Neural Networks) for unmanned pineapple harvesting. The model used the Shifted Window Transformer to fuse regional convolutional neural networks, achieving high-precision detection of pineapples. In complex field environments, the model achieved a detection accuracy of 92.54% and an average inference time of 0.163 seconds. Tahir et al. [24] proposed a model that integrates the Swin Transformer and CBAM (Convolutional Block Attention Module) modules, as well as Soft-NMS (Non-Maximum Suppression) technology, to improve the detection performance of small targets and occluded objects. On the VisDrone2019 dataset, the average detection accuracy increased by 4.8% compared to YOLOv8s. The aforementioned research results not only prove the application potential of the YOLO series algorithms in the field of agricultural product quality detection, but also demonstrate the continuously improvable characteristics of the series, with significant advantages in improving detection accuracy, efficiency, and automation levels [25,26]. Therefore, the YOLO series algorithms have tremendous potential and value in achieving automation and intelligence in agricultural product processing through the detection and identification of small foreign objects in Pu-erh raw green tea.
This study aims to develop an improved YOLOv8-based model for detecting and identifying small foreign objects in Pu-erh sun-dried green tea. It incorporates the MPDIoU loss function to enhance detection accuracy and target precision [27], integrates the EfficientDet module for efficient size-target detection [28], and introduces the BiFormer mechanism for enhanced context understanding in images [29]. By merging slice-assisted super reasoning with YOLOv8, the model achieves greater accuracy and robustness in recognizing small and multi-scale objects [30]. These advancements offer new technical solutions for the intelligent transformation of automated sorting machines, foreign object detectors, and smart sorting systems in the high-quality production of Yunnan Pu-erh tea, strongly supporting the industry’s move towards automation and intelligent development.
Materials and methods
Sample preparation and foreign object analysis
The processing workflow of Pu-erh sun-dried green tea, depicted in Fig 1A with its unique standards, is fraught with uncertainty. To ensure representativeness and scientific rigor, this study employed random sampling to capture the diversity of this traditional craft. Between March and April 2024, 75 Pu-erh sun-dried green tea samples were randomly collected from the tea horse ancient town tea trading hub in Simao District, Pu-erh City, Yunnan Province (22.47° N latitude, 100.58° E longitude), using the five-point sampling method [31] as shown in Fig 1B. All collected samples were mixed and homogenized to form the Pu-erh Sun-Dried Green Tea specimens required for this study’s image acquisition. During the processing of Pu-erh sun-dried green tea, various foreign objects may be encountered. As shown in Fig 1C, this study identified four common types of foreign objects: Grain, Melon seed shells, Small twigs, and Bamboo chips. These objects differ significantly in material, size, and color, and their presence affects the quality and safety of the tea.
To simulate potential foreign matter encountered in actual processing, this study mixed Pu-erh sun-dried Green Tea with four common contaminants. In accordance with stringent quality requirements, the blended samples were evenly divided into 200 groups, each representing a possible foreign matter scenario in the processing of Pu-erh sun-dried green tea. These samples will be utilized for subsequent image collection and photography, providing a rich foundation of image data for this research.
Image acquisition
In this study, to ensure high-quality capture of Pu-erh sun-dried green tea mixed with foreign matter images, a 20-megapixel MV-CE200-11UC economical area scan camera with a Sony IMX183 sensor was used, designed for capturing vibrant CMOS images. This industrial camera, with a maximum frame rate of 19.2fps, ensures the smoothness and real-time nature of the image acquisition process, meeting the study’s need for high-speed imaging. To further enhance image clarity and accuracy, a 35 mm focal length MVL-LF3528M-F industrial lens was equipped. With a low optical distortion rate of 0.40% and an F-Mount interface, this lens provides a robust guarantee for image quality. The low distortion rate minimizes edge distortion, ensuring authenticity and precision in the images.
Image preprocessing and dataset division
This study collected a total of 938 images of Pu-erh sun-dried green tea mixed with Grain, Melon seed shells, Small twigs, and Bamboo chips. To enhance the dataset’s quality and representativeness, 856 images were rigorously selected as the original foundational dataset. To address the potential negative impact of sample imbalance on the model’s generalization, data augmentation techniques as shown in Fig 2 [32] were employed, including brightness adjustment [33], contrast enhancement [34], and flipping [35], to expand the dataset of Pu-erh sun-dried green tea with foreign objects. Ultimately, a dataset of 5992 images was obtained, effectively preventing overfitting during training and significantly enhancing the model’s generalization, accuracy, and reliability in practical applications.
Labels were manually annotated using the Make Sense tool on the image dataset, with target bounding boxes centered on foreign objects, and label files created in txt and xml formats. As shown in Fig 3, the dataset labels include specific information such as the quantity of foreign object classifications, the dimensions and distribution of bounding boxes, and the position coordinates of each foreign object center relative to the entire image. Additionally, it covers the aspect ratio of the target foreign objects relative to the entire image dataset. Labels A, B, C, and D represent Grain, Melon seed shells, Small twigs, and Bamboo chips, respectively.
In this study, to ensure the accuracy and generalization of model evaluation, a 5-fold cross-validation method [36] as depicted in Fig 4 was utilized. Initially, the dataset was randomly divided into an 80% training set and a 20% test set. Subsequently, the training set was further divided into 5 parts, alternately serving as sub-training and sub-validation sets in a 4:1 ratio over 5 cycles. Ultimately, the model’s comprehensive performance metrics were derived from the average results of the 5 evaluations. This approach effectively prevents overfitting and ensures the stability of the assessment outcomes.
Improvements to the YOLOv8 network
YOLOv8 is a deep learning model designed for real-time object detection, inheriting the core strengths of the YOLO architecture, which enables fast and accurate detection and localization by processing the entire image in a single pass within a unified neural network framework [37]. YOLOv8 innovates by incorporating advanced feature extraction mechanisms and optimized training strategies, significantly enhancing the model’s performance in detection accuracy and speed [38]. This study improves upon the YOLOv8 framework by adopting the MPDIoU loss function to enhance detection performance and the model’s precision in targeting; integrating the EfficientDet module for efficient detection of objects of various sizes through composite scale-anchor boxes and adaptive feature pyramids; introducing the BiFormer bidirectional attention mechanism to allow the model to consider both forward and backward dependencies in sequence data, thereby significantly enhancing the model’s understanding of target context in images; and applying slice-assisted super reasoning technology in conjunction with YOLOv8 for algorithmic integration, which subdivides the image for in-depth local feature analysis, significantly improving the model’s recognition accuracy and robustness for small and multi-scale objects. The structure of the improved YOLOv8 model is shown in Fig 5.
Improvement of loss function.
The YOLOv8 model’s core lies in its Boundary Bounding Regression (BBR) module, responsible for precisely determining target locations [39]. The loss function within BBR is crucial, directly affecting the model’s detection performance. Initially, YOLOv8 employed the CIoU loss function, which significantly improved detection box convergence by considering overlap area, center point distance, and aspect ratio consistency [40]. However, the CIoU loss function’s effectiveness relies on high-quality experimental data in the training dataset. When the dataset includes low-quality data, the regression performance may be suboptimal [41]. Particularly in image datasets like Pu-erh sun-dried green tea, where small foreign objects are densely structured and bounding boxes have high overlap, the CIoU loss function may decline in convergence and accuracy.
To address this challenge, this study adopts the MPDIoU loss function [27], with structural parameters shown in Fig 6. The MPDIoU loss function is designed to overcome the limitations of traditional CIoU loss functions when dealing with highly overlapping bounding boxes. By optimizing the algorithm, it enhances the model’s adaptability to dense targets and complex scenes, significantly improving detection accuracy and robustness. The computation formula is presented as follows.
The definition of the MPDIoU loss is expressed as follows in Eq (4):
(4)
The loss function directly minimizes the distance between the top-left and bottom-right points of the predicted and actual bounding boxes. This means the MPDIoU loss function is comprehensive and simplifies calculations. It also streamlines the comparison of similarity between bounding boxes, making it suitable for both overlapping and non-overlapping box regression. This indicates that the MPDIoU loss function can effectively improve the training of bounding box regression for small foreign objects in Pu-erh sun-dried green tea, enhancing convergence speed and regression accuracy.
EfficientDet.
The EfficientDet algorithm is a one-stage object detection network that extends the compound scaling concept of EfficientNet, clearly formalizing the network architecture into a scalable framework. It balances the precision of object detection while considering the model’s detection speed [28].
The EfficientDet algorithm implementation, as shown in Fig 7, is divided into three stages: 1) The backbone EfficientNet extracts features using Neural Architecture Search and Compound Scaling Method; 2) The BiFPN integrates cross-layer features to enhance detection; 3) Classification and regression networks predict image categories and object locations. EfficientNet’s initial and final layers consist of a single Conv, forming standard convolutional layers processed with normalization and activation functions, with kernel sizes of 3x3 and 1x1, respectively. All intermediate layers are composed of repeated Mobile Inverted Bottleneck Convolution (MBConv) stacks. The MBConv module initially increases dimensions with a 1x1 convolution, followed by a Depthwise Separable Convolution (DSC) network with either 3x3 or 5x5 kernels, and a Squeeze and Excitation Network (SENet), then reduces dimensions with a final 1x1 convolution, as depicted in Fig 8.
BiFPN is the core of the EfficientDet network, introducing bidirectional cross-scale connections and utilizing the allocation of training weights through fast normalization, providing a new solution for effectively representing and processing multi-scale features. Although the traditional Feature Pyramid Network (FPN) [42] can effectively handle multi-scale features, there is an issue where low-level information is lost when it passes through multiple layers of the network to reach higher layers due to the long path, as shown in Fig 9A. To reduce the loss of low-level information, the Path Aggregation Network (PANet) [43] adds a bottom-up path to the top-down structure of FPN and captures feature information at all levels through adaptive functional pooling, as shown in Fig 9B.
Due to the extensive memory requirements for the complex network computations in PANet, which leads to prolonged model inference times, BiFPN has been improved upon the basis of PANet. It incorporates bidirectional cross-scale connections by adding an additional connection path between the original input and output nodes at the same level, and by stacking the same feature layers multiple times. This approach achieves a balance between detection accuracy and computational speed, as shown in the network structure depicted in Fig 10.
BiFormer.
The Self-Attention mechanism, renowned for capturing long-range dependencies, has become a pivotal technique in the field of object detection [44]. However, it also introduces significant memory consumption and high computational costs. To address these challenges, researchers have incorporated various manually designed sparse attention patterns to reduce model complexity [45,46]. While these methods alleviate computational pressure, they still fall short in fully capturing long-range relationships. Zhu et al. [29] proposed an innovative dual-routing attention mechanism, Bi-level Routing Attention (BRA), which employs a two-tier routing strategy to more effectively address the issue of capturing long-range dependencies. Based on the BRA core component, this study adopts the BifFormer general visual network architecture. The central design concept of BifFormer is to preliminarily filter out irrelevant key-value pairs at a coarse regional level using the BRA module, followed by applying a refined token-to-token attention mechanism in the remaining routing areas. This strategy not only endows the model with adaptability but also significantly enhances computational efficiency and substantially reduces memory usage. Consequently, BifFormer inherits the advantages of the Transformer model while achieving more flexible content awareness and computational resource allocation, as shown in the overall structure of the BifFormer model in Fig 11.
Assuming an input two-dimensional feature map X∈ℝH×W×C, it is divided into S×S non-overlapping regions, each region containing a set of feature vectors. By reshaping X, it is possible to perform region partitioning for
, and after linear mapping, obtain the tensor Q, K, V, with the linear projection described as follows:
(5)
Where, Wq、Wk、Wv∈ℝC×C represents the projection weights for the tensor Q, K, V.
Based on a directed graph, routing from region to region is implemented. At the regional level, by calculating the average of Q and K, the regional Qr、 is obtained, and by performing matrix multiplication on Qr、Kr, an adjacency matrix for the region-to-region affinity graph is constructed:
(6)
Each entry in Ar represents the semantic relevance between two regions. By retaining the top k most relevant connections for each region and pruning the affinity graph, the routing index matrix Ir is obtained:
(7)
Where, the i-th row of Ir contains the k most relevant indices for the i-th region.
The token-to-token attention uses the region-to-region routing index matrix Ir to apply fine-grained token-to-token attention within the selected k routing regions. Since these routing regions may be scattered throughout the entire feature map, the key Kg and value Vg tensors are gathered:
(8)
The attention mechanism is applied to the collected key-value pairs to compute the output O.
(9)
Where, LCE(V) is a local context enhancement term that effectively enhances the representational capability of local contextual features while ensuring that computational efficiency is not compromised. This design provides more precise and rich feature representation for subsequent foreign object detection tasks.
Slice-assisted super inference algorithm integration.
The Slice-assisted super inference involves slicing the input detection image and applying object detection to each slice [47], as illustrated in the steps shown in Fig 12.
Comparative experimental design
To evaluate the improved YOLOv8 model’s ability to detect small foreign objects in Pu-erh sun-dried green tea, this study compared it with the original YOLOv8, Faster-RCNN, CornerNet, and SSD models through extensive experiments. For scientific and reproducible results, a consistent experimental platform and software version were used for training all models. Specific configuration parameters are shown in Table 1.
To thoroughly assess the improved YOLOv8 model’s effectiveness in detecting small foreign objects in Pu-erh sun-dried green tea, this study utilized an advanced binary confusion matrix analysis. This method allows for the precise calculation of key classification metrics: Precision [48] and Recall [49]. Additionally, to evaluate model performance comprehensively, the F1 score [50], Average Precision (AP) [51], and mean Average Precision (mAP) [52] were included as metrics. Together, these metrics form an integrated evaluation system to measure the model’s classification accuracy and reliability in object detection tasks. The following formulas provide a quantitative method for in-depth analysis of model performance.
(11)
(12)
(13)
(14)
(15)
Where, TP denotes the count of true positives where the model correctly identifies actual foreign objects; FP denotes the count of false positives where the model erroneously classifies non-foreign objects as foreign; FN denotes the count of false negatives where the model fails to detect actual foreign objects.
Results and analysis
Ablation study
Based on the original YOLOv8 framework, this study implemented a series of enhancements to improve its performance in detecting small foreign objects in Pu-erh sun-dried green tea. An exhaustive statistical analysis was conducted to systematically verify the effectiveness of these improvements, precisely assessing the individual and combined contributions to model performance. The ablation study comparative data are presented in Table 2.
As demonstrated in Table 2, this study introduced the MPDIoU optimized loss function to enhance the model’s sensitivity to target regions, achieving more precise bounding box positioning. Key metrics of Precision, Recall, and mAP improved by 0.83%, 0.78%, and 0.69% respectively over the original YOLOv8. Integration of EfficientDet into the YOLOv8 framework led to model lightweighting, with Parameters and Gradients metrics decreasing by 4.06%, and a 0.6G reduction in model size (GFLOPs). Metrics showed varying improvements, with Precision, Recall, and mAP increasing by 1.86%, 1.01%, and 0.81% respectively, Precision showing the greatest gain. Incorporating BiFormer, with its bidirectional attention mechanism, allowed for the consideration of both forward and backward dependencies in the input data, capturing context more comprehensively. This enhanced the model’s ability to identify small foreign objects in Pu-erh sun-dried green tea, with only a 0.6G increase in model size, but a significant boost in Precision, Recall, and mAP by 2.16%, 0.56%, and 1.32% respectively. Overall, the improved model saw a substantial increase in performance metrics, with a mere 0.7G increase in size, and Precision, Recall, and mAP rose significantly by 4.50%, 5.30%, and 3.63% over the original YOLOv8. Collectively, the YOLOv8-MEB model demonstrated superior performance in detecting small foreign objects in Pu-erh sun-dried green tea from the perspective of smart agricultural devices, offering robust support and new insights for foreign object detection technology in the field of smart agriculture.
This study utilized Gradient-weighted Class Activation Mapping (Grad-CAM), an advanced visualization tool for gaining deeper insights into the model’s decision-making process. During the ablation study, gradients were calculated for specific layers of each model to identify regions most influential in image classification decisions. Grad-CAM generates heatmaps that visually reveal key areas in image classification results, highlighted to provide intuitive explanations for model predictions. As shown in Fig 13, the Grad-CAM heatmaps produced by the improved YOLOv8-MEB model matched the actual foreign object areas more closely, indicating the effectiveness of the model’s enhancements in recognizing small foreign objects.
Loss function analysis
The loss function, a critical metric for model performance, primarily measures the discrepancy between predicted and observed values. As shown in Fig 14, the improved YOLOv8-MEB model exhibited a rapid decrease in loss during the initial training phase, indicating good adaptability to the training data. Around the 110th training epoch, the rate of loss reduction began to decelerate, signaling the model’s convergence. After 300 epochs, the loss curve stabilized, indicating a stable training state without evident overfitting or significant fluctuations, further confirming the model’s stability and robustness.
Model performance analysis
As depicted in Fig 15, the YOLOv8-MEB model presented in this study achieved performance metrics of 95.36% Precision, 95.65% Recall, and an F1 score of 95.50%. Compared to the original YOLOv8 model, the YOLOv8-MEB model saw a 4.50% increase in Precision, crucial for the detection of small foreign objects in Pu-erh sun-dried green tea due to its direct impact on the accuracy of detection results. Additionally, a 5.30% enhancement in Recall indicates the improved model’s increased effectiveness in identifying all present foreign objects. The F1 score also rose by 4.9%, reflecting a better balance between Precision and Recall. In summary, the YOLOv8-MEB model’s exceptional ability for efficient and accurate identification of small foreign object targets in Pu-erh sun-dried green tea was significant for quality control of agricultural products like Pu-erh, effectively identifying and detecting small foreign objects to ensure product purity and safety.
Comparative analysis of different models
This study conducted a comprehensive evaluation of the improved YOLOv8-MEB model, focusing on its accuracy and mean average precision in the detection tasks of four different foreign objects in Pu-erh raw green tea, and conducted an in-depth comparative analysis with advanced network models such as YOLOv8, YOLOv7, YOLOv5, Faster-RCNN, and SSD. Table 3 demonstrates the significant advantages of the YOLOv8-MEB model in foreign object detection.
The improved YOLOv8-MEB model achieved an increase in AP values of 3.60%, 3.89%, 7.17%, 13.97%, and 11.30% respectively for Grain foreign object identification compared to the YOLOv8, YOLOv7, YOLOv5, Faster-RCNN, and SSD network models. For Melon seed shells foreign objects, the AP values increased by 3.64%, 4.04%, 7.23%, 14.10%, and 11.40% respectively. For Small twigs foreign object identification, the AP values increased by 3.78%, 4.06%, 7.49%, 14.24%, and 11.44% respectively. For Bamboo chips foreign object identification, the AP values increased by 3.50%, 3.69%, 7.15%, 13.81%, and 11.06% respectively, with the final mAP increasing by 3.63%, 3.92%, 7.26%, 14.03%, and 11.30% respectively. This indicates that the YOLOv8-MEB network model is more accurate in predicting the location of foreign object targets, performs better in the detection of small target foreign objects in Pu-erh raw green tea across different categories, and has higher objectivity and reliability.
To thoroughly verify the accuracy and efficiency of the optimized YOLOv8-MEB model in identifying common foreign impurities in the grading task of Pu-erh sun-dried green tea, this study selected a real dataset from the Tea Plant and Processing Science Observational Experiment Station of the College of Tea Science, Yunnan Agricultural University (25.13° N latitude, 102.75° E longitude) for detailed validation. The dataset includes 50 representative image samples, fully reflecting detection challenges in real scenarios. Under various lighting conditions, the YOLOv8-MEB model demonstrated excellent detection performance in both single and multiple target scenarios. This study conducted a detailed performance comparison with comparative models, as intuitively presented in Fig 16, showing the comparative results. In Fig 16, labels A, B, C, and D represent Grain, Melon seed shells, Small twigs, and Bamboo chips, respectively.
The experimental results indicate that under diverse lighting conditions such as constant light sources, ordinary light, dim light, unilateral light, indoor lighting, and weak light sources, there is a significant difference in the foreign object detection capabilities of various target detection models when the number of targets is low compared to when the number of targets is high. Through in-depth confidence analysis, this study found that the YOLOv8-MEB model demonstrated exceptional confidence levels in both sparse and dense scenes of foreign object targets. The model’s confidence was significantly higher in all tested conditions compared to other models such as YOLOv8, YOLOv7, YOLOv5, SSD, and Faster-RCNN, which exhibited relatively lower confidence under the same conditions. The YOLOv8-MEB model’s ability to achieve such remarkable high-confidence performance is attributed to its advanced algorithmic architecture and optimization strategies, which not only show significant advantages in the localization accuracy of small foreign objects but also effectively reduce the probability of false positives and repeated detections. Especially in environments with poor lighting and a large number of targets, the performance of the YOLOv8-MEB model is particularly outstanding, accurately identifying all multi-target foreign objects, including complete and partial ones, while maintaining high confidence. In stark contrast, the other comparative models involved in this study, although capable of foreign object detection to some extent, showed a noticeable decline in confidence accuracy and were accompanied by issues of target target recognition omissions. This indicates that in actual application scenarios with complex lighting conditions and variable target numbers, these models lack robustness and cannot meet the requirements for high-precision foreign object detection. A comprehensive assessment of the experimental results shows that the YOLOv8-MEB model in this study demonstrates excellent robustness and reliability in foreign object detection tasks, and its performance is significantly better than other models tested. This provides strong technical support and theoretical basis for efficient and accurate small foreign object detection in Pu-erh raw green tea under various lighting conditions in the future.
Discussion
This study is dedicated to the detection and identification of foreign objects in the production process of Pu-erh raw green tea, ensuring the quality and hygiene of Pu-erh tea as a food product. To address the issue of rapid detection and precise identification of small foreign objects in Pu-erh raw green tea, this study proposes an improved YOLOv8 network model for foreign object detection. In the field of agriculture, the YOLO algorithm has shown broad potential, especially in defect detection, quality grading, safety detection, and foreign object identification of agricultural products. Li et al. [21] proposed a lightweight improved YOLOv5s model that achieved a precision rate of 97.80% in detecting dragon fruit under daytime and nighttime lighting conditions. In comparison, the improved YOLOv8 model in this study achieved Precision, Recall, mAP, and F1 scores of 95.36%, 95.65%, 97.78%, and 95.50%, respectively, in the task of detecting small foreign objects in Pu-erh raw green tea, demonstrating adaptability in small foreign object identification and detection, with significant improvements in accuracy and evaluation metrics compared to the YOLOv5s model’s performance in dragon fruit detection tasks.
In terms of hardware deployment, adaptability and maintenance costs are key considerations in actual production environments. The model proposed in this study has considered lightweight and computational efficiency in its design to adapt to different hardware platforms. Compared to the original model, the improved YOLOv8 network model in this study has reduced Parameters and Gradients indicators by 1.10%, while significantly increasing Precision, Recall, mAP, and F1 by 4.50%, 5.30%, 3.63%, and 4.9%, respectively. This indicates that the model has optimized the use of computational resources while maintaining high accuracy. Compared to the Mask-YOLOv7 model based on spatio-temporal convolutional neural networks proposed by Bello et al. [22], which achieved detection accuracy rates of 93% and 95% in controlled and uncontrolled environments, respectively, the model in this study has improved the accuracy rate to 97.78% in the detection of small foreign objects in Pu-erh raw green tea, demonstrating higher detection accuracy and robustness.
Furthermore, the scalability and real-time application of the model are also a focus of this study. The improved YOLOv8 model in this study has been optimized in terms of computational efficiency and resource requirements to support large-scale industrial use and real-time applications. Compared to the unmanned pineapple harvesting model proposed by Meng et al. [23], which achieved a detection accuracy rate of only 92.54% in identifying pineapples, the model in this study has a higher accuracy rate in the detection of small foreign objects in Pu-erh raw green tea, which is crucial for real-time detection and sorting systems. Although the size of the improved model in this study has increased by 7.87% compared to the original model, this increase is mainly attributed to the increase in model complexity to improve detection accuracy and robustness. This study has achieved a breakthrough in foreign object detection technology for Pu-erh raw green tea, laying the foundation for the automation and intelligence of the tea industry. By improving detection accuracy and speed, this technology has optimized the traditional tea production process, reduced human errors, increased efficiency, and ensured product quality and safety. The innovativeness of the improved YOLOv8 network model provides an efficient solution for foreign object detection in the agricultural and food processing industries, which is expected to enhance industry standards and consumer confidence. The potential of the technology in real-time monitoring and data processing is enormous and will support food safety regulation.
In future research, the aim is to construct an efficient foreign object detection and identification screening device for Pu-erh raw green tea. The research will focus on the lightweight of the software algorithm model and the optimization of the hardware framework design. On the software side, model compression techniques such as pruning, quantization, and knowledge distillation will be used to reduce the model size while maintaining detection performance. At the same time, the algorithm will be optimized for cross-platform compatibility and real-time performance to ensure flexibility in operation on different hardware and low-latency detection in production environments. In the design and deployment of the hardware framework, modular and integrated strategies will be adopted to improve the maintainability and scalability of the equipment, ensuring its adaptability to diverse production environments. Through highly integrated component design, the aim is to achieve smoothness and efficiency in the operation process, while reducing spatial occupancy and improving energy efficiency. In addition, intuitive user interfaces and automated calibration mechanisms will be developed to enhance the flexibility and operability of the system. These designs will allow operators to easily configure and monitor the system, reducing manual intervention and improving the system’s adaptability and accuracy. At the same time, integrated remote monitoring and control functions will be added to allow monitoring and control of the equipment from different locations, further enhancing the management efficiency of the equipment.
Conclusion
This study proposes an improved YOLOv8 network model-based foreign object detection method aimed at enhancing the rapid detection and precise identification of small foreign objects in the production process of Pu-erh raw green tea, ensuring the quality and hygiene safety of Pu-erh tea. The improved YOLOv8 model in this study employs an MPDIoU optimized loss function to improve target detection accuracy, integrates the EfficientDet architecture to enhance detection efficiency for targets of different sizes, and utilizes the BiFormer bidirectional attention mechanism to enhance the understanding of image context. Additionally, the introduction of slice-assisted super reasoning technology further improves the model’s recognition accuracy and robustness for small targets and multi-scale foreign objects, ensuring the quality and hygiene safety of Pu-erh raw green tea.
Compared to the original model, the improved YOLOv8 model in this study has significantly increased Precision, Recall, mAP, and F1 by 4.50%, 5.30%, 3.63%, and 4.9%, respectively, while also reducing the Parameters and Gradients indicator parameters by 1.10%. Moreover, compared to the original model, YOLOv7, YOLOv5, Faster-RCNN, and SSD network models, the AP values for Grain foreign object identification have increased by 3.60%, 3.89%, 7.17%, 13.97%, and 11.30%, respectively. For Melon seed shells foreign objects, the AP values have increased by 3.64%, 4.04%, 7.23%, 14.10%, and 11.40%, respectively. For Small twigs foreign object identification, the AP values have increased by 3.78%, 4.06%, 7.49%, 14.24%, and 11.44%, respectively. For Bamboo chips foreign object identification, the AP values have increased by 3.50%, 3.69%, 7.15%, 13.81%, and 11.06%, respectively, with the final mAP increasing by 3.63%, 3.92%, 7.26%, 14.03%, and 11.30%, respectively. The improved model in this study possesses efficient foreign object recognition capabilities, providing technical support for intelligent detection technology on the foreign object sorting line of Pu-erh raw green tea, and also laying the foundation for the smartization of the tea industry and the enhancement of tea quality.
Acknowledgments
The author would like to thank the anonymous reviewers for their careful review of the paper and their kind suggestions to improve the overall quality of the manuscript.
References
- 1. Yiqing Z, Protection B. Overview of Chinese Tea Culture Landscape. International Council on Monuments and Sites. 2021; 11; 31.
- 2. He Y. Terroir Exploration: A Case Study of Jingmai Mountain Tea Culture Tourism. 2023. http://hdl.handle.net/10579/25646.
- 3. Dehui Z. Research Article Application of Control System Based on S7-300 in the Producting of Pu-erh Tea. Advance Journal of Food Science and Technology. 2014; 6(12); 1335–1338.
- 4. Ho K, Haufe T, Ferruzzi M, Neilson A. Production and polyphenolic composition of tea. Nutrition Today. 2018; 53(6); 268–278.
- 5.
Heiss M. L, Heiss R. J. The Tea Enthusiast’s Handbook: A Guide to the World’s Best Teas. Ten Speed Press 2012.
- 6.
Huang Y, Zhou H, Xiong T, Zhao Y. The research and development on Pu-erh tea fermentation automatic control technology and key equipment. In 2010 3rd International Conference on Computer Science and Information Technology (Vol. 3, pp. 555–559). IEEE. 2010.
- 7. Hicks A. Current status and future development of global tea production and tea products. Au Jt. 2009; 12(4); 251–264.
- 8. Yang Z, Ma W, Lu J, Tian Z, Peng K. The Application Status and Trends of Machine Vision in Tea Production. Applied Sciences. 2023; 13(19); 10744.
- 9. Zhao Z, Zheng P, Xu S, Wu X. Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems. 2019; 30(11); 3212–3232. pmid:30703038
- 10. Sun X, Xu C, Li J, Xie D, Gong Z, Fu W, et al. Nondestructive detection of insect foreign bodies in finished tea products using THz‐TDS combination of baseline correction and variable selection algorithms. Journal of Food Process Engineering. 2023; 46(2); e14224.
- 11. Wei Y, Wen Y, Huang X, Ma P, Wang L, Pan Y, et al. The dawn of intelligent technologies in tea industry. Trends in Food Science & Technology. 2024; 104337.
- 12. Kamilaris A, Prenafeta-Boldú F. X. Deep learning in agriculture: A survey. Computers and electronics in agriculture. 2018; 147; 70–90.
- 13. Patne G, Ghonge P. Automization of agriculture products defect detection and grading using image processing system. International Journal of Computer Science Engineering and Information Technology Research. 2018; 8(3); 25–32.
- 14. Narendra V. G, Hareesha K. S. Quality inspection and grading of agricultural and food products by computer vision-a review. International journal of computer applications. 2010; 2(1); 43–65.
- 15. Cooper R. L, Lamb IV J. C, Barlow S. M, Bentley K, Brady A. M, Doerrer N. G, et al. A tiered approach to life stages testing for agricultural chemical safety assessment. Critical Reviews in Toxicology. 2006; 36(1); 69–98. pmid:16708695
- 16. Graves M.; Smith A.; Batchelor B. Approaches to foreign body detection in foods. Trends in Food Science & Technology 1998, 9(1), 21–27.
- 17. Fan S.; Li J.; Zhang Y.; Tian X.; Wang Q.; He X, et al. Online detection of defective apples using computer vision system combined with deep learning methods. Journal of Food Engineering 2020, 286, 110102.
- 18. Deng L.; Li J.; Han Z. Online defect detection and automatic grading of carrots using computer vision combined with deep learning methods. Lwt. 2021, 149, 111832.
- 19. Wu Y, Chen J, Wu S, Li H, He L, Zhao R, et al. An improved YOLOv7 network using RGB-D multi-modal feature fusion for tea shoots detection. Computers and Electronics in Agriculture. 2024; 216; 108541.
- 20. Wang S, Wu D, Zheng X. TBC-YOLOv7: a refined YOLOv7-based algorithm for tea bud grading detection. Frontiers in Plant Science. 2023; 14; 1223410. pmid:37662161
- 21. Li H, Gu Z, He D, Wang X, Huang J, Mo Y, et al. A lightweight improved YOLOv5s model and its deployment for detecting pitaya fruits in daytime and nighttime light-supplement environments. Computers and Electronics in Agriculture. 2024; 220; 108914.
- 22.
Bello R. W, Oladipo M. A. Mask YOLOv7-based drone vision system for automated cattle detection and counting. Bello R.-W., & Oladipo MA. (2024). Mask YOLOv7-Based Drone Vision System for Automated Cattle Detection and Counting. Artificial Intelligence and Applications. https://doi. org/10.47852/bonviewAIA42021603.
- 23. Meng F, Li J, Zhang Y, Qi S, Tang Y. Transforming unmanned pineapple picking with spatio-temporal convolutional neural networks. Computers and Electronics in Agriculture. 2023; 214; 108298.
- 24. Tahir N. U. A, Long Z, Zhang Z, Asim M, ELAffendi M. PVswin-YOLOv8s: UAV-based pedestrian and vehicle detection for traffic management in smart cities using improved YOLOv8. Drones. 2024; 8(3); 84.
- 25. Liu F, Wang S, Pang S, Han Z. Detection and recognition of tea buds by integrating deep learning and image-processing algorithm. Journal of Food Measurement and Characterization. 2024; 1–18.
- 26. Chang C, Huang C, Chen H. Design and Implementation of Artificial Intelligence of Things for Tea (Camellia sinensis L.) Grown in a Plant Factory. Agronomy. 2022; 12(10); 2384.
- 27.
Siliang M, Yong MPDIoU: A loss for efficient and accurate bounding box regression. arXiv preprint arXiv:2307.07662, 2023.
- 28. Tan M, Pang R, Le Q. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781–10790). 2020.
- 29. Zhu L, Wang X, Ke Z, Zhang W, Lau R. Biformer: Vision transformer with bi-level routing attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10323–10333). 2023.
- 30. Zheng Y, Zhan Y, Huang X, Ji G. YOLOv5s FMG: An improved small target detection algorithm based on YOLOv5 in low visibility. IEEE Access 2023.
- 31. Yuan J, Xiang S, Xia J, Yu L, Liu S. Evaluation of sampling methods for scatterplots. IEEE Transactions on Visualization and Computer Graphics. 2020; 27(2); 1720–1730.
- 32. Shukla K. N, Potnis A, Dwivedy P. A review on image enhancement techniques. Int. J. Eng. Appl. Comput. Sci 2017; 2(07); 232–235.
- 33. Sugimoto Y, Imaizumi S. An extension of reversible image enhancement processing for saturation and brightness contrast. Journal of Imaging. 2022; 8(2); 27. pmid:35200729
- 34. Maurya L, Lohchab V, Mahapatra P, Abonyi J. Contrast and brightness balance in image enhancement using Cuckoo Search-optimized image fusion. Journal of King Saud University-Computer and Information Sciences. 2022; 34(9); 7247–7258.
- 35. Khanna M, Singh L, Thawkar S, Goyal M. PlaNet: a robust deep convolutional neural network model for plant leaves disease recognition. Multimedia Tools and Applications. 2024; 83(2); 4465–4517.
- 36. Feng C, Yu Z, Kingi U, Baig M. Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data. Journal of manufacturing systems. 2005; 24(2); 93–107.
- 37. Monika M, Rajender U, Tamizhselvi A, Rumale A. S. REAL-TIME OBJECT DETECTION IN VIDEOS USING DEEP LEARNING MODELS. ICTACT Journal on Image & Video Processing 2023, 14(2).
- 38.
Peng X, Huang C. An Improved Real-Time Multiple Object Tracking Algorithm Based on YOLOv8. In Proceedings of the 2nd International Conference on Signal Processing, Computer Networks and Communications (pp. 180–184). 2023.
- 39.
He Y, Zhu C, Wang J, Savvides M, Zhang X. Bounding box regression with uncertainty for accurate object detection. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 2888–2897). 2019.
- 40. Wang X, Song J. ICIoU: Improved loss based on complete intersection over union for bounding box regression. IEEE Access 2021; 9; 105686–105695.
- 41.
Yu J, Jiang Y, Wang Z, Cao Z, Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM international conference on Multimedia 2016, (pp. 516–520).
- 42.
Lin T. Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition 2017, (pp. 2117–2125).
- 43.
Liu S, Qi L, Qin H, Shi J, Jia J. Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition 2018, (pp. 8759–8768).
- 44. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A. N, et al. Attention is all you need. Advances in neural information processing systems. 2017; 30.
- 45.
Arar M, Shamir A, Bermano A. H. Learned queries for efficient local attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, (pp. 10841–10852).
- 46.
Chen Z, Zhu Y, Zhao C, Hu G, Zeng W, Wang, J, et al. Dpt: Deformable patch-based transformer for visual recognition. In Proceedings of the 29th ACM International Conference on Multimedia 2021, (pp. 2899–2907).
- 47.
Akyon F. C, Altinuc S. O; Temizel A. Slicing aided hyper inference and fine-tuning for small object detection. In 2022 IEEE International Conference on Image Processing (ICIP) 2022, (pp. 966–970). IEEE.
- 48. Streiner D. L, Norman G. R. “Precision” and “accuracy”: two terms that are neither. Journal of clinical epidemiology. 2006; 59(4); 327–330. pmid:16549250
- 49. Gillund G, Shiffrin R. M1984. A retrieval model for both recognition and recall. Psychological review. 91(1); 1. pmid:6571421
- 50. Yacouby R, Axman D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In Proceedings of the first workshop on evaluation and comparison of NLP systems. 2020, (pp. 79–91).
- 51.
He K, Lu Y, Sclaroff S. Local descriptors optimized for average precision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, (pp. 596–605).
- 52.
Henderson P, Ferrari V. End-to-end training of object class detectors for mean average precision. In Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20–24, 2016, Revised Selected Papers, Part V 13 2017. (pp. 198–213). Springer International Publishing.