Introduction

China holds abundant supplies of deep and ultra-deep carbonate oil and gas resources, with total reserves amounting to 67.1 billion tons of oil equivalent, representing 34% of the nation’s total oil and gas resources. These resources have become a key focus for expanding reserves and boosting production, marking a critical growth area for the country’s oil and gas industry. Ultra-deep fault-controlled fractured-vuggy reservoirs are a newly discovered type of ultra-deep reservoir, buried at depths exceeding 7000 m1,2,3. Porosity is a critical parameter for evaluating reservoir reserves and optimizing development strategies. Predicting porosity in complex reservoirs remains challenging due to factors such as deep burial, weak seismic signals, and significant heterogeneity. Fractured-vuggy reservoirs, such as those in the Shunbei oilfield, exhibit complexities similar to those observed in the Asmari reservoir, where fracture density plays a crucial role in oil production and reservoir management4. Current prediction methods often fall short of capturing the intricate characteristics of these reservoirs.

Porosity prediction methods can be classified into three main types based on the technical approach: physical model-based methods, seismic inversion, and data-driven approaches5. Physical model-based methods are the most traditional, grounded in established physical principles. These methods rely on well log data and rock physics models to estimate porosity based on rock and fluid properties. Early methods, such as the Archie formula, used well log parameters like resistivity and acoustic velocity to calculate porosity6, while the Kozeny-Carman formula linked permeability to porosity, and the Wyllie-Rose formula determined porosity using acoustic velocity7. With advances in experimental techniques, laboratory-based rock physics analysis has become an essential tool, allowing more accurate characterization of porosity in complex reservoirs by measuring core samples and applying rock physics theories like the Gassmann equation. To address geological complexity, statistical rock physics analysis has been introduced, analyzing statistical relationships in well log and core data to build porosity models, making it particularly suitable for heterogeneous reservoirs. While physical model-based methods are highly reliable, their ability to characterize reservoir complexity is somewhat limited, as they heavily depend on laboratory data and prior knowledge8,9,10.

Seismic inversion is a core technology in porosity prediction, allowing for the estimation of reservoir properties by converting seismic-derived elastic attributes (such as impedance) into porosity using rock physics models or statistical relationships derived from laboratory or well log data11,12,13. In a study of the Asmari formation in Iran’s Hendijan oil field, Abdolahi et al. demonstrated the effectiveness of combining seismic inversion with well log data to improve porosity prediction accuracy and reservoir quality delineation14. However, in complex reservoirs, especially deep carbonate reservoirs, it remains challenging to establish a reliable relationship between inverted seismic attributes and porosity, as the geological complexity is often too great for traditional methods to handle effectively.

Compared to physical model-based and seismic inversion methods, data-driven approaches do not rely on explicit physical laws or geological assumptions. These methods include traditional geostatistics and modern artificial intelligence (AI) techniques. Geostatistical methods rely on statistical properties and spatial correlations in data to build models, often integrating both hard and soft data, such as seismic and well log data. Kriging, an early interpolation technique, uses well log data to perform spatial prediction between wells by establishing a variogram for precise interpolation15. Cokriging goes a step further by jointly interpolating seismic and well log data, enhancing prediction accuracy16. These methods are largely deterministic, while stochastic inversion methods, like Sequential Gaussian Simulation (SGS) and Multiple Gaussian Simulation (PGS)17,18,19,provide multiple realizations, making them particularly adaptable for modeling heterogeneous reservoirs. In the case of the Savak formation, SGS was used to generate various possible reservoir attribute distributions, assisting in predicting hydrocarbon distribution and quality18.

In addition to geostatistical methods, AI techniques have seen extensive application in reservoir characterization, particularly in integrating seismic data (pre-stack, post-stack, and inverted attributes) with well log data, significantly enhancing reservoir property description20,21,22. For instance, Support Vector Regression (SVR) combined with seismic attributes can effectively estimate porosity and water saturation23, while Artificial Neural Networks (ANN) capture complex, nonlinear relationships between seismic attributes and porosity or permeability. Studies indicate that combining ANN with the Gamma test to select optimal seismic and well log attributes significantly enhances the prediction accuracy of permeability, porosity, and intrinsic attenuation in carbonate aquifers and sandstone-shale reservoirs24,25. A study in the Upper Assam Basin of India exemplified this approach, successfully applying absolute and relative acoustic impedance (AAI and RAI) to stratigraphically delineate reservoirs and estimate porosity, shale volume, and water saturation, improving overall reservoir property estimation26. Likewise, XGBoost, integrated with PySpark, has achieved favorable results in porosity prediction for complex carbonate reservoirs, leveraging large datasets to capture nonlinear relationships and offering new solutions for complex reservoir characterization and porosity prediction27.

However, existing deep learning methods face significant challenges when applied to ultra-deep carbonate reservoirs. Due to the complex pore-fracture structure and high heterogeneity of these reservoirs, seismic signals are often weak, with high noise levels, making it difficult for traditional methods to extract clear reservoir property features. This study addresses these challenges by proposing a porosity prediction model based on seismic inversion and deep learning. By combining high-resolution well log data with 3D seismic data, we selected critical seismic attributes (such as wave impedance, coherence, and phase tracking) and constructed a training dataset. The model leverages deep learning to extract multi-scale subsurface information and incorporates phase-controlled constraints to capture the nonlinear features of porosity. Bayesian optimization further enhances the model’s performance, achieving high-precision porosity predictions in complex reservoirs.

This paper is structured as follows: “Geological characteristics of ultra-deep fault-controlled fractured-vuggy reservoirs” details the geological characteristics of the study area, “Methods and data set” describes the methods and dataset, “Results and discussions” presents the results and discusses prediction accuracy, and “Conclusions” offers the conclusions and implications for future research.

Geological characteristics of ultra-deep fault-controlled fractured-vuggy reservoirs

The Shunbei No.4 strike-slip fault zone is a typical ultra-deep, fault-controlled, fracture- vuggy oil and gas reservoir located in the Shunbei oil field of the Tarim Basin, China. Structurally, it lies in the Shuntuoguole low uplift, a saddle-shaped belt between the Tabei and Tazhong uplifts and the Awati and Manjiaer depressions (Fig. 1). The reservoir is controlled by large strike-slip faults and exhibits a tabular structure with a thickness reaching up to 1041 m. It is primarily developed in the middle and lower Ordovician Yijianfang and Yingshan Formations, with burial depths exceeding 7200 m. The main rock types are sparry and micrite limestone, with an average porosity of approximately 3% and permeability ranging from 0.706 to 4760 × 10− 3 millidarcies28. The matrix generally has negligible storage and permeability.

The internal geological structure of the fault zone, as shown in Fig. 1, is complex and influenced by differential activity within the large strike-slip fault zone. It features segmented reservoir formation, vertical penetration, and intermittent distributions of tensional-shear and compressional-shear zones. The fractured-vuggy reservoir mainly resulted from multiple episodes of strike-slip fault activity, with minor fluid reworking in later stages. The upper part of the reservoir is sealed by dense carbonate rocks on the sides and a thick mudstone caprock on top, forming a fault-controlled fracture-cave trap. The reservoir space mainly consists of cavities formed by fractures, pores between breccias, and structural fractures. The irregular distribution of these cavities, holes, and fractures leads to significant local variations in seismic wave impedance, leading to scattering and interference of seismic waves during propagation. This results in complex and ambiguous signals in the seismic data, a low signal-to-noise ratio, and significant challenges in porosity prediction.

Fig. 1
figure 1

(modified from Ref.28).

Location of the study area and geological characteristics of the reservoir: (a) geographical location of the Tarim Basin; (b) structural location of the Shunbei No. 4 fault zone; (c) stratigraphic histogram of the Shunbei oilfield; (d) development model of ultra-deep fractured-vuggy reservoirs.

Methods and data set

Data collection and preparation

In this study, we collected logging porosity data from six wells in the northern part of the study area (see Fig. 1), and combined these with 3D seismic data from geostatistical inversion to create a comprehensive dataset.

Logging data and seismic attribute selection

Logging data: Logging porosity data were collected from six wells with a measurement accuracy of 0.125 m. This high accuracy was crucial given the relatively lower accuracy (25 m) of seismic sampling in the ultra-deep (> 7000 m) target area. Logging interpretation porosity was used as the target value for seismic porosity prediction due to its superior precision.

Seismic attributes: we selected seismic attributes based on the specific geological characteristics of the reservoir, prioritizing those that comprehensively capture its features. The attribute selection process begins with geological feature analysis, identifying key structures like faults, cracks, and cavities. Initial filtering of seismic attributes was informed by previous studies and seismic interpretation experience, resulting in the selection of four categories of attributes that effectively represent these geological features (Fig. 2). Subsequently, PCA and correlation assessments were conducted for further refinement, with the overall process illustrated in Fig. 3.

Fig. 2
figure 2

Selected seismic properties: (a) Impedance, (b) Ant tracking, (c) Coherence, (d) Tensor. These attributes are used in the deep learning model for porosity prediction.

Impedance: Reflects changes in porosity within the formation as the product of medium density and wave velocity. It exhibits a clear mirror-symmetry relationship with the porosity curve, due to variations primarily caused by pores or fractures.

Ant tracking: Provides geological information about small and medium-sized fractures, indicating cracks or continuous holes in matrix rocks. It captures minor changes in subsurface rock structures and helps identify fractures of different scales that are related to porosity.

Coherence: Highlights geological features such as faults and fault-controlled fractured-vuggy bodies by analyzing the similarity of adjacent seismic trace signals. This attribute helps identify lateral heterogeneity within strata.

Tensor: Reflect chaotic reflection structures in seismic data, converting them into two- or three-dimensional image attributes. These attributes describe the discontinuous reflection characteristics caused by faults and small fractured-vuggy bodies.

Fig. 3
figure 3

Workflow for selecting seismic attributes. This diagram outlines the steps involved in selecting seismic attributes.

Data processing and integration

To align seismic attribute data with logging porosity data:

Time-depth conversion: Seismic attribute volumes were converted from time to depth.

Depth matching and correction: Depth matching and correction of logging porosity data were performed, using logging interpretation porosity as the target value for seismic predictions. Seismic attribute data were densified and resampled to ensure alignment with logging data at the same depths, generating a training dataset that included both seismic attributes and porosity values.

Data cleaning

We processed the data for missing values and outliers. If few values were missing or had little impact on the analysis, rows or columns containing them were deleted directly; if many values were missing, interpolation methods (such as the mean, median, or mode) were used to fill them in. The outliers were either deleted or replaced as appropriate.

Standardization

The dataset was standardized to ensure consistent statistical properties and eliminate dimensional differences between features, enhancing the model’s convergence rate.

Model construction and application

A phase-constrained deep learning model was developed. Parameters such as the number of neurons and network layers were optimized using a Bayesian optimization algorithm. The optimal model was then applied to the seismic volume of the entire region to calculate the porosity distribution.

Deep learning model establishment and optimization

Model structure

A deep learning model is built with two components: feature extraction and regression. The feature extraction component identifies characteristics related to porosity from multi-dimensional seismic attributes, while the regression component maps these extracted features to porosity values. With multi-dimensional seismic data as input, the model processes the data through these two stages to predict the porosity, denoted as \(\hat{y}\).

$$\hat y = {\text{fc}}2({\text{fc}}1({\text{flatten}}({\text{pool}}1({\text{relu1}}({\text{conv}}1({\text{seismic\_data}}))))))$$
(1)

The convolution layer, conv1, slides the convolution kernel over the input seismic data to extract local features.

$${\text{conv}}1[i] = {\text{ReLu}}\left( {\sum\limits_{k = 0}^{K - 1} {{\text{seismic\_data}}[i + k]} \cdot {\text{weight}}[k] + {\text{bias}}} \right)$$
(2)
$$\:ReLU\left(x\right)={\text{max}}(0,x)$$
(3)

where \(\:seismic\_data\left[i\right]\) is the i-th element of the input data, k denotes the sliding position of the convolution kernel, \(\:weight\left[k\right]\) denotes the weight of the convolution kernel, and \(\:bias\) denotes a bias term.

Pooling operation, pool1, takes the maximum or average over each sliding window. The mathematical expression for the pooling operation is as follows:

$${\text{pool}}1[i] = \max ({\text{conv}}1[i],{\text{conv}}1[i + 1])$$
(4)

The fully connected stratigraphy is expressed as follows:

$${\text{fc}}1[i] = \sigma \left( {\sum\limits_{j = 0}^{M - 1} {{\text{flattened}}[j] \cdot {\text{weight}}[j,i] + {\text{bias}}[i]} } \right)$$
(5)
$${\text{fc}}2[i] = \sum\limits_{j = 0}^{N - 1} {{\text{fc}}1[j] \cdot {\text{weight}}[j,i] + {\text{bias}}[i]}$$
(6)

where fc1[i] represents the output of the ith neuron of the fully connected layer, \(\:\sigma\:\) denotes the activation function, \(\:flattened\left[j\right]\) denotes the output of the input layer or the upper layer of neurons, the value of the jth element after flattening processing, \(\:weight[j,i]\) denotes the weight connecting the jth input to the ith neuron, and \(\:bias\left[i\right]\) denotes the bias term representing the ith neuron.)

Model training

The deep learning model is trained by minimizing the loss function, which is set to \(L_{\text{total}}\). The root mean square error \(L_{\text{regression}}\) between the predicted porosity value \(\hat{y}\)and the actual porosity value y is calculated to determine the model’s prediction error. Additionally, a seismic facies loss term \({L_{{{nearest\_neighbor}}}}\) is introduced to constrain the predicted porosity values, ensuring they remain consistent with the seismic facies data. A parameter \(\lambda\) is introduced to adjust the relative weight between the two loss terms. The model parameters are updated via backpropagation using the gradient descent algorithm to minimize the total loss value \({L_{{\text{total}}}}\). This approach improves the model’s performance, particularly in predicting high-porosity values, by maintaining geological consistency during the training process.

$${L_{{\text{total}}}} = {L_{{\text{regression}}}} + \lambda {L_{{{nearest\_neighbor}}}}$$
(7)
$$L_{{\text{regression}}} = \frac{1}{N}\sum\limits_{i = 1}^N {{{({{\hat y}_i} - {y_i})}^2}}$$
(8)
$${L_{nearest\_neighbor}} = \frac{1}{N}\sum\limits_{i = 1}^N {d({{\hat y}_i},{\text{center}}({\Theta _i}))}$$
(9)

where \(L_{regression}\) denotes the loss function for the regression task, \(L_{nearest neighbor}\)denotes the seismic phase loss term, \({\hat{Y}}_{i}\) refers to the predicted value of porosity for the ith sample, \({\Theta _i}\) refers to the corresponding true seismic facies label, center (\({\Theta _i}\)) denotes the central value of the porosity corresponding to seismic phase \({\Theta _i}\), and \(d({\hat Y_i},{\text{center}}({\Theta _i}))\)denotes the distance measure between the predicted value \({\hat{Y}}_{i}\) and the porosity center value \({\Theta _i}\) of the true seismic facies.

Model optimization

To optimize the deep learning model, we use a Bayesian optimization algorithm to automatically search for the optimal hyperparameter combination. This process adjusts key elements such as the network structure (number of layers, neurons), training iterations, and loss function parameters. The objective is to minimize prediction error by dynamically selecting the best parameters based on the model’s performance in previous iterations.

Bayesian optimization explores the hyperparameter space and refines the model by updating the parameters after each iteration29. The final set of parameters is the one that achieves the lowest prediction error, ensuring optimal model performance.

$${x^*} = \arg \min f(x),\,\,x \in X$$
(10)

where x represents a set of hyperparameter combinations, X represents the parameter combination space, and \(x^{\ast}\) denotes a set of x in X, such that \(f(x)\) achieves the optimal solution.

SHAP value analysis

The SHAP method, based on Shapley values from game theory, explains model predictions by quantifying the impact of each feature30. SHAP analysis assesses how each feature influences the prediction while accounting for uncertainty in feature values. It calculates the SHAP value for each feature by iteratively adding and removing features, thereby providing a clear understanding of how each input affects the model’s output.

Validation method

The validation method in this study involved using several blind wells that were excluded from the model training process. The deep learning model, developed using seismic attributes and logging data from other wells, was applied to predict porosity in these blind wells. The predicted porosity values were then compared to the actual logging porosity. The prediction accuracy was assessed using MSE, Coefficient of Determination (R2), and Mean Absolute Error (MAE), calculated as follows:

$$MSE = \frac{1}{n}\sum\limits_{i = 1}^n {{{({{\hat y}_i} - {y_i})}^2}}$$
(11)
$${R^2} = 1 - \frac{{{{\sum\nolimits_{i = 1}^n {({y_i} - {{\hat y}_i})} }^2}}}{{{{\sum\nolimits_{i = 1}^n {({y_i} - \bar y)} }^2}}}$$
(12)
$$MAE = \frac{1}{n}\sum\limits_{i = 1}^n {\left| {{{\hat y}_i} - {y_i}} \right|}$$
(13)

where \({\hat{y}}_{i}\)represents the predicted porosity, \({{y}}_{i}\)represents the actual porosity, and n represents the number of samples. A lower MSE, MAE and a higher R2 indicates higher prediction accuracy. This method ensures the model’s reliability in predicting porosity under different geological conditions, especially in ultra-deep fault-controlled fractured-vuggy reservoirs.

Results and discussions

Results of data analysis

Based on previous studies and seismic interpretation experience, we identified four categories of seismic attributes that effectively characterize the reservoir body of this oilfield. From these categories, we selected eight key seismic attributes: TENSOR_ANT, XGNLTD, PHASE_ANT, EIGENVALUE_2ND, EIGENVALUE_3RD, LAMBDA_ANT, LAMBDA_AAGFIL, and IMP. Specifically, IMP falls under the Impedance category; XGNLTD and LAMBDA_AAGFIL are categorized as Coherence attributes; TENSOR_ANT, PHASE_ANT, and LAMBDA_ANT are part of the Ant Tracking category; and EIGENVALUE_2ND and EIGENVALUE_3RD belong to the Tensor category. By collecting seismic and logging porosity data and performing well-to-seismic matching, we obtained a dataset comprising 10,261 data points. After preprocessing steps such as standardization and imputing missing values, the final training dataset was prepared. The statistical properties of the dataset is summarized in Table 1. Porosity values range from 0.003 to 30.594%, with an average of 0.854%, a median of 0.752%, and a standard deviation of 0.912%.

Table 1 Basic statistics used for model training.

Certain porosity values are identified as outliers, which can be attributed to the geological features of the reservoir. Ultra-deep fractured-vuggy reservoirs often feature complex fracture networks and karst structures, including caves and vugs, leading to localized areas with very high porosity. In contrast, regions with denser rock or fewer fractures may show unusually low porosity. These natural variations in rock formations are typical of such reservoirs and account for the extreme porosity values observed.

To further explore the relationship between seismic attributes and logging porosity, we calculated the Pearson correlation coefficient, which measures the linear relationship between two variables. The Pearson correlation coefficient ranges from − 1 to 1, where 1 indicates a perfect positive linear relationship, − 1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship. A value below 0.2 suggests a weak correlation31. As illustrated in Fig. 4, the correlation between individual seismic attributes and porosity ranges from 0.0049 to 0.26, indicating weak linear relationships. These results underscore the need for applying deep learning techniques to model the nonlinear relationships between multiple seismic attributes and porosity.

Fig. 4
figure 4

Thermal map of Pearson correlation coefficients between seismic attributes and porosity. The map shows weak linear correlations between individual seismic attributes and porosity, highlighting the need for deep learning to capture the nonlinear relationships between seismic data and porosity.

Next, we utilized PCA to filter the seismic attributes. This method calculates the loadings for each feature, which indicate how much each feature contributes to the principal components. By examining the absolute values of these loadings, we identified the features with the largest values as important attributes, ensuring that they effectively capture the variation in the data.

The loading values for the various seismic attributes are displayed in Fig. 5. In our selection process, we considered both the completeness of the seismic attribute categories and the contribution of features highlighted by PCA. Ultimately, we chose LAMBDA_ANT, PHASE_ANT, IMP, EIGENVALUE_3RD and LAMBDA_AAGFIL as the predictive feature attributes.

Fig. 5
figure 5

PCA Loadings of Seismic Attributes.Feature loadings for each seismic attribute across the first five principal components (PC1–PC5). Higher absolute loading values indicate greater contribution of the attribute to the component.

Results of evaluation and validation

Model performance

In this study, 80% of the dataset was used to train the deep learning model, while the remaining 20% was reserved for testing its predictive performance. The R2 and MSE for the training set were 0.973 and 0.08, respectively, while for the test set, these values were 0.899 and 0.16. These results suggest that the model fits the training data well and also shows good generalization capability on unseen data.

Figure 6 compares the porosity prediction results from both the unconstrained and facies-constrained deep learning models. The unconstrained model performs well for low-porosity values (< 5%), but its accuracy diminishes for higher porosities due to an imbalance in the training data, with a predominance of low-porosity samples.

Fig. 6
figure 6

Comparison of actual and predicted porosity values from the model: (a) Without the facies constraint; (b) With the facies constraint.

Traditional deep learning models often face this issue, as they tend to focus on the more abundant low-porosity samples, resulting in poor performance for the less common high-porosity values. This limitation arises as the model primarily learns from the abundant low-porosity data, leading to underfitting in the high-porosity regions.

To address this issue, the facies-constrained deep learning model was developed. By incorporating seismic facies as constraints during training, the model is guided to better capture the underlying geological features, resulting in more accurate predictions across the full porosity range, especially in high-porosity regions. As Fig. 6 shows, the facies-constrained model improves accuracy by balancing predictions for both low- and high-porosity samples, ensuring more robust performance.

Verification and comparison of predictive accuracy

To validate the model’s performance, we selected two blind logging wells that were excluded from the model training process. The predicted porosity values from the deep learning model were compared to the actual logging porosity values from these blind wells. To quantify the model’s accuracy, we calculated the MSE between predicted and true porosity values, as shown in Table 2.

Table 2 Comparison of the MSE, MAE, and R2 for porosity prediction between the two methods across the two wells.

Table 2 shows that the MSE of the facies-constrained deep learning model is 76% lower than that of the traditional wave impedance inversion method. This significant improvement underscores the effectiveness of deep learning for predicting porosity, particularly in highly heterogeneous, ultra-deep reservoirs. Additionally, R2 values indicate that the deep learning model explains a greater proportion of variance in the actual porosity values compared to the traditional method. Furthermore, the MAE demonstrates that predictions from the deep learning model are closer to the actual values, confirming its superior predictive performance across all wells assessed.

Comparing prediction results from the two blind wells clearly reveals that the deep learning model outperforms the traditional method in both low-porosity and high-porosity regions (Fig. 7). Notably, in high-porosity regions, the deep learning model provides significantly more accurate predictions, demonstrating its superior ability to handle limited high-porosity samples.

Fig. 7
figure 7

Comparison between the predicted and actual porosity from two blind logging wells. The deep learning model’s predictions closely match the actual logging data, demonstrating its effectiveness in predicting porosity in ultra-deep fractured-vuggy reservoirs.

SHAP analysis and feature importance

By analyzing the SHAP values, we can quantify the percentage contribution of each seismic attribute to the model’s predictions. As illustrated in Fig. 8, LAMBDA_AAGFIL and PHASE_ANT are the most influential attributes in porosity prediction. LAMBDA_AAGFIL reflects the reservoir body structure, while PHASE_ANT captures the fracture structure. Both are critical in ultra-deep fractured-vuggy reservoirs. These findings validate the model’s ability to capture key geological features and provide insights into the relative importance of seismic attributes.

In terms of practical applications, understanding the influence of specific seismic attributes allows for more targeted data collection and model refinement in future reservoir prediction efforts. Future applications could leverage these key attributes to refine porosity prediction models, particularly in complex reservoirs with significant structural heterogeneity. For instance, integrating additional geological data, such as lithofacies or fault intensity, could enhance model accuracy.

Fig. 8
figure 8

Influence of seismic attributes on porosity. This analysis prioritizes the seismic attributes that are most critical for accurate porosity predictions.

Porosity distribution

The final porosity distribution map of the study area highlights high-porosity regions, represented by red and orange areas predominantly concentrated in the middle and upper sections (Fig. 9). These regions correspond to areas around Wells 1 and 2, the most productive wells in the study area. The high-porosity areas align with key geological features, such as bedding structures and fracture zones, validating the model’s ability to capture geological influences on porosity distribution.

Fig. 9
figure 9

Profile of the predicted porosity distribution in the study area. The red and orange regions represent areas of high porosity, while the green and blue regions represent areas of low-porosity. (Image was generated with software Petrel 2018.2 https://www.software.slb.com/products/petrel).

Notably, ultra-deep fault-controlled fractured-vuggy reservoirs exhibit a porosity distribution distinct from that of stratified reservoirs. In these reservoirs, porosity is often governed by the distribution of faults and fractures rather than by layered geological structures. This unique, highly heterogeneous, and non-layered distribution pattern is well-represented in the model, with high-porosity regions concentrated along fault and fracture zones, while areas with minimal fault influence show lower porosity. This distribution aligns with the known geological characteristics of ultra-deep fractured-vuggy reservoirs, validating the model’s ability to reflect the complex geological structures present in the study area.

The smooth color gradient and continuous porosity distribution suggest that the deep learning model effectively captures trends and variations in formation characteristics and porosity, providing highly reliable prediction results. The identification of high-porosity areas, often potential reservoir sites, underscores the model’s practical utility for exploration and development planning.

Limitations and future research

Although the deep learning model shows significant improvements over traditional methods, it still has some limitations. First, the model relies solely on seismic attributes, excluding other geological factors such as fluid saturation, mineral composition, and others. These factors, which can influence reservoir properties, are not directly accounted for in the model, potentially limiting its ability to capture small-scale variations.

Additionally, while the facies-constrained model improves predictions in high-porosity regions, imbalances in the training data may still affect accuracy. High-porosity samples are often underrepresented, which might lead to less accurate predictions in these critical areas.

Future work could focus on incorporating additional data, such as lithology and fluid distribution, to improve the model’s predictive capabilities. Hybrid models that combine deep learning with traditional geophysical methods could also enhance performance across different reservoir types. Additionally, developing better techniques for balancing training data could further improve the model’s accuracy in rare but important high-porosity zones.

Conclusions

In this study, we developed a deep learning-based porosity prediction method that integrates multiple key seismic attributes and incorporates seismic facies constraints to address the challenges of predicting porosity in ultra-deep, fault-controlled fractured-vuggy reservoirs. The method achieved a 76% reduction in MSE during blind well predictions, significantly enhancing predictive accuracy. The key findings are as follows:

  1. 1.

    We established a nonlinear relationship model between seismic attributes and porosity using deep learning, effectively leveraging detailed subsurface structural information within the seismic data. This capability allows for more accurate capture of porosity distribution under complex geological conditions.

  2. 2.

    The introduction of seismic facies constraints significantly improved the model’s performance in predicting high-porosity areas, addressing the shortcomings of traditional methods in these scenarios.

  3. 3.

    SHAP analysis revealed the critical role of seismic attributes related to fracture-cave and fault structures, particularly identifying LAMBDA_AAGFIL and PHASE_ANT as the most influential attributes for porosity prediction. This underscores the importance of these geological features in reservoir characterization.

This approach holds significant potential for application in other complex reservoir environments, especially those with high heterogeneity and intricate geological structures. Future research could explore the integration of additional geological constraints and the development of hybrid models that combine deep learning with traditional geophysical methods, further enhancing prediction accuracy and adaptability.