Introduction

Motivation

Creep constitutive models are typically derived by human experts through the meticulous examination of creep data. One of the first creep constitutive equations was developed by Andrade in 1910 to predict primary creep deformation in soft metals1,2. In the 1950’s a succession of creep-rupture equations called Time-Temperature Parameters were developed to predict the creep-rupture of materials3,4,5. The first and most popular, the Larson-Miller parameter, (developed in 1952) is still employed today3,6. Constitutive equations such as MPC Omega, Theta projection, Wilshire, Kachanov-Rabotnov, Sine-Hyperbolic (Sinh), Wilshire-Cano-Stewart, TTC, etc. have evolved from these early efforts to predict the minimum-creep-strain-rate, creep rupture, and deformation of materials7,8,9,10,11,12. In contrast to these human-derived equations, machine learning (ML) has emerged as an alternative approach to predicting creep behavior. Machine learning empowers us to discover patterns and trends in data that would be impossible to discern through human intuition alone. Typically, ML is applied as a “black-box”, where the functional relationship between variables is not transparent but a hidden feature. This study seeks to leverage “glass-box” machine learning (ML) to discover creep equations that offer human interpretable insight into strengthening mechanisms and how these features correlate to creep resistance.

Fig. 1
figure 1

The schematic of human-supervised machine learning. Tasks are split between a human and machine. Certain tasks are human-exclusive (e.g. ML Parameters) while others are machine-exclusive (e.g. Run ML Program). Several tasks (e.g. Data Analysis, Data Splitting, etc.) are performed in combination with humans and machines and are typically guided by humans.

Machine learning

Machine Learning (ML) is a computational process that leverages data to achieve a specific outcome and improve performance13. Machine learning has had a cross-cutting impact on many scientific disciplines because it enables the discovery of complex relationships in data14,15. Machine learning has been applied to predict mechanical16,17,18and physical properties19,20, for classification in microstructural analysis21,22, and to accelerate alloy development23,24. Machine learning methods do NOT consider physics readily; rather, ML minimizes an objective function to generate a desired output25. Physics may be injected into the objective function to constrain the ML algorithm towards a physics-based solution; however, the physics should be applied and how it impacts the objective function is the domain of a human.

To enable true scientific discovery, human-supervised ML is recommended as illustrated in Fig. 1. In human-supervised ML, a human adds their knowledge throughout the calibration process to achieve higher correlations. For example, a human can inspect data and identify outliers, incomplete data, and false positives based on their experience. Humans have scientific knowledge and can add physical equations as constraints or training data to enable physics-informed ML algorithms. During training and testing, a human decides how to split the data to ensure effective correlation. Most importantly, humans are the final authority on the ML solution. A human must interpret the ML outputs and determine whether those outputs are sound. Machine learning in general, is often referred to as a “black-box” since an equation is NOT the output; rather, ML generates a trained and highly complex “model” that is outside the realm of human understanding26,27,28. Interpretability comes from parametrically exercising the “model” over a range of inputs, plotting the output, and examining the results. Since some of the relationships in the “model” are highly non-linear and involve multiple inputs, interpretability is not always achieved.

Machine learning in Creep

Machine learning in creep is chiefly applied to predict creep properties and performance. There is a plethora of instances where black-box ML is employed for creep-rupture (CR) prediction29,30,31,32,33,34,35,36,37,38. Algorithms such as Back Propagation Neural Networks29, Divide and Conquer Self-Adaptive30, Bayesian Neural Networks32, Artificial Neural Networks33,34, Soft constrained algorithms36, xBoost37,38, and others have been applied to the creep problem. These algorithms have predicted rupture out to 105hours31. Notably, the prediction accuracy is high in the range of 90–97.5%29,30,31,32,33,34,35,36. Most ML models have been employed to correlate stress, temperature, and rupture29,30,31,32,34; however, a few have included chemistry, heat treatment, and/or product form33,35,36.

Several authors have laid out the path to improve ML applications in creep. Juan et al. outlined the lack of a unified database and the lack of physical and chemical significance of ML models as a drawback to accelerated material discovery39. Shin et al. proposed to correlate the ML models with the important alloy features to determine the overall creep response40. Peng et al. pointed out the effects of uncertainty sources in ML models and the resulting scatter in the creep response of high-temperature alloys27. Ling et al. called for the replacement of replicated experiments with ML models to predict material properties at a reduced cost41. Efforts were made to employ time temperature parameter models and generative adversarial network to generate synthetic data, train and test ML algorithm, and improve prediction accuracy42,43. Zou et al. employed Shapley Additive Explanations values and input feature importance ranking of creep rupture life to interpret black box ML algorithm37. In all the above, ML does not produce human-interpretable equations but rather a black-box model that exhibits limited interpretability. If the objective is simply prediction accuracy, black-box is ideal. If the objective is to discover the fundamental relationships in materials, then human interpretable equations are preferred. A potential solution to this problem is genetic programming (GP) with symbolic regression.

Fig. 2
figure 2

Illustration of MGGP algorithm with symbolic regression. Gene 1 and Gene 2 undergo crossover and mutation to evolve into a new equation. The tree depth, td increases due to a mutation in Gene 2.

Genetic programming with symbolic regression

Genetic programming (GP) simulates the biological evolution of living organisms44. Symbolic regression solves problems by determining the mathematical expressions that best fit a given dataset. A genetic programming algorithm with symbolic regression is illustrated in Fig. 2. The genes evolve to solve your problem. The gene has a tree structure that consists of mathematical expressions \(\left( {+,{\text{ }} - ,{\text{ }} \times ,{\text{ }} \div ,{\text{ }}\sinh ,{\text{ }}\cosh ,{\text{ }}\tanh ,{\text{ }}\exp ,{\text{ etc}}{\text{.}}} \right)\)and constants45,46. During gene evolution, the mathematical expressions are selected, mutated, and sub-grouped to form an equation that solves a problem.

In Multigene Genetic Programming (MGGP), multiple generations of genes evolve in parallel and undergo selection, crossover, and mutation to acquire and delete genes. Many genes, i.e., equations, are generated. During the selection process only, the fittest will survive, based on a complex probabilistic process45,46,47,48. The genes with the best ability to predict the data “survive” a generation are transferred directly and/or combined with other surviving genes. This process iterates until a user-defined limit has been reached, such as (1) a desired predictive performance, (2) a specific number of genes or generations, and/or (3) a particular amount of time has elapsed. In GP and MGGP, the equations can become complex. To achieve human-interpretable equations, a human must define the allowable mathematical expressions, gene size, tree depth, and mathematical/physical constraints to be enforced.

A few authors have applied GP with symbolic regression to discover new creep equations. Baraldi et al. explored GP with symbolic regression to discover a creep-strain-rate equation as a function of stress, temperature, and natural logarithm of creep strain49. The resulting human-interpretable equation compared favorably to the codified logistic creep strain equation and a black-box NN model. Hossain et al. leveraged MGGP with symbolic regression to develop creep-rupture equations as a function of stress and temperature50. The study revealed gene size and tree depth affect the complexity of equations. Thus far, GP/MGGP creep equations have only involved stress and temperature inputs. The current study investigates the inclusion of stress, temperature, and chemistry. The research question is can we mathematically define the role of chemistry in creep-rupture? How is chemistry related to the strengthening mechanisms of creep resistance?

Objectives

This study seeks to discover, a mathematical relationship between stress, temperature, properties, chemistry, and creep-rupture of a superalloy. Creep data for alloy 617 is gathered from seven (7) sources with variations in chemical composition. The data is analyzed and separated into training, testing, and post-audit validation sets. An MGGP algorithm with symbolic regression is employed to generate equations. The algorithm is systematically optimized to generate equations. The optimal equation is selected based on statistical goodness and human evaluation. The equation is compared to the training and test data and separately held post-audit validation data. The mathematical form of the equation is analyzed to determine the effect of chemistry on creep resistance. The equation is used to determine an optimal alloy composition for Alloy 617 that maximizes creep strength. CALPHAD calculations of the nominal and optimal chemistry are carried out to determine the phases, phase fractions, and stability at equilibrium. When compared, the sources of creep strengthening/weakening are revealed. Finally, the MGGP equation and its opportunities and limitations are discussed.

Material

Alloy 617 is a solid solution strengthened alloy with nominal Ni-Cr-Co-Mo composition specifically designed for high-temperature applications51. Alloy 617 has been employed in nuclear components, intermediate heat exchangers, combustion liners for turbine engines, and pressure vessels, among others, as outlined in the ASME Boiler and Pressure Vessel code52.

Creep data is collected from seven (7) data sources (INL, KAERI, NIMS, Huntington, ORNL, Cook, and GE) including a range of heats and product forms (rod, plate, rod, bar, CR sheet, and forging)51. The nominal chemical composition from each source is summarized in Table 1. In total, 306 data points were collected, of which 252 were used for training and testing, and 54 for post-audit validation. The creep-rupture data spans 6.2–537.8 MPa, 13–40,100 h, and 593–1149 °C. The test count is 306 with a total duration of 1,177,853 h (≈ 134 years).

Tensile and yield strength as a function of temperature is available from only 3 data sources (INL, Hunting, ORNL)51. Tensile information is not available for all sources. The room temperature tensile strength of ORNL, 750 MPa, is selected as the baseline property for all data. The melting temperature ranges from 1332 to 1380 °C53. The mean melting temperature of 1350 °C is selected as the baseline property for all data.

Table 1 Chemical composition (wt%) of Alloy 617.
Fig. 3
figure 3

Key microstructural features of alloy 61754.

Literature offers insight into the creep-strengthening mechanics of Alloy 617 as illustrated in Fig. 354. The primary strengthening mechanisms in alloy 617 are solid solution and precipitation strengthening. Solid solution strengthening comes from Co and Mo. The element Co contributes to solid solution strengthening which impedes diffusion to enhance creep resistance60. The element Mo has the dual effect of solid solution strengthening and as a catalyst for γ’formation55. Precipitation strengthening is imparted by the intermetallic gamma prime γ’ (Ni3Al) phase. Gamma prime γ’ helps to prevent dislocation motion via γ’-enriched bands within grains as shown in Fig. 3. The key carbides are FCC crystal structured M23C6 (M = Cr + Mo) and M6C(M = Mo)55,56,57,58,59,60,61,62,63,64. These carbides form alongside γ’ creating a strengthening zone within grains. They also form along grain boundaries and offer resistance to crack propagation as shown in Fig. 3. Beyond forming carbides, Cr promotes oxidation and corrosion resistance56,57. Additional precipitates are topological closed-packed phases µ and titanium carbonitride TiCN. The µ is a strengthener; however, the presence of µ is not detected after aging at temperatures from 649 to 1093 °C58. If the µ phase is detected, it appears near grain boundaries in the vicinity of carbides as shown in Fig. 3. The TiCN appears in a 400–1000 °C temperature range. Grain boundary stability is promoted by Co, Mo, B, and Mn55. Centered on the strengthening mechanisms, the key alloying elements are Ni, Cr, Co, Mo, Al, Ti, and C, respectively.

The tolerant elements Ti, B, Cu, and S can be beneficial if held at trace amounts; but offer deleterious effects on creep resistance at high concentrations56. Titanium readily forms carbides, nitrides, oxides, and carbonitrides, which remain suspended in the melt and hinder their incorporation into the solid material as desired60. Copper induces embrittlement and reduced ductility. At elevated temperatures, Cu can contaminate the γ’ phase, diminishing creep resistance. The structures that form within alloy 617 greatly influence creep resistance. The element ratios are provided in Table 2. The ratios represent a simplification of strengthening precipitates γ’ (Ni3Al), carbides (M23C6 and M6C), and titanium carbonitride (TiCN).

Table 2 Element ratio of key element in alloy 617. (wt%/wt%).

Multi-Gene Genetic Programming (MGGP)

A Multi-Gene Genetic Programming (MGGP) algorithm with symbolic regression is employed to generate creep equations65,66. The pre-processing steps (data analysis, input feature screening, and data splitting) and post-processing steps (model extraction and evaluation, model refinement) involve both a human and the machine; where the machine provides statistics that a human interprets to make decisions. The optimization of the MGGP algorithm parameters and the selection of the optimal creep equation are human tasks informed by iteratively tuning the algorithm until a suitable discovery is made.

Data analysis

The Alloy 617 data is analyzed. Interrupted and ongoing tests are removed51. Normalization is performed for dimensional homogeneity. Stress, σ is normalized by room temperature tensile strength to furnish σ/UTS. Temperature, T (℃) is converted to Kelvin (K) and normalized by melting temperature to furnish T/Tm. Rupture, tr (hr) is transformed into a natural logarithm, ln(tr).

Input feature screening

Input feature screening eliminates inputs that do not correlate to creep-rupture. There are 21 input features including normalized stress, normalized temperature, alloying elements (13), element ratios (4), and mechanical properties (2). Input features are evaluated statistically where the Pearson correlation coefficient, r is used to measure the correlation intensity, and the p-value is used to accept or reject the null hypothesis. The value ranges from − 1 to + 1, indicating the strength and direction of the correlation between the two features59,60,61. The p-value determines the significance of a relationship. The threshold for significance is p ≤ 0.05 where the probability of observing the obtained results by random chance is less than 5%. A highly significant relationship is realized at p ≤ 0.01. Input features with p ≥ 0.05 should be discarded from equation discovery unless human intervention is required.

Fig. 4
figure 4

Heat map of Pearson correlation coefficient, r and p-value for Alloy 617. The color codes represent the type of correlation while the size of the “O” symbol represents the strength of correlation.

The heat map of the Pearson correlation coefficient, r with p-value is depicted in Fig. 4. The p-value test finds 3 significant (Ni, Mn, and UTS) and 7 highly significant input features (T/Tm, Co, Cu, C, B, Cr/C, and Mo/C). The input feature σ/UTS is not found to be significant even though σ is a critically important feature for creep rupture prediction. This lack of significance is likely due to normalizing σ by a fixed room temperature UTS rather than the UTS at actual temperature per Heat (the data of which is unavailable). It is also observed that the ratio Ni/Al is not significant. The Ni/Al ratio represents a simplification of γ’ (Ni3Al) which is a key strengthening phase. Human expertise gives the ability to include or exclude certain features, even if the statistical rationale suggests otherwise. Note the MGGP algorithm during optimization may discard additional inputs. The resulting input features that passed screening and meet human expertise are listed in Table 3.

Table 3 Input features.
Fig. 5
figure 5

Data splitting of (a) training and testing and (b) post-audit validation.

Data splitting

The data is split into three datasets: training, testing, and post-audit validation as shown in Fig. 5.

  • Training and testing set – INL, KAERI, NIMS, and Huntington.

  • Post-audit validation set – ORNL, Cook, and GE.

The training and testing data are randomly split into 70% training and 30% testing sets unbiased to temperature, stress, chemical composition, or data source. The training and testing sets exhibit a similar range of scatter with no discernible trend. The post-audit validation data is held separately and later used to evaluate the MGGP equation’s ability to predict “hidden data” based on alloying elements.

Algorithm

The MGGP algorithm includes many parameters that influence equation generation62. It is necessary to calibrate these parameters to furnish high-fitness and low-complexity equations. The MGGP algorithm is probabilistic and includes Monte Carlo methods at almost every decision point. When uncalibrated, each run will result in a different set of elite equations. Once properly calibrated, consistent elite equations should be observed where the elite equations per run are not identical but exhibit similar complexity, input features, and mathematical expressions.

In terms of the structure of equations, multigene is enabled. Multiple genes can evolve and be additively or subtractive connected to furnish an equation, (e.g. see Fig. 2). The gene size, g controls the maximum number of genes that can be employed. The maximum tree depth, \({t_d}\), and mutation depth, \({t_m}\) control the depth of mathematical operators within a single tree and the depth at which mutation operations can be performed. These parameters control the maximum complexity of a given equation. The optimal range for the creep-rupture problem exists between

$$6 \leqslant g+{t_d} \leqslant 8$$
(1)

obtained through parametric optimization50. Herein, \(g={t_d}=4\) is set.

Fig. 6
figure 6

MGGP Parameter Concepts.

The MGGP algorithm concept is illustrated in Fig. 6. Population size, p is the number of equations per generation. A randomly seeded initial population is built, generation i. The larger the population, the more likely good initial equations will exist within the first generation; however, the larger the population, the more number of generations, \({n_g}\) are needed to genetically improve fitness. Fitness and complexity are evaluated through a selection process. Two types of selection are performed: Pareto and Elitism. The selection type is a user-defined probability. During Pareto selection, a tournament size, ntof equations are randomly selected from the current generation63. Equations can participate in multiple tournaments, increasing selection pressure. The nt equations are Pareto sorted using fitness and complexity as competing objectives to obtain a rank 1 Pareto set of equations that are non-dominated in terms of both fitness and complexity. Finally, a random equation from the rank 1 pareto set is then returned. The equations returned from Pareto tournaments become the parents of the next generation. Elitism selection is also performed separately from Pareto. During elitism, the current generation is sorted by best fitness and lowest complexity. A user-defined elite fraction is then stored for the next generation.

With the selection process completed, the next step is to build a new generation i + 1 from the selected individuals. The elite fraction is directly transferred to the new generation. The remaining equations are generated through genetic operations performed on the selected individuals. Mutation, crossover, or direct copy are performed with a user defined probability. Six types of mutation operations are possible but only 3 are enabled by default (Ordinary Koza Style subtree mutation, mutating a non-constant terminal to another non-constant terminal, and constant perturbations sampling a Gaussian distribution). Crossover can occur either at a high-level (where 1 to g genes are exchanged randomly and independently for each parent with a given probability) or low-level (where a single random gene is exchanged between parents). Direct copy is where a selected parent is directly copied to the next population. Direct copy operates similarly to the elite fraction, but it is performed on equations that have passed Pareto selection. Selection and population building repeat until convergence criteria have been met or the user-defined number of generations, \({n_g}\) has been reached.

In a single run of the MGGP algorithm, model fitness is constrained by the initial randomly seeded population. The initial population may not hold a good mixture of initial equations. To overcome this issue, the MGGP algorithm can be operated with multiple runs, r. By executing multiple runs, multiple initial populations can be seeded. These populations evolve independently and in parallel. The final populations are merged and Pareto sorted to furnish the fittest equations. Thereby a more global solution is obtained. The total number of equations, N built by MGGP is calculated as

$$N=p \cdot {n_g} \cdot r\left( {1 - elite - copy} \right)$$
(2)

assuming no convergence before \({n_g}\) has been reached. The merge size, n of the final populations

$$n=p \cdot r$$
(3)

is subjected to Pareto selection to furnish the fittest equations.

Fig. 7
figure 7

MGGP parameter optimization (a) population, p and (b) generations, \({n_g}\), runs, r as a function of (c) complexity and (d) R2, Mutation as a function of (e) complexity and (f) R2, Tournament Size as a function of (g) complexity and (h) R2.

Parameter optimization

The MGGP algorithm must be optimized. Numerous parameters can be tweaked to improve convergence. It is not feasible to optimize them in parallel. Herein, a series approach is taken where we start with the default parameters, and sequentially optimize only the key parameters that control “gross” convergence. Note, that this initial optimization process is performed on a workstation computer (4 cores, 32GB RAM) and is thus constrained by solve time.

Before optimization, the function nodes must be selected. The function nodes are basic (+, -, ×, ÷) and complex (neg, log, power, exp, negexp, mult3, add3, sinh, sqrt, cube). These function nodes are selected because they exist in human-derived creep equations.

Parameter optimization was performed by testing the influence of various parameters on fitness and model complexity as shown in Fig. 7. The convergence of population and generations is shown in Fig. 7 (a) and (b) in terms of RMSE. Convergence is observed to exponentially soften to a saturation point at p > 1000 and \({n_g}\)> 200 respectively; with a ratio of 20%. The convergence of runs, r is shown in Fig. 7 in terms of (c) model complexity and (d) R2, respectively. The data points correspond to the equation with the highest R2 and lowest complexity in the Pareto front of the final generation. Convergence is observed at r > 40 where model complexity and R2 appear to exponentially hardened to a saturation point. The genetic operations of mutation, crossover, and direct copy must hold the following constraint m + c + d = 1. Since direct copy is redundant to elitism, it is fixed to 0.02 and the relationship between mutation and crossover is explored. The convergence of mutation, m is next. A linear relationship is observed between complexity and mutation with an expanding uncertainty as shown in Fig. 7 (e). This uncertainty is reflected in the R2 values where increasing mutation aids fitness but inconsistently as shown in Fig. 7 (f). Mutation, cross-over, and direct copy are set to 0.475, 0.475, and 0.05 respectively that offer a good balance of low complexity and high fitness with some certainty. Finally, tournament size, nT is evaluated as shown in Fig. 7(g) and (h). It is observed that both complexity and R2 spike at a tournament size of 150 which is 15% of the total population.

The optimal workstation parameters are summarized in Table 4. While a workstation is useful, it would be ideal to scale up the population size, generations, and tournament size to expand the breadth and depth of exploration. The Ohio Supercomputing Center was leveraged to the scale-up parameters as shown in Table 4. The chemistry equation was obtained using the Ohio Supercomputing Center.

Table 4 MGGP Parameters employed in Workstation and Supercomputing.
Fig. 8
figure 8

Pareto front of fittest equations shown as R2 versus complexity.

RESULTS

MGGP-Derived equation

The optimal equation is human-selected. The MGGP algorithm generated 400,000,000 equations. The final merged population was 800,000 equations. Pareto selection reveals 314 elite equations. The Pareto front of these elite equations is shown in Fig. 8. A region of interest is designated where \({R^2}\)> 0.88 and complexity ranges from 50 to 100. Within the region of interest, the equations are ranked in the following order (1) a high number of chemistry inputs, (2) high \({R^2}\), and (3) low complexity. Furthermore, a choice was made to select an equation that offers conservative creep-rupture predictions when chemistry is evaluated outside the ASTM limits. Following this rationale, the optimal equation is selected. The optimal equation is

$$\begin{gathered} \ln ({t_r})={\alpha _1}\exp \left( { - \frac{{\left( {({C^{Ni/Al}}{\raise0.7ex\hbox{${Mo}$} \!\mathord{\left/ {\vphantom {{Mo} C}}\right.\kern-0pt}\!\lower0.7ex\hbox{$C$}}\sinh \left( {{\raise0.7ex\hbox{${Mo}$} \!\mathord{\left/ {\vphantom {{Mo} C}}\right.\kern-0pt}\!\lower0.7ex\hbox{$C$}}} \right)} \right)}}{{{\raise0.7ex\hbox{$\sigma $} \!\mathord{\left/ {\vphantom {\sigma {{\sigma _{TS}}}}}\right.\kern-0pt}\!\lower0.7ex\hbox{${{\sigma _{TS}}}$}}}}} \right) - {\alpha _2}\ln \left( {{\raise0.7ex\hbox{$T$} \!\mathord{\left/ {\vphantom {T {{T_m}}}}\right.\kern-0pt}\!\lower0.7ex\hbox{${{T_m}}$}}} \right)... \hfill \\ {\text{ }} - {\alpha _3}{\raise0.7ex\hbox{$\sigma $} \!\mathord{\left/ {\vphantom {\sigma {{\sigma _{TS}}}}}\right.\kern-0pt}\!\lower0.7ex\hbox{${{\sigma _{TS}}}$}}\sqrt {{\raise0.7ex\hbox{${Cr}$} \!\mathord{\left/ {\vphantom {{Cr} C}}\right.\kern-0pt}\!\lower0.7ex\hbox{$C$}}+{\raise0.7ex\hbox{${Ni}$} \!\mathord{\left/ {\vphantom {{Ni} {Al}}}\right.\kern-0pt}\!\lower0.7ex\hbox{${Al}$}} - 8.17} - {\alpha _4}{\raise0.7ex\hbox{$T$} \!\mathord{\left/ {\vphantom {T {{T_{TS}}}}}\right.\kern-0pt}\!\lower0.7ex\hbox{${{T_{TS}}}$}}\ln \left( {B+2{\raise0.7ex\hbox{$\sigma $} \!\mathord{\left/ {\vphantom {\sigma {{\sigma _{TS}}}}}\right.\kern-0pt}\!\lower0.7ex\hbox{${{\sigma _{TS}}}$}}} \right) - {\alpha _5} \hfill \\ \end{gathered}$$
(4)

where, \({t_r}\) is rupture time (hr), \(\sigma\) is the applied stress (MPa), \({\sigma _{TS}}\) is the room temperature ultimate tensile strength (MPa), T is the temperature (K), \({T_m}\) is the melting temperature (K), C and B are elements (in wt%), and Ni/Al, Mo/C, Cr/C are element ratio (in wt%/ wt%). The expressional complexity of the equation is 75. The equation has 10 input features. The material constants, \({\alpha _i}\) are summarized in Table 5. The equation is fully calibrated, and no further calibration is required to predict rupture as a function of stress and temperature; only chemistry. The optimal equation is moderated by the following constraint.

$$100\% =Ni+Al+Mo+Cr+C+B+others$$
(5)

where any alloying adjustments are applied to the Ni-base

$$Ni=100 - \left( {Al+Mo+Cr+C+B+others} \right)$$
(6)

where others, are the remaining alloying elements not being adjusted. Mathematically, the equation has a form similar to time-temperature parameter models6. The ECCC V5 guideline calls for equations that are free of inflection points, cross-over, and/or turn-over for any combination of stress and temperature67. Each term within the equation uniquely impacts rupture time. Term 1 is a stress and alloying strengthening/weakening. Term 2 is a thermal weakening where increasing temperature reduces rupture. Term 3 is a mechanical and carbide strengthening/weakening where increasing Cr and Ni increases rupture. Term 4 is a thermo-mechanical strengthening/weakening. Term 5 is a bias term for the baseline creep strength.

Table 5 The material constants of the MGGP equation.
Fig. 9
figure 9

Mapping of the MGGP Equation to the Materials Science Pyramid.

The equation is mapped to the materials science pyramid as shown in Fig. 9. The equation offers a mathematical relationship between properties, chemistry, and performance. Given reasonable estimate of UTS (due to processing) and melting temperature, creep-rupture can be predicted from chemistry alone. Three questions arise.

1. Can the equation predict creep-rupture from known and unknown chemistry?

2. Can the equation be employed to discover new chemistry with improved creep resistance?

3. If we find a new creep-resistance chemistry, how do we validate it without making the alloy?

Fig. 10
figure 10

MGGP-derived equation predictions vs. rupture data for (a) training and (b) testing set.

Predictions

To answer the first question, the equation is compared to the training and testing data Fig. 10. The predictions and experiments are in agreement. The data resides along the diagonal (black solid line) excluding a few outliers. The training and testing data carry a coefficient of determination, \({R^2}\) of 0.904 and 0.893 showing a stronger correlation in the training data. Improved correlation in training data is likely due to the 70/30 data splitting ratio.

Fig. 11
figure 11

Creep-rupture predictions of training and tests data (a) INL, (b) KAERI, (c) Huntington, and (d) NIMS. The solid lines are MGGP equation predictions corresponding to creep data, dashed-dot lines are interpolations, and dashed lines are extrapolations.

Predictions of creep-rupture from the chemistry of training and test data are shown in Fig. 11. The chemistry of each source is input into the equation along with the baseline UTS and average melting temperature. Visually, the predictions are in excellent agreement with the data (solid lines). Interpolations (dashed-dot lines) and extrapolations (dashed lines) are equally spaced relative to each other and meet the visual expectation for this material. Quantitatively, INL and NIMS exhibit the best predictions with R2 at 0.889 and 0.959, respectively. The worst fit is observed in Huntington data with a low \({R^2}\) of 0.332. Additional error metrics, including Normalized Mean Square Error (NMSE), Root Mean Square Error (RMSE), Mean Squared Error (MSE), and Mean Absolute Error (MAE) are summarized in Table 6. Note, that the density of data has an impact on R2 measures. Huntington illustrates that replicated creep data harms the accuracy of statistics while the equation is observed to track the average trend in the data.

Fig. 12
figure 12

Creep-rupture predictions of post-audit validation data (a) ORNL, (b) Cook, and (c) GE. The solid lines are MGGP equation predictions corresponding to creep data, dashed-dot lines are interpolation, and dashed lines are extrapolation.

To assess the ability to predict creep-rupture from “hidden” chemistry, creep-rupture predictions of post-audit validation are shown in Fig. 12. Post-audit validation data is held blindly and not used during training or testing. The chemical compositions of these data sources are input into the equation with the baseline UTS and average melting temperature. Visually, the predictions are in agreement with the data. Outliers are observed in the ORNL data at the lowest isotherms and the GE data at the hottest isotherms. Quantitatively, the best prediction is observed for Cook with an R2 of 0.852. Overall, the equation effectively predicts the creep-rupture from the chemistry of materials both seen and unseen data. Supplementary error metrics are summarized in Table 6.

Table 6 Error metrics of creep rupture predictions compared to experimental data.
Fig. 13
figure 13

Creep-rupture of optimized chemistries compared to all available creep data.

Chemistry optimization

To answer the second question, the equation is employed to determine an optimal chemistry for creep resistance. The creep conditions of 10–420 MPa and 900 °C are targeted. A temperature of 900 °C is chosen due to the availability of the most data points. An evolutionary non-linear global algorithm that mimics the process of natural selection was employed for optimization. The target is to maximize the following objective function

$$Obj{\text{ = }}\sqrt {\frac{{\ln \sum {{t_t}} }}{n}}$$
(7)

where \({t_r}\) is rupture time and n is total data points at a 5 MPa interval. The constraint [Eq. (5)] is held. Other is set to 13.324 equivalent to ORNL the strongest Heat. The elements Ni, Cr, Mo, Al, C, and B are varied, and the resulting, Ni/Al, Mo/C, and Cr/C ratio is calculated and input into the equation.

Chemistry optimization was performed at the ASTM limits and ± 10%. Creep rupture is improved as shown in Fig. 13 where both optimizations furnish virtually identical predictions with Obj equal to 0.449. Interestingly, the chemistry in these two cases was not identical, suggesting the solver has reached a plateau where the MGGP equation becomes insensitive to further chemical change. Mathematically, C and B are driven to maximize and minimize, respectively. Maximizing C is related to the formation of strengthening carbides such as M23C6 and M6C57,58,59,60,61,62,63,64. Boron is a narrowly tolerant alloying element whose mechanisms are not fully understood56. Mathematically, it is driven to minimize but cannot be zero due to ln(0)=-∞. Trace amounts of B in Alloy 617 benefit ductility and spallation resistance where B is known to segregate at grain boundaries and matrix/carbide interfaces68. A modified derivative of Alloy 617 with boron called Alloy 617B exists to take advantage of narrow additions of Boron to the chemistry; however, data for this Alloy 617B is not available in this effort68. In the available data, B has a small magnitude and high COV (as shown in Table 1) and exhibits relative insensitivity to creep strength. Subsequently, the MGGP equation is insensitive to B and tends to minimize this element.

As the sensitivity of the objective function deteriorates. further expansion of the ASTM limits will result in the same outcome and a wider chemistry range loses credibility/conservatism. Subsequently, the optimal chemistry was defined as those within ± 10% ASTM. The optimized chemistry is provided in Table 7.

Table 7 Optimal Chemistry (in wt%).
Fig. 14
figure 14

Creep-rupture predictions of ORNL and optimized data chemistry ranging from 850 to 1000 °C. Solid lines are ORNL, and dashed lines are optimized data predictions.

Parametric creep rupture predictions ranging from 850 to 1000 °C of the ORNL and optimized chemistry is shown in Fig. 14. In both cases, the predicted isotherms are evenly spaced. The optimized chemistry exhibits notable improvement in creep resistance when compared to ORNL. Furthermore, the optimized chemistry exhibits a slightly higher UTS compared to ORNL at elevated temperatures.

Fig. 15
figure 15

Equilibrium phase diagram of (a) ORNL, (b) optimized ASTM chemistry, and (c) comparison of ORNL (solid lines), and optimized ASTM (dashed lines) chemistry.

CALPHAD

To address the final question, calculated equilibrium phase diagrams for the optimized chemistry are generated. The equilibrium phase diagram for the ORNL and the optimized chemistry are shown in Fig. 15(a)-(c). The thermodynamic calculations are carried out using the Thermo-Calc software and TCNI12 database70.

For ORNL, Cr-rich M23C6 carbides are metastable, completely dissolving between 775 and 1000°C to form M6C carbides before reforming briefly until they completely disappear at 1100°C. The γ’ strengthener is stable up to 700 °C. The deleterious σ phase precipitates below 850 °C55; however, the σ phase takes an extraneous amount of time to precipitate and is not detected in aging alloy 617 experiments.

For the optimized chemistry, M23C6 precipitates at a higher phase fraction and persists out to 1200°C with only minor dissolution between 775 to 1000°C to form metastable M6C carbides. The γ’ is stable up to 800 °C. The σ phase remains unchanged. Interestingly, once M23C6 dissolve at 1200 °C, short-lived M7C3 carbides appear up to the solidus temperature. (Cr, Fe)7C3carbides exhibit high hardness and wear resistance71. It might be possible to anneal and rapidly quench to precipitate these carbides into the microstructure; however, subsequent exposure to elevated temperature during creep will result in dissolution over time. Overall, the optimized chemistry has strengthening phases that exist at a high phase fraction and persist over a wider temperature range. CALPHAD suggests that the optimized chemistry will produce a microstructure that is more stable and exhibits better creep resistance.

Discussion

The MGGP equation predicts creep rupture as a function of stress, temperature, chemistry, and chemical ratio. In the materials science pyramid, processing was ignored and structure was evaluated post calibration as illustrated in Fig. 9. While the post-audit validation and CALPHAD simulations have validated the feasibility of the MGGP equation; it is unclear if the MGGP equations represents a fundamental relationship between performance and chemistry or rather a closed-form phenomenological equation. There is a need to generate a high-fidelity database of chemistry, processing, structure, and properties data of reasonable quantity to enable a deeper discovery of fundamental relationships within any specific alloy.

The addition of physical equations (as mathematical constraints or input data) will enhance the MGGP algorithm’s ability to discover fundamental equations. The chief advantage of the MGGP algorithm over “black-box” machine learning, is the output of a closed-form equations that can be integrated by a human. This transparency enables a human to apply constraints to the algorithm, modify algorithm parameters, and make decision concerning the acceptance, reject, or modification of elite equations. For example, basic concepts such as dimensional homogeneity can be employed. Extrapolation ability can be tests through a mathematical exercise of the equations at known extreme input conditions (e.g., ultimate tensile strength, zero stress, melting and room temperature, etc.). Nevertheless, the ideal next step for this effort would be to manufacture the optimized chemistry into samples for characterization and testing to verify CALPHAD and predict creep-rupture results.

Conclusions

This study aims to explore the application of multi gene genetic programming algorithm to discover new creep equations. The MGGP algorithm is employed on Alloy 617 data to discover a creep-rupture equation. The key outcomes are as follows.

• The MGGP equation is a function of stress, temperature, chemistry, and chemical ratio. The mathematical terms relate to thermomechanical and alloying strengthening/weakening, respectively.

• The fitness of the MGGP equation to training, test, and post-audit validation data exhibited a high \({R^2}\) measures. Post-audit validation confirms the applicability of the model to unseen data; both qualitative and quantitatively. Caution should be exercised when evaluated \({R^2}\), as the quantity and scatter in data can influence measurements.

• An optimized chemistry is obtained within +/−10% ASTM limits. The optimized chemistry exhibits improved rupture strength compared to data sources.

• Equilibrium calculated phase diagrams (CALPHAD) of optimized chemistry show the evolution and stability of favorable carbides and gamma-prime phase over a wide temperature range.

In the future, the physical metallurgy will be explored, and the proposed optimized chemistry will be developed, followed by creep testing.