Molecular insights on the solvent screening for the benzene extraction from fuels using ionic liquids via QSPR method

Amereh, Mahdieh; Gorji, Ali Ebrahimpoor; Sobati, Mohammad Amin

doi:10.1038/s41598-024-79639-x

Download PDF

Article
Open access
Published: 28 December 2024

Molecular insights on the solvent screening for the benzene extraction from fuels using ionic liquids via QSPR method

Mahdieh Amereh¹,
Ali Ebrahimpoor Gorji¹ &
Mohammad Amin Sobati¹

Scientific Reports volume 14, Article number: 30718 (2024) Cite this article

1549 Accesses
Metrics details

Subjects

Abstract

Benzene separation from hydrocarbon mixtures is a challenge in the refining and petrochemical industries. The application of liquid–liquid extraction process using ionic liquids (I.Ls) is an option for this separation. The selection of the most appropriate I.L. for this application is a challenging task due to the variety of anion and cation structures. In the current study, the benzene distribution between the aliphatic hydrocarbon-rich and I.L.-rich phases has been evaluated using the Quantitative Structure–Property Relationship (QSPR) method. A dataset comprising of 112 ternary systems (namely, I.L., benzene, and aliphatic hydrocarbon) was compiled after an extensive review of literature. The primary dataset consists of 17 anions, 20 cations, and 12 aliphatic hydrocarbons. Therefore, the impact of the structure of anion, cation, or aliphatic hydrocarbon on the benzene distribution between the aliphatic hydrocarbon-rich and I.L.-rich phases has been investigated. The linear QSPR models were constructed using Multiple Linear Regression (MLR). The statistical evaluation of the final linear model showed that the constructed model (R² = 0.900) has an acceptable capability to predict the mole fraction of benzene in the I.L.-rich phase. Additionally, non-linear QSPR models were developed using Genetic Programming (GP) and Artificial Neural Network (ANN) machine learning methods. The statistical evaluation of the GP model (R² = 0.927) and ANN model (R² = 0.939) showed that non-linear models had slightly higher prediction accuracy compared to the linear model. The final QSPR model was developed using the BELe3 cation descriptor which is a 2D Burden eigenvalues descriptor and HTm anion descriptor which is a 3D GETAWAY descriptor. After model construction, the selected molecular descriptors of anion and cation structures has been interpreted. The results showed that the size and the electronegativity of the atoms in the anion and cation structure are probably important parameters that affect the benzene distribution between the aliphatic hydrocarbon-rich and I.L.-rich phases. Additionally, the anion shape can be considered as an effective parameter in the benzene extraction process.

Prognostication of advanced CO₂ capture using tunable solvents with an ensemble learning-based decision tree model

Article Open access 04 June 2025

Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state

Article Open access 09 September 2021

Modeling the solubility of light hydrocarbon gases and their mixture in brine with machine learning and equations of state

Article Open access 02 September 2022

Introduction

Aromatic compounds especially BTX (i.e., benzene, toluene, and xylenes) and ethyl benzene are valuable raw materials in the petrochemical industries¹. In addition, the reduction of sulfur, nitrogen, and aromatic compounds (especially Benzene) in the middle distillate products has considerable importance due to their environmental impacts^2,3. Therefore, great attention has been focused on the separation of aromatic compounds from naphtha-cracking streams. The separation and purification of aromatic compounds are usually challenging due to the close boiling points of different hydrocarbons and the different combinations of azeotropes that may be formed^4,5. Based on aromatic concentration, three separation techniques are categorized commercially: (I) liquid–liquid extraction is predominantly employed typically for low aromatic concentrations (in the range of 20 to 65 wt%); (II) extractive distillation is commonly chosen for moderate aromatic concentrations (in the range of 65 to 90 wt%), and (III) the azeotropic distillation is used for extremely high aromatic concentrations (> 90 wt%)^6,7.

The liquid extraction process is regarded as the most favorable approach in the aromatic separation process for mixtures containing lower than 20 wt% aromatic contents. The notable advantages of this method include low energy consumption, preservation of both physical properties and chemical structures, ease of application, and mild operating conditions. In the extraction process, the selection of an appropriate solvent is important. In other words, an ideal solvent should be environmentally friendly, easily regenerated, sufficiently available at low cost, have a high level of aromatic compound selectivity, have a significant distribution ratio, and maintain a minimum solvent-to-feed ratio^2,8. In addition, the physical and thermodynamic properties of the solvent such as density, surface tension, viscosity, and thermal stability should be applicable in the industry. Commonly, organic solvents such as furfuryl alcohol, N-formyl morpholine (NFM), sulfolane, ethylene glycols, dimethyl sulfoxide (DMSO), and N-methyl pyrrolidone (NMP) are employed in the aromatic/aliphatic separation process. Due to the high levels of toxicity, volatility, flammability, and regeneration costs of these organic solvents, alternative solvents are being investigated^3,8.

Nowadays, ionic liquids (I.Ls) have been gradually noticed as an attractive and promising alternative to traditional organic solvents for the extraction of aromatic and aliphatic hydrocarbons. In general, I.Ls exhibit low vapor pressure, notable thermal and chemical stability, effective solubility for both organic and inorganic compounds, and low environmental impacts. Additionally, I.Ls are often recognized as “designer solvents” due to their potential for diverse combinations of anions and cations, resulting in numerous I.Ls with distinct specifications and a wide range of applications^6,9. In the petroleum refining industries, numerous separation processes using I.Ls have been investigated. These processes include desulfurization, denitrogenation¹⁰, and dearomatization of fuels¹¹.

Investigation of liquid–liquid equilibrium (LLE) data is needed to better understand the role of I.Ls in the liquid–liquid extraction process¹². As a result, the number of experimental studies on the LLE of ternary systems containing I.L., aromatic hydrocarbon, and aliphatic hydrocarbon has increased over the recent decades. The literature predominantly discusses imidazolium-based I.Ls as the preferred solvents for extracting aromatic hydrocarbons from the mixture of aromatic/aliphatic components^13,14. Furthermore, other categories of I.Ls based on pyridinium, ammonium, etc. have been investigated in some studies^15,16.

The vast number of potential I.Ls that can be synthesized makes it impossible to select the optimal I.L. for a particular task or any separation/extraction process only using experimental measurements. Therefore, the development of predictive models can be significantly time-saving and cost-effective. In the literature, various models such as COSMO-RS¹⁷ and UNIFAC¹⁸ have been used to calculate LLE data, but limited ternary systems were studied in the development of each model. Therefore, the variety of anions, cations, and aliphatic hydrocarbons in the studied systems was very limited.

The QSPR method is a commonly used technique that utilizes molecular descriptors to establish quantitative relationships between a chemical structure and its corresponding property^19,20,21. Therefore, it is possible to design and/or develop new solvents or chemical structures using the QSPR method for different applications. This modeling method not only offers quantitative information about the desired property but also provides valuable qualitative insights via descriptor interpretation. The QSPR method has been extensively applied in various studies related to ILs, demonstrating its effectiveness in predicting their thermodynamic and physicochemical properties^22,23,24,25.

In our recent researches, we have utilized the QSPR approach to develop descriptive and predictive models for estimating the distribution of thiophene²⁶ , a sulfur-containing compound, and pyridine²⁷, a nitrogen-containing compound, between the hydrocarbon-rich and I.L.-rich phases. This study focuses on the separation of benzene from fuels. Benzene is a toxic aromatic cyclic hydrocarbon and exposure to it beyond the standard limit can cause serious problems for human health such as cancer. On the other hand, benzene is used as a raw material for the production of resins, drugs, dyes, and many other chemical products. Therefore, the separation of benzene from the fuels has received much attention in different studies. After reviewing the literature, it was found that despite the availability of LLE data for a considerable number of ternary systems including I.L., benzene (as an aromatic hydrocarbon), and an aliphatic hydrocarbon, there is no QSPR model to predict the distribution of benzene between the aliphatic hydrocarbon-rich and I.L.-rich phases. In order to fill this gap, a dataset containing LLE data for 112 ternary systems was collected. This dataset covers 17 anions, 20 cations, and 12 aliphatic hydrocarbons. By the combination of the involved anion, cation, and aliphatic hydrocarbon structures in the present dataset, 4080 possible ternary systems (i.e., 20 × 17 × 12 = 4080 systems) can be formed. It should be noted that there is only experimental data for 3% (112 ternary systems) of the 4080 systems, so far. As a result, developing a QSPR model can estimate the benzene distribution between the aliphatic hydrocarbon-rich and I.L.-rich phases of the remaining non-studied systems. Additionally, the impact of the variation of anion, cation, and aliphatic hydrocarbon structure on benzene separation has been investigated.

QSPR method

Dataset

Collecting an adequate and reliable dataset is the first step in the QSPR method. A dataset including 112 ternary systems was collected after a comprehensive review of the literature^{3,9,11,12,13,14,15,16,17,18,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61}. The collected dataset includes 17 anions, 20 cations, and 12 aliphatic hydrocarbons. More details regarding the ternary systems for mentioned dataset are presented in Table 1. All LLE data points were measured at 298.15 K and 1 atm.

Table 1 The employed dataset in the present study.

Full size table

The dataset included various aliphatic hydrocarbons such as straight-chain hydrocarbon³², cyclic hydrocarbon⁶¹, and branched hydrocarbon³⁵. Also, imidazolium-, pyrrolidinium-, ammonium-, and pyridinium-based I.Ls were present in the dataset.

In order to estimate the benzene distribution between the aliphatic hydrocarbon-rich and I.L.-rich phases, it was tried to investigate the relationship between the benzene mole fraction in the aliphatic hydrocarbon-rich phase (X₂) and the benzene mole fraction in the I.L.-rich phase (Y₂) for all ternary systems included in the main dataset and predict Y₂ in terms of X₂ using a linear model.

Due to the variation of different structures (i.e., anion, cation, and aliphatic hydrocarbon) in each ternary system, it is necessary to apply the effect of the structure of each group (i.e., anions, cations, or aliphatic hydrocarbons) using molecular descriptors in the model. Therefore, the descriptors of each group were considered to develop the model.

Two subsets (i.e., train and test subsets) were used to validate the constructed QSPR model. Train subset was used for the development and internal validation of the QSPR model. Test subset was used for external validation of the constructed model.

Basic theory

All 112 ternary systems of the dataset including I.L. (1), benzene (2), and aliphatic hydrocarbon (3) were investigated. A relationship between the values of Y₂ and X₂ was identified in every ternary system. Consequently, Eq. (1) can be developed to predict Y₂ values at any desired X₂ values:

$${\text{Y}}_{{2}} {\text{ = f (X}}_{{2}} {)}$$

(1)

where f can be a linear or non-linear function of X₂. According to Table 1, the structures of anions, cations, and aliphatic hydrocarbons are different in each of the ternary systems. In Eq. (1), the only independent variable in the model is X₂, so the prediction of Y₂ values is not possible with this model as it does not consider the impact of anion, cation, and aliphatic hydrocarbon structures. As a result, anion, cation, and aliphatic hydrocarbon descriptors should be considered as independent variables along with X₂ (see Eq. (2)).

$${\text{Y}}_{{2}} {\text{ = f (X}}_{{2}} {\text{) + g (anion, cation, and aliphatic hydrocarbon descriptors)}}$$

(2)

Molecular descriptors calculation

Structural features of various compounds are encrypted in molecular descriptors such as constitutional, geometrical, topological, etc. In order to calculate different types of descriptors, it was necessary to use the optimal structure for each compound in the dataset. Therefore, the 3D structure of all anions, cations, and aliphatic hydrocarbons was drawn using Chembio3D Ultra software. Afterward, the structure optimization was carried out based on the Density Functional Theory (DFT). In this regard, the 6-31++G(d,p) basis set and B3LYP level were employed. In the next step, Dragon software was employed to calculate the molecular descriptors for the optimized structures. Descriptors with constant or almost constant for each group were eliminated. Finally, 1247 anion descriptors, 1282 cation descriptors, and 1079 aliphatic hydrocarbon descriptors were calculated.

Model construction

Descriptors selection

In order to identify the most important and effective descriptors among a large number of descriptors (nearly 3608 descriptors), variable selection is important. The Genetic Algorithm (GA) method was employed to select appropriate variables as one of the most effective and efficient variable selection methods. GA is known for its capability to explore the large search space efficiently by mimicking the process of natural selection⁶². It operates by generating a population of potential solutions (in this case, the subsets of descriptors) and iteratively improving them through operations such as selection, crossover, and mutation. The fitness function used in GA evaluates the predictive accuracy and complexity of the developed model to ensure that the selected descriptors not only improve the prediction performance but also prevent overfitting⁶³. The fitness function employed in this study was based on the Leave-One-Out Cross-Validation (LOO-CV) coefficient of determination (R²). By optimizing the fitness function through LOO-CV, the best subset of descriptors was identified in such a way to improve the predictive performance of the developed model while maintaining the model simplicity.

The performance of the GA in selecting appropriate descriptors is highly dependent on some key parameters such as population size, mutation rate, and the number of iterations⁶⁴. Population size controls the number of candidate solutions (i.e., subsets of descriptors) evaluated in each iteration. A carefully chosen population size of 500 allows for sufficient exploration of the descriptor space while maintaining the computational efficiency. Mutation rate defines the likelihood of the random changes in the solutions, which helps maintain diversity in the population and avoids early convergence to suboptimal solutions^63,65. In the present study, a mutation rate of 20 was selected to balance the exploration and exploitation of the search space. Iterations refer to the number of generations through which the population evolves. In this study, 10,000 iterations were used ensuring that the GA converged towards an optimal subset of descriptors without overfitting risk.

Linear model

For discovering the correlation between the independent variables (i.e., GA-selected descriptors and X₂) and the Y₂, Multiple Linear Regression (MLR) was employed to construct a QSPR model. The GA-MLR model was developed using the QSARINS software^66,67,68.

Non-linear models

Machine learning has received much attention in recent years as a powerful tool in the development of predictive models. Machine learning methods can develop non-linear models, which is more accurate compared to the linear models. In this study, Genetic Programming (GP) and Artificial Neural Network (ANN), as two common methods in the field of machine learning approaches, were used to develop non-linear models.

In order to develop non-linear GP based model which is biological science inspired, GPTIPS code was employed. The input data of GPTIPS was X₂ and the final selected descriptors. In this method, the model was developed using selected mathematical operators (e.g. addition, subtraction, multiplication, and division) and adjusting the parameters of the code. After several iterations of modeling and comparing the statistical parameters of the developed models, the best model has been selected. More details on the model development procedure using GP can be found elsewhere^27,69.

Multi-Layer Perceptron (MLP), which is one of the most widely used artificial neural networks, was chosen to develop the ANN model. MLP consists of an input layer, an output layer, and one or more hidden layers, each layer containing a number of neurons. In this study, X₂ and the final selected descriptors were the input of the network and Y₂ was the output of the network. More details on the model development procedure using MLP can be found elsewhere^27,69.

Statistical parameters and models validation

Table 2 provides statistical parameters that can be utilized to assess the predictive capability of the QSPR model that has been developed. Where n is the number of data in the train, test, or main set, p is the number of independent variables, ${\text{Y}}_{{{2}_{{\text{i}}} }}^{{{\text{Exp}}}}$. is the experimental Y₂ value, ${\text{Y}}_{{{2}_{{\text{i}}} }}^{{{\text{Pre}}}}$. is the predicted Y₂, and $\overline{{\text{Y}}}_{{2}}$ is the average of the experimental Y₂ values.

Table 2 Statistical parameters employed in the present study.

Full size table

Q²_LOO-CV, which is defined as the LOO-CV coefficient of detertion, was initially used for internal validation of the constructed model⁶⁹. Additionally, Leave-Many-Out Cross Validation (LMO-CV) was performed, where 25% of the data was left out in each iteration. The Q²_LMO-CV was also calculated, further supporting the robustness and predictive power of the model. The calculation of Q² for LMO-CV follows a similar procedure as that for LOO-CV (Eq. (7)). More detailed information about cross-validation techniques can be found elsewhere^70,71. Additionally, external validation is commonly employed in the model validation process considering train and test sets.

Results

Linear model construction

In our recent research, we examined 84 ternary systems comprising of I.L., thiophene, and hydrocarbon solvent. The results revealed a correlation between the mole fraction of thiophene in the hydrocarbon-rich phase and the mole fraction of thiophene in the I.L.-rich phase. In another investigation, we examined 51 ternary systems comprising of I.L., pyridine, and hydrocarbon solvent. The results revealed a correlation between the mole fraction of pyridine in the hydrocarbon-rich phase and the mole fraction of pyridine in the I.L.-rich phase. Thiophene is an aromatic heterocycle compound containing a sulfur atom and pyridine is an aromatic heterocycle compound containing a nitrogen atom.

In this study, by reviewing the literature, 112 ternary systems containing I.L., aromatic hydrocarbon (benzene in the current study), and aliphatic hydrocarbon were gathered. In total, 17 anions, 20 cations, and 12 aliphatic hydrocarbons were included in the dataset. For each ternary system, the relation between the benzene mole fraction in the I.L.-rich phase (Y₂) and the benzene mole fraction in the aliphatic hydrocarbon-rich phase (X₂) was calculated. It should be noted that several polynomial forms were examined in the fitting process and it was found that the linear relation between Y₂ and X₂ is more accurate in spite of its simplicity. The results of this analysis are tabulated in Table 3. As observed in Table 3, the R² for all correlations are greater than 0.85. Consequently, a strong correlation between X₂ and Y₂ was confirmed.

Table 3 The developed models using only X₂ variable (i.e., ${\text{Y}}_{2}\text{ = a }{\text{X}}_{2}\text{ } + \, {\text{b}}$) for each of the ternary systems.

Full size table

In order to provide a model that predicts the benzene distribution between the aliphatic hydrocarbon-rich and I.L.-rich phases, Eq. (12) was developed using all the data points in the dataset (see Table 4). In other words, a linear model was developed using X₂ to predict Y₂ values. Y₂ is the target variable and X₂ is the independent variable. As Table 3 shows, in each ternary system, anion, cation, and aliphatic hydrocarbon are different. Therefore, it was necessary to add appropriate molecular descriptors to the model from each of the three mentioned groups. In this regard, Eq. (13) was constructed using one anion descriptor and X₂, Eq. (14) was constructed using one cation descriptor and X₂, and Eq. (15) was developed using one aliphatic hydrocarbon descriptor and X₂. The statistical parameters of the developed models are presented in Table 4. As the statistical parameters of the developed models show, R² value for the developed model using only X₂ was 0.698. This value is equal to 0.861, 0.765, and 0.710 for the models that include X₂ and one anion descriptor, one cation descriptor, or one aliphatic hydrocarbon descriptor, respectively. These results show that the structure of anion and cation has a considerable impact on determining the Y₂ value; In contrast, the addition of the aliphatic hydrocarbon descriptor to the model had a negligible effect on increasing the accuracy of the constructed model. For further investigation, Eq. (16) was developed using X₂, an anion descriptor, and an aliphatic hydrocarbon descriptor. As the comparison of the statistical parameters of Eqs. (13) and (16) shows, there is almost no progress in the model estimation capability. Also, Eq. (17) was developed using X₂, a cation descriptor, and an aliphatic hydrocarbon descriptor. Similarly, the comparison of Eqs. (17) and (14) shows the insignificant effect of the addition of aliphatic hydrocarbon descriptor on the developed model. The reason can be due to the similarity of the structure of hydrocarbons in the present dataset. Therefore, the model was developed using only X₂, an anion descriptor, and a cation descriptor (Eq. 18).

Table 4 The constructed models taking into account all ternary systems in the main dataset and their statistical parameters.

Full size table

Plot (a) in Fig. 1 shows the Y₂ values predicted using Eq. (12) versus Y₂^Exp and plot (b) shows the Y₂ values predicted using Eq. (18) versus Y₂^Exp. The results clearly show that considering the anion and cation descriptors in the developed model increases the estimation capability of the constructed model.

In order to investigate the results of modeling using polynomials containing higher degrees of X₂, the second to fifth power of X₂ was also considered as independent variables along with X₂ and anion, cation and aliphatic hydrocarbon descriptors. The mentioned variables were given as input to QSARINS software. The outcome of this analysis confirmed that the inclusion of higher powers of X₂ do not improve the accuracy of the developed model. Therefore, the linear model seems to be the most appropriate model.

As plot (b) in Fig. 1 shows, the predicted Y₂ values are negative for some data points. By considering the X₂ = 0.15 as a turning point and constructing the model separately for X₂ range larger than 0.15 and lower than 0.15, this problem was solved (Please check plot (c) in Fig. 1). In addition, the comparison of the X₂ coefficient for the models developed using only the X₂ variable shows that in the range of 0–0.15, the Y₂ values have a stronger dependence on the X₂ value (see Eqs. (19) and (20)) in Table 4). Therefore, developing two linear models with the same variables for the range of 0 to 0.15 and 0.15 to 1 can upgrade the estimation capability of the model. In this regard, the model was initially created for the X₂ range of 0.15 to 1 by selecting suitable descriptors for both anions and cations. Subsequently, the determined variables were utilized to develop the model for the range of 0 to 0.15.

In the construction of QSPR models, identifying the appropriate number of descriptors used in the model is important. The breaking point plot is used as a valuable tool to determine the optimal number of descriptors. In this regard, models were developed using X₂ and anion or cation descriptors for the range of 0.15–1. Also, R² and Fisher function (F) values were determined for each constructed model. As a result, the effect of increasing the number of descriptors of each group on the estimation capability of the constructed model was determined. Considering the models developed using only X₂ and anion descriptors (see Fig. 2), only one anion descriptor was needed to develop the model. Moreover, an increment in the number of anion descriptors not only had a negligible effect on the accuracy of the model but also caused a considerable decrease in the F-value. In addition, the breaking point plot confirmed that increasing the number of cation descriptors does not result in a significant improvement in the accuracy of the model. Therefore, due to the effectiveness of the presence of cation descriptors in determining Y₂ values, using one cation descriptor in the model is sufficient. Finally, three variables (i.e., X₂, one anion descriptor, and one cation descriptor) were selected to develop the final model. Taking into account the X₂ = 0.15 as the turning point, the model was constructed for the range of 0.15 to 1 using three mentioned variables (Eq. (21)). Afterward, considering the points (0,0) and (0.15, Y₂ predicted by Eq. (21), the model for the range 0 to 0.15 was also developed (Eq. (22)). As a result, the final model consists of two linear models developed for 0 to 0.15 and 0.15 to 1 ranges.

The plot (c) in Fig. 1 shows the Y₂ values predicted using Eqs. (21) and (22) versus Y₂^Exp. A comparison between plots (b) and (c) in Fig. 1 reveals that the constructed model considering the turning point is not only more accurate compared to the model without the turning point, but also there is no negative predicted Y₂ value in the model.

Evaluation and validation of the constructed linear QSPR model

In addition to internal validation, external validation is also very important for the evaluation of the constructed QSPR models. In the external validation, most of the data are classified into train set to be used for the model development. The rest of the data are classified in the test set to investigate the validation of the constructed model. In this regard, 26% of the total data (i.e., 30 ternary systems which are identified in Table 3) including [BF₄] anion, [DCA] anion, [TfO] anion, [BMpyr] cation, [C₁₀MIM] cation, [N₄₁₁₁] cation, n-decane and n-dodecane were involved in the test set. The external validation of the model is more reliable due to the inclusion of three anions, two cations, and two aliphatic hydrocarbons exclusively in the test set. In other words, the test set contains four ternary systems in which all anion, cation, and aliphatic hydrocarbon structures are different considering the involved structures in the train set (i.e., 61^st, 62^nd, 66^th, and 67^th systems).

To highlight the benefits of considering the X₂ = 0.15 as a turning point, Eq. (23) was constructed utilizing the train set for X₂ values higher than 0.15, at first (see Table 5). The linear model for the X₂ range lower than 0.15 (Eq. (24)) was developed considering two points (i.e., (0,0) and (0.15, Y₂ predicted by Eq. (23)). The Y₂ values predicted using Eq. (23) or (24) for all data points can be found in the supporting information Excel file.

Table 5 Final developed QSPR models.

Full size table

Figure 3 shows William’s plot, residual plot, and Y₂ values predicted using Eq. (23) or (24) versus experimental Y₂ values for test and train sets. According to the William’s plot, no outlier is observed in the employed dataset. The outcome of statistical evaluation of the final QSPR model after train and test classification, are reported in Table 6.

Table 6 Statistical evaluation of the final constructed QSPR models.

Full size table

Non-linear model construction

As mentioned, GP and ANN methods were chosen to develop non-linear models. First, Eq. (25) was developed using the four mathematical operators of addition, subtraction, multiplication and division.

$$\begin{aligned} {\text{Y}}_{2} ~ & = ~0.01807 \times {\text{X}}_{2} - \left( { - ~{\text{X}}_{2} \times {\text{BELe3}} \times {\text{HTm}}^{2} + ~{\text{X}}_{2} \times {\text{BELe3}} + 4.732} \right) \\ & \;\;\; \times \left( {{\text{HTm}} - 2 \times {\text{X}}_{2} \times {\text{HTm}} + 6.718} \right) \times 3.819 \times 10^{{ - 5}} - 0.01807 \times ~{\text{X}}_{2} \\ & \;\;\; \times {\text{HTm}} - 0.01585 \times ~{\text{X}}_{2} ~ \times {\text{BELe3}}^{3} + ~{\text{X}}_{2} ~ \times ~\left( {{\text{BELe3}}^{2} + 3 \times {\text{BELe3}} - 2 \times ~{\text{X}}_{2} } \right) \\ & \;\;\; \times ~0.1119 + 0.009033 \times {\text{BELe3}}^{{\text{2}}} + 0.02907 \times {\text{X}}_{2} ~\left( {{\text{BELe3}} + {\text{HTm}}~ - {\text{X}}_{2} \times {\text{BELe3}} + 1.519} \right) \\ & \;\;\; + 0.009033 \times {\text{X}}_{2} ^{2} \times {\text{BELe3}} \times {\text{HTm}} + ~{\text{X}}_{2} ^{2} \times {\text{BELe3}} \times {\text{HTm}}~\left( {2 \times {\text{X}}_{2} - {\text{BELe3}}} \right) \times 0.01585 - 0.02417 \\ \end{aligned}$$

(25)

Afterward, the MLP network was employed to develop a non-linear ANN model. The network inputs were X₂ along with an anion descriptor and a cation descriptor selected by GA (see Fig. 4). For the hidden layers, the hyperbolic tangent sigmoid non-linear activation function was selected and for the output layer, the linear activation function was selected. Since the Scaled Conjugate Gradient learning algorithm resulted in the minimum MSE values, it was used to train the network. Finally, an MLP network containing one hidden layer with 5 neurons was selected as the most appropriate MLP network. The predicted Y₂ values using the GP and MLP models as well as the weights and biases of the MLP network can be found in the supporting information Excel file.

Evaluation and validation of the constructed non-linear QSPR models

The train and test sets for the non-linear models were considered the same as the linear model. The values of the predicted Y₂ versus experimental Y₂ and also the residual values versus experimental Y₂ for GP and ANN models are presented in Fig. 5. The statistical parameters of the final linear and non-linear constructed QSPR models for train and test sets are presented in Table 7.

Table 7 Statistical evaluation of the final linear and non-linear constructed QSPR models.

Full size table

In this study, the accuracy of the QSPR models was evaluated by computing and presenting the statistical parameters of the developed models which are tabulated in Table 8. A comparison between the statistical parameters of Eq. (12) and Eq. (18) reveals that considering the anion and cation descriptors in the model development improves the model accuracy significantly. Additionally, a comparison between the statistical parameters of Eq. (18) with Eqs. (21) and (22) confirms that considering the turning point in the model development increases the estimation capability of the model. On the other hand, the comparison of statistical parameters of non-linear models with the final linear model (i.e., Eq. (23) or (24)) shows that although the prediction accuracy of the model has increased slightly, this increase is not significant. One reason can be the variety of the anions and cations in the dataset. The more important reason that seems to be the almost linear subordination of Y₂ to X₂ and the selected descriptors. Therefore, compared to non-linear models, the developed linear model has a favorable prediction accuracy along with its simplicity. Therefore, the linear model was considered a more favorable model due to its less complexity and applicability in the interpretation of descriptors.

Table 8 Comparison of the statistical parameters of the developed QSPR models for the main set.

Full size table

Discussion

In the development of the QSPR model, it was found that the effect of the aliphatic hydrocarbon structure on the estimation of Y₂ value is insignificant. Therefore, the aliphatic hydrocarbon descriptor was not considered in the developed final model. The name and structure of all aliphatic hydrocarbons are presented in Table 9.

Table 9 The name, structure, and chemical formula for all aliphatic hydrocarbons involved in the dataset.

Full size table

After developing the QSPR model, the analysis of the chosen descriptors within the model helps to find their possible importance in estimating the target variable (Y₂). In this regard, anion and cation descriptors that appeared in the model have been introduced and investigated.

Interpretation of the anion descriptor

HTm descriptor was GA-selected anion descriptor in the final QSPR model. HTm is a GETAWAY (GEometry, Topology, and Atom-Weights AssemblY) descriptor. GETAWAY descriptors attempt to match the 3D molecular geometry provided by the molecular influence (or leverage) matrix and atom connectivity using chemical information, topology, and various atomic weighting schemes⁷². Considering the matrix type, GETAWAY descriptor can be classified into two categories: influence/distance matrix R (R-GETAWAY) and molecular influence matrix H (H-GETAWAY). The matrix type for calculating the HTm descriptor is the influence matrix H and it is weighted by atomic masses. As a symmetric matrix, the molecular influence matrix (H) is defined by Eq. (26)⁷³:

$${\text{H}}~ = ~{\text{M}}~ \times ~({\text{M}}^{{\text{T}}} ~ \times ~{\text{M}})^{{ - 1}} ~ \times ~{\text{M}}^{{\text{T}}}$$

(26)

H is a symmetric A × A matrix, with the M matrix comprising three columns representing the central Cartesian atomic coordinates (x, y, z), and the A row representing the atoms of a molecule. The calculations are performed on the H-filled molecular graph. The superscript T is the transposed of the M matrix. Diagonal elements in the H matrix (i.e., h_ii) are called leverage and are determined as follows:

$${0 } \le {\text{ h}}_{{{\text{ii}}}} { } \le { 1,}\;\;{ } - {1 } \le {\text{ h}}_{{{\text{ij}}}} { } \le { 1 }$$

(27)

$$\mathop \sum \limits_{{\text{i = 1}}}^{{\text{A}}} {\text{h}}_{{{\text{ii}}}} {\text{ = D}}$$

(28)

$$\mathop \sum \limits_{{\text{j = 1}}}^{{\text{A}}} {\text{h}}_{{{\text{ij}}}} { = 0}$$

(29)

$$\overline{{\text{h }}} { = }\frac{{\text{D}}}{{\text{A}}}$$

(30)

where h is the element of the H matrix, the average value of the diagonal terms in the H matrix is represented by $\overline{{\text{h}}}$, while the rank of the M matrix is denoted by D. For linear molecules, D is equal to 1, for planar molecules it is 2, and for 3D-molecules it is 3. It is important to note that the molecular influence matrix remains unchanged regardless of rotation.

Leverages (h_ii) represent the contribution or influence of each atom in determining the shape of the molecule. Mantle atoms in the structure of molecules always have higher leverage values compared to the atoms near the center of the molecule. In addition, the size and shape of the molecule are effective in determining the maximum leverage values. In spherical molecular structures, the leverage values are lower compared to the linear molecular structures. The maximum leverage decreases as the number of atoms (i.e., the molecular size) increases in the molecules that have almost the same shape. Consequently, the HTm descriptor is particularly useful for studying the molecular systems where atom connectivity and the spatial distribution of atomic masses have a significant effect on the molecular behavior, such as in the extraction processes. This molecular descriptor is capable to reflect the geometrical positioning of atoms, along with their influence on the molecular interactions, provides a more accurate representation of the overall molecular structure.

The off-diagonal elements (h_ij) reflect the ability of atom j to interact with atom i. The positive values of h_ij indicate that atoms i and j are located in the same molecular region and the probability of atomic interactions between them is higher. On the other hand, the negative values of h_ij show that the mentioned atoms are located in the molecular region opposite to the center of the molecule and the probability of atomic interactions between them is low.

The coefficient of HTm is positive in Eq. (23) or (24) which shows that increasing the HTm values of the anion structures improves the benzene extraction. The HTm descriptor values are presented in Table 10. The bond length and the number of bonds are effective in determining the leverage value. Therefore, between the atoms close to the surface of the anion, the larger atoms (which are chlorine, sulfur, fluorine, oxygen, and nitrogen atoms, respectively), have larger leverage values. Therefore, the presence of electronegative atoms near the anion surface can increase the interaction between the hydrogen atom in benzene and the electronegative atom in anion. Additionally, the size and shape of anion structure is effective in determining HTm descriptor values. Therefore, the HTm descriptor’s dependence on the atomic masses and the molecular topology makes it a key factor in evaluating the intermolecular interactions, especially in systems where the electron density and spatial arrangement play pivotal roles. In the benzene extraction process, these interactions are crucial because they directly affect the efficiency of the phase transfer and separation processes. Therefore, the HTm descriptor, through its leverage of the atomic masses and 3D structural information, offers deeper insight into the anion’s capacity to interact with benzene molecules, especially in I.L. systems where the complex molecular behavior is involved.

Table 10 Details of anions involved in the current dataset.

Full size table

Interpretation of the cation descriptor

BELe3 descriptor was GA-chosen cation descriptor in the QSPR model. BELe3 is Burden eigenvalues descriptor that is weighted by atomic Sanderson electronegativities. In order to calculate the Burden eigenvalues, Burden matrix (B) is calculated. Burden matrix which is calculated based on the hydrogen-depleted graph is defined as Eq. (31)⁷⁴.

$$\left[ B \right]_{{ij}} = \left\{ {\begin{array}{*{20}l} {\pi _{{ij}}^{*} ~.~~10^{{ - 1}} } \hfill & {~if} \hfill & {~\left( {i,j} \right) \in E} \hfill \\ {Z_{i} ~} \hfill & {~if} \hfill & {~i = j} \hfill \\ {0.001~} \hfill & {~if} \hfill & {\left( {i,j} \right) \notin E~} \hfill \\ \end{array} } \right.$$

(31)

where Z_i is the atomic number of atoms i, and E is the edges of the graph. The conventional bond order, denoted as ${\pi }_{ij}^{*}$, represents different values for various types of bonds. For instance, it takes on values of 0.1, 0.15, 0.2, 0.3, and 0.01 for single, aromatic, double, triple, and terminal bonds, respectively.

The Burden eigenvalues reflect the molecular connectivity and help to predict the interaction strength in the solvent systems. Higher eigenvalues can suggest stronger interactions between the cation and benzene, enhancing the extraction efficiency. In addition, the BELe3 descriptor is particularly relevant for evaluating the interactions in I.L. systems due to its sensitivity to the electronic environment of the cation. Table 11 illustrates that the BELe3 descriptor values is increased by an increase in the length of the alkyl side chain in the cation structure. It may be due to the increase in the number of single bonds. As the alkyl side chain length increases, not only the number of single bonds increase, but also the steric effects of the cation play critical role in the molecular interactions. Eq. (23) or (24) show the positive effect of increasing BELe3 values on the benzene extraction. Therefore, a longer alkyl chain may enhance the benzene extraction. It should be noted that as the cation size increases, the Coulombic interaction between the anion and the cation decreases. Consequently, the CH-π interaction between the cation and benzene increases. This enhancement is particularly crucial in the I.L. systems, where these interactions can significantly influence the solubility and extraction outcomes. Moreover, the electronegativity of the atoms in the structure of the cations is effective in determining the Y₂ value. It seems that the impact of electronegativity in this context underscores the significance of cation structure design in optimizing the extraction efficiency, as it affects the overall interaction energy and stability of the I.L. with benzene.

Table 11 Details of cations involved in the current dataset.

Full size table

Conclusion

In the present study, after collecting an extensive dataset including 112 ternary systems containing I.L., benzene, and aliphatic hydrocarbon, new linear and non-linear QSPR models were developed to estimate the benzene distribution between the aliphatic hydrocarbon-rich and I.L.-rich phases. For this purpose, the existence of correlation between X₂ and Y₂ values was confirmed for each of the ternary systems, at first. Afterward, considering the anion, cation, and aliphatic hydrocarbon molecular descriptors in the model, it was found that the aliphatic hydrocarbon descriptor has a negligible effect in predicting Y₂ values. Consequently, only cation and anion molecular descriptors were considered in the constructed model. It was also found that considering the X₂ = 0.15 as a turning point is effective in increasing the estimation capability of the constructed model. Moreover, the breaking point plot showed that only one anion descriptor and one cation descriptor were sufficient for the development of the model. As a result, the final linear model was constructed using X₂, one anion descriptor, and one cation descriptor considering the turning point. Internal and external validation confirmed that the linear constructed QSPR model can estimate Y₂ with reasonable accuracy. GP and MLP machine learning method were employed for developing non-linear QSPR models. The results showed that the prediction accuracy for the non-linear models is only slightly better compared to the linear model. Therefore, the linear model was preferred in this study due to its simplicity and ease of interpretation.

Interpretation of the GA-chosen descriptors in the constructed model revealed that the size of the cation and anion, the electronegativity of the atoms in the cation structure, the shape of anion, and the presence of electronegative atoms near the surface of the anions are effective in predicting Y₂ values. The QSPR-developed model is an applicable tool to predict Y₂ values for unstudied ternary systems (only 112 ternary systems from 4080 synthesizable ternary systems using 17 anions, 20 cations, and 12 aliphatic hydrocarbons have been studied). The outcome of the present study is a step forward in the development of a computational approach for solvent screening in the extractive removal of benzene from fuel mixtures containing aromatic/aliphatic compounds using I.Ls as green solvents.

Data availability

It should be justified that “All data generated or analysed during this study are included in this published article [and its supplementary information files]”.

Abbreviations

C₂MIM:: 1-Ethyl-3-methyl imidazolium
C₃MIM:: 1-Propyl-3-methylimidazolium
C₄MIM:: 1-Butyl-3-methyl imidazolium
C₅MIM:: 1-Pentyl-3-methylimidazolium
C₆MIM:: 1-Hexyl-3-methylimidazolium
C₈MIM:: 1-Methyl-3-octyl imidazolium
C₁₀MIM:: 1-Decyl-3-methylimidazolium
C₁₂MIM:: 1-Dodecyl-3-methylimidazolium
COC₂MIM:: 1-(2-Methoxyethyl)-3-methylimidazolium
BMPyr:: 1-Butyl-1-methylpyrrolidinium
EMPY:: 1-Ethyl-3-methyl pyridinium
EPY:: 1-Ethylpyridinium
BPY:: 1-Butylpyridinium
HPY:: 1-Hexylpyridinium
N₄₁₁₁ :: Butyltrimethylammonium
N₄₄₄₁ :: Tributylmethylammonium
C₂ :: Ethyl-(2-hydroxy-ethyl)-dimethyl-ammonium
TEMA:: (2-Hydroxyethyl) methylammonium
Pre:: Predicted data
Exp:: Experimental data
SCN:: Thiocyanate
MeSO₄ :: Methyl sulfate
OAc:: Acetate
DEP:: Diethyl phosphate
DCA:: Dicyanamide
MP:: Methyl phosphonate
OTf:: Trifluoromethanesulfonate
EtSO₄ :: Ethyl sulfate
OcSO₄ :: Octyl sulfate
MDEGSO₄ :: Diethylenglycol monomethyl ether sulfate
BF₄ :: Tetrafluoroborate
NTF₂ :: Bis(trifluoromethylsulfonyl)imide
Cl:: Chloride
AlCl₄ :: Aluminum tetrachloride
FeCl₄ :: Tetrachloroferrate
NO₃ :: Nitrate
PF₆ :: Hexafluorophosphate
Cat:: Cation descriptor
Ani:: Anion descriptor
HC:: Aliphatic hydrocarbon descriptor

References

García, S., Larriba, M., Casas, A., García, J. & Rodríguez, F. Separation of toluene and heptane by liquid-liquid extraction using binary mixtures of the ionic liquids 1-butyl-4-methylpyridinium bis(trifluoromethylsulfonyl)imide and 1-ethyl-3-methylimidazolium ethylsulfate. J. Chem. Eng. Data 57(9), 2472–2478. https://doi.org/10.1021/je300635c (2012).
Article CAS MATH Google Scholar
Rodriguez, N. R., Requejo, P. F. & Kroon, M. C. Aliphatic–aromatic separation using deep eutectic solvents as extracting agents. Ind. Eng. Chem. Res. 54(45), 11404–11412. https://doi.org/10.1021/acs.iecr.5b02611 (2015).
Article CAS Google Scholar
Requejo, P. F., Gómez, E., Calvar, N. & Domínguez, Á. Application of pyrrolidinium-based ionic liquid as solvent for the liquid extraction of benzene from its mixtures with aliphatic hydrocarbons. Ind. Eng. Chem. Res. 54(4), 1342–1349. https://doi.org/10.1021/ie5040489 (2015).
Article CAS Google Scholar
Ayuso, M. et al. Separation of benzene from methylcycloalkanes by extractive distillation with cyano-based ionic liquids: Experimental and CPA EoS modelling. Sep. Purif. Technol. 234, 116128. https://doi.org/10.1016/j.seppur.2019.116128 (2020).
Article CAS MATH Google Scholar
Ángeles, D., Begoña, G., Patricia, F. R. & Sandra, C. Extraction of aromatic compounds from their mixtures with alkanes: from ternary to quaternary (or higher) systems. In Solvents, Ionic Liquids and Solvent Effects (eds. Daniel, G.-M. & Magdalena, M.) (IntechOpen, 2019).
Lyu, Y., Brennecke, J. F. & Stadtherr, M. A. Review of recent aromatic–aliphatic–ionic liquid ternary liquid-liquid equilibria and their modeling by COSMO-RS. Ind. Eng. Chem. Res. 59(19), 8871–8893. https://doi.org/10.1021/acs.iecr.0c00581 (2020).
Article CAS MATH Google Scholar
Meindersma, G. W. & de Haan, A. B. Conceptual process design for aromatic/aliphatic separation with ionic liquids. Chem. Eng. Res. Des. 86(7), 745–752. https://doi.org/10.1016/j.cherd.2008.02.016 (2008).
Article CAS Google Scholar
Hadj-Kali, M. K., Salleh, Z., Ali, E., Khan, R. & Hashim, M. A. Separation of aromatic and aliphatic hydrocarbons using deep eutectic solvents: a critical review. Fluid Phase Equilib. 448, 152–167. https://doi.org/10.1016/j.fluid.2017.05.011 (2017).
Article CAS MATH Google Scholar
Requejo, P. F., Calvar, N., Domínguez, Á. & Gómez, E. Comparative study of the LLE of the quaternary and ternary systems involving benzene, n-octane, n-decane and the ionic liquid [BMpyr][NTf2]. J. Chem. Thermodyn. 98, 56–61. https://doi.org/10.1016/j.jct.2016.02.027 (2016).
Article ADS CAS Google Scholar
Kȩdra-Królik, K., Fabrice, M. & Jaubert, J.-N. Extraction of thiophene or pyridine from n-heptane using ionic liquids. Gasoline and diesel desulfurization. Ind. Eng. Chem. Res. 50(4), 2296–2306. https://doi.org/10.1021/ie101834m (2011).
Domínguez, I., Requejo, P. F., Canosa, J. & Domínguez, Á. (Liquid+liquid) equilibrium at T=298.15K for ternary mixtures of alkane+aromatic compounds+imidazolium-based ionic liquids. J. Chem. Thermodyn. 74, 138–143. https://doi.org/10.1016/j.jct.2014.01.022 (2014).
Article ADS CAS MATH Google Scholar
González, E. J., Calvar, N., Gómez, E. & Domínguez, Á. Separation of benzene from alkanes using 1-ethyl-3-methylpyridinium ethylsulfate ionic liquid at several temperatures and atmospheric pressure: Effect of the size of the aliphatic hydrocarbons. J. Chem. Thermodyn. 42(1), 104–109. https://doi.org/10.1016/j.jct.2009.07.017 (2010).
Article ADS CAS Google Scholar
Arce, A., Earle, M. J., Rodríguez, H. & Seddon, K. R. Separation of benzene and hexane by solvent extraction with 1-Alkyl-3-methylimidazolium bis{(trifluoromethyl)sulfonyl}amide ionic liquids: effect of the alkyl-substituent length. J. Phys. Chem. B 111(18), 4732–4736. https://doi.org/10.1021/jp066377u (2007).
Article CAS PubMed Google Scholar
Zhou, T. et al. Evaluation of the ionic liquids 1-alkyl-3-methylimidazolium hexafluorophosphate as a solvent for the extraction of benzene from cyclohexane: (Liquid+liquid) equilibria. J. Chem. Thermodyn. 48, 145–149. https://doi.org/10.1016/j.jct.2011.12.006 (2012).
Article ADS CAS MATH Google Scholar
Gómez, E., Domínguez, I., Calvar, N. & Domínguez, Á. Separation of benzene from alkanes by solvent extraction with 1-ethylpyridinium ethylsulfate ionic liquid. J. Chem. Thermodyn. 42(10), 1234–1239. https://doi.org/10.1016/j.jct.2010.04.022 (2010).
Article ADS CAS MATH Google Scholar
Requejo, P. F., Calvar, N., Gómez, E. & Domínguez, Á. Study of the suitability of two ammonium-based ionic liquids for the extraction of benzene from its mixtures with aliphatic hydrocarbons. Fluid Phase Equilib. 426, 17–24. https://doi.org/10.1016/j.fluid.2016.02.006 (2016).
Article CAS Google Scholar
Gómez, E., Domínguez, I., Calvar, N., Palomar, J. & Domínguez, Á. Experimental data, correlation and prediction of the extraction of benzene from cyclic hydrocarbons using [Epy][ESO4] ionic liquid. Fluid Phase Equilib. 361, 83–92. https://doi.org/10.1016/j.fluid.2013.10.033 (2014).
Article CAS MATH Google Scholar
Peng, D., Horvat, D. P. & Picchioni, F. Computer-aided ionic liquid design and experimental validation for benzene-cyclohexane separation. Ind. Eng. Chem. Res. 60(13), 4951–4961. https://doi.org/10.1021/acs.iecr.0c05935 (2021).
Article CAS MATH Google Scholar
Yousefinejad, S. & Hemmateenejad, B. Chemometrics tools in QSAR/QSPR studies: a historical perspective. Chemom. Intell. Lab. Syst. 149, 177–204. https://doi.org/10.1016/j.chemolab.2015.06.016 (2015).
Article CAS MATH Google Scholar
Katritzky, A. R., Pacureanu, L., Dobchev, D. & Karelson, M. QSPR study of critical micelle concentration of anionic surfactants using computational molecular descriptors (in eng). J. Chem. Inf. Model. 47(3), 782–93. https://doi.org/10.1021/ci600462d (2007).
Tropsha, A. Best practices for QSAR model development, validation, and exploitation. Mol. Inf. 29(6–7), 476–488. https://doi.org/10.1002/minf.201000061 (2010).
Article CAS MATH Google Scholar
Gorji, A. E. & Sobati, M. A. Toward molecular modeling of thiophene distribution between the ionic liquid and hydrocarbon phases: Effect of hydrocarbon structure. J. Mol. Liq. 287, 110976. https://doi.org/10.1016/j.molliq.2019.110976 (2019).
Article CAS Google Scholar
A. E. Gorji and M. A. Sobati, How anion structures can affect the thiophene distribution between imidazolium-based ionic liquid and hydrocarbon phases? A theoretical QSPR study. Energy Fuels 33(9), 8576–8587. https://doi.org/10.1021/acs.energyfuels.9b02416 (2019).
Nekoeinia, M., Yousefinejad, S. & Abdollahi, A. Prediction of E T N polarity scale of ionic liquids using a QSPR approach. Ind. Eng. Chem. Res. 54, 12682–12689. https://doi.org/10.1021/acs.iecr.5b02982 (2015).
Article CAS Google Scholar
Peng, D., Kleiweg, A.-J., Winkelman, J. G. M., Song, Z. & Picchioni, F. A hierarchical hybrid method for screening ionic liquid solvents for extractions exemplified by the extractive desulfurization process. ACS Sustain. Chem. Eng. 9(7), 2705–2716. https://doi.org/10.1021/acssuschemeng.0c07866 (2021).
Article CAS Google Scholar
Gorji, A. E., Sobati, M. A., Alopaeus, V. & Uusi-Kyyny, P. Toward solvent screening in the extractive desulfurization using ionic liquids: QSPR modeling and experimental validations. Fuel 302, 121159. https://doi.org/10.1016/j.fuel.2021.121159 (2021).
Article CAS Google Scholar
Amereh, M., Ebrahimpoor Gorji, A. & Sobati, M. A. Toward solvent selection for the extractive removal of pyridine from fuels using ionic liquids: a QSPR study. Fuel 343, 127820. https://doi.org/10.1016/j.fuel.2023.127820 (2023).
González, E. J., Calvar, N., Gómez, E. & Domínguez, Á. Separation of benzene from linear alkanes (C6–C9) Using 1-Ethyl-3-methylimidazolium ethylsulfate at T = 298.15 K. J. Chem. Eng. Data 55(9), 3422–3427. https://doi.org/10.1021/je1001544 (2010).
Article CAS Google Scholar
González, E. J., Calvar, N., González, B. & Domínguez, Á. (Liquid+liquid) equilibria for ternary mixtures of (alkane+benzene+[EMpy] [ESO4]) at several temperatures and atmospheric pressure. J. Chem. Thermodyn. 41(11), 1215–1221. https://doi.org/10.1016/j.jct.2009.05.008 (2009).
Article ADS CAS MATH Google Scholar
Zhang, F. et al. Benzyl- and vinyl-functionalized imidazoium ionic liquids for selective separating aromatic hydrocarbons from alkanes. Ind. Eng. Chem. Res. 55(3), 747–756. https://doi.org/10.1021/acs.iecr.5b03814 (2016).
Domínguez, I., González, E. J., González, R. & Domínguez, Á. Evaluation of [C3mim][NTf2] as solvent for the liquid-liquid extraction of benzene from mixtures of benzene and hexane. Sep. Sci. Technol. 47(2), 331–336. https://doi.org/10.1080/01496395.2011.621161 (2012).
Article CAS MATH Google Scholar
Lee, K.-H., You, S.-H. & Park, S.-J. The selectivity of imidazolium-based ionic liquids with different anions to BTX aromatics in hexane at 298.15 K and atmospheric pressure. Korean J. Chem. Eng. 33(10), 2982–2989. https://doi.org/10.1007/s11814-016-0140-4 (2016).
Article CAS MATH Google Scholar
Domańska, U., Pobudkowska, A. & Królikowski, M. Separation of aromatic hydrocarbons from alkanes using ammonium ionic liquid C2NTf2 at T=298.15K. Fluid Phase Equilibria 259(2), 173–179. https://doi.org/10.1016/j.fluid.2007.06.025 (2007).
Article CAS Google Scholar
Sakal, S. A., Lu, Y.-Z., Jiang, X.-C., Shen, C. & Li, C.-X. A Promising ionic liquid [BMIM][FeCl4] for the extractive separation of aromatic and aliphatic hydrocarbons. J. Chem. Eng. Data 59(3), 533–539. https://doi.org/10.1021/je400076x (2014).
Article CAS Google Scholar
Wiśniewski, P. et al. Effect of the ionic liquids on extraction of aromatic and sulfur compounds from the model petrochemical stream. Fluid Phase Equilibria 552, 113296. https://doi.org/10.1016/j.fluid.2021.113296 (2022).
Article CAS MATH Google Scholar
Manohar, C. V., Rabari, D., Kumar, A. A. P., Banerjee, T. & Mohanty, K. Liquid–liquid equilibria studies on ammonium and phosphonium based ionic liquid–aromatic–aliphatic component at T=298.15K and p=1bar: correlations and a-priori predictions. Fluid Phase Equilibria 360, 392–400. https://doi.org/10.1016/j.fluid.2013.10.005 (2013).
Article CAS Google Scholar
García, J., Fernández, A., Torrecilla, J. S., Oliet, M. & Rodríguez, F. Ternary liquid−liquid equilibria measurement for hexane and benzene with the ionic liquid 1-Butyl-3-methylimidazolium methylsulfate at T = (298.2, 313.2, and 328.2) K. J. Chem. Eng. Data 55(1), 258–261. https://doi.org/10.1021/je900321j (2010).
Article CAS Google Scholar
Mokhtarani, B. et al. Ternary (liquid–liquid) equilibria of nitrate based ionic liquid+alkane+benzene at 298.15K: experiments and correlation. Fluid Phase Equilibria 341, 35–41. https://doi.org/10.1016/j.fluid.2012.12.025 (2013).
Article CAS Google Scholar
Hashim, M. A., Zulhaziman, M., Salleh, M., Ali, E. & Hadj-Kali, M. K. Selective extraction of benzene from benzene–cyclohexane mixture using 1-ethyl-3-methylimidazolium tetrafluoroborate ionic liquid. AIP Conf. Proc. 2124(1), 020028. https://doi.org/10.1063/1.5117088 (2019).
Article CAS Google Scholar
Domínguez, I., Calvar, N., Gómez, E. A. & Domínguez, Á. Separation of benzene from heptane using tree ionic liquids: BMimMSO4, BMimNTf2, and PMimNTf2. Procedia Eng. 42, 1597–1605. https://doi.org/10.1016/J.PROENG.2012.07.553 (2012).
Article MATH Google Scholar
Pereiro, A. B. & Rodríguez, A. An ionic liquid proposed as solvent in aromatic hydrocarbon separation by liquid extraction. AIChE J. 56(2), 381–386. https://doi.org/10.1002/aic.11937 (2010).
Article ADS CAS MATH Google Scholar
Revelli, A.-L., Mutelet, F. & Jaubert, J.-N. Extraction of benzene or thiophene from n-heptane using ionic liquids. NMR and thermodynamic study. J. Phys. Chem. B 114(13), 4600–4608. https://doi.org/10.1021/jp911978a (2010).
Article CAS PubMed Google Scholar
Letcher, T. & Reddy, P. Ternary (liquid + liquid) equilibria for mixtures of 1-hexyl-3-methylimidazolium (tetrafluoroborate or hexafluorophosphate) + benzene + an alkane at T=298.2 K and p=0.1 MPa. J. Chem. Thermodyn. 37, 415–421. https://doi.org/10.1016/j.jct.2004.05.001 (2005).
Article ADS CAS Google Scholar
Enayati, M., Mokhtarani, B., Sharifi, A. & Mirzaei, M. Extraction of benzene from heptane with pyridinium based ionic liquid at (298.15, 308.15 and 318.15) K. Fluid Phase Equilibria 411, 53–58. https://doi.org/10.1016/j.fluid.2015.12.009 (2016).
Article CAS Google Scholar
Deenadayalu, N., Ngcongo, K. C., Letcher, T. M. & Ramjugernath, D. Liquid−liquid equilibria for ternary mixtures (an ionic liquid + benzene + heptane or hexadecane) at T = 298.2 K and atmospheric pressure. J. Chem. Eng. Data 51(3), 988–991. https://doi.org/10.1021/je050494l (2006).
Article CAS Google Scholar
Gómez, E., Domínguez, I., González, B. & Domínguez, Á. Liquid−liquid equilibria of the ternary systems of alkane + aromatic + 1-ethylpyridinium ethylsulfate ionic liquid at T = (283.15 and 298.15) K. J. Chem. Eng. Data 55(11), 5169–5175. https://doi.org/10.1021/je100716c (2010).
Article CAS MATH Google Scholar
Requejo, P. F., Calvar, N., Domínguez, Á. & Gómez, E. Application of the ionic liquid tributylmethylammonium bis(trifluoromethylsulfonyl)imide as solvent for the extraction of benzene from octane and decane at T = 298.15 K and atmospheric pressure. Fluid Phase Equilibria 417, 137–143. https://doi.org/10.1016/J.FLUID.2016.02.028 (2016).
Article CAS Google Scholar
Domínguez, I., González, E. J., González, R. & Domínguez, Á. Extraction of benzene from aliphatic compounds using commercial ionic liquids as solvents: study of the liquid-liquid equilibrium at T = 298.15 K. J. Chem. Eng. Data 56(8), 3376–3383. https://doi.org/10.1021/je200334e (2011).
Article CAS MATH Google Scholar
Requejo, P. F., Calvar, N., Domínguez, Á. & Gómez, E. Determination and correlation of (liquid+liquid) equilibria of ternary and quaternary systems with octane, decane, benzene and [BMpyr][DCA] at T=298.15K and atmospheric pressure. J. Chem. Thermodyn. 94, 197–203. https://doi.org/10.1016/j.jct.2015.11.016 (2016).
Article ADS CAS Google Scholar
Calvar, N., Domínguez, I., Gómez, E., Palomar, J. & Domínguez, Á. Evaluation of ionic liquids as solvent for aromatic extraction: Experimental, correlation and COSMO-RS predictions. J. Chem. Thermodyn. 67, 5–12. https://doi.org/10.1016/j.jct.2013.07.011 (2013).
Article ADS CAS MATH Google Scholar
Letcher, T. M. & Naicker, P. K. Ternary liquid−liquid equilibria for mixtures of an n-alkane + an aromatic hydrocarbon + N-Methyl-2-pyrrolidone at 298.2 K and 1 atm. J. Chem. Eng. Data 43(6), 1034–1038. https://doi.org/10.1021/je980114e (1998).
Article CAS Google Scholar
González, E. J., Calvar, N., González, B. & Domínguez, Á. Liquid extraction of benzene from its mixtures using 1-ethyl-3-methylimidazolium ethylsulfate as a solvent. J. Chem. Eng. Data 55(11), 4931–4936. https://doi.org/10.1021/je100508y (2010).
Article CAS MATH Google Scholar
Salleh, M. Z. M., Hadj-Kali, M. K., Hashim, M. A. & Mulyono, S. Ionic liquids for the separation of benzene and cyclohexane – COSMO-RS screening and experimental validation. J. Mol. Liquids 266, 51–61. https://doi.org/10.1016/j.molliq.2018.06.034 (2018).
Article CAS Google Scholar
Zhou, T. et al. Deep separation of benzene from cyclohexane by liquid extraction using ionic liquids as the solvent. Ind. Eng. Chem. Res. 51(15), 5559–5564. https://doi.org/10.1021/ie202728j (2012).
Article CAS MATH Google Scholar
Ismail, M., Bustam, M. A. & Man, Z. Extraction of benzene and cyclohexane using [BMIM][N(CN)<SUB>2</SUB>] and their equilibrium modelling. Adv. Mater. Sustaina. Growth 1901, 080001, https://doi.org/10.1063/1.5010516 (2017).
Wang, R., Wang, J., Meng, H., Li, C. & Wang, Z. Liquid−liquid equilibria for benzene + cyclohexane + 1-methyl-3-methylimidazolium dimethylphosphate or + 1-ethyl-3-methylimidazolium diethylphosphate. J. Chem. Eng. Data 53(5), 1159–1162. https://doi.org/10.1021/je700759h (2008).
Article CAS MATH Google Scholar
Domínguez, I., Calvar, N., Gómez, E. & Domínguez, Á. Liquid–liquid extraction of aromatic compounds from cycloalkanes using 1-butyl-3-methylimidazolium methylsulfate ionic liquid. J. Chem. Eng. Data 58(2), 189–196. https://doi.org/10.1021/je300826t (2013).
Article CAS MATH Google Scholar
Lyu, Z. et al. Simulation based ionic liquid screening for benzene–cyclohexane extractive separation. Chem. Eng. Sci. 113, 45–53. https://doi.org/10.1016/j.ces.2014.04.011 (2014).
Article CAS Google Scholar
Salleh, M. Z. M., Hadj-Kali, M. K., Hashim, M. A. & Ali, E. Extractive separation of benzene and cyclohexane using 1-butyl-3-methylimidazolium acetate. IOP Conf. Ser.: Mater. Sci. Eng. 458(1), 012067. https://doi.org/10.1088/1757-899X/458/1/012067 (2018).
Article Google Scholar
González, E. J., Calvar, N., González, B. & Domínguez, Á. Measurement and correlation of liquid–liquid equilibria for ternary systems cyclooctane+aromatic hydrocarbon+1-ethyl-3-methylpyridinium ethylsulfate at T=298.15K and atmospheric pressure. Fluid Phase Equilibria 291(1), 59–65. https://doi.org/10.1016/j.fluid.2009.12.019 (2010).
Article CAS MATH Google Scholar
Domínguez, I., González, E. J. & Domínguez, Á. Liquid extraction of aromatic/cyclic aliphatic hydrocarbon mixtures using ionic liquids as solvent: Literature review and new experimental LLE data. Fuel Process. Technol. 125, 207–216. https://doi.org/10.1016/j.fuproc.2014.04.001 (2014).
Article CAS MATH Google Scholar
Mitchell, M. An Introduction to Genetic Algorithms (The MIT Press, 1996). https://doi.org/10.7551/mitpress/3927.001.0001.
Haupt, R. L. & Haupt, S. E. Practical Genetic Algorithms (Wiley InterScience electronic collection) (Wiley, 2004).
Goldberg, D. E. Genetic Algorithms in Search Optimization and Machine Learning (. Addison-Wesley Longman Publishing Co., 1989).
MATH Google Scholar
Whitley, D. A Genetic Algorithm Tutorial, Statistics and Computing, vol. 4 (Springer, 1998). https://doi.org/10.1007/BF00175354.
Gramatica, P. Principles of QSAR modeling: comments and suggestions from personal experience. Int. J. Quant. Struct.-Property Relationships (IJQSPR) 5(3), 61–97. https://doi.org/10.4018/IJQSPR.20200701.oa1 (2020).
Article Google Scholar
Gramatica, P. & Sangion, A. A historical excursus on the statistical validation parameters for QSAR models: a clarification concerning metrics and terminology. J. Chem. Inf. Model. 56(6), 1127–1131. https://doi.org/10.1021/acs.jcim.6b00088 (2016).
Article CAS PubMed Google Scholar
Gramatica, P., Chirico, N., Papa, E., Cassani, S. & Kovarich, S. QSARINS: A new software for the development, analysis, and validation of QSAR MLR models. J. Comput. Chem. 34(24), 2121–2132. https://doi.org/10.1002/jcc.23361 (2013).
Article CAS Google Scholar
Ghomisheh, Z., Gorji, A. E. & Sobati, M. A. Prediction of critical properties of sulfur-containing compounds: new QSPR models. J. Mol. Graph. Model. 101, 107700. https://doi.org/10.1016/j.jmgm.2020.107700 (2020).
Article CAS PubMed Google Scholar
Xu, Q.-S. & Liang, Y.-Z. Monte Carlo cross validation. Chemometr. Intel. Lab. Syst. 56(1), 1–11. https://doi.org/10.1016/S0169-7439(00)00122-2 (2001).
Article CAS MATH Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer series in statistics) (Springer, 2001).
Gackowski, M. et al. Application of multivariate adaptive regression splines (MARSplines) for predicting antitumor activity of anthrapyrazole derivatives. Int. J. Mol. Sci. 23, 9. https://doi.org/10.3390/ijms23095132.
Leszczynski, J. Handbook of Computational Chemistry (Handbook of Computational Chemistry) (Springer, 2012).
Todeschini, R., Consonni, V., Mannhold, R., Kubinyi, H. & Timmerman, H. Handbook of Molecular Descriptors (Methods & Principles in Medicinal Chemistry) (Wiley, 2008).

Download references

Acknowledgements

The authors sincerely acknowledge Prof. Paola Grammatica, senior professor of the University of Insubria for providing the free license of the QSARINS software.

Author information

Authors and Affiliations

School of Chemical Engineering, Iran University of Science and Technology (IUST), Tehran, Iran
Mahdieh Amereh, Ali Ebrahimpoor Gorji & Mohammad Amin Sobati

Authors

Mahdieh Amereh
View author publications
You can also search for this author inPubMed Google Scholar
Ali Ebrahimpoor Gorji
View author publications
You can also search for this author inPubMed Google Scholar
Mohammad Amin Sobati
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Mahdieh Amereh: Data curation; Investigation; Methodology; Formal analysis; Software; Writing—Original Draft; Visualization; Validation Ali Ebrahimpoor Gorji: Investigation; Methodology; Formal analysis; Validation Mohammad Amin Sobati: Conceptualization; Methodology; Supervision; Writing—Review & Editing; Project administration.

Corresponding author

Correspondence to Mohammad Amin Sobati.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Amereh, M., Gorji, A.E. & Sobati, M.A. Molecular insights on the solvent screening for the benzene extraction from fuels using ionic liquids via QSPR method. Sci Rep 14, 30718 (2024). https://doi.org/10.1038/s41598-024-79639-x

Download citation

Received: 07 July 2024
Accepted: 11 November 2024
Published: 28 December 2024
DOI: https://doi.org/10.1038/s41598-024-79639-x

Subjects

Abstract

Similar content being viewed by others

Prognostication of advanced CO2 capture using tunable solvents with an ensemble learning-based decision tree model

Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state

Modeling the solubility of light hydrocarbon gases and their mixture in brine with machine learning and equations of state

Introduction

QSPR method

Dataset

Basic theory

Molecular descriptors calculation

Model construction

Descriptors selection

Linear model

Non-linear models

Statistical parameters and models validation

Results

Linear model construction

Evaluation and validation of the constructed linear QSPR model

Non-linear model construction

Evaluation and validation of the constructed non-linear QSPR models

Discussion

Interpretation of the anion descriptor

Interpretation of the cation descriptor

Conclusion

Data availability

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links

Prognostication of advanced CO₂ capture using tunable solvents with an ensemble learning-based decision tree model