Introduction

Through many rounds of information investment, the construction of the Zhongtai Engineering Project in the power grid resources business has basically realized the informationization of management, such as the application of advanced technology in equipment production, power marketing, dispatching operation, financial control, and safety monitoring information systems, and the unification of data sources. Ubiquitous Power Internet of Things takes “data source unification, marketing maintenance, and distribution operation” as the main basis, and also hopes to fundamentally realize the problems of “unified data”, “unified ID” and “unified service”, and puts forward the unified research on the construction of ubiquitous power grid resource business1. Through the design of the power grid resource framework for power enterprises, using the domain model-driven analysis method, combined with typical application scenarios and processes, the complete top-down analysis of power grid resource business is realized. The precise positioning of each center is clarified. The basic services of business objects are refined and the enterprise-level shared services are summarized and precipitated. Focus on completing the service sorting work of the ten centers of power grid resources, assets, graphics, topology, model management, and power grid analysis, including service list, business object, service input and output2.

In the power industry, CIM (Common Information Model) defines a standard object model for the development and integration of power engineering, planning, management, operations, and business applications. It provides a standard for describing power objects and their relationships. CIM can be defined as providing standard objects for interoperability and applications for production, transmission, distribution, marketing, and retail systems in the electricity, water, and gas industries3.

In 2016, the construction service of the model center precipitated and took shape, based on the SG-CIM (State Grid Common Information Model) standard model. The full model of the power grid through operation and distribution was established. A series of services such as model, resource, topology, automatic mapping, GIS graphics, measurement, and outage management is constructed in the way of common precipitation to support the business application of power supply intelligent service system, distribution automation, distribution network planning system and distribution network engineering management system4. Qiu Zhiyong analyzed the problems faced by power system information integration and proposed corresponding solutions. He also completed the CIM/XML interactive software platform and discussed the CIM model in the SVG graphic display5. Ye Ce studied how to establish a database for the CIM model data, and participated in the development of the distribution and transformation management system of the actual power supply Bureau. He also established a data Mart covering power quality and tested the establishment of a data warehouse6. Taking the CIM model as the data, Liu Chongru et al. proposed the viewpoint of a power system component database based on the CIM model and used the Oracle9i database as the background to establish the mapping rules from the CIM model to the database7. Cao Yang and others discussed the latest development of the CIM model and IEC61970 standard and introduced the changes to the IEC61970 interface specification and graphic standard8. Relevant scholars study the data of power grid resources. Relevant scholars study the data of power grid resources. Reference9 puts forward the power grid operation prediction based on artificial neural network model. The nonlinear autoregressive network is used to calculate the hourly wind speed, solar radiation and power demand of the power grid, and the multi-step differential data prediction is carried out to complete the coordination of the power grid network structure. However, this method takes a long time to calculate. Reference10 proposes a hybrid model for short-term electricity price forecasting. The time series of electricity price is decomposed by variational model, the time domain characteristics of model function in VMD domain are extracted by convolutional neural network, and the power grid data is processed by gated cycle unit to complete the electricity price data processing. However, the data processing efficiency of this method is low. Reference11 proposes a stochastic programming algorithm for multi-microgrid operation on the basis of considering the integration of stochastic renewable energy. According to the energy storage data, the power generation is preliminarily predicted, and the power grid operation is simulated by game theory to determine the power market data. However, the data processing accuracy of this method needs to be further improved. Reference12 puts forward a predictive power grid platform based on blockchain, which obtains power generation scheduling data through distributed energy, and provides real-time energy consumption monitoring data by using energy trading module and predictive analysis module to complete the update of power grid energy trading data. However, the application scope of this method is relatively simple. In order to deal with the problem of data islands between different systems, it is necessary to gather different systems together, use integrated “language” to explain information and use integrated interfaces to transmit and transport information. In this paper, according to the data requirements of the function module calculation method, the multi-source heterogeneous data related to power grid planning are processed separately according to the structure. Establish the corresponding relational data model for structured data, complete the process of extraction and cleaning from different data sources, complete the data optimization of specific functions, and finally implement it into the database of this system for various businesses. The standardization method of power grid resource data is established to solve the problem of data integration and processing. The data integration and differentiation analysis methods are proposed, and the data unified model is established to apply to power grid resources. The business algorithm model provides support for power grid planning automation.

Article Highlights:

  1. (1)

    The definition of data module standards needs to fully consider the readability, expansibility and compatibility of data. Establish data module standards, standardize the functions of each module and their connection modes, reduce the incidence of coupling and mutual influence, and thus ensure the normal operation, upgrading and maintenance of the power grid system.

  2. (2)

    Using standard structure data to build a power grid planning data platform, the data from different sources and different formats are processed and integrated, and the data are standardized and unified, making it easier to use and manage.

  3. (3)

    By using advanced data analysis technology, the data are deeply analyzed and utilized, and valuable information and rules are found, which provides more comprehensive and accurate data support for power grid planning.

In the next section, consider setting the power grid control system to complete the matching of power grid resource data and integrate the graphic data and attribute data of the database to improve the practicability and operability of the system. “Differentiation analysis of power grid system data” section shows the differential analysis of power grid system data, and obtains the decisive factors of power grid difference through regression analysis. “Data differentiation model” section shows the spatial difference prediction, calculates the data difference of the whole network, and completes the power grid planning standard by constructing the data differentiation model. In “Experimental test” section, the experimental results and analysis are given, and the effectiveness of the algorithm is verified by comparative analysis.

Related work

Grid control system

Smart grid regulation and control system covers power grid stability monitoring, dynamic monitoring and analysis, timely protection of equipment after a power outage, online monitoring of power grid internal system, and further analysis of the security control system after the implementation of the optimization scheme combined with CIM model, as well as the management effect of online monitoring. Therefore, the integrated intelligent system further analyzes whether the alarm function of the control technology meets the requirements of the power grid regulation and control system.

The intelligent control of power grid regulation and control system is divided into four main functions: operation stability control, integrated intelligent monitoring and alarm, relay protection equipment online monitoring, and security and stability control device monitoring. The basic language semantic analysis of model building is used to realize real-time monitoring of power grid operation information. The analysis of additional monitoring information of the power grid system is combined with the status information of secondary equipment to realize three-dimensional monitoring of the power grid, ensure the normalization of system operation and improve the dynamic distribution of omnidirectional monitoring13,14. At present, the operation status of the power grid regulation and control system realizes dynamic information monitoring and transient process monitoring through intelligent monitoring, which is beneficial to achieve more comprehensive supervision and provide more support for the overall planning of the regulation and control system. Depending on the overall research, the online fault of power grid operation can be found. Combined with the overall monitoring technology, the intelligent alarm function device is provided, which realizes the real-time monitoring and intelligent monitoring function of the power grid to a certain extent. It also realizes the perfection of each interface performance in the computer interface and ensures integrated control by using the technical advantages15. The motivation for related work is to enhance the intelligence of the power grid regulation system, achieve comprehensive monitoring through integrated intelligent systems, use advanced technology to obtain and process power grid information in real time, timely detect and handle faults, and provide support for the overall planning of the system.

Power grid resource data matching

Power grid planning has a wide range of research, with the characteristics of “interdisciplinary, multi-field open, multi-means integration”16. The research object is huge, rich in content, and strong in uncertainty. Internally, it includes the stable and effective electrical calculation method of traditional planning, and externally, it coordinates with multi-party cooperation (economy, society, city and users, etc.) to achieve a business scenario of stable internal operation, and harmonious external interaction. Power grid planning is a complex, systematic and multivariate nonlinear optimization problem, which determines the discrete, dynamic, multi-variable, multi-stage, and multi-objective characteristics of the solution process17.

According to previous studies, the initial power grid planning is mainly based on the experience of planners, and artificially carries out data collation and analysis, electrical calculation, manual drawing and scheme verification. Since the application of mathematical theory and computer is gradually widespread, power grid planning has achieved a deeper exploration in theory and application. Based on statistics, operations research, mathematical modeling, information engineering theory and other new achievements, supplemented by computer technology, a variety of power grid planning optimization models are obtained, as shown in Fig. 1.

Fig. 1
figure 1

Power grid planning model.

In the stage of “database + computer-aided design”, a power graphic information system is built by using the computer-aided design drawing software GStarCAD EP 2024 (https://yun.gstarcad.com/activity/promotionArchitectureBaidu2/?bd_vid=76). In this stage, although the overall picture of the power system can be drawn, the graphic data and the attribute data of the database are not fully integrated, and the practicability and operability are low18. The stage of “GIS” can connect the attribute data of the database with the graphic data, but the disadvantage is that there is no function to support the analysis of topological structure so advanced operations such as electrical calculation can not be carried out. The phase of “automatic drawing + equipment management + geographic information system” realizes the direct mapping relationship between geographic information and topological model, strengthens the expression and collaboration ability of the system as a whole, operates computing tools on the basis of complete system graphics, consolidates the internal logic of the transaction process, and improves the interactivity19.

Differentiation analysis of power grid system data

The difference prediction module is proposed to use regression analysis. Regression analysis is a key tool to analyze the internal relationship between variables, which is used to calculate the monitoring data of variables, sort out the correlation between the difference prediction results and the decisive factors, and then complete the different prediction. There are many decisive factors of power grid difference. The decisive factors can be selected according to the change law of difference by using the regression analysis method, which makes the difference prediction result more accurate20. If there is a linear relationship between the difference prediction calculation target and the decisive factor, the calculation target is assumed to be the dependent variable y. The decisive factor is assumed to be the independent variable x, which can be calculated by using the linear regression equation with one variable, as shown in Formula 1.

$$y = ax + b$$
(1)

where, a is the regression constant and b is the regression coefficient.

a and b are obtained by the least squares method, n is the number of independent and dependent variables as shown in Eq. 2.

$$b = \frac{{\sum\nolimits_{n}^{n} {x_{1} } - x_{2}^{n} y_{1}^{n} }}{{\sum\nolimits_{n = 1}^{n} x_{2}^{2} - x_{n}^{2} x_{n} }} \cdot a = y_{n}^{ - 2n}$$
(2)

where \(y_{n}\) is the monitored data value of the dependent variable, and Xi is the monitored data value of the independent variable. The value is: \(x = \frac{{\sum_{i = 1}^{n} x_{i} }}{n},y = \frac{{\sum\nolimits_{i = 1}^{n} {y_{i} } }}{n}\).

Data differentiation model

Prediction of spatial difference

Data difference prediction includes data difference prediction of the whole network and spatial data difference prediction of regions divided according to functions or plots. The choice depends on the application scenario of difference prediction. The application scenarios of data difference prediction of the whole network are generally for power generation schemes, industrial upgrading, total annual electricity consumption, etc.. The application scenarios of spatial data difference prediction are for the access, replacement, and expansion of specific equipment in the power grid, involving substation site selection, and capacity determination, grid planning and other businesses21.

Spatial data variance prediction includes the following correspondence:

$$F(x,y) \to L(x,y) \to S(x,y) \to E_{x} y) \to L_{z}$$
(3)

In the formula, area \(F(x,y)\) corresponds to area \(L(x,y)\) planning through \(f_{1}\); area planning corresponds to area \(S(x,y)\) data difference through \(f_{2}\). The area data difference corresponds to the whole network data difference through \(f_{3}\).

$$f_{3} :L(x,y) = \sum\limits_{i = 1}^{m} {S_{i} } (x,y) \times SC_{i} = \sum\limits_{i = 1}^{m} {I_{i} } (x,y)$$
(4)

where m is the total number of zone classes and \(SC_{i}\) is the degree of loading of zone i.

$$L = f_{3} (L_{xy} ) = \sum\limits_{x,y} {L_{xy} }$$
(5)

Among them, \(L_{xy}\) represents the spatial weight of position x and position y.

Data standard model

Spatial difference prediction obtains the difference value, distribution, density, and other data of the selected area in a certain period of time, and obtains various parameters of the data standard. The specific way is to confirm the optimal threshold of the model standard according to the division of the region and the consideration of the difference distribution, and to confirm the index density of the data standard according to the difference value and density. Taking the power grid equipment construction project as an example, the data standard model is optimized according to the following formula as the objective function, and the algorithm is as shown in formula 6.

$$C = C_{szu} + C_{iz} + C_{iv}$$
(6)

In the formula, \(C_{szu}\) is the annual investment and maintenance fund of the proposed substation; \(C_{iz}\) is the annual investment fund of the substation access line; \(C_{iv}\) is the annual depreciation cost of the substation access line. The specific calculation method is as follows:

$$C_{szu} = \sum\limits_{i = 1}^{n} {\left\{ {C_{sz} (S_{i} )\left[ {\frac{{r_{0} (1 + r_{0} )^{{t_{ms} }} }}{{(1 + r_{i} )^{{t_{0} }} }}} \right] + C_{su} (S_{i} )} \right\}}$$
(7)
$$C_{iz} = \beta \left[ {\frac{{T_{0} (1 + r_{0} )^{{t_{ml} }} }}{{(1 + r_{0} )^{{t_{0} }} - 1}}} \right]\sum\limits_{n = 1}^{2V} {\sum\limits_{i = 1}^{N} {l_{i} } }$$
(8)
$$C_{iv} = \alpha \sum\limits_{i = 1}^{N} {\sum\limits_{j = j} {W_{j} } } l_{ij}$$
(9)

In the formula, n is the number of substations to be developed; \(C_{sz} (S_{i} )\) is the development fund of substation i; \(C_{iz} (S_{i} )\) is the maintenance fund of substation i; \(S_{i}\) is the capacity value of substations i; \(r_{i}\) is the load difference of substations i; J is all the difference points of the network; \(r_{0}\) is the threshold ratio; \(t_{ms}\) is the time for substations to be scrapped; N is the total number of built and proposed substations; \(\beta\) is the construction cost of sectional lines; \(l_{ij}\) is the total length of the line from i to j; \(W_{j}\) is the active power of point j; \(t_{ml}\) is the scrap time of the line connected to the substation; \(\alpha\) is the line loss coefficient.

Standard for grid planning

According to the standard model of power grid data, taking the standard of power grid planning and construction as an example, a large number of distribution transformers and lines are overloaded due to the lack of effective planning in the initial stage of construction. For heavy overload distribution transformers and lines, the first consideration is to expand the capacity of distribution transformers, build substations, or optimize the differences between distribution transformers and distribution lines to adjacent distribution and lines. If you choose to transfer the difference, it involves grid planning. It is necessary to make a comprehensive analysis of the grid, select the most suitable equipment for transfer and adjust the interconnection switch or build new lines to achieve the balance of differences, enhance the overall reliability of power supply, and solve the weak links of the grid.

The process of grid planning follows the principles of segmentation and connection. On the premise of not affecting the power supply quality of important users, the principle of equal difference in line segmentation is to make the transfer and maintenance more targeted.

Considering the transfer capacity of substation, two aspects are generally considered:one is the power supply capacity of the substation itself, and the other is the transfer difference capacity of the substation.

To calculate the power supply capacity of the substation, firstly count the number of transformers \(r_{1} ,r_{2} , \cdots ,r_{n}\) corresponding to n substations in the area. The mathematical calculation method of power supply capacity of substation i is shown in formula (9).

Experimental test

Example analysis

The non-inferior solution set of the double objective function (F5, F6) is solved by the proposed algorithm. The Pareto front curve is solved by the proposed algorithm. Because of the large amount of different data in the whole year, the scientificity and effectiveness of the optimization algorithm become the important indicators discussed in this paper.

Tchebycheff method and MOEA/D algorithm are used to compare and analyze the proposed algorithm. The four optimization problems are as follows:

ZDT1:

$$\left\{ {\begin{array}{*{20}c} {f_{1} (x) = x_{1} } \\ {f_{2} (x) = 8(x) = x(x)\left[ {1 - \left( {\frac{f(x)}{g}} \right)^{3} } \right]} \\ {g(x) = 1 + 9\left( {\sum\limits_{2 = 2}^{4} x ,/(n - 1)} \right)} \\ \end{array} } \right.$$
(10)

where, \(x_{1} \in [0,1]\)\(,\quad n = 30.\)

ZDT2:

$$\left\{ {\begin{array}{*{20}c} {f_{1} (x) = x_{1} } \\ {f_{2} (x) = g(x)|[ - \sqrt {\frac{{f_{(x)} }}{g(x)}} ]} \\ \end{array} } \right.$$
(11)

ZDT3:

$$\left\{ {\begin{array}{*{20}c} {f_{1} (x) = x_{1} } \\ {f_{2} (x) = g(x)\left[ {1 - \sqrt {\frac{{f_{1} (x)}}{g(x)}} - \frac{{f_{0} (x)}}{g(x)}\sin (10\pi x)} \right]} \\ \end{array} } \right.$$
(12)

where, \(xi \in \left[ {0, 1} \right],n = 30\), \(g\left( x \right)\) is the same with ZDT1.

ZDT4:

$$\left\{ {\begin{array}{*{20}c} {f_{1} (x) = 1 - exg( - 4x)\sin^{5} (6f(\pi x)} \\ {f_{2} (x) = g(x)[1 - (f_{1} (x)/g(x)^{2} ]} \\ {g(x)\left[ {1 - (f_{1} (x)/g(x)^{2} } \right]} \\ \end{array} } \right.$$
(13)

The population number is set to 100. The algorithm iteration stops at 200 generations. The crossover operator and mutation operator adopt simulated binary crossover and polynomial mutation. The crossover probability is 1, and the mutation probability is 1/n. This paper analyzes the advantages and disadvantages of the two algorithms by calculating the average coverage rate (C-metric), average distance metric (D-metric) and average time (T) after 5 times, as shown in Table 1.

Table 1 Algorithm performance comparison.

It can be seen from Table 1 that in DTZ1-DTZ4, the algorithm of this paper with the known optimal reference point set is superior to the MOEA/D algorithm in terms of average coverage and average distance metric. In DTZ4, the Pareto front is non-convex, and the performance of the proposed algorithm is much better than that of MOEA/D. In addition, because the optimal reference point set z* is obtained in advance, the running speed of the proposed algorithm is significantly faster than that of the MOEA/D algorithm in the four experiments.

The number of power grid equipment is set to 100, with a data file size of 50 MB, 2 types of data (voltage and current data), a noise ratio of 10%, and a missing value ratio of 5%. Set different numbers of parallel threads to 2, 4, and 8, with a load change rate of 15% per hour. Set the short-circuit resistance to different values of 0.1 Ω, 0.5 Ω, and 1 Ω, respectively.

To further validate the effectiveness of the method proposed in this paper, coverage rate was used as the experimental indicator to compare the three methods. Calculate the coverage of the solution set obtained by each method on the known optimal reference point set. The higher the coverage, the closer the solution set obtained by this method is to the optimal solution, and the better the model performance. The comparison results of coverage are shown in the Fig. 2.

Fig. 2
figure 2

Comparison of coverage results.

From the graph, it can be seen that the PF method has a high coverage rate in the initial stage, but it decreases over time and has poor performance in the later stage. The coverage of MOEA/D method is relatively stable but slightly low. And the overall performance of the method in this article shows the highest coverage, especially in the later stage, indicating that its ability to search for the optimal solution is the strongest, and the solution set is closer to the optimal solution.

Algorithm performance analysis

Comparison of optimal solutions

The Pareto optimal solution set under the lowest distance metric of the two algorithms is shown in Fig. 3.

Fig. 3
figure 3

Comparison of pareto optimal solution sets of the two algorithms.

In Fig. 3, it can be seen that the Pareto optimal solution set output by the algorithm proposed in this paper (red curve) is closer to the standard Pareto front (black dashed line) than MOEA/D (light blue curve), indicating that the algorithm proposed in this paper has better solution quality. Specifically, as the target space changes (from left to right), our algorithm exhibits a tighter fit than MOEA/D in most regions, especially in the right region where the solution set is significantly closer to the standard Pareto front. This indicates that the algorithm proposed in this paper can more effectively approximate the true Pareto front when dealing with multi-objective optimization problems, thereby providing higher quality non inferior solutions.

Comparison of interpolation algorithm accuracy

Because of the small variation of the reliability level throughout the year in the experiment, it puts forward high requirements for the accuracy of the interpolation algorithm, so it is necessary to select the interpolation algorithm with strong fitting ability. In areas with sparse data points, interpolation errors are usually large, and increasing the number of data points can significantly improve accuracy. Understanding data such as periodicity, symmetry, uniform distribution, and non-uniform distribution can help choose more suitable interpolation methods. For data with non-uniform distribution, weighted interpolation method is needed to improve accuracy. By selecting an appropriate interpolation algorithm and providing sufficient data points, non-linear function interpolation can achieve high accuracy. However, due to the complexity of the function, the situation of data points, algorithm capabilities, and practical application limitations (such as limited data, measurement errors, and computational resource constraints), it may not be possible to achieve the theoretical maximum accuracy. Therefore, it is necessary to comprehensively consider the characteristics of the function, the number and distribution of data points, and computational resources to select an appropriate interpolation algorithm. Considering the existence of unavoidable errors, taking function \(f(x) = x^{3} - x^{4}\) as an example, the fitting simulation is carried out through the tracing point method (ginput function in Matlab). The interpolation results of the four methods are shown in Fig. 4.

Fig. 4
figure 4

Comparison of interpolation results.

It can be seen from Fig. 4 that when the known points are the same, the linear interpolation and the interpolation of known neighboring points have larger errors for complex curves. The results of the algorithm in this paper are similar to those of the third-order spline interpolation, both of which fit the standard curve. The fitting effect of the third-order Hermite algorithm is better. In addition, the third-order spline interpolation needs more memory and computing time than the algorithm in this paper, so the algorithm in this paper fits the Pareto front curve best.

Reliability comparison

The value of the proportional coefficient \(\beta_{i}\) indicates the user data type. In this paper, the following two user data types are selected to indicate differences: in type A, \(\beta_{1} - \beta_{4}\) is 0.49, 0.49, − 0.01, − 0.01; in type B, \(\beta_{1} - \beta_{4}\) is 0.01, 0.01, − 0.49, − 0.49. The Pareto front interpolation curves of the bi-objective function under the two types are shown in Fig. 5.

Fig. 5
figure 5

Pareto frontier interpolation curve.

Three points with EENS values of 1.7, 1.8 and 1.9 are taken as the target points to be solved. The values of the two user data types decrease with the increase of EENS, but the shape of the Pareto front curve under different user data types is quite different. The type B user data is significantly lower than the type A user data under the same EENS level.

The three-layer BP neural network is trained according to the Pareto optimal solution set output by the proposed algorithm. Set the number of hidden layer nodes as 3. Take the data as samples. Take 35 groups of non-inferior solutions as training samples and 7 groups of non-inferior solutions as test samples to test the BP neural network. Then, compare the time-sharing outputs of the test set and the ideal sample under different inputs, and calculate the relative error of the samples in the test set, as shown in Fig. 6.

Fig. 6
figure 6

Relative errors of test sets.

It can be seen from Fig. 6 that under different inputs, the ideal output is very close to the output of the BP neural network. Under different inputs, the relative error is not more than 0. 013, the relative error of each stage data is not more than 0. 006, and the relative error of peak data is less than 0. 001, which fully meets the experimental accuracy. To sum up, the trained BP neural network can accurately simulate the nonlinear relationship between the inputs of the two objective functions.

Conclusion

Based on the research of power grid resource data standardization, this paper establishes the data characteristics of multi-source heterogeneous data, and analyzes and processes according to structured and unstructured data. For structured data, a data integration method of extracting, cleaning, and merging data from multiple source systems is proposed. The data standardization is carried out according to the data characteristics of business algorithms. The CIM model is established to analyze the topological structure of the distribution network. Finally, according to the data requirements of the system business and the database design principles, the spatial data association is designed. The data unified model is established to realize the data standard of the exclusive power grid planning system.

The power grid planning system contains a huge amount of content, which still needs further study in the future. The next step is to enhance the data feature collection of the intelligent algorithms in the power grid planning system. Establish the corresponding algorithm model of business, verify and optimize its accuracy, master the planning process and planning decision-making by the algorithm, and completely liberate the participation of human beings. Deeply explore the data features required for intelligent algorithms, establish business algorithm models, and validate optimization. Thoroughly study the algorithm planning process and decision-making mechanism to provide scientific basis for power grid planning. Continuously optimize technology, reduce human involvement, and achieve planning automation and intelligence.