Introduction
Distribution systems have been experiencing fast and significant changes, making the problems related to operation and planning of such systems more challenging. The new generation of smart distribution systems is characterized by the presence of dispersed and diverse generation, two-way flow of electrical power and the presence of meshed grid configurations instead of only radial ones. Besides, a smart distribution grid must be developed having in mind principles such as [1] accommodation of all generation options, active participation by consumers in demand response, power quality, efficient operation, self-healing, and optimization of asset utilization. The efficiency and reliability of the operation of smart distribution systems require monitoring, analysis, and control of the power grid at different levels.
Advances in metering and communications infrastructures allow the collection and storage of a large volume of data representing different system variables, taken at different voltage levels. It is evident that the communications networks play a very crucial role in the smart grid implementation, requiring also the presence of a proper two-way communications infrastructure, which should be able to meet the needs of management and remote control of the grid [2]. Such infrastructure must allow real-time data communication through wide area networks (WANs) to the distribution feeder and customer level [3]. Electrical utilities WANs usually adopt different communications technologies [4], including wired, such as fiber optics, power line communication (PLC) systems, copper-wire line, and wireless, such as cellular networks and cognitive radio [5]. Many different developments of communications technologies and strategies are reported in the technical literature [4]. They are designed to support a wide range of applications involving distribution systems monitoring, analysis, management, control, automation, and planning. Despite all the efforts in developing adequate communications infrastructures for smart distribution systems, in some cases their capacities still represent a major bottleneck for effectively running the advanced functions and tools for distribution management and control.
Rationale
The advances in metering system infrastructure and the ongoing deployment of smart meters will generate a big volume of data every day, mostly steady state data associated with system normal operating conditions. From an electric utility point of view, it would be interesting if all the data could be transmitted and become available to be processed. However, the communications network can be a bottleneck to achieve this goal. Data compression can enable the transmission of a big volume of data, which can be further used for post-operation steady state analyses that will help utilities to assess the power system performance and to enhance their operational processes. Many different techniques have been proposed for data compression in smart grid computation and control [6]–[8]. Most of them are devoted to power quality analysis and aim at the compression of electrical signal waveform associated with a transient response that follows a given system disturbance [9]–[12]. Ringwelski et al. [6] proposed lossless compression algorithms for smart meters and compare them to off-the-shelf algorithms. The best performance, in terms of compression ratio (CR), was achieved by the Lempel–Ziv–Markov Chain Huffman Coding algorithm, resulting in average CRs between 4:1 and 20:1. A compression approach tailored for the requirements of load profile data transmission in smart metering is presented in [7]. Zhang et al. [8] proposed a real-time data compression technique that combines exception compression with swing door trending compression, which resulted in CRs that ranged from 6:1 to 11:1. In [9], a biorthogonal 5/3 spline filter is employed for the compression of power system waveforms, achieving CRs up to 8:1. A wavelet transform-based multiresolution analysis is employed in [10] to perform the compression of disturbance signals in the smart grid context, having achieved a CR of 5.4:1. Focusing on the compression of data provided by phasor measurement units (PMUs), Klump et al. [11] adapted techniques already employed to solve image compression problems and proposed a new lossless compression approach. The best CR achieved was 14.35:1. Also aiming the compression of PMU data, the application of an embedded zerotree wavelet transform-based technique is proposed in [12]. According to the authors, the obtained performance is not as good as it is observed when using the same technique for image compression. The authors also point out that each data has its own features and even if a particular data compression method may be very effective for a specific signal, it may not be that productive in compressing other signals effectively.
Lai [13] presented a critical review on the impact of big data on smart grid and argues that due to smart grid deployment, there is a need to deal with a huge volume of data and different types of data sources in real time. It is stated that decision support systems need to incorporate with data compression mechanism to deal with big data situations effectively and that achieving a high compression is a major concern. The state-of-the-art and future trends of methods for the compression of electrical signal waveforms can be found in [14]. The authors state that data compression for smart grids is far from being as mature as for speech, image, and video compression. Smart grids will demand data compression techniques that are suitable for many distinct applications. While high compression levels are desirable in order to cope with the burden of the communications systems and the storage utilization [10], the level of tolerance to inaccuracies in the decompressed data will depend mainly on the characteristics of the targeted application.
Most of the works found in the literature show data compression results that can be considered relatively good for the intended applications. However, envisioning the volume of steady-state data that will be available for transmission and storage in the smart grid, higher CRs may be required. Moreover, there is a lack of research works on the compression of steady-state data in comparison to the compression of waveform signals.
This paper presents a methodology for the compression of large datasets, which can be transmitted from data concentrators or a regional data and control center in a compressed form and reconstructed to serve as input data for different applications, thus reducing the burden that would be imposed to the communications system capacity if the uncompressed data should be transmitted. The focus is on the compression of data from different measurements, taken at several time instants, which may cover, for example, an entire day of system operation. The measurement data at any given time instant will correspond, in general, to a steady state system operating condition. The dataset can be conveniently stored in a matrix format, which is suitable for the application of the singular value decomposition (SVD) technique. SVD has been successfully employed for image compression and other related applications [15]–[18]. However, its application for data compression in the smart grid has not been analyzed yet. The main motivation for employing SVD lies in its simplicity and potential to achieve good tradeoffs between data compression and loss of information. When applying SVD for data compression, the number of singular values (SVs) to be retained is the only parameter that needs to be set. As in many practical applications the ordered SVs decrease rapidly, only a few SVs are necessary in order to effectively compress the data. When the application of SVD is effective for a given data compression task, this will probably be revealed at high compression levels, with only a few SVs being retained. It is not the objective of this paper to select the best technique to perform a given data compression task in the smart grid computation. Rather, it aims at presenting the SVD as a technique that is worth applying and exploring, as its application is straightforward and can provide results of acceptable accuracy. If the data compression achieved using SVD is not acceptable, other techniques can be explored. In such case, the search for an acceptable performance usually involves the exploration of different models and parameters.
Tests using real-data collected from metering devices at 50 different substations are performed. The results show that the proposed methodology leads to a significant reduction in the volume of data to be transmitted. It is also shown that there is a very low loss of information after the data reconstruction is performed, meaning that the reconstructed data can serve as valid inputs for many different applications. The comparative results are presented and it is also shown that SVD can be employed along with other techniques, such as discrete wavelet transform (DWT), in order to achieve even better results.
The remainder of this paper is organized as follows. Section III discusses the importance of the metering and communications infrastructures in smart distribution systems. Aspects of the SVD are presented in Section IV. In Section V, the proposed methodology is presented and discussed. Results for data compression using real-data are presented in Section VI. Finally, the conclusion is made in Section VII.
Data Handling in Smart Distribution Systems
Power distribution systems, traditionally designed as passive and radial networks, have been experiencing many transformations to deliver the concept of a smart grid. These changes are driven by different factors, such as: the introduction of dispersed power generation directly connected to the distribution grid, shift from conventional power generation to renewable ones, the introduction of new technologies that will improve system monitoring and control capabilities, such as new power electronic devices and smart metering systems.
Some of the characteristics usually associated with a smart distribution system are [19]–[22] as follows:
presence of distributed and renewable energy resources;
capability to detect, analyse, and respond to disturbances (self-healing);
easy integration (plug and play) of different sorts of energy sources and loads;
automation and control of the distribution network;
automatic fault location and restoration;
islanding;
optimized and efficient use of assets.
The implementation of such characteristics requires the introduction of technologies and techniques that allow effectively monitoring and controlling the electric grid. In this context, advanced metering and communications infrastructures play an import role. Automatic meter reading (AMR) systems can automatically collect and transmit measured data using different communications technologies, wired or wireless. The advanced metering infrastructure represents an advance with respect to the AMR, as it makes possible not only to collect the measured data, but also enables analyses and interaction with consumers’ devices [4]. This requires a two-way communication between the utility and consumers. At power distribution substations, intelligent electronic devices (IEDs) are capable of collecting operational and commercial data and present multiple communication channels, applications, and protocols.
Electric utilities’ WANs usually employ hybrid communications technologies [4], such as PLC, fiber optics, WiMax, ZigBee, etc. They have to support many different applications, such as those related to monitoring, control, and automation in a distribution management systems [23], as well as demand side management. It is believed that with the deployment of new smart grid components, an adequate communications infrastructure is needed to allow sustainable operations to both utilities and customers [24].
Considering the diversity of possibilities of monitoring, control, and automation envisaged for the smart grid, the communications infrastructure will have to cope with a huge flow of data among the components of the grid [25]. Even with significant investments in the communications infrastructure, it can still be a bottleneck for the implementation of some applications. Therefore, new algorithms and methodologies that reduce the volume of data through the communications network will enable a more efficient and effective utilization of monitoring, control, and automation tools in a smart distribution system.
Data Compression via SVD
Data compression can be, in general, classified as lossless [26] or lossy [27], depending whether or not all original data can be recovered once it has gone through the compression process. In a lossless data compression, all original data can be recovered when the data is decompressed. In a lossy compression, on the other hand, part of the information is lost when the data is compressed. In such a compression, not only the redundant data but also information found to be less relevant is discarded. This will improve data compression but at the cost of making lossy compression a nonreversible process, as part of the information is permanently lost.
Lossy data compression has received significant attention from researchers due to its potential to achieve better CRs, generally much better than obtained with lossless compression. This will benefit a wide range of applications if adequate tradeoff between the data compression and information loss can be accepted.
Many applications of SVD for lossy data compression can be found in [28]–[30]. The SVD technique can be used to decompose a matrix \begin{equation} {\mathbf{X}}={\mathbf{U}} \boldsymbol {\Sigma } \bf {V}^{\mathbf{T}} \end{equation}
Equation (1) can also be expressed as a sum of rank one matrices as \begin{equation} {\mathbf{X}} = \sum _{{\mathbf{i}}=1}^{\mathbf{m}}{\mathbf{u}}_{\mathbf{i}} \boldsymbol {\sigma }_{\mathbf{i}} {\mathbf{v}}_{\mathbf{i}}^{\mathbf{T}} = {\mathbf{u}}_{1} { \boldsymbol {\sigma } }_{1} {\mathbf{v}}_{1}^{\mathbf{T}} +{\mathbf{u}}_{2} \boldsymbol {\sigma }_{2} {\mathbf{v}}_{2}^{\mathbf{T}} +\cdots +{\mathbf{u}}_{\mathbf{m}} \boldsymbol {\sigma }_{\mathbf{m}} {\mathbf{v}}_{\mathbf{m}}^{\mathbf{T}}\quad ~ \end{equation}
The SVD can be used for data compression, enabling noise reduction and data dependencies, as small SVs mainly represent the noise and interdependences in
Proposed Methodology
The deployment of a large number of smart sensors and meters in electrical distribution systems poses enormous challenges to the information infrastructure, as it will have to support the exchange of a huge volume of data to be processed by many different smart grid applications. This will require the collection and transmission of enormous amounts of data, such as, for example, measurements taken during a whole day of system operation (for post-operation analysis or billing) or in short time intervals (for network operation functions).
This section presents a methodology that allows the compression of data to be exchanged in smart distribution systems.
A. Data Matrix
Consider, for example, that measurements from
This is a convenient form to represent the data, as it can be easily manipulated for data compression, to be shown in the next section.
B. Data Compression
As discussed in Section IV, data matrix \begin{equation} {\mathbf{X}}^{({\mathbf{m}}\times {\mathbf{t}})} = {\mathbf{U}}^{({\mathbf{m}}\times {\mathbf{m}})} \boldsymbol {\Sigma }^{({\mathbf{m}}\times {\mathbf{t}})} {\mathbf{V}}^{({\mathbf{t}}\times {\mathbf{t}})} \end{equation}
Data compression can be achieved by taking advantage of the fact that many matrices occurring in practice do exhibit some kind of structure that leads to only a few SVs actually being non-negligible. In such cases, good approximation of matrix \begin{equation} {\mathbf{XR}} = {\mathbf{U}} \boldsymbol {\Sigma }' {\mathbf{V}} \end{equation}
\begin{align*} {\mathbf{XR}}=&\left [{\begin{array}{cc} {{\mathbf{XR}}_{11}^{^{({\mathbf{r}}\times {\mathbf{r}})} } } &\quad {{\mathbf{XR}}_{12}^{^{({\mathbf{r}}\times ({\mathbf{t}}-{\mathbf{r}}))} } } \\ {{\mathbf{XR}}_{21}^{^{(({\mathbf{m}}-{\mathbf{r}})\times {\mathbf{r}})} } } &\quad {{\mathbf{XR}}_{22}^{^{(({\mathbf{m}}-{\mathbf{r}})\times ({\mathbf{t}}-{\mathbf{r}}))} } } \end{array}}\right ] \\ {\mathbf{U}}=&\left [{\begin{array}{cc} {{\mathbf{U}}_{11}^{^{({\mathbf{r}}\times {\mathbf{r}})} } } &\quad {{\mathbf{U}}_{12}^{^{({\mathbf{r}}\times ({\mathbf{m}}-{\mathbf{r}}))} } } \\ {{\mathbf{U}}_{21}^{^{(({\mathbf{m}}-{\mathbf{r}})\times {\mathbf{r}})} } } &\quad {{\mathbf{U}}_{22}^{^{(({\mathbf{m}}-{\mathbf{r}})\times ({\mathbf{m}}-{\mathbf{r}}))} } } \end{array}}\right ] \\ {\mathbf{V}}=&\left [{\begin{array}{cc} {{\mathbf{V}}_{11}^{^{({\mathbf{r}}\times {\mathbf{r}})} } } &\quad {{\mathbf{V}}_{12}^{^{({\mathbf{r}}\times ({\mathbf{t}}-{\mathbf{r}}))} } } \\ {{\mathbf{V}}_{21}^{^{(({\mathbf{t}}-{\mathbf{r}})\times {\mathbf{r}})} } } &\quad {{\mathbf{V}}_{22}^{^{(({\mathbf{t}}-{\mathbf{r}})\times ({\mathbf{t}}-{\mathbf{r}}))} } } \end{array}}\right ] \\ \boldsymbol {\Sigma }\boldsymbol{ {'} }=&\left [{\begin{array}{cc} { \boldsymbol {\Sigma }_{11}^{^{({\mathbf{r}}\times {\mathbf{r}})} } } &\quad {0} \\ {0} &\quad {0} \end{array}}\right ]. \end{align*}
Performing matrix multiplications on the right hand side of (4) it is possible to express matrix \begin{equation} \left [{\begin{array}{cc} {{\mathbf{XR}}_{11} } &\quad {{\mathbf{XR}}_{12} } \\ {{\mathbf{XR}}_{21} } &\quad {{\mathbf{XR}}_{22} } \end{array}}\right ]=\left [{\begin{array}{cc} {{\mathbf{U}}_{11} \boldsymbol {\Sigma }_{11} {\mathbf{V}}_{11} } &\quad {{\mathbf{U}}_{11} \boldsymbol {\Sigma }_{11} {\mathbf{V}}_{12} } \\ {{\mathbf{U}}_{21} \boldsymbol {\Sigma }_{11} {\mathbf{V}}_{11} } &\quad {{\mathbf{U}}_{21} \boldsymbol {\Sigma }_{11} {\mathbf{V}}_{12} } \end{array}}\right ]. \end{equation}
Then, it can be seen that the only submatrices needed to compute the approximated matrix
Noting the dimensions of each submatrix and observing that
C. Compression Ratio
The extent of compression achieved by a coding scheme can be measured by a CR. The term CR has been defined in several ways in the literature. In many contexts, the CR is computed by dividing the size of the original data by the size of the compressed data. A \begin{equation} {\rm CR} = \frac {m\times t}{(m+t+1)\times r}. \end{equation}
It can be noted from (6) that, given a measurement set, the effectiveness of the data compression will basically depend on the number of SVs,
Communications network bottlenecks will limit the value of CR that can be considered acceptable for a given data compression task. So, alternatively, it is possible to rearrange expression (6) so that one can directly compute the number of SVs that need to be retained in order to achieve a given CR \begin{equation} r = \frac {m\times t}{(m+t+1)\times {\rm CR}}. \end{equation}
Expression (7) will be used to directly compute the value of
D. Loss of Information
As discussed in Section IV, lossy compression methods can be very effective for data compression, but this comes with a cost, which is the loss of information that will not be retrieved when the original data is reconstructed. Then, data compression should be carried out in a way that a good tradeoff between the CR and loss of information is achieved. In other words, data compression should not result in loss of information that renders the reconstructed matrix of limited use to the applications that would employ it as input data.
In this paper, the loss of information is measured in terms of the mean absolute error (MAE) and the mean percentage error (MPE) observed when comparing the reconstructed data matrix with the original one. The expressions for computing the MAE and MPE are \begin{align} {\rm MAE}=&\frac {1}{(m\times t)} \sum _{i=1}^{m}\sum _{j=1}^{t}\left |{X(i,j)-XR(i,j)}\right | \\ {\rm MPE}=&\frac {1}{(m\times t)} \sum _{i=1}^{m}\sum _{j=1}^{t}\left |{\frac {X(i,j)-XR(i,j)}{X(i,j)} }\right | \times 100. \end{align}
E. Algorithm
The block diagram in Fig. 4 illustrates the flow of data when employing the proposed methodology. The data compression algorithm performed at the sending end (for example, a data concentrator or a regional data and control center) can be summarized by the following steps.
Form data matrix
as illustrated in Fig. 2.{\mathbf{X}} Perform SVD to obtain matrices
,{\mathbf{U}} , and\boldsymbol {\Sigma } .{\mathbf{V}} Based on a value of
chosen to achieve a given CR, form submatricesr ;\boldsymbol {\Sigma }_{11}^{({\mathbf{r}}\times {\mathbf{r}})} ;{\mathbf{U}}_{11}^{({\mathbf{r}}\times {\mathbf{r}})} ;{\mathbf{U}}_{21}^{(({\mathbf{m}}-{\mathbf{r}})\times {\mathbf{r}})} ; and{\mathbf{V}}_{11}^{({\mathbf{r}}\times {\mathbf{r}})} .{\mathbf{V}}_{12}^{({\mathbf{r}}\times ({\mathbf{t}}-{\mathbf{r}}))} Reconstruct matrix
by computing{\mathbf{X}} according to (5).{\mathbf{XR}} Evaluate the loss of information by computing MAE and MPE using (8) and (9), respectively.
Before sending the compressed data it is possible to check if the loss of information after data compression is acceptable or not for the targeted applications. If it is not acceptable, data compression can be performed again by increasing the number of SVs to be retained and repeating the steps 3)–5). This will reduce the loss of information, but with the cost of making data compression less effective. Once a good tradeoff between CR and loss of information is achieved, the data is transmitted. At the receiving end, the data matrix can be reconstructed using (5), in the same way indicated by step 4) of the presented algorithm. It is important to observe that when an acceptable tradeoff is not achieved by using SVD, the application of a different data compression technique should be considered, which means that steps 2)–4) of the algorithm are changed accordingly.
It should be noted that other quantities can be calculated to measure the loss of information. As this calculation is performed still at the sending end, information about expected inaccuracies in the reconstructed data can also be sent along with the compressed data, which can be useful depending on the intended application. Among the applications that will benefit from the proposed approach are those devoted to steady state analysis, such as power flow and state estimation [31]. In those studies, forecasted quantities are often employed and those may be far less accurate than the reconstructed data. The investigation of the influence of the proposed approach on specific applications will be the subject of future publication.
Test Results
The proposed data compression methodology has been implemented using MATLAB and tested with real measurement data from an U.K. utility. The tests have been performed on a PC Intel core i7 processor, 2.20 GHz, with 8 GB of RAM. The data description and obtained results are presented next.
A. Data Description
Input data for tests have been preprocessed using raw data from the Electricity North West Company that owns and operates the electric power distribution network in the North West of England, which includes the regions of Cumbria, Lancashire, Greater Manchester and parts of North Yorkshire, Derbyshire, and Cheshire. The company has recently launched the customer load active system services (CLASS) project, which aims to increase the capacity of the electricity network by using voltage control to manage electricity consumption at peak times. As part of the CLASS project a significant amount of data has been collected in the year of 2014, by installing metering equipment in many different substations of a trial area that represents 17% of the company’s network, serving around 470 000 customers. The employed data is available in [32], where several measurements collected since April 2014 can be found. Measurements were collected at 1 min or 1 s time intervals. More details about the CLASS project are described in [32].
The next section shows results obtained with the proposed method for tests with measurement data collected in a 1-min basis, from transformers at 50 substations, during the whole day of December 10, 2014. Those include measurements of three-phase voltages and currents, active power, reactive power, and power factor, taken from a total of 900 m at 1440 time instants (minutes). This corresponds to a total of 1 296 000 measurements and a
It should be noted that the choice to show results obtained with data collected on December 10, 2014 was arbitrary and simply because this was the most recent date with available data when the tests were run. Similar results were obtained when performing additional tests using data collected on other dates.
B. Results
The algorithm presented in Section V-E was tested to evaluate the CR and loss of information for the data matrix containing the 1 296 000 measurements collected on December 10, 2014. Different situations have been considered, in which different CRs were aimed. The obtained results are presented in Table I. The number of retained SVs was calculated using (7), where the target CRs (TCRs) were those shown in the first column of Table I. The second column of Table I shows the number of SVs
From Table I, it can be seen that very good tradeoff between data compression and loss of information have been achieved.
Tables II and III present more detailed results and show also the mean errors (in percent and absolute) per type of variable being measured. From Table II, it is possible to observe that more significant percentage errors are associated with the reactive power, while voltage, currents, and active power measurements present very low errors when reconstructed. Despite the larger percentage errors observed in the reconstruction of the reactive power measurements, it is also possible to observe that those can be significantly reduced if lower CRs are allowed.
The results in Table III show that the reconstructed data also presents low absolute errors, considering that the measurement values found in the original data lie in the ranges shown in Table IV for each variable.
Regarding power factor and, particularly, reactive power measurements, it was also observed that data reconstructed with high percentage errors are not necessarily associated with high absolute errors. In many cases, high percentage errors are obtained when reconstructing power factor or reactive power measurements whose values are close to zero. In such cases, small absolute errors may be associated with large percentage errors.
The maximum errors for voltage, current, and active power measurements are presented in Table V. However, it is important to note that those maximum errors can be seen as outliers. As an example, consider the CR of 92.3:1, which is associated with the largest reconstruction errors. In this case, 95.38% of the reconstructed active power measurements present errors below 5%, the same happening to 94.31% and 100% of current and voltage magnitude measurements, respectively. Maximum errors associated with power factor and reactive power measurements can be very high, as explained before. So, these errors are not meaningful and are not shown in Table V. However, considering again the worst case of the CR of 92.25:1, 82.35% of the reconstructed reactive power measurements present errors below the corresponding MPE shown in Table II. The same happens to 86.02% of the power factor measurements.
C. Comparative Analysis
In order to compare the performance of the data compression obtained when employing SVD, additional tests have been carried out using the DWT for the compression of the same dataset. Different Daubechies’ wavelets [33] have been tested, as well as different thresholds and levels of decomposition. Table VI shows results obtained with SVD and with the order 2 Daubechies’ wavelet (db2) and five levels of decomposition. Similar results were obtained when testing different Daubechies’ wavelets.
It can be seen that SVD is capable of achieving better tradeoff for higher CRs. This can be associated with the fact that in SVD the first SVs carry the most relevant information about the data. Fig. 5 shows the 150 largest SVs obtained when applying the SVD technique. In order to allow a better visualization of the SVs decay, the three largest SVs have been omitted. Regarding the db2 model, different thresholds have been employed, aiming to retain the number of wavelet coefficients that would result in the same CRs shown for SVD. However, due to the discrete nature of this problem, it was not always possible to exactly match the CRs obtained with both techniques. As a result, the CRs presented in the first and in the fourth columns of Table VI are in some cases slightly different.
When using the DWT, the margin of errors associated with MPE and MAE, for a confidence interval of 95%, presented characteristics similar to those obtained when applying SVD and shown in Table I.
The maximum errors observed with DWT are presented in Table VII, along with the ones previously obtained with SVD. When comparing such results it is possible to observe that for higher CRs the maximum errors obtained with DWT are much larger than those obtained with SVD. On the other hand, maximum errors obtained with the application of DWT are smaller for lower CRs. In order to achieve higher CRs with the DWT (db2), it is necessary to discard not only detail coefficients, but also some approximation coefficients [33], resulting in larger reconstruction errors.
As it happened for the results obtained with SVD, maximum errors associated with power factor and reactive power measurements presented very large values when using DWT. As discussed in Section VI-B, this happens when attempting to reconstruct power factor or reactive power measurements whose original values are close to zero, which may result in high percentage errors.
For lower CRs the results obtained with the DWT tend to be better. However, regarding the tested data, the reconstruction errors for lower CRs are already small for both methods, with the exception of the errors associated with the reconstruction of reactive power and power factor measurements. The MPEs associated with reactive power measurements (MPE_Q) and power factor measurements (MPE_PF), computed when using SVD and DWT, are shown in Table VIII.
From the obtained results, one can think of combine SVD and DWT in order to obtain a tradeoff that suits a given data compression problem. If, for example, MPEs below 2%, per measured quantity, are acceptable for a given application, then SVD can be employed to compress only voltage, current and active power measurements, while a DWT is employed to compress reactive power and power factor measurements. Table IX shows the CRs achieved and the associated errors when using such strategy. Taking into account the number of measurements associated with each variable, the global CR, for the whole data set, will be 27.8:1. The resulting CR is better than the one that would have been obtained if SVD or DWT alone was employed.
The CRs achieved with the tested data when applying lossless methods, such as Block Zip 2 and Lempel-Ziv-Markov chain-Algorithm, were approximately 6.6:1 and 7.8:1, respectively. As previously mentioned, it is not the objective of this paper to select the best technique to perform a given data compression task in the smart grid, but to present SVD as a technique that is worth applying and exploring in such context. The application of SVD is straightforward, not requiring the exploration of different parameters or models, and can lead to competitive results. As illustrated in Table IX, it can also be combined with other data compression technique to achieve better data compression results. This strategy can be easily addressed by the proposed methodology.
D. Comments
According to the proposed methodology, the measurement data will be reconstructed at the receiving end by performing simple matrix multiplications using the transmitted compressed data. One important feature of the proposed approach is that the whole process, including the reconstruction phase, can be performed in advance, before sending the compressed data. This brings some advantages, which are discussed next.
1) Tradeoff Between CR and Loss of Information:
The choice of the best tradeoff between CR and loss of information will depend, mainly, on the application that will make use of the transmitted data and on limitations associated with the communications network. In other words, that choice will be driven by the tolerable loss of information and/or data compression requirements.
Compression requirements or constraints are observed by simply imposing limitations on the CR to be achieved. The loss of information, on the other hand, can be evaluated in different forms. Tables I–III show the percentage and absolute errors computed as global or per variable measures. The observation of the absolute errors per variable seems to be a good strategy when the order of magnitude of the measurements being collected is known. Consider, for example, that MAEs of 0.02 kV, 0.01 kA, 0.05 MW, and 0.05 Mvar are considered acceptable for the measurement data set. Then, a CR of 19.8:1 would be sufficient to simultaneously attend all the loss of information constraints, as can be verified in Table III. Depending on the intended application, different thresholds can be chosen and different loss of information measures (or even a combination of them) may be adopted.
2) Errors Associated to Metered Quantities:
Besides the errors presented in Tables I–III, it is also possible to compute the errors associated with measurements taken from any specific meter (rows of data matrix
It is important to note that the errors associated with measurements taken from specific meters can also be transmitted along with the compressed data. This information may be useful for many applications, particularly those in which it is important to model and take into account the uncertainties of the data being processed.
3) Computational Efficiency:
Computing a full SVD of a
Conclusion
This paper presented a methodology for data compression in smart distribution systems using the SVD technique. The proposed methodology can help the communications infrastructure to cope with the challenge of transmitting a big volume of data that will need to be exchanged in future smart grids. An algorithm for exploring different tradeoff between data CR and loss of information was presented. The application of SVD, due to its simplicity and effectiveness, has been proposed. Tests have been performed employing real-data from different substations of an U.K. company. The obtained results show that a significant reduction in the volume of data to be transmitted can be achieved, being the loss of information very low after data reconstruction. It was also shown that SVD can be competitive with other data compression techniques and that the proposed methodology does not exclude the possibility of employing different techniques, cooperating to achieve even better results.
ACKNOWLEDGMENT
J. C. S. de Souza and T. M. L. Assis would like to thank the Fluminense Federal University, and the Federal University of Rio de Janeiro, for granting them a sabbatical leave to carry out postdoctoral research at Imperial College London, where this research work has been conducted.