On Centralization and Unitization of Batch Normalization for Deep ReLU Neural Networks | IEEE Journals & Magazine | IEEE Xplore
Scheduled Maintenance: On Saturday, 19 July, IEEE Xplore will undergo scheduled maintenance from 7:00 AM-11:00 AM ET. During this time, there will be periods when the website will be unavailable.

On Centralization and Unitization of Batch Normalization for Deep ReLU Neural Networks


Abstract:

Batch normalization (BN) enhances the training of deep ReLU neural network with a composition of mean centering (centralization) and variance scaling (unitization). Despi...Show More

Abstract:

Batch normalization (BN) enhances the training of deep ReLU neural network with a composition of mean centering (centralization) and variance scaling (unitization). Despite the success of BN, there lacks a theoretical explanation to elaborate the effects of BN on training dynamics and guide the design of normalization methods. In this paper, we elucidate the effects of centralization and unitization on training deep ReLU neural networks for BN. We first reveal that feature centralization in BN stabilizes the correlation coefficients of features in unnormalized ReLU neural networks to achieve feature decorrelation and accelerate convergence in training. We demonstrate that weight centralization that subtracts means from weight parameters is equivalent to BN in feature decorrelation and achieves the same linear convergence rate in training. Subsequently, we show that feature unitization in BN enables dynamic learning rate that inversely varies with the norm of features for training and propose an adaptive loss function to emulate feature unitization. Furthermore, we exemplify the theoretical results to develop an efficient alternative to BN using a simple combination of weight centralization and the proposed adaptive loss function. Extensive experiments show that the proposed method achieves comparable classification accuracy and evidently reduces memory consumption in comparison to BN, and outperforms normalization-free methods in image classification. We further extend the weight centralization to enable small-batch training for object detection networks.
Published in: IEEE Transactions on Signal Processing ( Volume: 72)
Page(s): 2827 - 2841
Date of Publication: 06 June 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.