I. Introduction
The field of natural language processing (NLP) encompasses a variety of topics, which involves the computational processing and understanding of human languages. Since the 1980s, the field has increasingly relied on data-driven computation involving statistics, probability, and machine learning [1], [2]. Recent increases in computational power and parallelization, harnessed by graphical processing units (GPUs) [3], [4], now allow for “deep learning,” which utilizes artificial neural networks (ANNs), sometimes with billions of trainable parameters [5]. In addition, the contemporary availability of large data sets, facilitated by sophisticated data collection processes, enables the training of such deep architectures [6]–[8].