Multi-objective evolutionary optimization for dimensionality reduction of texts represented by synsets

View article
PeerJ Computer Science

Main article text

 

Introduction

Formulating dimensionality reduction problem as an optimization problem

where T=(M,V) is the transformation of the dataset matrix M using the vector of changes V, 10xval_eval(c,T(M,V)).FPr and 10xval_eval(c,T(M,V)).FNr represent the false positive ratio ( FPr) and false negative ratio ( FNr), respectively. FPr and FNr are calculated using a 10-fold cross validation scheme of the classifier c, applied to the dataset represented as a matrix (M). To improve the readability of the text, dimension ratio ( DIMr), FPr and FNr are used in the rest of the manuscript to denote the functions f1, f2 and f3 respectively.

Experimental design

Selecting a dataset from available corpora

Preprocessing configuration

Experimental protocol

Optimization process configuration

Results and discussion

Theoretical and practical implications

Conclusions and future work

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Iñaki Vélez de Mendizabal conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Vitor Basto-Fernandes conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, coordination, and approved the final draft.

Enaitz Ezpeleta conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, coordination, and approved the final draft.

José R. Méndez conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Silvana Gómez-Meire analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Urko Zurutuza conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, coordination, Project funding, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The YouTube Spam Collection Data Set is available at the UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/YouTube+Spam+Collection.

The source code used for running the experiments is available at GitHub: https://github.com/sing-group/moea4sdr; Iñaki Velez de Mendizabal, Vitor Basto Fernandes, Enaitz Ezpeleta, José Ramón Méndez Reboredo, Silvana Gómez Meire, & Urko Zurutuza. (2022). Multi-Objective Evolutioary Algorithms for Synset Dimensionality Reduction [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7441851.

Funding

This work was supported by SMEIC, SRA and ERDF (TIN2017-84658-C2-1-R and TIN2017-84658-C2-2-R subprojects of Semantic Knowledge Integration for Content- Based Spam Filtering) and by the Conselleria de Cultura, Educación e Universidade of Xunta de Galicia (Competitive Reference Group—ED431C 2022/03-GRC). The Intelligent Systems for Industrial Systems research group of Mondragon Unibertsitatea (Iñaki Vélez de Mendizabal, Enaitz Ezpeleta, and Urko Zurutuza) is supported by the department of Education, Universities and Research of the Basque Country (IT1676-22). Vitor Basto Fernandes was supported by FCT (Fundação para a Ciência e a Tecnologia) I.P. (UIDB/04466/2020 and UIDP/04466/2020). There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

1 Citation 860 Views 67 Downloads