A multi-level classification based ensemble and feature extractor for credit risk assessment

View article
PeerJ Computer Science

Main article text

 

Introduction

  • 1)

    Address data imbalance: The S1 comprehensive sampling method is used, which combines the S2 and S3 methods to have more advantages.

  • 2)

    Design the feature extractor: The three feature extractors reduced the data dimension, which reduced the storage space and computational cost. Deep learning models have powerful learning capabilities, and auto-encoders with special structures can learn features better.

  • 3)

    Model hyperparameter optimization: The DNN, AE, RF, XGBoost, LightGBM, and GDBT used in this article are optimized to ensure the optimal performance of the model.

  • 4)

    Combined ensemble learning model: Integrating multiple ensemble learning models can not only ensure the diversity of the combined model but also improve the accuracy of classification.

Data features and principal techniques

Data features

Principal techniques

Model evaluation metrics

Multi-level classification based ensemble and feature extractor approach

SMOTE+Tomek links sampling

Features extraction

DNN feature extractor

Auto-encoder feature extractor

PCA feature extractor

Multi-level classification based ensemble learning

Multiple ensemble classifiers

Stacking

Composing abstract features and ensemble model

Experiment and analysis

Results and discussion

Conclusions and future work

Supplemental Information

Data and code.

Model training results and comparison data.

DOI: 10.7717/peerj-cs.1915/supp-1

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Yuanyuan Wang conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Zhuang Wu conceived and designed the experiments, performed the experiments, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Jing Gao performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Chenjun Liu analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Fangfang Guo analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The code and raw data are available in the Supplemental Files.

Funding

This work was supported by the Innovation Fund of Industry, Education and Research of China University (2021LDA11003). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

3 Citations 1,122 Views 73 Downloads