IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models

View article
Bioinformatics and Genomics

Main article text

 

Introduction

Materials & Methods

Data preparation

Feature engineering

Construction and prediction models evaluation

Results and Discussion

Selection and analysis of feature class

Comparison of different classification algorithms

IdentPMP outperforms other method

Conclusions

Supplemental Information

The meaning of the feature classes and the number in each pretreatment step

Dimension, the original dimension of feature classes. IG (Information Gain), the feature dimension after feature selection using IG. PCA (Principal Comp onent Analysis), the feature dimension after dimensionality reduction using PCA method.

DOI: 10.7717/peerj.11900/supp-1

Modeling performance of each feature class (70%)

AUPRC, area under the precision–recall curve; AUC, area under the receiver operating characteristic curve. Sen, sensitivity. Spe, specificity. MCC, Matthews correlation coefficient. F1, F1-score. For those feature classes whose information entropy of all features is less than 0.05, the dimension of feature selection is set to 70%. The maximum values in each metric are marked in bold.

DOI: 10.7717/peerj.11900/supp-2

Modeling performance of each feature class

AUPRC, area under the precision–recall curve; AUC, area under the receiver operating characteristic curve. Spe, specificity. MCC, Matthews correlation coefficient. F1, F1-score. For those feature classes whose information entropy of all features is less than 0.05, the dimension of feature selection is set to 80%. The maximum values in each metric are marked in bold.

DOI: 10.7717/peerj.11900/supp-3

Modeling performance of each feature class (90%)

AUPRC, area under the precision–recall curve; AUC, area under the receiver operating characteristic curve. Sen, sensitivity. Spe, specificity. MCC, Matthews correlation coefficient. F1, F1-score. For those feature classes whose information entropy of all features is less than 0.05, the dimension of feature selection is set to 90%. The maximum values in each metric are marked in bold.

DOI: 10.7717/peerj.11900/supp-4

The performance of five algorithms on independent test sets

AUPRC, area under the precision–recall curve; AUC, area under the receiver operating characteristic curve. Sen, sensitivity. Spe, specificity. MCC, Matthews correlation coefficient. F1, F1-score. The maximum values in each evaluation metric are marked in bold.

DOI: 10.7717/peerj.11900/supp-5

Additional Information and Declarations

Competing Interests

The authors declare there are no competing interests.

Author Contributions

Xinyi Liu performed the experiments, analyzed the data, prepared figures and/or tables, designed and constructed website, and approved the final draft.

Yueyue Shen performed the experiments, prepared figures and/or tables, and approved the final draft.

Youhua Zhang and Fei Liu analyzed the data, authored or reviewed drafts of the paper, and approved the final draft.

Zhiyu Ma performed the experiments, prepared figures and/or tables, designed and constructed website, and approved the final draft.

Zhenyu Yue conceived and designed the experiments, analyzed the data, authored or reviewed drafts of the paper, and approved the final draft.

Yi Yue conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The code and data (UniProt ID of training set and PlantMP dataset) are available at http://identpmp.aielab.net/.

The Uniprot ID is used to query proteins in Uniprot.

https://www.uniprot.org/.

Funding

This work was supported by the grants from the Natural Science Young Foundation of Anhui (2008085QF293), the 2020 “Three Renewal and One Creation” Innovation Platform Fund-Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture lnformation (Anhui Development and Reform Innovation [2020]555), the Natural Science Young Foundation of Anhui Agricultural University (2019zd12), and the Introduction, Stabilization of Talent Project of Anhui Agricultural University (yj2019-32) and the Graduate Innovation Fund of Anhui Agricultural University (2021yjs-53). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

6 Citations 1,334 Views 126 Downloads

Your institution may have Open Access funds available for qualifying authors. See if you qualify

Publish for free

Comment on Articles or Preprints and we'll waive your author fee
Learn more

Five new journals in Chemistry

Free to publish • Peer-reviewed • From PeerJ
Find out more