AN OPTIMIZED DEEP-FOREST MODEL USING A MODIFIED DIFFERENTIAL EVOLUTION OPTIMIZATION ALGORITHM: A CASE OF HOST-PATHOGEN PROTEIN-PROTEIN INTERACTION PREDICTION
No Thumbnail Available
Date
2025-04
Journal Title
Journal ISSN
Volume Title
Publisher
Covenant University Ota
Abstract
Deep forest is an advanced ensemble learning technique that employs forest structures within a cascade framework, leveraging deep architectures to enhance predictive performance by adaptively capturing high-level feature representations. Despite its promise, deep forest models often face critical challenges, including manual hyperparameter optimization and inefficiencies in computational time and memory usage. To address these limitations, Bayesian optimization, a prominent model-based hyperparameter optimization method, is frequently utilized, with Differential Evolution (DE) serving as the acquisition function in recent implementations. However, DE's reliance on random index selection for constructing donor vectors introduces inefficiencies, as suboptimal or redundant indices may hinder the search for optimal solutions. This study introduces an optimized deep forest algorithm that integrates a modified DE acquisition function into Bayesian optimization to improve host-pathogen protein-protein interaction (HPPPI) prediction. The modified DE approach incorporates a weighted and adaptive donor vector selection mechanism, enhancing the exploration and exploitation of hyperparameter configurations. Performance evaluations using 10-fold cross-validation on human–Plasmodium falciparum (PF) protein sequence datasets sourced from reputable databases demonstrated the model's superiority over traditional Bayesian optimization, genetic algorithms, evolutionary strategies, and conventional machine learning models. The optimized framework achieved an accuracy of 89.3%, sensitivity of 85.4%, precision of 91.6%, and Area Under the Receiver Operating Characteristic Curve (AUROC) of 89.1%, surpassing existing methods. Additionally, the model exhibited reduced computational time and memory usage. The optimized DF was deployed as a web-based pipeline, DFH3PI (Deep Forest Host-Pathogen Protein-Protein Interaction Prediction), which successfully identified three potential human–PF PPIs previously classified as non-interacting: P50250–P08319, Q8ILI6–O94813, and Q7KQL3–Q96GQ7. These findings not only present the potential of DFH3PI for advancing HPPPI prediction but also establish the optimized deep forest framework as a transformative tool in computational biology. Its ability to combine accuracy and efficiency marks a significant step forward in predictive modeling.
Description
Keywords
Deep forest, hyperparameter, optimization, protein-protein interaction, Plasmodium falciparum, malaria.