Residual-Driven Enhancement of Multi-Output Prediction

dc.contributor.authorMohammad Hawawreh
dc.date.accessioned2026-01-04T13:16:13Z
dc.date.available2026-01-04T13:16:13Z
dc.date.issued2025-11-20
dc.description.abstractBusiness Intelligence teams often train a separate model for each Key Performance Indicator KPI to keep governance and explanations simple. The drawback is that related KPIs (e.g., revenue and margin) move together; training them in isolation can yield accurate but incoherent forecasts. This thesis proposes Residual-Polished Random Forests (RPRF), a light, two-stage upgrade that preserves per-target Random Forests RFs while explicitly borrowing strength across targets through the residuals. Stage A fits one RF per target and computes leakage-safe out-of-bag (OOB) residuals. Stage B constructs, for any new case, a local means of nearby training residuals (k-nearest neighbor, k-NN) and applies a covariance-aware linear transform to align that correction with observed cross-target error structure; the adjusted residual is then added back to the Stage-A predictions. The design keeps standard per-target interpretability, adds only one main hyperparameter (k), and avoids target leakage via OOB predictions. The method is evaluated under controlled synthetic scenarios (systematically varying sample size n, predictors p, number of targets q, and residual correlation ρ) and on two real datasets: the first one is volatile organic compounds VOCs with four outputs q=4; and the another is ENB2012 Energy with two output q=2. Performance is reported as per-target RMSE/R² and macro-averages, using a fixed train/validation/test protocol (k chosen on validation). Across simulations, RPRF consistently surpasses independent RF and often exceeds XGBoost, especially when data are limited, p is large, q is moderate-to high, or cross-target dependence is non-trivial, precisely the regimes where variance control and coherence matter. On real data, RPRF attains the best overall accuracy on VOCs (clear inter-target structure), and remains near-ceiling on ENB2012, where all models perform extremely well, and XGBoost retains a slight edge with only two outputs. Overall, RPRF offers predictive improvements with minimal disruption, making it a practical default when KPIs are correlated but per-target workflows must be retained.
dc.identifier.urihttps://hdl.handle.net/20.500.11888/20734
dc.language.isoen
dc.publisherAn Najah National University
dc.supervisorDr. Abdelrahman EID
dc.titleResidual-Driven Enhancement of Multi-Output Prediction
dc.title.alternativeتعزيز التنبؤ متعدد المخرجات بالاعتماد على البواقي
Files
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections