Residual-Driven Enhancement of Multi-Output Prediction
| dc.contributor.author | Mohammad Hawawreh | |
| dc.date.accessioned | 2026-01-04T13:16:13Z | |
| dc.date.available | 2026-01-04T13:16:13Z | |
| dc.date.issued | 2025-11-20 | |
| dc.description.abstract | Business Intelligence teams often train a separate model for each Key Performance Indicator KPI to keep governance and explanations simple. The drawback is that related KPIs (e.g., revenue and margin) move together; training them in isolation can yield accurate but incoherent forecasts. This thesis proposes Residual-Polished Random Forests (RPRF), a light, two-stage upgrade that preserves per-target Random Forests RFs while explicitly borrowing strength across targets through the residuals. Stage A fits one RF per target and computes leakage-safe out-of-bag (OOB) residuals. Stage B constructs, for any new case, a local means of nearby training residuals (k-nearest neighbor, k-NN) and applies a covariance-aware linear transform to align that correction with observed cross-target error structure; the adjusted residual is then added back to the Stage-A predictions. The design keeps standard per-target interpretability, adds only one main hyperparameter (k), and avoids target leakage via OOB predictions. The method is evaluated under controlled synthetic scenarios (systematically varying sample size n, predictors p, number of targets q, and residual correlation ρ) and on two real datasets: the first one is volatile organic compounds VOCs with four outputs q=4; and the another is ENB2012 Energy with two output q=2. Performance is reported as per-target RMSE/R² and macro-averages, using a fixed train/validation/test protocol (k chosen on validation). Across simulations, RPRF consistently surpasses independent RF and often exceeds XGBoost, especially when data are limited, p is large, q is moderate-to high, or cross-target dependence is non-trivial, precisely the regimes where variance control and coherence matter. On real data, RPRF attains the best overall accuracy on VOCs (clear inter-target structure), and remains near-ceiling on ENB2012, where all models perform extremely well, and XGBoost retains a slight edge with only two outputs. Overall, RPRF offers predictive improvements with minimal disruption, making it a practical default when KPIs are correlated but per-target workflows must be retained. | |
| dc.identifier.uri | https://hdl.handle.net/20.500.11888/20734 | |
| dc.language.iso | en | |
| dc.publisher | An Najah National University | |
| dc.supervisor | Dr. Abdelrahman EID | |
| dc.title | Residual-Driven Enhancement of Multi-Output Prediction | |
| dc.title.alternative | تعزيز التنبؤ متعدد المخرجات بالاعتماد على البواقي |
Files
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: