How I Built Demand Forecasting Prototypes
Combining Prophet and XGBoost into an ensemble, building the decision interface in Power BI, and why the question matters more than the model.
Demand planning in most supply chain organisations still runs on spreadsheets. The forecast is a number in a cell, produced by a combination of historical average, gut feel, and the most recent sales conversation. When demand shifts — and it always does — the plan is wrong, the inventory is wrong, and the scramble to respond costs money and time. The Forecasting Prototypes were built to explore whether a more rigorous, model-based approach could deliver meaningfully better accuracy in the specific context of the organisation's demand patterns.
These are prototypes rather than production systems: the goal was to establish which modelling approaches performed best against historical data, build the decision interface that planners would actually use, and generate enough evidence to justify investing in a full production deployment. That framing — prototype first, production when proven — is the right sequencing for forecasting investment.
Data Pipeline and Feature Engineering
The data pipeline pulls from three sources: the ERP system for historical order data, the demand planning tool for existing forecasts (used as a baseline benchmark), and a weather API for seasonal exogenous variables relevant to the specific product categories. The pipeline cleans, aligns, and resamples the data to weekly granularity — the planning cadence the organisation operates on. Feature engineering creates lag variables, rolling averages, and calendar features (week of year, proximity to public holidays, seasonal indicators) that the models use alongside the raw demand signal.
Model Architecture
Two modelling approaches were evaluated: Prophet (a decomposition-based model from Meta that handles trend, seasonality, and holiday effects explicitly) and XGBoost (a gradient boosting model that treats forecasting as a supervised learning problem over the engineered features). Both were benchmarked against the existing spreadsheet baseline using held-out validation periods.
The final ensemble combines the two models with a simple weighted average, where the weights are optimised per SKU category based on validation performance. Categories with strong seasonal patterns benefit most from Prophet's explicit seasonality decomposition. Categories with complex feature interactions benefit more from XGBoost's ability to capture non-linear relationships. The ensemble consistently outperforms either model individually across the held-out validation periods.
The Decision Interface in Power BI
A forecast that lives in a Python notebook is not useful to a planner. The decision interface was built in Power BI, which is the tool the planning team already uses daily. The dashboard shows the model forecast alongside the existing plan, with confidence intervals that give planners a sense of forecast uncertainty at different horizons. An override mechanism allows planners to apply their own judgement — knowledge of a promotional event, a customer-specific signal — on top of the model forecast, with the override tracked and measured against eventual actuals.
Capabilities and Outcomes
The prototype covers weekly and monthly forecast horizons across the top 200 SKUs by volume. Accuracy measurement uses MAPE (mean absolute percentage error) as the primary metric, with bias tracking to identify systematic over or under-forecasting by category. Against the held-out validation period, the ensemble model reduced MAPE by approximately 15% compared to the existing spreadsheet baseline across the full SKU set, with larger improvements in the seasonal categories where the existing approach was most unreliable. The prototype results justified the investment in a production deployment, which is now in scoping.