Introduction

  • TL;DR: AI Sales Forecasting with feature-based ML turns time series into a supervised regression problem using lags/rolling stats, calendar signals, and exogenous variables.
  • The winning recipe is: feature taxonomy → point-in-time correctness → rolling-origin backtests → WAPE → quantile forecasts.

Why it matters: This approach scales across many SKUs/stores and stays maintainable when your catalog grows.

1) What “feature-based ML” means for sales forecasting

Definition, scope, common misconception

  • Definition: convert time series into a feature table (lags/rollings/calendar/exogenous) and fit a regressor (GBDT).
  • Misconception: “GBDT can’t do time series.” It can, if the feature pipeline and validation are correct.

Why it matters: Most failures come from leakage and bad validation, not from the model class.

2) Lock the data schema first (long format)

Use a consistent schema: unique_id, ds, y. It enables global modeling and uniform backtesting.

Why it matters: Schema consistency makes “add a new SKU” a data task, not an engineering project.

3) Exogenous variables must be typed (Static / Dynamic / Calendar)

Nixtla separates sales exogenous signals into Static, Dynamic (known ahead), and Calendar.

Why it matters: If you don’t type them, you will either leak future data or lose critical signal at inference.

4) Stop leakage with point-in-time joins

Point-in-time correctness prevents training on feature values that were not available when the label was recorded.

Why it matters: Leakage creates “great offline scores” and “terrible production results.”

5) Multi-step strategy: recursive vs direct

  • mlforecast defaults to recursive, and documents “one model per horizon” (direct).
  • sktime’s reduction approach exposes a strategy argument (recursive, etc.).

Why it matters: Strategy choice drives training data shape, operational complexity, and long-horizon error behavior.

6) Validation: rolling origin + TimeSeriesSplit + WAPE

  • Rolling forecasting origin (time series CV) is a standard evaluation pattern.
  • TimeSeriesSplit avoids training on the future; samples must be equally spaced for comparable folds.
  • Use WAPE as a default operations-friendly metric.

Why it matters: A strict validation protocol lets you improve features safely without fooling yourself.

7) Uncertainty: quantile forecasts with GBDT

  • LightGBM supports objective=quantile.
  • XGBoost documents quantile loss via reg:quantileerror.
  • CatBoost includes quantile-family regression losses.

Why it matters: Inventory and replenishment decisions need ranges (P10/P50/P90), not only point forecasts.

Conclusion

  • Build feature-based ML forecasting as a pipeline: schema → feature taxonomy → point-in-time joins → rolling-origin backtests → WAPE → quantiles.
  • Prefer recursive for short horizons; direct for longer horizons when error accumulation hurts.
  • Next (Part 5): deep learning / foundation-model forecasting—when it’s worth the cost and when it’s not.

Summary

  • Feature-based ML is the fastest path to scalable sales forecasting in many real businesses.
  • Point-in-time correctness is non-negotiable for reliable offline-to-online performance.
  • WAPE + rolling-origin backtests keeps evaluation honest; quantiles make forecasts actionable.

#ai #timeseries #forecasting #demandforecasting #salesforecasting #featureengineering #mlops #lightgbm #xgboost #wape

References

  • (Point-in-time feature joins - Databricks, 2025-06-20)[https://docs.databricks.com/aws/en/machine-learning/feature-store/time-series]
  • (Feature store overview and glossary - Databricks, 2025-12-10)[https://docs.databricks.com/aws/en/machine-learning/feature-store/concepts]
  • (Point-in-time feature joins - Azure Databricks, 2025-06-20)[https://learn.microsoft.com/en-us/azure/databricks/machine-learning/feature-store/time-series]
  • (Lag features for time-series forecasting in AutoML - Microsoft Learn, 2025-02-26)[https://learn.microsoft.com/en-us/azure/machine-learning/concept-automl-forecasting-lags?view=azureml-api-2]
  • (TimeSeriesSplit - scikit-learn, 2025)[https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html]
  • (Time series cross-validation - FPP3, 2021)[https://otexts.com/fpp3/tscv.html]
  • (WAPE: Weighted Absolute Percentage Error - Rob J Hyndman, 2025-08-08)[https://robjhyndman.com/hyndsight/wmape.html]
  • (Evaluating Predictor Accuracy - AWS Docs)[https://docs.aws.amazon.com/forecast/latest/dg/metrics.html]
  • (Exogenous Variables in MLForecast - Nixtla, 2025-12-05)[https://www.nixtla.io/blog/mlforecast-exogenous-variables]
  • (One model per step - Nixtla MLForecast Docs)[https://nixtlaverse.nixtla.io/mlforecast/docs/how-to-guides/one_model_per_horizon.html]
  • (LightGBM Parameters)[https://lightgbm.readthedocs.io/en/latest/Parameters.html]
  • (XGBoost Parameters)[https://xgboost.readthedocs.io/en/stable/parameter.html]
  • (CatBoost regression loss functions)[https://catboost.ai/docs/en/concepts/loss-functions-regression]
  • (make_reduction - sktime)[https://www.sktime.net/en/v0.19.2/api_reference/auto_generated/sktime.forecasting.compose.make_reduction.html]
  • (Time-related feature engineering - scikit-learn)[https://scikit-learn.org/stable/auto_examples/applications/plot_cyclical_feature_engineering.html]
  • (Frequently asked questions about forecasting in AutoML - Azure)[https://learn.microsoft.com/en-us/azure/machine-learning/how-to-automl-forecasting-faq?view=azureml-api-2]