Introduction

  • AI Sales Forecasting often starts with feature-based ML (GBDT). This lesson shows when to move to deep learning and how to use foundation models as fast baselines.
  • TL;DR: pick models based on covariate availability, rolling backtests, calibrated uncertainty, and cost/latency.

Why it matters: Deep learning only pays off when it reduces decision risk (stockouts/overstock) at an acceptable operational cost.

1) Model landscape (train-from-scratch vs pretrained)

  • Train-from-scratch: DeepAR, TFT, N-HiTS, TiDE, PatchTST
  • Pretrained foundation models: TimesFM, Chronos, TimeGPT

Why it matters: Pretrained models accelerate baselining; train-from-scratch can fit your domain more tightly.

2) Prerequisites

  • Long format schema: unique_id, ds, y
  • Correct covariate typing (static / known future / observed-only)
  • Compute planning (GPU vs CPU, batch inference windows)

Why it matters: Most production failures come from missing covariates at inference and uncontrolled inference cost.

3) Step-by-step production design

Step A — Lock two baselines

  • Feature-based ML (Part 4)
  • One foundation model baseline (TimesFM or Chronos or TimeGPT)

Step B — Choose architecture by constraints

  • DeepAR for fast probabilistic global modeling
  • TFT for multi-horizon + rich covariates + interpretability
  • TiDE / N-HiTS for long horizon without attention overhead
  • PatchTST for transformer-based long horizon with patching efficiency
  • Chronos-Bolt if you need faster zero-shot with better efficiency (as reported by AWS)

Why it matters: Architecture choice fixes your data pipeline, training strategy, and serving envelope.

4) Validation and uncertainty

  • Rolling-origin backtests for honest multi-horizon evaluation
  • Metrics: WAPE + pinball loss + prediction-interval coverage
  • TimeGPT and Nixtla docs describe quantiles and prediction intervals concepts

Why it matters: If uncertainty is miscalibrated, inventory policies become systematically wrong.

Conclusion

  • Use deep learning when it improves multi-horizon, covariate-heavy, probabilistic forecasting in a measurable way.
  • Always keep a foundation-model baseline to avoid over-tuning.
  • Next (Part 6): turning forecasts into replenishment decisions (service levels, safety stock, P50/P90 policies).

Summary

  • Deep learning is a tool for probabilistic, multi-horizon forecasting — not a default upgrade.
  • Pretrained TS foundation models speed up baselining and can help in low-data regimes.
  • Production success depends on covariate availability, rolling validation, calibration, and serving cost.

#ai #timeseries #forecasting #demandforecasting #deeplearning #foundationmodels #timesfm #chronos #timegpt #mlops

References

  • (A decoder-only foundation model for time-series forecasting, 2024-04-17)[https://arxiv.org/abs/2310.10688]
  • (Chronos: Learning the Language of Time Series, 2024-03-12)[https://arxiv.org/abs/2403.07815]
  • (TimeGPT-1, 2023-10-05)[https://arxiv.org/abs/2310.03589]
  • (DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks, 2017-04-13)[https://arxiv.org/abs/1704.04110]
  • (Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting, 2019-12-19)[https://arxiv.org/abs/1912.09363]
  • (PatchTST, 2022-11-27)[https://arxiv.org/abs/2211.14730]
  • (Demand forecasting with TFT, 2026-02-09)[https://pytorch-forecasting.readthedocs.io/en/v1.4.0/tutorials/stallion.html]
  • (Chronos-Bolt blog, 2024-12-02)[https://aws.amazon.com/blogs/machine-learning/fast-and-accurate-zero-shot-forecasting-with-chronos-bolt-and-autogluon/]
  • (TimeGPT probabilistic docs, 2026-02-09)[https://nixtla.io/docs/forecasting/probabilistic/introduction]
  • (NeuralForecast intro, 2026-02-09)[https://nixtlaverse.nixtla.io/neuralforecast/docs/getting-started/introduction.html]