Introduction
- AI Sales Forecasting often starts with feature-based ML (GBDT). This lesson shows when to move to deep learning and how to use foundation models as fast baselines.
- TL;DR: pick models based on covariate availability, rolling backtests, calibrated uncertainty, and cost/latency.
Why it matters: Deep learning only pays off when it reduces decision risk (stockouts/overstock) at an acceptable operational cost.
1) Model landscape (train-from-scratch vs pretrained)
- Train-from-scratch: DeepAR, TFT, N-HiTS, TiDE, PatchTST
- Pretrained foundation models: TimesFM, Chronos, TimeGPT
Why it matters: Pretrained models accelerate baselining; train-from-scratch can fit your domain more tightly.
2) Prerequisites
- Long format schema:
unique_id, ds, y - Correct covariate typing (static / known future / observed-only)
- Compute planning (GPU vs CPU, batch inference windows)
Why it matters: Most production failures come from missing covariates at inference and uncontrolled inference cost.
3) Step-by-step production design
Step A — Lock two baselines
- Feature-based ML (Part 4)
- One foundation model baseline (TimesFM or Chronos or TimeGPT)
Step B — Choose architecture by constraints
- DeepAR for fast probabilistic global modeling
- TFT for multi-horizon + rich covariates + interpretability
- TiDE / N-HiTS for long horizon without attention overhead
- PatchTST for transformer-based long horizon with patching efficiency
- Chronos-Bolt if you need faster zero-shot with better efficiency (as reported by AWS)
Why it matters: Architecture choice fixes your data pipeline, training strategy, and serving envelope.
4) Validation and uncertainty
- Rolling-origin backtests for honest multi-horizon evaluation
- Metrics: WAPE + pinball loss + prediction-interval coverage
- TimeGPT and Nixtla docs describe quantiles and prediction intervals concepts
Why it matters: If uncertainty is miscalibrated, inventory policies become systematically wrong.
Conclusion
- Use deep learning when it improves multi-horizon, covariate-heavy, probabilistic forecasting in a measurable way.
- Always keep a foundation-model baseline to avoid over-tuning.
- Next (Part 6): turning forecasts into replenishment decisions (service levels, safety stock, P50/P90 policies).
Summary
- Deep learning is a tool for probabilistic, multi-horizon forecasting — not a default upgrade.
- Pretrained TS foundation models speed up baselining and can help in low-data regimes.
- Production success depends on covariate availability, rolling validation, calibration, and serving cost.
Recommended Hashtags
#ai #timeseries #forecasting #demandforecasting #deeplearning #foundationmodels #timesfm #chronos #timegpt #mlops
References
- (A decoder-only foundation model for time-series forecasting, 2024-04-17)[https://arxiv.org/abs/2310.10688]
- (Chronos: Learning the Language of Time Series, 2024-03-12)[https://arxiv.org/abs/2403.07815]
- (TimeGPT-1, 2023-10-05)[https://arxiv.org/abs/2310.03589]
- (DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks, 2017-04-13)[https://arxiv.org/abs/1704.04110]
- (Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting, 2019-12-19)[https://arxiv.org/abs/1912.09363]
- (PatchTST, 2022-11-27)[https://arxiv.org/abs/2211.14730]
- (Demand forecasting with TFT, 2026-02-09)[https://pytorch-forecasting.readthedocs.io/en/v1.4.0/tutorials/stallion.html]
- (Chronos-Bolt blog, 2024-12-02)[https://aws.amazon.com/blogs/machine-learning/fast-and-accurate-zero-shot-forecasting-with-chronos-bolt-and-autogluon/]
- (TimeGPT probabilistic docs, 2026-02-09)[https://nixtla.io/docs/forecasting/probabilistic/introduction]
- (NeuralForecast intro, 2026-02-09)[https://nixtlaverse.nixtla.io/neuralforecast/docs/getting-started/introduction.html]