Introduction
- TL;DR: Kubeflow is an ecosystem for running reproducible ML workflows on Kubernetes—from notebooks and pipelines to distributed training and model serving. (Kubeflow)
- In practice, “using Kubeflow” means wiring together Profiles/Namespaces, Notebooks, Pipelines (KFP), training (Trainer), tuning (Katib), and serving (KServe) with clear operational boundaries. (Kubeflow)
1) What “Kubeflow” is in 2026: Projects vs Platform
Kubeflow can be installed as standalone projects (e.g., Pipelines-only) or as the integrated Kubeflow AI reference platform. The official “Installing Kubeflow” guide explicitly frames these as two installation methods. (Kubeflow)
Why it matters: Treating Kubeflow as an ecosystem lets you start small (one project) and expand to a full platform when your team is ready, reducing operational risk. (Kubeflow)
2) Install and Access: Manifests + Kustomize + Istio Gateway
Kubeflow 1.11 was released on 2025-12-15. The upstream manifests repository documents both “single-command” and “install individual components” approaches using Kustomize.
2-1) Fast local install (Kind example)
The manifests README provides a Kind-based flow and a retry loop for applying resources (to handle CRD/CR timing).
| |
2-2) Access via port-forward
The default access path is port-forwarding the Istio ingress gateway and logging in via Dex using the documented default credentials. (GitHub)
| |
For exposing Kubeflow via Ingress/LoadBalancer, the manifests README warns that many web apps rely on Secure Cookies, so you’ll typically need HTTPS for non-localhost domains. (GitHub)
Why it matters: Most “Kubeflow is broken” reports are actually ingress/auth/cookie issues. Lock down access patterns early (HTTPS + identity integration) for a stable UI. (GitHub)
3) Multi-tenancy Basics: Profiles and Namespaces
A Kubeflow Profile wraps a Kubernetes Namespace and supports an owner + contributors model. (Kubeflow) Kubeflow Pipelines multi-user isolation is part of the Profile/Namespace isolation strategy and is documented as supported in Kubeflow Platform deployments. (Kubeflow)
Why it matters: Namespaces are the unit of isolation for runs, artifacts, and access control. Define Profile conventions before onboarding multiple teams. (Kubeflow)
4) Notebooks: Reproducible dev environments on Kubernetes
The Notebooks quickstart shows the standard UI flow: open Central Dashboard → select a namespace → create notebook servers. (Kubeflow)
Why it matters: Notebooks are the front door to your platform. Standardize images, PVC usage, and namespace defaults to make downstream Pipelines/Trainer workloads consistent. (Kubeflow)
5) Kubeflow Pipelines: DSL → IR YAML → Run
To submit a pipeline, you compile it to YAML using the KFP SDK compiler; the output is an IR YAML representation of the pipeline spec. (Kubeflow)
| |
The official “Run a Pipeline” guide describes uploading the compiled artifact from the KFP dashboard to start runs. (Kubeflow)
The manifests repo also documents a Kubernetes-native API mode where pipeline definitions are stored as Kubernetes CRs (Pipeline, PipelineVersion). (GitHub)
Why it matters: Treat IR YAML as the contract for reproducibility and CI/CD. Namespace-aware design is essential for multi-tenant operations. (Kubeflow)
6) Training at Scale: Kubeflow Trainer v2
Kubeflow Trainer is presented as a Kubernetes-native project for scalable distributed training (including LLM fine-tuning) across frameworks. (Kubeflow)
Its installation guide lists prerequisites such as Kubernetes >= 1.31 and kubectl >= 1.31. (Kubeflow)
Migration guidance explains that Trainer v2 introduces unified APIs (e.g., TrainJob, TrainingRuntime) that replace framework-specific CRDs like PyTorchJob/TFJob/MPIJob. (Kubeflow)
Why it matters: Unified APIs reduce operational fragmentation as you add frameworks and hardware types (multi-node, multi-GPU) over time. (Kubeflow)
7) Tuning and Serving: Katib + KServe
Katib user guides describe configuring Trial templates for HPO experiments. (Kubeflow)
The manifests repo documents KServe installation, noting KFServing was rebranded to KServe. (GitHub)
KServe’s website shows the InferenceService-based workflow, and Kubeflow’s Models Web App documentation states it works with v1beta1 InferenceService. (kserve.github.io)
Why it matters: Katib + KServe closes the loop from experimentation to production APIs, turning “trained models” into continuously deployable services. (Kubeflow)
Conclusion
- Start with the right mental model: Kubeflow Projects vs the integrated platform. (Kubeflow)
- Use manifests + Kustomize for installation and plan access (HTTPS) early. (GitHub)
- Make Profiles/Namespaces your multi-tenant foundation. (Kubeflow)
- Standardize on KFP’s IR YAML workflow for reproducibility and CI/CD. (Kubeflow)
- Scale training with Trainer v2 and serve models with KServe for end-to-end MLOps. (Kubeflow)
Summary
- Install Kubeflow via upstream manifests and access via Istio gateway.
- Use Profiles/Namespaces for isolation and governance.
- Build pipelines with KFP (DSL → IR YAML → Runs).
- Run distributed training with Kubeflow Trainer v2.
- Tune with Katib and deploy with KServe (
InferenceService).
Recommended Hashtags
#kubeflow #kubernetes #mlops #kubeflowpipelines #kfp #kubeflowtrainer #katib #kserve #gitops #llm
References
- Kubeflow 1.11 Release | Kubeflow | 2025-12-15 |
https://www.kubeflow.org/docs/releases/kubeflow-1.11/(Kubeflow) - Kubeflow Deployment Manifests | GitHub | 2026-01-08 (accessed) |
https://github.com/kubeflow/manifests(GitHub) - Installing Kubeflow | Kubeflow | 2025-12 (published, “3 weeks ago”) |
https://www.kubeflow.org/docs/started/installing-kubeflow/(Kubeflow) - Profiles and Namespaces | Kubeflow | 2025-03-29 |
https://www.kubeflow.org/docs/components/central-dash/profiles/(Kubeflow) - Notebooks Quickstart Guide | Kubeflow | 2025-03-29 |
https://www.kubeflow.org/docs/components/notebooks/quickstart-guide/(Kubeflow) - Compile a Pipeline | Kubeflow | 2025-08-12 |
https://www.kubeflow.org/docs/components/pipelines/user-guides/core-functions/compile-a-pipeline/(Kubeflow) - Run a Pipeline | Kubeflow | 2025-03-29 |
https://www.kubeflow.org/docs/components/pipelines/user-guides/core-functions/run-a-pipeline/(Kubeflow) - Multi-user Isolation (KFP) | Kubeflow | 2025-12-04 |
https://www.kubeflow.org/docs/components/pipelines/operator-guides/multi-user/(Kubeflow) - Kubeflow Trainer Installation | Kubeflow | 2025-11-07 |
https://www.kubeflow.org/docs/components/trainer/operator-guides/installation/(Kubeflow) - KServe (InferenceService) | KServe | 2026-01-08 (accessed) |
https://kserve.github.io/website/(kserve.github.io)