Data Preprocessing and Visualization for AI: A Complete Guide (Lecture 5)
Data Preprocessing and Visualization for AI: A Complete Guide (Lecture 5) In this lecture, we’ll cover data preprocessing—the crucial step to ensure your AI models work with clean, structured, and meaningful data. We’ll also explore data visualization techniques to better understand your dataset. Table of Contents {% toc %} 1) Why Data Preprocessing Matters Model performance depends heavily on data quality. Even the most advanced algorithms can fail if the input data is noisy or inconsistent. ...
Supervised Learning Practice: Classification and Regression (Lecture 6)
Supervised Learning Practice: Classification and Regression (Lecture 6) In this lecture, we’ll explore Supervised Learning, understand the difference between Classification and Regression, review popular algorithms, and implement both tasks using scikit-learn. Table of Contents {% toc %} 1) What Is Supervised Learning? Supervised Learning uses input data (X) paired with labels (y) to train a model that can predict the correct output for unseen inputs. 1.1 Classification vs. Regression Type Description Output Examples Use Cases Classification Predicts a category Spam/Not spam, species Spam detection, diagnosis Regression Predicts a continuous value Price, temperature House price, sales forecast 2) Classification 2.1 Concept Assigns each input to one of several classes. Example: “Is this email spam?” 2.2 Common Algorithms Logistic Regression Decision Tree Support Vector Machine (SVM) Random Forest 3) Regression 3.1 Concept Predicts a continuous value based on input features. Example: “Predict apartment price given size, location, and year built.” 3.2 Common Algorithms Linear Regression Ridge Regression Lasso Regression Decision Tree Regression 4) General Supervised Learning Workflow Prepare data: Separate features (X) and labels (y) Split into training and testing sets Choose a model and train it Predict on test data Evaluate model performance Improve results via hyperparameter tuning 5) Lab: Classification Example (Iris Species) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, classification_report # Load data iris = load_iris() X, y = iris.data, iris.target # Train/test split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y ) # Train model model = LogisticRegression(max_iter=200) model.fit(X_train, y_train) # Predict y_pred = model.predict(X_test) # Evaluate print("Accuracy:", f"{accuracy_score(y_test, y_pred)*100:.2f}%") print(classification_report(y_test, y_pred, target_names=iris.target_names)) 6) Lab: Regression Example (California Housing Prices) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score import numpy as np # Load data housing = fetch_california_housing() X, y = housing.data, housing.target # Train/test split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # Train model reg = LinearRegression() reg.fit(X_train, y_train) # Predict y_pred = reg.predict(X_test) # Evaluate mse = mean_squared_error(y_test, y_pred) rmse = np.sqrt(mse) r2 = r2_score(y_test, y_pred) print(f"RMSE: {rmse:.3f}") print(f"R² Score: {r2:.3f}") 7) Evaluation Metrics Classification ...
Unsupervised Learning Practice: Clustering and Dimensionality Reduction (Lecture 7)
Unsupervised Learning Practice: Clustering and Dimensionality Reduction (Lecture 7) In this lecture, we’ll explore Unsupervised Learning, understand the concepts of Clustering and Dimensionality Reduction, and implement both techniques using scikit-learn. Table of Contents {% toc %} 1) What Is Unsupervised Learning? Unsupervised Learning finds patterns, structures, or relationships in data without labels. Unlike supervised learning, there is no “answer key”—the model discovers hidden rules on its own. 1.1 Common Applications Customer Segmentation: Grouping customers based on purchase history Anomaly Detection: Fraud detection, early fault detection in systems Data Visualization: Reducing high-dimensional data to 2D/3D for interpretation 2) Clustering Clustering groups similar data points together. Popular algorithms include: ...
AI Development Environment Setup: Anaconda, Jupyter, and GPU Acceleration (Lecture 4)
AI Development Environment Setup: Anaconda, Jupyter, and GPU Acceleration (Lecture 4) In this lecture, we’ll set up a stable AI development environment for Machine Learning and Deep Learning projects. You’ll learn how to install Anaconda, run Jupyter Notebook, and configure GPU acceleration with CUDA and cuDNN. Table of Contents {% toc %} 1) Why Environment Setup Matters A well-configured environment prevents common issues such as: Library version conflicts Slow training due to CPU-only execution Non-reproducible results across team members Goals: ...
Deep Learning Basics: CNN vs. RNN and a Hands-On MNIST Example (Lecture 3)
Deep Learning Basics: CNN vs. RNN and a Hands-On MNIST Example (Lecture 3) This is Lecture 3 of our AI 101 series. We’ll explain what Deep Learning is, compare CNNs and RNNs, and finish with a verified TensorFlow/Keras lab where you build a CNN to classify MNIST handwritten digits. Table of Contents {% toc %} 1) What Is Deep Learning? Deep Learning is a subset of Machine Learning that uses multi-layer artificial neural networks to model complex patterns in data—especially effective for images, audio, and text. ...
Machine Learning Basics: Supervised, Unsupervised, and Reinforcement Learning (Lecture 2)
Machine Learning Basics: Supervised, Unsupervised, and Reinforcement Learning (Lecture 2) This is Lecture 2 of our AI 101 series. We’ll break down three core types of Machine Learning, explore their real-world applications, and finish with a verified scikit-learn lab that runs locally without internet access. Table of Contents {% toc %} 1) What Is Machine Learning? Machine Learning (ML) is the process of teaching computers to learn patterns from data and make predictions without being explicitly programmed with rules. ...
AI 101: From Concepts to a Working Example (Lecture 1)
AI 101: From Concepts to a Working Example (Lecture 1) This is Lecture 1 of a 20-part series. We’ll cover what AI is, a short history, where it’s used, and finish with a hands-on lab you can run locally without any external downloads. Table of Contents {% toc %} 1) What Is AI? Artificial Intelligence (AI) enables computers to perform tasks that typically require human intelligence—learning, reasoning, perception, and language understanding. ...
Kubernetes CronJob Guide: Schedule and Automate Batch Jobs
Kubernetes CronJob Guide: Schedule and Automate Batch Jobs Kubernetes CronJob allows you to schedule jobs for automated recurring tasks, similar to crontab in Linux. Whether it’s daily database backups, hourly log cleanups, or periodic alerts, CronJobs are a powerful way to automate jobs in a Kubernetes cluster. Table of Contents {% toc %} 1. What is a Kubernetes CronJob? A CronJob in Kubernetes is a controller that runs Jobs on a time-based schedule. It uses cron format to define when the job should be created, and handles the job execution just like a normal Job resource. ...
Kubernetes Secret Guide: Secure Storage and Management of Sensitive Data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 # Kubernetes Secret Guide: Secure Storage and Management of Sensitive Data Kubernetes **Secrets** allow you to store and manage sensitive information like passwords, API keys, and TLS certificates securely inside your cluster. Rather than embedding these values directly into your application code or Pod specs, you can manage them as separate resources and control their access with RBAC. --- ## Table of Contents {% toc %} --- ## 1. What Is a Secret? A **Secret** in Kubernetes stores **confidential key-value pairs**. These values are **Base64-encoded** and stored in `etcd`. You can inject Secrets into Pods as **environment variables** or **mounted files**. > Important: Base64 is not encryption — it’s only encoding. To protect against data exposure, you should enable **encryption at rest** in Kubernetes. --- ## 2. Why Use Secrets Instead of Hardcoding? Hardcoding sensitive data in Deployment YAML or application code has serious drawbacks: * **Security risk**: Anyone with access to your code or manifests can see credentials. * **Difficult rotation**: Changing credentials requires editing multiple files. * **Audit challenges**: No clear record of who accessed or changed secrets. Secrets address these issues by providing: * Centralized sensitive data management. * Controlled access with **RBAC**. * Separation of configuration from code. --- ## 3. Secret vs ConfigMap | Feature | ConfigMap | Secret | | ----------- | --------------------------- | --------------------------------------- | | Purpose | Non-sensitive configuration | Sensitive data | | Storage | Plain text | Base64 encoded | | Security | Low | Higher (can enable encryption) | | Example Use | App settings, URLs | Passwords, API tokens, TLS certificates | --- ## 4. Key Features of Secrets * **Centralized secure storage** for sensitive values. * **Multiple injection options**: Environment variables or mounted volumes. * **RBAC integration** to control who can access secrets. * **Support for custom types** for specialized use cases. * **Extensibility**: Works with tools like HashiCorp Vault, Sealed Secrets, or External Secrets Operator. --- ## 5. Secret Types | Type | Description | Use Case Example | | -------------------------------- | --------------------------------- | -------------------------------- | | `Opaque` | Default generic key-value pairs | DB credentials, API keys | | `kubernetes.io/dockerconfigjson` | Docker registry authentication | Private image pulls | | `kubernetes.io/tls` | TLS certificates and private keys | HTTPS services | | Custom | For plugins/controllers | Sealed Secrets, External Secrets | --- ## 6. Creating Secrets ### 1) Using YAML ```yaml apiVersion: v1 kind: Secret metadata: name: db-secret type: Opaque data: username: YWRtaW4= # "admin" Base64 encoded password: c2VjdXJlcGFzcw== # "securepass" Base64 encoded Base64 encoding: ...
Kubernetes Job Guide: One-Time and Batch Workloads Made Easy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 # Kubernetes Job Guide: One-Time and Batch Workloads Made Easy Kubernetes **Job** is a workload resource designed for **one-time or limited-run tasks**. Unlike Deployments, Jobs are not meant for always-on services — they **run until completion** and then exit. In this guide, we’ll cover: * What a Job is and why it exists * Key features and real-world use cases * How to configure retry policies and parallelism * YAML examples for different scenarios * Operational tips and best practices --- ## Table of Contents {% toc %} --- ## 1. What Is a Kubernetes Job? A Kubernetes Job manages **finite-duration tasks** by creating Pods that run to completion. Once the task finishes successfully, the Job marks it as complete and stops creating new Pods. **Examples:** * Data migration scripts * Database backups * Generating reports * Archiving logs * Machine learning model training (single run) --- ## 2. Why Use a Job Instead of a Pod? Running a task in a Pod alone is possible but comes with limitations: | Problem with Standalone Pod | How Job Solves It | | ---------------------------- | ----------------------------- | | Pod fails → manual restart | Automatic retries | | No completion tracking | Maintains success/fail status | | Hard to manage multiple Pods | Supports parallel execution | Jobs provide **fault tolerance**, **state tracking**, and **parallel processing control** out-of-the-box. --- ## 3. Key Features | Feature | Description | | ---------------- | ----------------------------------------- | | Completion-based | Stops once the task finishes successfully | | Retry support | Automatically restarts failed Pods | | State tracking | Keeps history of successful/failed Pods | | Parallelism | Run multiple Pods at the same time | | Resource control | Allocate CPU/memory per Job run | --- ## 4. Job vs CronJob vs Deployment | Resource | Purpose | Execution Mode | End Condition | | ---------- | ------------------------ | -------------- | ------------- | | Job | One-time/limited tasks | Immediate run | On completion | | CronJob | Scheduled recurring jobs | Based on cron | On completion | | Deployment | Always-on applications | Continuous run | Manual stop | --- ## 5. Basic YAML Example ```yaml apiVersion: batch/v1 kind: Job metadata: name: data-processing-job spec: template: spec: containers: - name: data-processor image: python:3.10 command: ["python", "-c", "print('Processing data...')"] restartPolicy: OnFailure backoffLimit: 3 Key Fields: ...