Bab 11: MLOps & Model Deployment

Dari Development ke Production: ML Lifecycle, Containerization & Deployment Strategies

Bab 11: MLOps & Model Deployment

🎯 Hasil Pembelajaran (Learning Outcomes)

Setelah mempelajari bab ini, Anda akan mampu:

  1. Memahami ML lifecycle dan prinsip-prinsip MLOps
  2. Mengimplementasikan containerization dengan Docker untuk ML applications
  3. Membangun REST API untuk model serving menggunakan FastAPI dan Flask
  4. Menerapkan model optimization dengan ONNX Runtime
  5. Merancang deployment strategies (batch vs real-time, cloud vs edge)
  6. Mengintegrasikan monitoring dan logging untuk production systems
  7. Mengelola model versioning dan CI/CD pipelines untuk ML

11.1 MLOps: Bridging ML Development dan Production

11.1.1 Dari Jupyter Notebook ke Production

The Reality Check:

Anda telah berhasil membuat model dengan 95% accuracy di Jupyter Notebook. Tapi… bagaimana cara menyajikannya ke users? Ini adalah salah satu tantangan terbesar dalam machine learning!

Production Gap:

Code
flowchart LR
    A["Jupyter Notebook<br/>95% Accuracy"] --> B{"Production Gap"}
    B --> C["Scalability?"]
    B --> D["Latency?"]
    B --> E["Reliability?"]
    B --> F["Monitoring?"]
    B --> G["Updates?"]

    C --> H["Challenges"]
    D --> H
    E --> H
    F --> H
    G --> H

    H --> I["MLOps Solutions"]

    style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style B fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px,color:#fff
    style C fill:#fff3cd,stroke:#856404,stroke-width:1px
    style D fill:#fff3cd,stroke:#856404,stroke-width:1px
    style E fill:#fff3cd,stroke:#856404,stroke-width:1px
    style F fill:#fff3cd,stroke:#856404,stroke-width:1px
    style G fill:#fff3cd,stroke:#856404,stroke-width:1px
    style H fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
    style I fill:#51cf66,stroke:#2b8a3e,stroke-width:2px
flowchart LR
    A["Jupyter Notebook<br/>95% Accuracy"] --> B{"Production Gap"}
    B --> C["Scalability?"]
    B --> D["Latency?"]
    B --> E["Reliability?"]
    B --> F["Monitoring?"]
    B --> G["Updates?"]

    C --> H["Challenges"]
    D --> H
    E --> H
    F --> H
    G --> H

    H --> I["MLOps Solutions"]

    style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style B fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px,color:#fff
    style C fill:#fff3cd,stroke:#856404,stroke-width:1px
    style D fill:#fff3cd,stroke:#856404,stroke-width:1px
    style E fill:#fff3cd,stroke:#856404,stroke-width:1px
    style F fill:#fff3cd,stroke:#856404,stroke-width:1px
    style G fill:#fff3cd,stroke:#856404,stroke-width:1px
    style H fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
    style I fill:#51cf66,stroke:#2b8a3e,stroke-width:2px
Figure 22.1: Production Gap - tantangan yang dihadapi saat membawa model dari Jupyter Notebook ke production

Statistik Mengejutkan:

  • 87% model ML tidak pernah masuk production (VentureBeat, 2023)
  • 80% data science project gagal deliver business value
  • Average time dari prototype ke production: 6-12 bulan
⚠️ Common Pitfalls

Kesalahan Umum:

  1. “Works on my machine” syndrome
  2. Tidak ada model versioning
  3. Manual deployment process
  4. Tidak ada monitoring
  5. Hardcoded configurations
  6. Tidak scalable

11.1.2 Apa itu MLOps?

MLOps = Machine Learning Operations: Praktik dan kultur untuk mengotomasi dan mempercepat ML lifecycle dari development hingga production.

MLOps = DevOps + Data + ML

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.patches import FancyBboxPatch
import numpy as np

fig, ax = plt.subplots(figsize=(14, 8))

# Define circles for Venn diagram
circle1 = plt.Circle((0.35, 0.5), 0.3, color='#4ECDC4', alpha=0.3, label='DevOps')
circle2 = plt.Circle((0.65, 0.5), 0.3, color='#FF6B6B', alpha=0.3, label='Data Engineering')
circle3 = plt.Circle((0.5, 0.7), 0.3, color='#95E1D3', alpha=0.3, label='ML/Data Science')

ax.add_patch(circle1)
ax.add_patch(circle2)
ax.add_patch(circle3)

# Add labels
ax.text(0.2, 0.3, 'CI/CD\nContainers\nOrchestration',
        fontsize=10, ha='center', weight='bold')
ax.text(0.8, 0.3, 'Data Pipelines\nData Quality\nData Versioning',
        fontsize=10, ha='center', weight='bold')
ax.text(0.5, 0.9, 'Model Training\nHyperparameter\nTuning',
        fontsize=10, ha='center', weight='bold')

# Center - MLOps
ax.text(0.5, 0.5, 'MLOps\n\n• Model Versioning\n• Automated Training\n• Model Monitoring\n• A/B Testing',
        fontsize=11, ha='center', va='center', weight='bold',
        bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.3))

ax.set_xlim(0, 1)
ax.set_ylim(0, 1.2)
ax.set_aspect('equal')
ax.axis('off')
ax.set_title('MLOps: Intersection of Three Disciplines',
             fontsize=16, fontweight='bold', pad=20)

plt.tight_layout()
plt.show()

Prinsip Utama MLOps:

  1. Automation: Automated training, testing, deployment
  2. Continuous X: CI/CD/CT (Continuous Training)
  3. Versioning: Code, data, dan model versioning
  4. Monitoring: Performance dan data drift monitoring
  5. Reproducibility: Dapat reproduce hasil experiments
  6. Collaboration: Data scientists, engineers, operations bekerja sama

11.1.3 ML Lifecycle

End-to-End ML Lifecycle:

Code
graph TB
    A[Problem Definition] --> B[Data Collection]
    B --> C[EDA & Analysis]
    C --> D[Data Preprocessing]
    D --> E[Feature Engineering]
    E --> F[Model Training]
    F --> G[Model Evaluation]
    G --> H{Good Enough?}
    H -->|No| E
    H -->|Yes| I[Model Deployment]
    I --> J[Monitoring]
    J --> K{Drift Detected?}
    K -->|Yes| B
    K -->|No| J

    style A fill:#ffd93d
    style I fill:#6bcf7f
    style J fill:#4d96ff
    style K fill:#ff6b6b

graph TB
    A[Problem Definition] --> B[Data Collection]
    B --> C[EDA & Analysis]
    C --> D[Data Preprocessing]
    D --> E[Feature Engineering]
    E --> F[Model Training]
    F --> G[Model Evaluation]
    G --> H{Good Enough?}
    H -->|No| E
    H -->|Yes| I[Model Deployment]
    I --> J[Monitoring]
    J --> K{Drift Detected?}
    K -->|Yes| B
    K -->|No| J

    style A fill:#ffd93d
    style I fill:#6bcf7f
    style J fill:#4d96ff
    style K fill:#ff6b6b

Perbandingan Traditional Software vs ML Systems:

Aspek Traditional Software ML Systems
Code Deterministic Probabilistic
Testing Unit tests cukup Need data validation, model tests
Deployment Deploy code Deploy code + data + model
Monitoring Log errors Monitor metrics + data drift + model drift
Updates Fix bugs Retrain dengan data baru
Dependencies Libraries Libraries + data + infrastructure

11.1.4 MLOps Maturity Levels

Level 0: Manual Process - Everything manual - Jupyter notebooks - No automation - Problem: Tidak scalable, tidak reproducible

Level 1: ML Pipeline Automation - Automated training pipeline - Model versioning - Basic CI/CD - Better: Reproducible, tapi deployment masih manual

Level 2: CI/CD Pipeline Automation - Automated training, testing, deployment - Monitoring dan alerts - Data validation - Production-ready: Full automation

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(14, 6))

levels = ['Level 0:\nManual', 'Level 1:\nML Pipeline\nAutomation', 'Level 2:\nFull CI/CD\nAutomation']
characteristics = {
    'Manual Work': [90, 40, 10],
    'Automation': [10, 60, 90],
    'Reproducibility': [20, 70, 95],
    'Time to Deploy': [100, 50, 15]
}

x = np.arange(len(levels))
width = 0.2
multiplier = 0

colors = ['#ff6b6b', '#4ecdc4', '#45b7d1', '#f9ca24']

for attribute, measurement in characteristics.items():
    offset = width * multiplier
    ax.bar(x + offset, measurement, width, label=attribute, alpha=0.8)
    multiplier += 1

ax.set_ylabel('Score / Time (arbitrary units)', fontsize=12, fontweight='bold')
ax.set_xlabel('MLOps Maturity Level', fontsize=12, fontweight='bold')
ax.set_title('MLOps Maturity Progression', fontsize=14, fontweight='bold')
ax.set_xticks(x + width * 1.5)
ax.set_xticklabels(levels)
ax.legend(loc='upper left', fontsize=10)
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()
💡 Goal: Achieve Level 2

Di chapter ini, kita akan belajar fondasi untuk mencapai Level 2 MLOps:

  • Container-based deployment (Docker)
  • REST API untuk model serving
  • Model optimization (ONNX)
  • Basic monitoring

11.2 Containerization dengan Docker

11.2.1 Mengapa Docker untuk ML?

Problem: “Works on my machine”

Data Scientist: "Model saya works perfectly!"
Engineer: "Di server production error terus..."
DevOps: "Python version berbeda, library incompatible..."

Solution: Docker Containers

Docker memastikan environment consistency dari development hingga production.

Analogi: Shipping Container

Seperti shipping container di pelabuhan:

  • Standardized: Same format, works everywhere
  • Isolated: Content tidak affect luar
  • Portable: Bisa pindah dari truck → ship → train
Code
graph LR
    A[Development<br/>Laptop] -->|Docker Image| B[Testing<br/>Server]
    B -->|Same Image| C[Production<br/>Cloud]
    C -->|Same Image| D[Edge<br/>Device]

    style A fill:#4ecdc4
    style B fill:#45b7d1
    style C fill:#51cf66
    style D fill:#ffd93d

graph LR
    A[Development<br/>Laptop] -->|Docker Image| B[Testing<br/>Server]
    B -->|Same Image| C[Production<br/>Cloud]
    C -->|Same Image| D[Edge<br/>Device]

    style A fill:#4ecdc4
    style B fill:#45b7d1
    style C fill:#51cf66
    style D fill:#ffd93d

Keuntungan Docker untuk ML:

  1. Reproducibility: Exact same environment
  2. Isolation: Dependencies tidak conflict
  3. Portability: Run anywhere (local, cloud, edge)
  4. Scalability: Easy to scale dengan orchestration (Kubernetes)
  5. Version Control: Docker images dapat di-version

11.2.2 Docker Basics untuk ML

Key Concepts:

  • Image: Template untuk container (like a class)
  • Container: Running instance of image (like an object)
  • Dockerfile: Recipe untuk build image
  • Docker Hub: Registry untuk share images

Basic Docker Commands:

# Build image
docker build -t my-ml-model:v1 .

# Run container
docker run -p 8000:8000 my-ml-model:v1

# List containers
docker ps

# Stop container
docker stop <container-id>

# Remove container
docker rm <container-id>

# List images
docker images

# Remove image
docker rmi my-ml-model:v1

11.2.3 Dockerfile untuk ML Model

Example: Dockerfile untuk Scikit-learn Model

# Base image dengan Python
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Copy requirements
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose port
EXPOSE 8000

# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Penjelasan:

  1. FROM: Base image (Python 3.10 slim version)
  2. WORKDIR: Set working directory di container
  3. COPY: Copy files dari host ke container
  4. RUN: Execute commands (install dependencies)
  5. EXPOSE: Document port yang digunakan
  6. CMD: Default command saat container start

Best Practices Dockerfile:

# ✅ Good: Multi-stage build untuk size optimization
FROM python:3.10-slim as builder

WORKDIR /app

# Install dependencies in builder stage
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Runtime stage
FROM python:3.10-slim

WORKDIR /app

# Copy only necessary files
COPY --from=builder /root/.local /root/.local
COPY main.py .
COPY model.pkl .

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

EXPOSE 8000

# Use non-root user untuk security
RUN useradd -m myuser
USER myuser

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

requirements.txt:

fastapi==0.104.1
uvicorn[standard]==0.24.0
scikit-learn==1.3.2
pandas==2.1.3
numpy==1.26.2
pydantic==2.5.0
🔐 Security Best Practices
  1. Don’t run as root: Use non-root user
  2. Minimize image size: Use slim/alpine base images
  3. Scan for vulnerabilities: docker scan my-image
  4. Don’t include secrets: Use environment variables
  5. Pin versions: Specific versions di requirements.txt

11.2.4 Docker Compose untuk Multi-Container

Untuk aplikasi dengan multiple services (API + database + monitoring):

docker-compose.yml:

version: '3.8'

services:
  # ML API Service
  ml-api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - MODEL_PATH=/models/model.pkl
      - DATABASE_URL=postgresql://user:pass@db:5432/mlops
    volumes:
      - ./models:/models
    depends_on:
      - db
    restart: unless-stopped

  # PostgreSQL Database
  db:
    image: postgres:15-alpine
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=mlops
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  # Prometheus Monitoring
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus

  # Grafana Dashboard
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana_data:/var/lib/grafana
    depends_on:
      - prometheus

volumes:
  postgres_data:
  prometheus_data:
  grafana_data:

Run dengan Docker Compose:

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f ml-api

# Stop all services
docker-compose down

# Stop and remove volumes
docker-compose down -v

11.3 Model Serving dengan FastAPI

11.3.1 Mengapa FastAPI?

FastAPI adalah modern, fast web framework untuk building APIs dengan Python.

Keunggulan FastAPI:

  1. Performance: Setara dengan NodeJS dan Go (thanks to Starlette + Pydantic)
  2. Type Hints: Automatic validation dan documentation
  3. Async Support: Native async/await support
  4. Auto Docs: Interactive API docs (Swagger UI)
  5. Easy to Learn: Intuitive, mirip Flask tapi lebih powerful

Performance Comparison:

import matplotlib.pyplot as plt
import numpy as np

frameworks = ['Flask', 'Django', 'FastAPI', 'Go Gin', 'NodeJS\nExpress']
requests_per_sec = [1200, 800, 18000, 25000, 15000]
colors = ['#3498db', '#e74c3c', '#2ecc71', '#f39c12', '#9b59b6']

fig, ax = plt.subplots(figsize=(12, 6))
bars = ax.barh(frameworks, requests_per_sec, color=colors, alpha=0.7)

# Add value labels
for i, (bar, val) in enumerate(zip(bars, requests_per_sec)):
    ax.text(val + 500, i, f'{val:,}', va='center', fontweight='bold')

ax.set_xlabel('Requests per Second', fontsize=12, fontweight='bold')
ax.set_title('Web Framework Performance Comparison\n(Higher is Better)',
             fontsize=14, fontweight='bold')
ax.grid(axis='x', alpha=0.3)

# Highlight FastAPI
bars[2].set_edgecolor('green')
bars[2].set_linewidth(3)

plt.tight_layout()
plt.show()

11.3.2 Building ML API dengan FastAPI

Step 1: Train dan Save Model

Code
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import joblib
import pandas as pd

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Evaluate
accuracy = model.score(X_test, y_test)
print(f"Model Accuracy: {accuracy:.3f}")

# Save model
joblib.dump(model, 'models/iris_model.pkl')
print("Model saved to models/iris_model.pkl")

# Save feature names dan target names untuk API
import json
metadata = {
    'feature_names': iris.feature_names,
    'target_names': iris.target_names.tolist(),
    'accuracy': float(accuracy)
}
with open('models/metadata.json', 'w') as f:
    json.dump(metadata, f, indent=2)

Step 2: Create Pydantic Models untuk Validation

Code
from pydantic import BaseModel, Field, validator
from typing import List

class IrisFeatures(BaseModel):
    """Input features untuk Iris prediction"""
    sepal_length: float = Field(..., ge=0, le=10,
                                description="Sepal length in cm")
    sepal_width: float = Field(..., ge=0, le=10,
                              description="Sepal width in cm")
    petal_length: float = Field(..., ge=0, le=10,
                               description="Petal length in cm")
    petal_width: float = Field(..., ge=0, le=10,
                              description="Petal width in cm")

    @validator('*')
    def check_positive(cls, v):
        if v < 0:
            raise ValueError('Value must be positive')
        return v

class PredictionResponse(BaseModel):
    """Response format untuk prediction"""
    prediction: str
    probability: float
    all_probabilities: dict
    model_version: str = "1.0.0"

class HealthResponse(BaseModel):
    """Health check response"""
    status: str
    model_loaded: bool
    version: str

Step 3: FastAPI Application (main.py)

Code
from fastapi import FastAPI, HTTPException, status
from fastapi.responses import JSONResponse
import joblib
import numpy as np
import json
from pathlib import Path
import logging

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize FastAPI app
app = FastAPI(
    title="Iris Classification API",
    description="ML API untuk prediksi species Iris flowers",
    version="1.0.0",
    docs_url="/docs",
    redoc_url="/redoc"
)

# Global variables
model = None
metadata = None

@app.on_event("startup")
async def load_model():
    """Load model saat aplikasi start"""
    global model, metadata

    try:
        model_path = Path("models/iris_model.pkl")
        metadata_path = Path("models/metadata.json")

        if not model_path.exists():
            raise FileNotFoundError(f"Model file not found: {model_path}")

        model = joblib.load(model_path)

        if metadata_path.exists():
            with open(metadata_path, 'r') as f:
                metadata = json.load(f)

        logger.info("Model loaded successfully")
        logger.info(f"Model accuracy: {metadata.get('accuracy', 'N/A')}")

    except Exception as e:
        logger.error(f"Error loading model: {e}")
        raise

@app.get("/", tags=["General"])
async def root():
    """Root endpoint"""
    return {
        "message": "Welcome to Iris Classification API",
        "docs": "/docs",
        "health": "/health"
    }

@app.get("/health", response_model=HealthResponse, tags=["General"])
async def health_check():
    """Health check endpoint"""
    return {
        "status": "healthy" if model is not None else "unhealthy",
        "model_loaded": model is not None,
        "version": "1.0.0"
    }

@app.post("/predict", response_model=PredictionResponse, tags=["Prediction"])
async def predict(features: IrisFeatures):
    """
    Predict Iris species berdasarkan features

    - **sepal_length**: Sepal length dalam cm
    - **sepal_width**: Sepal width dalam cm
    - **petal_length**: Petal length dalam cm
    - **petal_width**: Petal width dalam cm
    """
    try:
        # Prepare input
        input_data = np.array([[
            features.sepal_length,
            features.sepal_width,
            features.petal_length,
            features.petal_width
        ]])

        # Predict
        prediction = model.predict(input_data)[0]
        probabilities = model.predict_proba(input_data)[0]

        # Get class names
        class_names = metadata.get('target_names',
                                   ['setosa', 'versicolor', 'virginica'])

        # Format response
        return {
            "prediction": class_names[prediction],
            "probability": float(probabilities[prediction]),
            "all_probabilities": {
                class_names[i]: float(prob)
                for i, prob in enumerate(probabilities)
            },
            "model_version": "1.0.0"
        }

    except Exception as e:
        logger.error(f"Prediction error: {e}")
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=f"Prediction failed: {str(e)}"
        )

@app.post("/batch_predict", tags=["Prediction"])
async def batch_predict(features_list: List[IrisFeatures]):
    """Batch prediction untuk multiple samples"""
    try:
        predictions = []

        for features in features_list:
            input_data = np.array([[
                features.sepal_length,
                features.sepal_width,
                features.petal_length,
                features.petal_width
            ]])

            prediction = model.predict(input_data)[0]
            probabilities = model.predict_proba(input_data)[0]

            class_names = metadata.get('target_names',
                                       ['setosa', 'versicolor', 'virginica'])

            predictions.append({
                "prediction": class_names[prediction],
                "probability": float(probabilities[prediction])
            })

        return {"predictions": predictions, "count": len(predictions)}

    except Exception as e:
        logger.error(f"Batch prediction error: {e}")
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=f"Batch prediction failed: {str(e)}"
        )

@app.get("/model/info", tags=["Model"])
async def model_info():
    """Get model information dan metadata"""
    if metadata is None:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Metadata not found"
        )

    return {
        "model_type": "RandomForestClassifier",
        "accuracy": metadata.get('accuracy'),
        "features": metadata.get('feature_names'),
        "classes": metadata.get('target_names'),
        "version": "1.0.0"
    }

# Exception handler
@app.exception_handler(Exception)
async def global_exception_handler(request, exc):
    logger.error(f"Global exception: {exc}")
    return JSONResponse(
        status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
        content={"detail": "Internal server error"}
    )

Step 4: Run API

# Development mode (auto-reload)
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# Production mode (multiple workers)
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

Step 5: Test API

# Health check
curl http://localhost:8000/health

# Single prediction
curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "sepal_length": 5.1,
    "sepal_width": 3.5,
    "petal_length": 1.4,
    "petal_width": 0.2
  }'

# Batch prediction
curl -X POST "http://localhost:8000/batch_predict" \
  -H "Content-Type: application/json" \
  -d '[
    {"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2},
    {"sepal_length": 6.7, "sepal_width": 3.0, "petal_length": 5.2, "petal_width": 2.3}
  ]'

# Model info
curl http://localhost:8000/model/info

Interactive API Documentation:

Buka browser: http://localhost:8000/docs

FastAPI automatically generates Swagger UI untuk testing API!

11.3.3 Flask Alternative (Simpler)

Untuk project sederhana, Flask juga bisa digunakan:

Code
from flask import Flask, request, jsonify
import joblib
import numpy as np

app = Flask(__name__)

# Load model
model = joblib.load('models/iris_model.pkl')

@app.route('/predict', methods=['POST'])
def predict():
    try:
        data = request.get_json()

        features = np.array([[
            data['sepal_length'],
            data['sepal_width'],
            data['petal_length'],
            data['petal_width']
        ]])

        prediction = model.predict(features)[0]
        probability = model.predict_proba(features)[0]

        return jsonify({
            'prediction': int(prediction),
            'probability': float(probability[prediction])
        })

    except Exception as e:
        return jsonify({'error': str(e)}), 400

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8000, debug=False)

Perbandingan FastAPI vs Flask:

Aspek FastAPI Flask
Performance ⭐⭐⭐⭐⭐ (Async) ⭐⭐⭐ (Sync)
Type Validation ✅ Built-in (Pydantic) ❌ Manual
Auto Documentation ✅ Swagger + ReDoc ❌ Need extensions
Async Support ✅ Native ⚠️ Via extensions
Learning Curve Medium Easy
Best For Production APIs Prototypes, simple apps
💡 Recommendation
  • Prototype cepat: Flask
  • Production API: FastAPI
  • Legacy systems: Flask (banyak resources)
  • High performance: FastAPI

11.4 Model Optimization dengan ONNX

11.4.1 Apa itu ONNX?

ONNX = Open Neural Network Exchange: Open format untuk representing ML models.

Problem yang Dipecahkan:

Code
graph LR
    A[Train in<br/>PyTorch] --> B{Deploy where?}
    B --> C[TensorFlow<br/>Serving]
    B --> D[ONNX<br/>Runtime]
    B --> E[Mobile<br/>CoreML]
    B --> F[Edge<br/>TensorRT]

    style B fill:#ff6b6b
    style D fill:#51cf66

graph LR
    A[Train in<br/>PyTorch] --> B{Deploy where?}
    B --> C[TensorFlow<br/>Serving]
    B --> D[ONNX<br/>Runtime]
    B --> E[Mobile<br/>CoreML]
    B --> F[Edge<br/>TensorRT]

    style B fill:#ff6b6b
    style D fill:#51cf66

Tanpa ONNX: Need to convert manually, atau stuck dengan satu framework.

Dengan ONNX:

  1. Train di framework favorit (PyTorch, TensorFlow, scikit-learn)
  2. Convert ke ONNX format
  3. Deploy di berbagai platforms dengan ONNX Runtime

Keuntungan ONNX:

  1. Interoperability: Framework-agnostic
  2. Performance: ONNX Runtime highly optimized
  3. Portability: Run di berbagai devices
  4. Optimization: Built-in graph optimizations

11.4.2 ONNX Runtime Performance

Benchmark Results:

import matplotlib.pyplot as plt
import numpy as np

frameworks = ['PyTorch\n(CPU)', 'TensorFlow\n(CPU)', 'ONNX Runtime\n(CPU)',
              'PyTorch\n(GPU)', 'ONNX Runtime\n(GPU)']
inference_time = [12.5, 10.3, 4.2, 2.8, 1.1]  # ms per batch

fig, ax = plt.subplots(figsize=(12, 6))
colors = ['#e74c3c', '#3498db', '#2ecc71', '#e74c3c', '#2ecc71']
bars = ax.bar(frameworks, inference_time, color=colors, alpha=0.7)

# Highlight ONNX
bars[2].set_edgecolor('green')
bars[2].set_linewidth(3)
bars[4].set_edgecolor('green')
bars[4].set_linewidth(3)

ax.set_ylabel('Inference Time (ms)', fontsize=12, fontweight='bold')
ax.set_title('ONNX Runtime Performance Comparison\n(Lower is Better)',
             fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3)

# Add value labels
for bar, val in zip(bars, inference_time):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{val} ms', ha='center', va='bottom', fontweight='bold')

# Add speedup annotations
ax.text(0, 13, '3x slower', ha='center', fontsize=10, style='italic')
ax.text(2, 5, '⚡ Fast!', ha='center', fontsize=11, fontweight='bold', color='green')
ax.text(4, 2, '⚡⚡ Fastest!', ha='center', fontsize=11, fontweight='bold', color='green')

plt.tight_layout()
plt.show()

11.4.3 Convert Scikit-learn Model ke ONNX

Code
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import joblib
import numpy as np

# Load trained model
model = joblib.load('models/iris_model.pkl')

# Define input type
# Shape: (None, 4) berarti batch_size can vary, 4 features
initial_type = [('float_input', FloatTensorType([None, 4]))]

# Convert to ONNX
onnx_model = convert_sklearn(
    model,
    initial_types=initial_type,
    target_opset=12  # ONNX opset version
)

# Save ONNX model
with open("models/iris_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

print("Model converted to ONNX successfully!")

# Verify conversion
import onnxruntime as rt

# Load ONNX model
sess = rt.InferenceSession("models/iris_model.onnx")

# Test prediction
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name

test_data = np.array([[5.1, 3.5, 1.4, 0.2]], dtype=np.float32)
pred_onnx = sess.run([label_name], {input_name: test_data})

print(f"ONNX Prediction: {pred_onnx}")

# Compare dengan original model
pred_sklearn = model.predict(test_data)
print(f"Scikit-learn Prediction: {pred_sklearn}")
print(f"Match: {np.array_equal(pred_onnx[0], pred_sklearn)}")

11.4.4 Convert PyTorch Model ke ONNX

Code
import torch
import torch.nn as nn
import torch.onnx

# Define simple PyTorch model
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(4, 16)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(16, 8)
        self.fc3 = nn.Linear(8, 3)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return x

# Create model instance
model = SimpleNN()
model.eval()

# Dummy input untuk tracing
dummy_input = torch.randn(1, 4)

# Export to ONNX
torch.onnx.export(
    model,                      # Model
    dummy_input,                # Dummy input
    "models/pytorch_model.onnx", # Output file
    export_params=True,         # Store trained weights
    opset_version=12,           # ONNX version
    do_constant_folding=True,   # Optimize constants
    input_names=['input'],      # Input names
    output_names=['output'],    # Output names
    dynamic_axes={              # Variable length axes
        'input': {0: 'batch_size'},
        'output': {0: 'batch_size'}
    }
)

print("PyTorch model exported to ONNX")

# Verify
import onnxruntime as rt
import numpy as np

sess = rt.InferenceSession("models/pytorch_model.onnx")
input_name = sess.get_inputs()[0].name

test_input = np.random.randn(1, 4).astype(np.float32)
onnx_output = sess.run(None, {input_name: test_input})

print(f"ONNX Output shape: {onnx_output[0].shape}")

11.4.5 FastAPI dengan ONNX Runtime

Optimized API dengan ONNX:

Code
from fastapi import FastAPI
import onnxruntime as rt
import numpy as np
from pydantic import BaseModel

app = FastAPI(title="ONNX-Powered ML API")

# Load ONNX model
session = rt.InferenceSession("models/iris_model.onnx")
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name

class IrisFeatures(BaseModel):
    sepal_length: float
    sepal_width: float
    petal_length: float
    petal_width: float

@app.post("/predict")
async def predict(features: IrisFeatures):
    # Prepare input
    input_data = np.array([[
        features.sepal_length,
        features.sepal_width,
        features.petal_length,
        features.petal_width
    ]], dtype=np.float32)

    # Run inference dengan ONNX Runtime
    pred = session.run([output_name], {input_name: input_data})

    return {
        "prediction": int(pred[0][0]),
        "model": "ONNX Runtime"
    }

Performance Comparison:

Code
import time
import numpy as np

# Test data
test_data = np.random.randn(1000, 4).astype(np.float32)

# Benchmark scikit-learn
start = time.time()
for i in range(1000):
    _ = sklearn_model.predict(test_data[i:i+1])
sklearn_time = time.time() - start

# Benchmark ONNX
start = time.time()
for i in range(1000):
    _ = session.run([output_name], {input_name: test_data[i:i+1]})
onnx_time = time.time() - start

print(f"Scikit-learn: {sklearn_time:.3f}s")
print(f"ONNX Runtime: {onnx_time:.3f}s")
print(f"Speedup: {sklearn_time/onnx_time:.2f}x")
🚀 When to Use ONNX?

Use ONNX when:

  • High-throughput inference needed
  • Deploying to multiple platforms
  • Edge deployment (mobile, IoT)
  • Cross-framework compatibility required

Skip ONNX when:

  • Simple prototype
  • Model changes frequently
  • Framework-specific features needed

11.5 Deployment Strategies

11.5.1 Batch vs Real-time Serving

Batch Inference:

Process multiple samples together pada scheduled intervals.

Use Cases:

  • Daily recommendations
  • Periodic fraud detection
  • Nightly data processing
  • Batch ETL pipelines

Architecture:

Code
graph LR
    A[Data Lake] --> B[Batch Job<br/>Scheduler]
    B --> C[Load Data]
    C --> D[Model Inference]
    D --> E[Save Predictions]
    E --> F[(Results DB)]

    style B fill:#ffd93d
    style D fill:#51cf66

graph LR
    A[Data Lake] --> B[Batch Job<br/>Scheduler]
    B --> C[Load Data]
    C --> D[Model Inference]
    D --> E[Save Predictions]
    E --> F[(Results DB)]

    style B fill:#ffd93d
    style D fill:#51cf66

Example: Batch Prediction Script

Code
import pandas as pd
import joblib
from datetime import datetime
import logging

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def batch_predict(input_csv, output_csv, model_path):
    """
    Batch prediction untuk large datasets
    """
    # Load model
    logger.info(f"Loading model from {model_path}")
    model = joblib.load(model_path)

    # Load data in chunks (untuk large files)
    logger.info(f"Processing {input_csv}")
    chunk_size = 10000
    chunks = []

    for chunk in pd.read_csv(input_csv, chunksize=chunk_size):
        # Predict
        predictions = model.predict(chunk)
        probabilities = model.predict_proba(chunk).max(axis=1)

        # Add predictions to chunk
        chunk['prediction'] = predictions
        chunk['confidence'] = probabilities
        chunk['predicted_at'] = datetime.now()

        chunks.append(chunk)
        logger.info(f"Processed {len(chunk)} rows")

    # Combine all chunks
    result = pd.concat(chunks, ignore_index=True)

    # Save results
    result.to_csv(output_csv, index=False)
    logger.info(f"Results saved to {output_csv}")
    logger.info(f"Total rows processed: {len(result)}")

    return result

# Run batch job
if __name__ == "__main__":
    batch_predict(
        input_csv="data/new_samples.csv",
        output_csv="results/predictions.csv",
        model_path="models/iris_model.pkl"
    )

Scheduling dengan Cron (Linux):

# Run setiap hari jam 2 pagi
0 2 * * * /usr/bin/python3 /path/to/batch_predict.py

Real-time Inference:

Process requests immediately saat user request.

Use Cases:

  • Web applications
  • Mobile apps
  • Chatbots
  • Real-time recommendations
  • Fraud detection (transaction time)

Architecture:

Code
graph LR
    A[User Request] --> B[Load Balancer]
    B --> C[API Server 1]
    B --> D[API Server 2]
    B --> E[API Server N]

    C --> F[Model]
    D --> F
    E --> F

    F --> G[Response]

    style B fill:#4ecdc4
    style F fill:#51cf66

graph LR
    A[User Request] --> B[Load Balancer]
    B --> C[API Server 1]
    B --> D[API Server 2]
    B --> E[API Server N]

    C --> F[Model]
    D --> F
    E --> F

    F --> G[Response]

    style B fill:#4ecdc4
    style F fill:#51cf66

Perbandingan:

Aspek Batch Real-time
Latency Hours/days Milliseconds
Throughput Very high Lower
Cost Lower (off-peak) Higher (always on)
Complexity Simple Complex (scaling, monitoring)
Use Case Offline analytics User-facing apps

11.5.2 Cloud Deployment Options

Option 1: Platform as a Service (PaaS)

Examples:

  • Google Cloud AI Platform
  • AWS SageMaker
  • Azure ML
  • Hugging Face Inference API

Pros:

  • Managed infrastructure
  • Auto-scaling
  • Built-in monitoring
  • Easy deployment

Cons:

  • Higher cost
  • Vendor lock-in
  • Less control

Example: Deploy ke Hugging Face Spaces (Free!)

# app.py - Gradio interface
import gradio as gr
import joblib
import numpy as np

model = joblib.load('iris_model.pkl')

def predict(sepal_length, sepal_width, petal_length, petal_width):
    features = np.array([[sepal_length, sepal_width, petal_length, petal_width]])
    pred = model.predict(features)[0]
    prob = model.predict_proba(features)[0]

    classes = ['Setosa', 'Versicolor', 'Virginica']
    return {classes[i]: float(prob[i]) for i in range(3)}

iface = gr.Interface(
    fn=predict,
    inputs=[
        gr.Number(label="Sepal Length"),
        gr.Number(label="Sepal Width"),
        gr.Number(label="Petal Length"),
        gr.Number(label="Petal Width")
    ],
    outputs=gr.Label(label="Prediction"),
    title="Iris Classifier",
    description="Predict Iris species"
)

iface.launch()

Option 2: Container Orchestration (Kubernetes)

For production-scale applications:

# kubernetes-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-api-deployment
spec:
  replicas: 3  # 3 instances
  selector:
    matchLabels:
      app: ml-api
  template:
    metadata:
      labels:
        app: ml-api
    spec:
      containers:
      - name: ml-api
        image: your-registry/ml-api:v1
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "1000m"
        env:
        - name: MODEL_PATH
          value: "/models/model.pkl"
---
apiVersion: v1
kind: Service
metadata:
  name: ml-api-service
spec:
  selector:
    app: ml-api
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: LoadBalancer

Option 3: Serverless Functions

Examples:

  • AWS Lambda
  • Google Cloud Functions
  • Azure Functions

Pros:

  • Pay per request
  • Auto-scaling to zero
  • No server management

Cons:

  • Cold start latency
  • Execution time limits
  • Package size limits

Example: AWS Lambda Handler

Code
import json
import joblib
import numpy as np
import boto3

# Load model from S3 (di cold start)
s3 = boto3.client('s3')
s3.download_file('my-bucket', 'models/model.pkl', '/tmp/model.pkl')
model = joblib.load('/tmp/model.pkl')

def lambda_handler(event, context):
    """AWS Lambda handler function"""
    try:
        # Parse input
        body = json.loads(event['body'])
        features = np.array([[
            body['sepal_length'],
            body['sepal_width'],
            body['petal_length'],
            body['petal_width']
        ]])

        # Predict
        prediction = model.predict(features)[0]
        probability = model.predict_proba(features)[0]

        # Return response
        return {
            'statusCode': 200,
            'body': json.dumps({
                'prediction': int(prediction),
                'probability': float(probability[prediction])
            })
        }

    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({'error': str(e)})
        }

11.5.3 Edge Deployment

Edge Computing: Run ML models on device (smartphone, IoT, edge servers).

Why Edge? - Low latency: No network roundtrip - Privacy: Data stays on device - Offline capability: Works tanpa internet - Bandwidth saving: No data upload

Challenges:

  • Limited compute resources
  • Small memory footprint
  • Battery constraints
  • Model optimization crucial

Tools untuk Edge:

  1. TensorFlow Lite: Mobile & embedded devices
  2. ONNX Runtime Mobile: Cross-platform
  3. CoreML: iOS devices
  4. PyTorch Mobile: Android & iOS

Example: Model Quantization untuk Edge

Code
import torch
import torch.quantization

# Load PyTorch model
model = SimpleNN()
model.load_state_dict(torch.load('model.pth'))
model.eval()

# Quantization (Float32 -> Int8)
# Reduces model size 4x!
quantized_model = torch.quantization.quantize_dynamic(
    model,
    {torch.nn.Linear},  # Quantize Linear layers
    dtype=torch.qint8
)

# Save quantized model
torch.save(quantized_model.state_dict(), 'model_quantized.pth')

# Compare sizes
import os
original_size = os.path.getsize('model.pth') / (1024**2)  # MB
quantized_size = os.path.getsize('model_quantized.pth') / (1024**2)

print(f"Original model: {original_size:.2f} MB")
print(f"Quantized model: {quantized_size:.2f} MB")
print(f"Compression ratio: {original_size/quantized_size:.2f}x")

11.6 Monitoring & Maintenance

11.6.1 Mengapa Monitoring Penting?

Models degrade over time!

import matplotlib.pyplot as plt
import numpy as np

# Simulate model performance degradation
months = np.arange(0, 13)
accuracy = 95 - 0.5 * months + np.random.randn(13) * 0.5

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(months, accuracy, marker='o', linewidth=2, markersize=8, color='#e74c3c')
ax.axhline(y=90, color='orange', linestyle='--', linewidth=2, label='Warning Threshold')
ax.axhline(y=85, color='red', linestyle='--', linewidth=2, label='Critical Threshold')

ax.fill_between(months, 85, 90, alpha=0.2, color='orange')
ax.fill_between(months, 0, 85, alpha=0.2, color='red')

ax.set_xlabel('Months in Production', fontsize=12, fontweight='bold')
ax.set_ylabel('Model Accuracy (%)', fontsize=12, fontweight='bold')
ax.set_title('Model Performance Degradation Over Time', fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(alpha=0.3)
ax.set_ylim([80, 100])

# Add annotations
ax.annotate('Model deployed', xy=(0, 95), xytext=(1, 97),
            arrowprops=dict(arrowstyle='->', color='green'),
            fontsize=10, color='green', fontweight='bold')
ax.annotate('Retrain needed!', xy=(10, 88), xytext=(7, 82),
            arrowprops=dict(arrowstyle='->', color='red'),
            fontsize=10, color='red', fontweight='bold')

plt.tight_layout()
plt.show()

Why Performance Degrades:

  1. Data Drift: Input data distribution changes
  2. Concept Drift: Relationship between features dan target changes
  3. Upstream Changes: Data pipeline changes
  4. Seasonal Effects: Patterns change over time

11.6.2 What to Monitor?

1. Model Performance Metrics

Code
from prometheus_client import Counter, Histogram, Gauge
import time

# Define metrics
prediction_counter = Counter(
    'predictions_total',
    'Total number of predictions',
    ['model_version', 'prediction_class']
)

prediction_latency = Histogram(
    'prediction_latency_seconds',
    'Prediction latency in seconds',
    buckets=[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0]
)

model_accuracy = Gauge(
    'model_accuracy',
    'Current model accuracy',
    ['model_version']
)

# Usage dalam prediction function
@app.post("/predict")
async def predict(features: IrisFeatures):
    start_time = time.time()

    # Predict
    prediction = model.predict(...)

    # Record metrics
    prediction_counter.labels(
        model_version='1.0.0',
        prediction_class=prediction
    ).inc()

    latency = time.time() - start_time
    prediction_latency.observe(latency)

    return {"prediction": prediction}

2. Data Quality Metrics

Code
from typing import Dict
import numpy as np

class DataQualityMonitor:
    def __init__(self, reference_stats: Dict):
        """
        Initialize dengan reference statistics dari training data
        """
        self.reference_stats = reference_stats

    def check_data_drift(self, new_data: np.ndarray) -> Dict:
        """
        Check for statistical drift in input data
        """
        results = {}

        for i, feature_name in enumerate(self.reference_stats['features']):
            # Calculate statistics
            ref_mean = self.reference_stats['means'][i]
            ref_std = self.reference_stats['stds'][i]

            new_mean = new_data[:, i].mean()
            new_std = new_data[:, i].std()

            # Check drift (simple threshold-based)
            mean_drift = abs(new_mean - ref_mean) / ref_std
            std_drift = abs(new_std - ref_std) / ref_std

            results[feature_name] = {
                'mean_drift': mean_drift,
                'std_drift': std_drift,
                'drift_detected': mean_drift > 2.0 or std_drift > 0.5
            }

        return results

    def check_missing_values(self, data: np.ndarray) -> Dict:
        """Check for missing or invalid values"""
        return {
            'missing_count': np.isnan(data).sum(),
            'missing_percentage': np.isnan(data).mean() * 100
        }

    def check_outliers(self, data: np.ndarray) -> Dict:
        """Detect outliers using IQR method"""
        Q1 = np.percentile(data, 25, axis=0)
        Q3 = np.percentile(data, 75, axis=0)
        IQR = Q3 - Q1

        lower_bound = Q1 - 1.5 * IQR
        upper_bound = Q3 + 1.5 * IQR

        outliers = ((data < lower_bound) | (data > upper_bound)).any(axis=1)

        return {
            'outlier_count': outliers.sum(),
            'outlier_percentage': outliers.mean() * 100
        }

# Usage
monitor = DataQualityMonitor(reference_stats={
    'features': ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'],
    'means': [5.84, 3.05, 3.76, 1.20],
    'stds': [0.83, 0.43, 1.76, 0.76]
})

# Check new batch of data
drift_report = monitor.check_data_drift(new_data)
if any(result['drift_detected'] for result in drift_report.values()):
    print("⚠️ Data drift detected! Consider retraining.")

3. System Metrics

Code
import psutil
from prometheus_client import Gauge

# CPU and Memory metrics
cpu_usage = Gauge('cpu_usage_percent', 'CPU usage percentage')
memory_usage = Gauge('memory_usage_percent', 'Memory usage percentage')

def collect_system_metrics():
    """Collect system resource metrics"""
    cpu_usage.set(psutil.cpu_percent(interval=1))
    memory_usage.set(psutil.virtual_memory().percent)

# Call periodically (e.g., dalam background task)

11.6.3 Logging Best Practices

Structured Logging:

Code
import logging
import json
from datetime import datetime

class StructuredLogger:
    def __init__(self, name: str):
        self.logger = logging.getLogger(name)
        self.logger.setLevel(logging.INFO)

        # JSON formatter
        handler = logging.StreamHandler()
        self.logger.addHandler(handler)

    def log_prediction(self,
                      input_data: dict,
                      prediction: dict,
                      latency: float):
        """Log prediction dengan structured format"""
        log_entry = {
            'timestamp': datetime.utcnow().isoformat(),
            'event': 'prediction',
            'input': input_data,
            'output': prediction,
            'latency_ms': latency * 1000,
            'model_version': '1.0.0'
        }
        self.logger.info(json.dumps(log_entry))

    def log_error(self, error: Exception, context: dict):
        """Log error dengan context"""
        log_entry = {
            'timestamp': datetime.utcnow().isoformat(),
            'event': 'error',
            'error_type': type(error).__name__,
            'error_message': str(error),
            'context': context
        }
        self.logger.error(json.dumps(log_entry))

# Usage
logger = StructuredLogger('ml-api')

@app.post("/predict")
async def predict(features: IrisFeatures):
    start = time.time()

    try:
        prediction = model.predict(...)

        logger.log_prediction(
            input_data=features.dict(),
            prediction={'class': prediction},
            latency=time.time() - start
        )

        return {"prediction": prediction}

    except Exception as e:
        logger.log_error(e, context={'features': features.dict()})
        raise

11.6.4 Alerting & Notification

Setup Alerts untuk Critical Events:

Code
class AlertManager:
    def __init__(self, webhook_url: str):
        self.webhook_url = webhook_url

    def send_alert(self,
                   severity: str,
                   message: str,
                   metrics: dict = None):
        """Send alert ke Slack/Teams/Email"""
        import requests

        payload = {
            "text": f"🚨 [{severity}] ML Model Alert",
            "attachments": [{
                "color": "danger" if severity == "CRITICAL" else "warning",
                "fields": [
                    {"title": "Message", "value": message},
                    {"title": "Metrics", "value": str(metrics)}
                ],
                "ts": int(time.time())
            }]
        }

        requests.post(self.webhook_url, json=payload)

# Usage
alert_manager = AlertManager(webhook_url="https://hooks.slack.com/...")

# Check accuracy drop
if current_accuracy < 85:
    alert_manager.send_alert(
        severity="CRITICAL",
        message="Model accuracy dropped below threshold!",
        metrics={'current_accuracy': current_accuracy}
    )

11.6.5 Model Retraining Strategy

When to Retrain?

Code
flowchart TD
    A[Monitor Metrics] --> B{Performance<br/>Degraded?}
    B -->|No| A
    B -->|Yes| C{Data Drift<br/>Detected?}
    C -->|Yes| D[Collect New Data]
    C -->|No| E[Check Data Quality]
    D --> F[Retrain Model]
    E --> G{Data Quality<br/>OK?}
    G -->|Yes| H[Investigate Code]
    G -->|No| I[Fix Data Pipeline]
    F --> J[Evaluate New Model]
    J --> K{Better than<br/>Current?}
    K -->|Yes| L[Deploy New Model]
    K -->|No| M[Keep Current Model<br/>+ Investigate]
    L --> A
    M --> A

    style B fill:#ffd93d
    style C fill:#ffd93d
    style F fill:#51cf66
    style L fill:#51cf66

flowchart TD
    A[Monitor Metrics] --> B{Performance<br/>Degraded?}
    B -->|No| A
    B -->|Yes| C{Data Drift<br/>Detected?}
    C -->|Yes| D[Collect New Data]
    C -->|No| E[Check Data Quality]
    D --> F[Retrain Model]
    E --> G{Data Quality<br/>OK?}
    G -->|Yes| H[Investigate Code]
    G -->|No| I[Fix Data Pipeline]
    F --> J[Evaluate New Model]
    J --> K{Better than<br/>Current?}
    K -->|Yes| L[Deploy New Model]
    K -->|No| M[Keep Current Model<br/>+ Investigate]
    L --> A
    M --> A

    style B fill:#ffd93d
    style C fill:#ffd93d
    style F fill:#51cf66
    style L fill:#51cf66

Automated Retraining Pipeline:

Code
from datetime import datetime, timedelta
import schedule

class AutoRetrainingPipeline:
    def __init__(self,
                 model_trainer,
                 performance_threshold: float = 0.85):
        self.model_trainer = model_trainer
        self.performance_threshold = performance_threshold
        self.last_retrain = datetime.now()

    def should_retrain(self, current_metrics: dict) -> bool:
        """Decide if retraining is needed"""
        # Rule 1: Performance degradation
        if current_metrics['accuracy'] < self.performance_threshold:
            return True

        # Rule 2: Time-based (every 30 days)
        if (datetime.now() - self.last_retrain) > timedelta(days=30):
            return True

        # Rule 3: Data drift detected
        if current_metrics.get('data_drift', False):
            return True

        return False

    def retrain_and_deploy(self):
        """Full retraining and deployment pipeline"""
        print("🔄 Starting automated retraining...")

        # 1. Fetch new data
        new_data = self.fetch_new_training_data()

        # 2. Train new model
        new_model = self.model_trainer.train(new_data)

        # 3. Evaluate
        metrics = self.evaluate_model(new_model)

        # 4. Compare dengan current model
        if metrics['accuracy'] > self.get_current_model_accuracy():
            print("✅ New model better, deploying...")
            self.deploy_model(new_model)
            self.last_retrain = datetime.now()
        else:
            print("⚠️ New model not better, keeping current model")

    def fetch_new_training_data(self):
        """Fetch data from production logs"""
        # Implementation: query database, S3, etc.
        pass

    def evaluate_model(self, model):
        """Evaluate model on test set"""
        # Implementation
        pass

    def deploy_model(self, model):
        """Deploy new model to production"""
        # Implementation: save to S3, update model server, etc.
        pass

# Schedule retraining checks
pipeline = AutoRetrainingPipeline(model_trainer=...)

# Check every day
schedule.every().day.at("02:00").do(
    lambda: pipeline.retrain_and_deploy()
    if pipeline.should_retrain(get_current_metrics())
    else None
)

11.7 Model Versioning & Experiment Tracking

11.7.1 Mengapa Model Versioning?

Scenario tanpa versioning:

Engineer: "Model production bermasalah!"
Data Scientist: "Coba rollback ke versi sebelumnya"
Engineer: "Yang mana? Ada 10 file model.pkl..."
Data Scientist: "Yang... accuracy 95%?"
Engineer: "Semuanya 95%..." 😭

With proper versioning:

Engineer: "Rollback ke model v1.2.3"
System: "Deployed model v1.2.3 (trained 2024-01-15, accuracy 95.3%)"
Engineer: "Problem solved!" ✅

11.7.2 Tools untuk ML Experiment Tracking

Popular Tools:

  1. MLflow: Open-source, comprehensive
  2. Weights & Biases (wandb): Cloud-based, great UI
  3. Neptune.ai: Enterprise features
  4. TensorBoard: Built-in dengan TensorFlow/PyTorch

Example: MLflow

Code
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score

# Set experiment
mlflow.set_experiment("iris-classification")

# Start run
with mlflow.start_run(run_name="random-forest-v1"):
    # Log parameters
    params = {
        'n_estimators': 100,
        'max_depth': 10,
        'random_state': 42
    }
    mlflow.log_params(params)

    # Train model
    model = RandomForestClassifier(**params)
    model.fit(X_train, y_train)

    # Evaluate
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred, average='weighted')

    # Log metrics
    mlflow.log_metrics({
        'accuracy': accuracy,
        'f1_score': f1
    })

    # Log model
    mlflow.sklearn.log_model(
        model,
        "model",
        registered_model_name="iris-classifier"
    )

    # Log artifacts (plots, data, etc.)
    import matplotlib.pyplot as plt
    # ... create plot ...
    plt.savefig("confusion_matrix.png")
    mlflow.log_artifact("confusion_matrix.png")

    print(f"Run ID: {mlflow.active_run().info.run_id}")

MLflow UI:

# Start MLflow UI
mlflow ui --host 0.0.0.0 --port 5000

# Access at: http://localhost:5000

Load Best Model:

Code
# Load model by version
model_version = "1"
model = mlflow.pyfunc.load_model(f"models:/iris-classifier/{model_version}")

# Or load best model based on metric
from mlflow.tracking import MlflowClient

client = MlflowClient()
runs = client.search_runs(
    experiment_ids=["1"],
    order_by=["metrics.accuracy DESC"],
    max_results=1
)

best_run_id = runs[0].info.run_id
best_model = mlflow.sklearn.load_model(f"runs:/{best_run_id}/model")

11.7.3 Git untuk Model Versioning (DVC)

DVC (Data Version Control): Git untuk data dan models.

# Initialize DVC
dvc init

# Add model to DVC
dvc add models/iris_model.pkl

# This creates models/iris_model.pkl.dvc file
# Commit to Git
git add models/iris_model.pkl.dvc .gitignore
git commit -m "Add model v1.0.0"
git tag -a v1.0.0 -m "Model version 1.0.0"

# Push model to remote storage (S3, GCS, etc.)
dvc remote add -d storage s3://my-bucket/dvc-storage
dvc push

# Pull specific version
git checkout v1.0.0
dvc pull

11.8 CI/CD untuk ML (MLOps Pipeline)

11.8.1 Traditional CI/CD vs ML CI/CD

Traditional Software CI/CD:

Code → Build → Test → Deploy

ML CI/CD (includes CT - Continuous Training):

Code + Data → Build → Test → Train → Evaluate → Deploy → Monitor → Retrain

11.8.2 Example: GitHub Actions untuk ML

.github/workflows/ml-pipeline.yml:

name: ML Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    # Retrain weekly
    - cron: '0 0 * * 0'

jobs:
  data-validation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt

      - name: Validate data
        run: |
          python scripts/validate_data.py

      - name: Check data drift
        run: |
          python scripts/check_drift.py

  train-model:
    needs: data-validation
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt

      - name: Train model
        run: |
          python scripts/train_model.py

      - name: Evaluate model
        run: |
          python scripts/evaluate_model.py

      - name: Upload model artifact
        uses: actions/upload-artifact@v3
        with:
          name: trained-model
          path: models/

  test-api:
    needs: train-model
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Download model
        uses: actions/download-artifact@v3
        with:
          name: trained-model
          path: models/

      - name: Build Docker image
        run: |
          docker build -t ml-api:test .

      - name: Run API tests
        run: |
          docker run -d -p 8000:8000 ml-api:test
          sleep 10
          python tests/test_api.py

  deploy:
    needs: test-api
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v3

      - name: Download model
        uses: actions/download-artifact@v3
        with:
          name: trained-model
          path: models/

      - name: Deploy to production
        run: |
          # Deploy to cloud (AWS, GCP, Azure)
          # Or update Kubernetes deployment
          echo "Deploying to production..."

11.8.3 Model Testing

Different levels of testing:

Code
import pytest
import numpy as np
import joblib

# 1. Unit Tests: Individual components
def test_preprocessing():
    """Test preprocessing function"""
    from src.preprocessing import normalize_features

    data = np.array([[1, 2], [3, 4]])
    normalized = normalize_features(data)

    assert normalized.mean() < 0.1
    assert normalized.std() < 1.1

# 2. Model Tests: Model behavior
def test_model_predictions():
    """Test model produces valid predictions"""
    model = joblib.load('models/iris_model.pkl')

    # Test input
    X_test = np.array([[5.1, 3.5, 1.4, 0.2]])

    # Predict
    prediction = model.predict(X_test)
    probabilities = model.predict_proba(X_test)

    # Assertions
    assert prediction.shape == (1,)
    assert prediction[0] in [0, 1, 2]
    assert probabilities.shape == (1, 3)
    assert np.isclose(probabilities.sum(), 1.0)

# 3. Data Tests: Input validation
def test_data_schema():
    """Test input data matches expected schema"""
    import pandas as pd

    data = pd.read_csv('data/new_data.csv')

    # Check columns
    expected_columns = ['sepal_length', 'sepal_width',
                       'petal_length', 'petal_width']
    assert all(col in data.columns for col in expected_columns)

    # Check dtypes
    assert data[expected_columns].dtypes.apply(
        lambda x: np.issubdtype(x, np.number)
    ).all()

    # Check ranges
    assert (data[expected_columns] >= 0).all().all()
    assert (data[expected_columns] <= 10).all().all()

# 4. Performance Tests: Accuracy threshold
def test_model_accuracy():
    """Test model meets accuracy threshold"""
    model = joblib.load('models/iris_model.pkl')

    from sklearn.datasets import load_iris
    from sklearn.metrics import accuracy_score

    X, y = load_iris(return_X_y=True)
    predictions = model.predict(X)
    accuracy = accuracy_score(y, predictions)

    # Assert minimum accuracy
    assert accuracy >= 0.90, f"Model accuracy {accuracy} below threshold"

# 5. Integration Tests: API endpoint
def test_api_prediction():
    """Test API returns correct format"""
    import requests

    response = requests.post(
        "http://localhost:8000/predict",
        json={
            "sepal_length": 5.1,
            "sepal_width": 3.5,
            "petal_length": 1.4,
            "petal_width": 0.2
        }
    )

    assert response.status_code == 200
    data = response.json()
    assert 'prediction' in data
    assert 'probability' in data

# Run tests
# pytest tests/ -v

11.9 Best Practices Summary

11.9.1 Development Best Practices

✅ Checklist: Model Development

Before Training:

During Training:

After Training:

11.9.2 Deployment Best Practices

✅ Checklist: Deployment

Pre-deployment:

During Deployment:

Post-deployment:

11.9.3 Security Best Practices

🔐 Security Checklist

Application Security:

Infrastructure Security:

Data Security:

11.10 Rangkuman

📚 Chapter Summary

1. MLOps Fundamentals:

  • MLOps = DevOps + Data + ML
  • Automate ML lifecycle
  • 3 maturity levels: Manual → Pipeline → Full CI/CD

2. Containerization:

  • Docker untuk environment consistency
  • Dockerfile best practices
  • Docker Compose untuk multi-service apps

3. Model Serving:

  • FastAPI untuk production APIs
  • Flask untuk prototypes
  • REST API design principles

4. ONNX Optimization:

  • Framework-agnostic model format
  • Performance improvements
  • Cross-platform deployment

5. Deployment Strategies:

  • Batch vs Real-time
  • Cloud (PaaS, Kubernetes, Serverless)
  • Edge deployment

6. Monitoring:

  • Model performance metrics
  • Data drift detection
  • System metrics
  • Automated alerting

7. Model Versioning:

  • MLflow experiment tracking
  • DVC untuk data/model versioning
  • Git workflow

8. CI/CD:

  • Automated testing
  • Continuous training
  • Deployment automation

Key Takeaways

  1. Production ≠ Jupyter Notebook: Need proper engineering practices
  2. Monitoring is Critical: Models degrade over time
  3. Automation: Automate everything (training, testing, deployment)
  4. Versioning: Version code, data, AND models
  5. Security: Don’t forget security best practices

11.11 Soal Latihan

Review Questions

  1. Jelaskan perbedaan antara DevOps dan MLOps. Apa tantangan unique di ML?

  2. Apa keuntungan menggunakan Docker untuk ML deployment?

  3. Bandingkan FastAPI dan Flask untuk model serving. Kapan menggunakan masing-masing?

  4. Apa itu ONNX? Mengapa penting untuk deployment?

  5. Jelaskan perbedaan batch inference dan real-time inference. Berikan contoh use case masing-masing.

  6. Apa itu data drift? Mengapa perlu dimonitor?

  7. Sebutkan 5 metrics penting untuk monitor ML models di production.

  8. Jelaskan konsep blue-green deployment dan canary deployment.

  9. Apa itu model versioning? Mengapa penting?

  10. Jelaskan CI/CD/CT dalam context MLOps.

Coding Exercises

Exercise 1: Dockerize ML Model - Train scikit-learn model - Create Dockerfile - Build dan run container - Test API endpoints

Exercise 2: FastAPI Application - Implement complete FastAPI app dengan:

  • Health check endpoint
  • Single prediction
  • Batch prediction
  • Model info endpoint
  • Add input validation dengan Pydantic
  • Add error handling

Exercise 3: ONNX Conversion - Train PyTorch model - Convert to ONNX - Compare performance (PyTorch vs ONNX) - Implement ONNX-powered API

Exercise 4: Monitoring Dashboard - Implement metrics collection (Prometheus) - Create Grafana dashboard - Setup alerts untuk accuracy drops - Log predictions ke database

Exercise 5: CI/CD Pipeline - Setup GitHub Actions workflow - Implement automated testing - Auto-deploy on main branch - Add model versioning

Project: End-to-End MLOps Pipeline

Build complete MLOps pipeline:

  1. Development:

    • Train classification model
    • Use MLflow untuk experiment tracking
    • Version data dengan DVC
  2. API Development:

    • Build FastAPI application
    • Add comprehensive tests
    • Document API
  3. Containerization:

    • Create optimized Dockerfile
    • Setup Docker Compose dengan monitoring
  4. Deployment:

    • Deploy to cloud platform (Heroku/Railway/GCP)
    • Setup CI/CD pipeline
    • Configure monitoring
  5. Monitoring:

    • Implement metrics collection
    • Setup alerting
    • Create dashboard
  6. Documentation:

    • API documentation
    • Deployment guide
    • Monitoring guide

🎓 Selamat! Anda telah menyelesaikan Chapter 11 - MLOps & Deployment!

Ini adalah chapter terakhir di Part III. Anda sekarang memiliki skills untuk:

  • ✅ Build ML models
  • ✅ Deploy ke production
  • ✅ Monitor dan maintain models
  • ✅ Implement MLOps best practices

Next Steps:

  • Terapkan knowledge ini di project capstone
  • Explore advanced topics (Kubernetes, model optimization)
  • Build portfolio dengan production-ready ML systems

Keep learning, keep building! 🚀