Bab 11: MLOps & Model Deployment

Dari Development ke Production: ML Lifecycle, Containerization & Deployment Strategies

Bab 11: MLOps & Model Deployment

🎯 Hasil Pembelajaran (Learning Outcomes)

Setelah mempelajari bab ini, Anda akan mampu:

Memahami ML lifecycle dan prinsip-prinsip MLOps
Mengimplementasikan containerization dengan Docker untuk ML applications
Membangun REST API untuk model serving menggunakan FastAPI dan Flask
Menerapkan model optimization dengan ONNX Runtime
Merancang deployment strategies (batch vs real-time, cloud vs edge)
Mengintegrasikan monitoring dan logging untuk production systems
Mengelola model versioning dan CI/CD pipelines untuk ML

11.1 MLOps: Bridging ML Development dan Production

11.1.1 Dari Jupyter Notebook ke Production

The Reality Check:

Anda telah berhasil membuat model dengan 95% accuracy di Jupyter Notebook. Tapi… bagaimana cara menyajikannya ke users? Ini adalah salah satu tantangan terbesar dalam machine learning!

Production Gap:

Code

flowchart LR
    A["Jupyter Notebook<br/>95% Accuracy"] --> B{"Production Gap"}
    B --> C["Scalability?"]
    B --> D["Latency?"]
    B --> E["Reliability?"]
    B --> F["Monitoring?"]
    B --> G["Updates?"]

    C --> H["Challenges"]
    D --> H
    E --> H
    F --> H
    G --> H

    H --> I["MLOps Solutions"]

    style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style B fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px,color:#fff
    style C fill:#fff3cd,stroke:#856404,stroke-width:1px
    style D fill:#fff3cd,stroke:#856404,stroke-width:1px
    style E fill:#fff3cd,stroke:#856404,stroke-width:1px
    style F fill:#fff3cd,stroke:#856404,stroke-width:1px
    style G fill:#fff3cd,stroke:#856404,stroke-width:1px
    style H fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
    style I fill:#51cf66,stroke:#2b8a3e,stroke-width:2px

flowchart LR
    A["Jupyter Notebook<br/>95% Accuracy"] --> B{"Production Gap"}
    B --> C["Scalability?"]
    B --> D["Latency?"]
    B --> E["Reliability?"]
    B --> F["Monitoring?"]
    B --> G["Updates?"]

    C --> H["Challenges"]
    D --> H
    E --> H
    F --> H
    G --> H

    H --> I["MLOps Solutions"]

    style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style B fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px,color:#fff
    style C fill:#fff3cd,stroke:#856404,stroke-width:1px
    style D fill:#fff3cd,stroke:#856404,stroke-width:1px
    style E fill:#fff3cd,stroke:#856404,stroke-width:1px
    style F fill:#fff3cd,stroke:#856404,stroke-width:1px
    style G fill:#fff3cd,stroke:#856404,stroke-width:1px
    style H fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
    style I fill:#51cf66,stroke:#2b8a3e,stroke-width:2px

Figure 22.1: Production Gap - tantangan yang dihadapi saat membawa model dari Jupyter Notebook ke production

Statistik Mengejutkan:

87% model ML tidak pernah masuk production (VentureBeat, 2023)
80% data science project gagal deliver business value
Average time dari prototype ke production: 6-12 bulan

⚠️ Common Pitfalls

Kesalahan Umum:

“Works on my machine” syndrome
Tidak ada model versioning
Manual deployment process
Tidak ada monitoring
Hardcoded configurations
Tidak scalable

11.1.2 Apa itu MLOps?

MLOps = Machine Learning Operations: Praktik dan kultur untuk mengotomasi dan mempercepat ML lifecycle dari development hingga production.

MLOps = DevOps + Data + ML

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.patches import FancyBboxPatch
import numpy as np

fig, ax = plt.subplots(figsize=(14, 8))

# Define circles for Venn diagram
circle1 = plt.Circle((0.35, 0.5), 0.3, color='#4ECDC4', alpha=0.3, label='DevOps')
circle2 = plt.Circle((0.65, 0.5), 0.3, color='#FF6B6B', alpha=0.3, label='Data Engineering')
circle3 = plt.Circle((0.5, 0.7), 0.3, color='#95E1D3', alpha=0.3, label='ML/Data Science')

ax.add_patch(circle1)
ax.add_patch(circle2)
ax.add_patch(circle3)

# Add labels
ax.text(0.2, 0.3, 'CI/CD\nContainers\nOrchestration',
        fontsize=10, ha='center', weight='bold')
ax.text(0.8, 0.3, 'Data Pipelines\nData Quality\nData Versioning',
        fontsize=10, ha='center', weight='bold')
ax.text(0.5, 0.9, 'Model Training\nHyperparameter\nTuning',
        fontsize=10, ha='center', weight='bold')

# Center - MLOps
ax.text(0.5, 0.5, 'MLOps\n\n• Model Versioning\n• Automated Training\n• Model Monitoring\n• A/B Testing',
        fontsize=11, ha='center', va='center', weight='bold',
        bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.3))

ax.set_xlim(0, 1)
ax.set_ylim(0, 1.2)
ax.set_aspect('equal')
ax.axis('off')
ax.set_title('MLOps: Intersection of Three Disciplines',
             fontsize=16, fontweight='bold', pad=20)

plt.tight_layout()
plt.show()

Prinsip Utama MLOps:

Automation: Automated training, testing, deployment
Continuous X: CI/CD/CT (Continuous Training)
Versioning: Code, data, dan model versioning
Monitoring: Performance dan data drift monitoring
Reproducibility: Dapat reproduce hasil experiments
Collaboration: Data scientists, engineers, operations bekerja sama

11.1.3 ML Lifecycle

End-to-End ML Lifecycle:

Code

graph TB
    A[Problem Definition] --> B[Data Collection]
    B --> C[EDA & Analysis]
    C --> D[Data Preprocessing]
    D --> E[Feature Engineering]
    E --> F[Model Training]
    F --> G[Model Evaluation]
    G --> H{Good Enough?}
    H -->|No| E
    H -->|Yes| I[Model Deployment]
    I --> J[Monitoring]
    J --> K{Drift Detected?}
    K -->|Yes| B
    K -->|No| J

    style A fill:#ffd93d
    style I fill:#6bcf7f
    style J fill:#4d96ff
    style K fill:#ff6b6b

graph TB
    A[Problem Definition] --> B[Data Collection]
    B --> C[EDA & Analysis]
    C --> D[Data Preprocessing]
    D --> E[Feature Engineering]
    E --> F[Model Training]
    F --> G[Model Evaluation]
    G --> H{Good Enough?}
    H -->|No| E
    H -->|Yes| I[Model Deployment]
    I --> J[Monitoring]
    J --> K{Drift Detected?}
    K -->|Yes| B
    K -->|No| J

    style A fill:#ffd93d
    style I fill:#6bcf7f
    style J fill:#4d96ff
    style K fill:#ff6b6b

Perbandingan Traditional Software vs ML Systems:

Aspek	Traditional Software	ML Systems
Code	Deterministic	Probabilistic
Testing	Unit tests cukup	Need data validation, model tests
Deployment	Deploy code	Deploy code + data + model
Monitoring	Log errors	Monitor metrics + data drift + model drift
Updates	Fix bugs	Retrain dengan data baru
Dependencies	Libraries	Libraries + data + infrastructure

11.1.4 MLOps Maturity Levels

Level 0: Manual Process - Everything manual - Jupyter notebooks - No automation - Problem: Tidak scalable, tidak reproducible

Level 1: ML Pipeline Automation - Automated training pipeline - Model versioning - Basic CI/CD - Better: Reproducible, tapi deployment masih manual

Level 2: CI/CD Pipeline Automation - Automated training, testing, deployment - Monitoring dan alerts - Data validation - Production-ready: Full automation

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(14, 6))

levels = ['Level 0:\nManual', 'Level 1:\nML Pipeline\nAutomation', 'Level 2:\nFull CI/CD\nAutomation']
characteristics = {
    'Manual Work': [90, 40, 10],
    'Automation': [10, 60, 90],
    'Reproducibility': [20, 70, 95],
    'Time to Deploy': [100, 50, 15]
}

x = np.arange(len(levels))
width = 0.2
multiplier = 0

colors = ['#ff6b6b', '#4ecdc4', '#45b7d1', '#f9ca24']

for attribute, measurement in characteristics.items():
    offset = width * multiplier
    ax.bar(x + offset, measurement, width, label=attribute, alpha=0.8)
    multiplier += 1

ax.set_ylabel('Score / Time (arbitrary units)', fontsize=12, fontweight='bold')
ax.set_xlabel('MLOps Maturity Level', fontsize=12, fontweight='bold')
ax.set_title('MLOps Maturity Progression', fontsize=14, fontweight='bold')
ax.set_xticks(x + width * 1.5)
ax.set_xticklabels(levels)
ax.legend(loc='upper left', fontsize=10)
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

💡 Goal: Achieve Level 2

Di chapter ini, kita akan belajar fondasi untuk mencapai Level 2 MLOps:

Container-based deployment (Docker)
REST API untuk model serving
Model optimization (ONNX)
Basic monitoring

11.2 Containerization dengan Docker

11.2.1 Mengapa Docker untuk ML?

Problem: “Works on my machine”

Data Scientist: "Model saya works perfectly!"
Engineer: "Di server production error terus..."
DevOps: "Python version berbeda, library incompatible..."

Solution: Docker Containers

Docker memastikan environment consistency dari development hingga production.

Analogi: Shipping Container

Seperti shipping container di pelabuhan:

Standardized: Same format, works everywhere
Isolated: Content tidak affect luar
Portable: Bisa pindah dari truck → ship → train

Code

graph LR
    A[Development<br/>Laptop] -->|Docker Image| B[Testing<br/>Server]
    B -->|Same Image| C[Production<br/>Cloud]
    C -->|Same Image| D[Edge<br/>Device]

    style A fill:#4ecdc4
    style B fill:#45b7d1
    style C fill:#51cf66
    style D fill:#ffd93d

graph LR
    A[Development<br/>Laptop] -->|Docker Image| B[Testing<br/>Server]
    B -->|Same Image| C[Production<br/>Cloud]
    C -->|Same Image| D[Edge<br/>Device]

    style A fill:#4ecdc4
    style B fill:#45b7d1
    style C fill:#51cf66
    style D fill:#ffd93d

Keuntungan Docker untuk ML:

Reproducibility: Exact same environment
Isolation: Dependencies tidak conflict
Portability: Run anywhere (local, cloud, edge)
Scalability: Easy to scale dengan orchestration (Kubernetes)
Version Control: Docker images dapat di-version

11.2.2 Docker Basics untuk ML

Key Concepts:

Image: Template untuk container (like a class)
Container: Running instance of image (like an object)
Dockerfile: Recipe untuk build image
Docker Hub: Registry untuk share images

Basic Docker Commands:

# Build image
docker build -t my-ml-model:v1 .

# Run container
docker run -p 8000:8000 my-ml-model:v1

# List containers
docker ps

# Stop container
docker stop <container-id>

# Remove container
docker rm <container-id>

# List images
docker images

# Remove image
docker rmi my-ml-model:v1

11.2.3 Dockerfile untuk ML Model

Example: Dockerfile untuk Scikit-learn Model

# Base image dengan Python
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Copy requirements
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose port
EXPOSE 8000

# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Penjelasan:

FROM: Base image (Python 3.10 slim version)
WORKDIR: Set working directory di container
COPY: Copy files dari host ke container
RUN: Execute commands (install dependencies)
EXPOSE: Document port yang digunakan
CMD: Default command saat container start

Best Practices Dockerfile:

# ✅ Good: Multi-stage build untuk size optimization
FROM python:3.10-slim as builder

WORKDIR /app

# Install dependencies in builder stage
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Runtime stage
FROM python:3.10-slim

WORKDIR /app

# Copy only necessary files
COPY --from=builder /root/.local /root/.local
COPY main.py .
COPY model.pkl .

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

EXPOSE 8000

# Use non-root user untuk security
RUN useradd -m myuser
USER myuser

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

requirements.txt:

fastapi==0.104.1
uvicorn[standard]==0.24.0
scikit-learn==1.3.2
pandas==2.1.3
numpy==1.26.2
pydantic==2.5.0

🔐 Security Best Practices

Don’t run as root: Use non-root user
Minimize image size: Use slim/alpine base images
Scan for vulnerabilities: docker scan my-image
Don’t include secrets: Use environment variables
Pin versions: Specific versions di requirements.txt

11.2.4 Docker Compose untuk Multi-Container

Untuk aplikasi dengan multiple services (API + database + monitoring):

docker-compose.yml:

version: '3.8'

services:
  # ML API Service
  ml-api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - MODEL_PATH=/models/model.pkl
      - DATABASE_URL=postgresql://user:pass@db:5432/mlops
    volumes:
      - ./models:/models
    depends_on:
      - db
    restart: unless-stopped

  # PostgreSQL Database
  db:
    image: postgres:15-alpine
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=mlops
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  # Prometheus Monitoring
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus

  # Grafana Dashboard
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana_data:/var/lib/grafana
    depends_on:
      - prometheus

volumes:
  postgres_data:
  prometheus_data:
  grafana_data:

Run dengan Docker Compose:

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f ml-api

# Stop all services
docker-compose down

# Stop and remove volumes
docker-compose down -v

11.3 Model Serving dengan FastAPI

11.3.1 Mengapa FastAPI?

FastAPI adalah modern, fast web framework untuk building APIs dengan Python.

Keunggulan FastAPI:

Performance: Setara dengan NodeJS dan Go (thanks to Starlette + Pydantic)
Type Hints: Automatic validation dan documentation
Async Support: Native async/await support
Auto Docs: Interactive API docs (Swagger UI)
Easy to Learn: Intuitive, mirip Flask tapi lebih powerful

Performance Comparison:

import matplotlib.pyplot as plt
import numpy as np

frameworks = ['Flask', 'Django', 'FastAPI', 'Go Gin', 'NodeJS\nExpress']
requests_per_sec = [1200, 800, 18000, 25000, 15000]
colors = ['#3498db', '#e74c3c', '#2ecc71', '#f39c12', '#9b59b6']

fig, ax = plt.subplots(figsize=(12, 6))
bars = ax.barh(frameworks, requests_per_sec, color=colors, alpha=0.7)

# Add value labels
for i, (bar, val) in enumerate(zip(bars, requests_per_sec)):
    ax.text(val + 500, i, f'{val:,}', va='center', fontweight='bold')

ax.set_xlabel('Requests per Second', fontsize=12, fontweight='bold')
ax.set_title('Web Framework Performance Comparison\n(Higher is Better)',
             fontsize=14, fontweight='bold')
ax.grid(axis='x', alpha=0.3)

# Highlight FastAPI
bars[2].set_edgecolor('green')
bars[2].set_linewidth(3)

plt.tight_layout()
plt.show()

11.3.2 Building ML API dengan FastAPI

Step 1: Train dan Save Model

Code

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import joblib
import pandas as pd

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Evaluate
accuracy = model.score(X_test, y_test)
print(f"Model Accuracy: {accuracy:.3f}")

# Save model
joblib.dump(model, 'models/iris_model.pkl')
print("Model saved to models/iris_model.pkl")

# Save feature names dan target names untuk API
import json
metadata = {
    'feature_names': iris.feature_names,
    'target_names': iris.target_names.tolist(),
    'accuracy': float(accuracy)
}
with open('models/metadata.json', 'w') as f:
    json.dump(metadata, f, indent=2)

Step 2: Create Pydantic Models untuk Validation

Code

from pydantic import BaseModel, Field, validator
from typing import List

class IrisFeatures(BaseModel):
    """Input features untuk Iris prediction"""
    sepal_length: float = Field(..., ge=0, le=10,
                                description="Sepal length in cm")
    sepal_width: float = Field(..., ge=0, le=10,
                              description="Sepal width in cm")
    petal_length: float = Field(..., ge=0, le=10,
                               description="Petal length in cm")
    petal_width: float = Field(..., ge=0, le=10,
                              description="Petal width in cm")

    @validator('*')
    def check_positive(cls, v):
        if v < 0:
            raise ValueError('Value must be positive')
        return v

class PredictionResponse(BaseModel):
    """Response format untuk prediction"""
    prediction: str
    probability: float
    all_probabilities: dict
    model_version: str = "1.0.0"

class HealthResponse(BaseModel):
    """Health check response"""
    status: str
    model_loaded: bool
    version: str

Step 3: FastAPI Application (main.py)

Code

from fastapi import FastAPI, HTTPException, status
from fastapi.responses import JSONResponse
import joblib
import numpy as np
import json
from pathlib import Path
import logging

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize FastAPI app
app = FastAPI(
    title="Iris Classification API",
    description="ML API untuk prediksi species Iris flowers",
    version="1.0.0",
    docs_url="/docs",
    redoc_url="/redoc"
)

# Global variables
model = None
metadata = None

@app.on_event("startup")
async def load_model():
    """Load model saat aplikasi start"""
    global model, metadata

    try:
        model_path = Path("models/iris_model.pkl")
        metadata_path = Path("models/metadata.json")

        if not model_path.exists():
            raise FileNotFoundError(f"Model file not found: {model_path}")

        model = joblib.load(model_path)

        if metadata_path.exists():
            with open(metadata_path, 'r') as f:
                metadata = json.load(f)

        logger.info("Model loaded successfully")
        logger.info(f"Model accuracy: {metadata.get('accuracy', 'N/A')}")

    except Exception as e:
        logger.error(f"Error loading model: {e}")
        raise

@app.get("/", tags=["General"])
async def root():
    """Root endpoint"""
    return {
        "message": "Welcome to Iris Classification API",
        "docs": "/docs",
        "health": "/health"
    }

@app.get("/health", response_model=HealthResponse, tags=["General"])
async def health_check():
    """Health check endpoint"""
    return {
        "status": "healthy" if model is not None else "unhealthy",
        "model_loaded": model is not None,
        "version": "1.0.0"
    }

@app.post("/predict", response_model=PredictionResponse, tags=["Prediction"])
async def predict(features: IrisFeatures):
    """
    Predict Iris species berdasarkan features

    - **sepal_length**: Sepal length dalam cm
    - **sepal_width**: Sepal width dalam cm
    - **petal_length**: Petal length dalam cm
    - **petal_width**: Petal width dalam cm
    """
    try:
        # Prepare input
        input_data = np.array([[
            features.sepal_length,
            features.sepal_width,
            features.petal_length,
            features.petal_width
        ]])

        # Predict
        prediction = model.predict(input_data)[0]
        probabilities = model.predict_proba(input_data)[0]

        # Get class names
        class_names = metadata.get('target_names',
                                   ['setosa', 'versicolor', 'virginica'])

        # Format response
        return {
            "prediction": class_names[prediction],
            "probability": float(probabilities[prediction]),
            "all_probabilities": {
                class_names[i]: float(prob)
                for i, prob in enumerate(probabilities)
            },
            "model_version": "1.0.0"
        }

    except Exception as e:
        logger.error(f"Prediction error: {e}")
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=f"Prediction failed: {str(e)}"
        )

@app.post("/batch_predict", tags=["Prediction"])
async def batch_predict(features_list: List[IrisFeatures]):
    """Batch prediction untuk multiple samples"""
    try:
        predictions = []

        for features in features_list:
            input_data = np.array([[
                features.sepal_length,
                features.sepal_width,
                features.petal_length,
                features.petal_width
            ]])

            prediction = model.predict(input_data)[0]
            probabilities = model.predict_proba(input_data)[0]

            class_names = metadata.get('target_names',
                                       ['setosa', 'versicolor', 'virginica'])

            predictions.append({
                "prediction": class_names[prediction],
                "probability": float(probabilities[prediction])
            })

        return {"predictions": predictions, "count": len(predictions)}

    except Exception as e:
        logger.error(f"Batch prediction error: {e}")
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=f"Batch prediction failed: {str(e)}"
        )

@app.get("/model/info", tags=["Model"])
async def model_info():
    """Get model information dan metadata"""
    if metadata is None:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Metadata not found"
        )

    return {
        "model_type": "RandomForestClassifier",
        "accuracy": metadata.get('accuracy'),
        "features": metadata.get('feature_names'),
        "classes": metadata.get('target_names'),
        "version": "1.0.0"
    }

# Exception handler
@app.exception_handler(Exception)
async def global_exception_handler(request, exc):
    logger.error(f"Global exception: {exc}")
    return JSONResponse(
        status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
        content={"detail": "Internal server error"}
    )

Step 4: Run API

# Development mode (auto-reload)
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# Production mode (multiple workers)
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

Step 5: Test API

# Health check
curl http://localhost:8000/health

# Single prediction
curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "sepal_length": 5.1,
    "sepal_width": 3.5,
    "petal_length": 1.4,
    "petal_width": 0.2
  }'

# Batch prediction
curl -X POST "http://localhost:8000/batch_predict" \
  -H "Content-Type: application/json" \
  -d '[
    {"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2},
    {"sepal_length": 6.7, "sepal_width": 3.0, "petal_length": 5.2, "petal_width": 2.3}
  ]'

# Model info
curl http://localhost:8000/model/info

Interactive API Documentation:

Buka browser: http://localhost:8000/docs

FastAPI automatically generates Swagger UI untuk testing API!

11.3.3 Flask Alternative (Simpler)

Untuk project sederhana, Flask juga bisa digunakan:

Code

from flask import Flask, request, jsonify
import joblib
import numpy as np

app = Flask(__name__)

# Load model
model = joblib.load('models/iris_model.pkl')

@app.route('/predict', methods=['POST'])
def predict():
    try:
        data = request.get_json()

        features = np.array([[
            data['sepal_length'],
            data['sepal_width'],
            data['petal_length'],
            data['petal_width']
        ]])

        prediction = model.predict(features)[0]
        probability = model.predict_proba(features)[0]

        return jsonify({
            'prediction': int(prediction),
            'probability': float(probability[prediction])
        })

    except Exception as e:
        return jsonify({'error': str(e)}), 400

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8000, debug=False)

Perbandingan FastAPI vs Flask:

Aspek	FastAPI	Flask
Performance	⭐⭐⭐⭐⭐ (Async)	⭐⭐⭐ (Sync)
Type Validation	✅ Built-in (Pydantic)	❌ Manual
Auto Documentation	✅ Swagger + ReDoc	❌ Need extensions
Async Support	✅ Native	⚠️ Via extensions
Learning Curve	Medium	Easy
Best For	Production APIs	Prototypes, simple apps

💡 Recommendation

Prototype cepat: Flask
Production API: FastAPI
Legacy systems: Flask (banyak resources)
High performance: FastAPI

11.4 Model Optimization dengan ONNX

11.4.1 Apa itu ONNX?

ONNX = Open Neural Network Exchange: Open format untuk representing ML models.

Problem yang Dipecahkan:

Code

graph LR
    A[Train in<br/>PyTorch] --> B{Deploy where?}
    B --> C[TensorFlow<br/>Serving]
    B --> D[ONNX<br/>Runtime]
    B --> E[Mobile<br/>CoreML]
    B --> F[Edge<br/>TensorRT]

    style B fill:#ff6b6b
    style D fill:#51cf66

graph LR
    A[Train in<br/>PyTorch] --> B{Deploy where?}
    B --> C[TensorFlow<br/>Serving]
    B --> D[ONNX<br/>Runtime]
    B --> E[Mobile<br/>CoreML]
    B --> F[Edge<br/>TensorRT]

    style B fill:#ff6b6b
    style D fill:#51cf66

Tanpa ONNX: Need to convert manually, atau stuck dengan satu framework.

Dengan ONNX:

Train di framework favorit (PyTorch, TensorFlow, scikit-learn)
Convert ke ONNX format
Deploy di berbagai platforms dengan ONNX Runtime

Keuntungan ONNX:

Interoperability: Framework-agnostic
Performance: ONNX Runtime highly optimized
Portability: Run di berbagai devices
Optimization: Built-in graph optimizations

11.4.2 ONNX Runtime Performance

Benchmark Results:

import matplotlib.pyplot as plt
import numpy as np

frameworks = ['PyTorch\n(CPU)', 'TensorFlow\n(CPU)', 'ONNX Runtime\n(CPU)',
              'PyTorch\n(GPU)', 'ONNX Runtime\n(GPU)']
inference_time = [12.5, 10.3, 4.2, 2.8, 1.1]  # ms per batch

fig, ax = plt.subplots(figsize=(12, 6))
colors = ['#e74c3c', '#3498db', '#2ecc71', '#e74c3c', '#2ecc71']
bars = ax.bar(frameworks, inference_time, color=colors, alpha=0.7)

# Highlight ONNX
bars[2].set_edgecolor('green')
bars[2].set_linewidth(3)
bars[4].set_edgecolor('green')
bars[4].set_linewidth(3)

ax.set_ylabel('Inference Time (ms)', fontsize=12, fontweight='bold')
ax.set_title('ONNX Runtime Performance Comparison\n(Lower is Better)',
             fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3)

# Add value labels
for bar, val in zip(bars, inference_time):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{val} ms', ha='center', va='bottom', fontweight='bold')

# Add speedup annotations
ax.text(0, 13, '3x slower', ha='center', fontsize=10, style='italic')
ax.text(2, 5, '⚡ Fast!', ha='center', fontsize=11, fontweight='bold', color='green')
ax.text(4, 2, '⚡⚡ Fastest!', ha='center', fontsize=11, fontweight='bold', color='green')

plt.tight_layout()
plt.show()

11.4.3 Convert Scikit-learn Model ke ONNX

Code

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import joblib
import numpy as np

# Load trained model
model = joblib.load('models/iris_model.pkl')

# Define input type
# Shape: (None, 4) berarti batch_size can vary, 4 features
initial_type = [('float_input', FloatTensorType([None, 4]))]

# Convert to ONNX
onnx_model = convert_sklearn(
    model,
    initial_types=initial_type,
    target_opset=12  # ONNX opset version
)

# Save ONNX model
with open("models/iris_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

print("Model converted to ONNX successfully!")

# Verify conversion
import onnxruntime as rt

# Load ONNX model
sess = rt.InferenceSession("models/iris_model.onnx")

# Test prediction
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name

test_data = np.array([[5.1, 3.5, 1.4, 0.2]], dtype=np.float32)
pred_onnx = sess.run([label_name], {input_name: test_data})

print(f"ONNX Prediction: {pred_onnx}")

# Compare dengan original model
pred_sklearn = model.predict(test_data)
print(f"Scikit-learn Prediction: {pred_sklearn}")
print(f"Match: {np.array_equal(pred_onnx[0], pred_sklearn)}")

11.4.4 Convert PyTorch Model ke ONNX

Code

import torch
import torch.nn as nn
import torch.onnx

# Define simple PyTorch model
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(4, 16)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(16, 8)
        self.fc3 = nn.Linear(8, 3)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return x

# Create model instance
model = SimpleNN()
model.eval()

# Dummy input untuk tracing
dummy_input = torch.randn(1, 4)

# Export to ONNX
torch.onnx.export(
    model,                      # Model
    dummy_input,                # Dummy input
    "models/pytorch_model.onnx", # Output file
    export_params=True,         # Store trained weights
    opset_version=12,           # ONNX version
    do_constant_folding=True,   # Optimize constants
    input_names=['input'],      # Input names
    output_names=['output'],    # Output names
    dynamic_axes={              # Variable length axes
        'input': {0: 'batch_size'},
        'output': {0: 'batch_size'}
    }
)

print("PyTorch model exported to ONNX")

# Verify
import onnxruntime as rt
import numpy as np

sess = rt.InferenceSession("models/pytorch_model.onnx")
input_name = sess.get_inputs()[0].name

test_input = np.random.randn(1, 4).astype(np.float32)
onnx_output = sess.run(None, {input_name: test_input})

print(f"ONNX Output shape: {onnx_output[0].shape}")

11.4.5 FastAPI dengan ONNX Runtime

Optimized API dengan ONNX:

Code

from fastapi import FastAPI
import onnxruntime as rt
import numpy as np
from pydantic import BaseModel

app = FastAPI(title="ONNX-Powered ML API")

# Load ONNX model
session = rt.InferenceSession("models/iris_model.onnx")
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name

class IrisFeatures(BaseModel):
    sepal_length: float
    sepal_width: float
    petal_length: float
    petal_width: float

@app.post("/predict")
async def predict(features: IrisFeatures):
    # Prepare input
    input_data = np.array([[
        features.sepal_length,
        features.sepal_width,
        features.petal_length,
        features.petal_width
    ]], dtype=np.float32)

    # Run inference dengan ONNX Runtime
    pred = session.run([output_name], {input_name: input_data})

    return {
        "prediction": int(pred[0][0]),
        "model": "ONNX Runtime"
    }

Performance Comparison:

Code

import time
import numpy as np

# Test data
test_data = np.random.randn(1000, 4).astype(np.float32)

# Benchmark scikit-learn
start = time.time()
for i in range(1000):
    _ = sklearn_model.predict(test_data[i:i+1])
sklearn_time = time.time() - start

# Benchmark ONNX
start = time.time()
for i in range(1000):
    _ = session.run([output_name], {input_name: test_data[i:i+1]})
onnx_time = time.time() - start

print(f"Scikit-learn: {sklearn_time:.3f}s")
print(f"ONNX Runtime: {onnx_time:.3f}s")
print(f"Speedup: {sklearn_time/onnx_time:.2f}x")

🚀 When to Use ONNX?

Use ONNX when:

High-throughput inference needed
Deploying to multiple platforms
Edge deployment (mobile, IoT)
Cross-framework compatibility required

Skip ONNX when:

Simple prototype
Model changes frequently
Framework-specific features needed

11.5 Deployment Strategies

11.5.1 Batch vs Real-time Serving

Batch Inference:

Process multiple samples together pada scheduled intervals.

Use Cases:

Daily recommendations
Periodic fraud detection
Nightly data processing
Batch ETL pipelines

Architecture:

Code

graph LR
    A[Data Lake] --> B[Batch Job<br/>Scheduler]
    B --> C[Load Data]
    C --> D[Model Inference]
    D --> E[Save Predictions]
    E --> F[(Results DB)]

    style B fill:#ffd93d
    style D fill:#51cf66

graph LR
    A[Data Lake] --> B[Batch Job<br/>Scheduler]
    B --> C[Load Data]
    C --> D[Model Inference]
    D --> E[Save Predictions]
    E --> F[(Results DB)]

    style B fill:#ffd93d
    style D fill:#51cf66

Example: Batch Prediction Script

Code

import pandas as pd
import joblib
from datetime import datetime
import logging

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def batch_predict(input_csv, output_csv, model_path):
    """
    Batch prediction untuk large datasets
    """
    # Load model
    logger.info(f"Loading model from {model_path}")
    model = joblib.load(model_path)

    # Load data in chunks (untuk large files)
    logger.info(f"Processing {input_csv}")
    chunk_size = 10000
    chunks = []

    for chunk in pd.read_csv(input_csv, chunksize=chunk_size):
        # Predict
        predictions = model.predict(chunk)
        probabilities = model.predict_proba(chunk).max(axis=1)

        # Add predictions to chunk
        chunk['prediction'] = predictions
        chunk['confidence'] = probabilities
        chunk['predicted_at'] = datetime.now()

        chunks.append(chunk)
        logger.info(f"Processed {len(chunk)} rows")

    # Combine all chunks
    result = pd.concat(chunks, ignore_index=True)

    # Save results
    result.to_csv(output_csv, index=False)
    logger.info(f"Results saved to {output_csv}")
    logger.info(f"Total rows processed: {len(result)}")

    return result

# Run batch job
if __name__ == "__main__":
    batch_predict(
        input_csv="data/new_samples.csv",
        output_csv="results/predictions.csv",
        model_path="models/iris_model.pkl"
    )

Scheduling dengan Cron (Linux):

# Run setiap hari jam 2 pagi
0 2 * * * /usr/bin/python3 /path/to/batch_predict.py

Real-time Inference:

Process requests immediately saat user request.

Use Cases:

Web applications
Mobile apps
Chatbots
Real-time recommendations
Fraud detection (transaction time)

Architecture:

Code

graph LR
    A[User Request] --> B[Load Balancer]
    B --> C[API Server 1]
    B --> D[API Server 2]
    B --> E[API Server N]

    C --> F[Model]
    D --> F
    E --> F

    F --> G[Response]

    style B fill:#4ecdc4
    style F fill:#51cf66

graph LR
    A[User Request] --> B[Load Balancer]
    B --> C[API Server 1]
    B --> D[API Server 2]
    B --> E[API Server N]

    C --> F[Model]
    D --> F
    E --> F

    F --> G[Response]

    style B fill:#4ecdc4
    style F fill:#51cf66

Perbandingan:

Aspek	Batch	Real-time
Latency	Hours/days	Milliseconds
Throughput	Very high	Lower
Cost	Lower (off-peak)	Higher (always on)
Complexity	Simple	Complex (scaling, monitoring)
Use Case	Offline analytics	User-facing apps

11.5.2 Cloud Deployment Options

Option 1: Platform as a Service (PaaS)

Examples:

Google Cloud AI Platform
AWS SageMaker
Azure ML
Hugging Face Inference API

Pros:

Managed infrastructure
Auto-scaling
Built-in monitoring
Easy deployment

Cons:

Higher cost
Vendor lock-in
Less control

Example: Deploy ke Hugging Face Spaces (Free!)

# app.py - Gradio interface
import gradio as gr
import joblib
import numpy as np

model = joblib.load('iris_model.pkl')

def predict(sepal_length, sepal_width, petal_length, petal_width):
    features = np.array([[sepal_length, sepal_width, petal_length, petal_width]])
    pred = model.predict(features)[0]
    prob = model.predict_proba(features)[0]

    classes = ['Setosa', 'Versicolor', 'Virginica']
    return {classes[i]: float(prob[i]) for i in range(3)}

iface = gr.Interface(
    fn=predict,
    inputs=[
        gr.Number(label="Sepal Length"),
        gr.Number(label="Sepal Width"),
        gr.Number(label="Petal Length"),
        gr.Number(label="Petal Width")
    ],
    outputs=gr.Label(label="Prediction"),
    title="Iris Classifier",
    description="Predict Iris species"
)

iface.launch()

Option 2: Container Orchestration (Kubernetes)

For production-scale applications:

# kubernetes-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-api-deployment
spec:
  replicas: 3  # 3 instances
  selector:
    matchLabels:
      app: ml-api
  template:
    metadata:
      labels:
        app: ml-api
    spec:
      containers:
      - name: ml-api
        image: your-registry/ml-api:v1
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "1000m"
        env:
        - name: MODEL_PATH
          value: "/models/model.pkl"
---
apiVersion: v1
kind: Service
metadata:
  name: ml-api-service
spec:
  selector:
    app: ml-api
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: LoadBalancer

Option 3: Serverless Functions

Examples:

AWS Lambda
Google Cloud Functions
Azure Functions

Pros:

Pay per request
Auto-scaling to zero
No server management

Cons:

Cold start latency
Execution time limits
Package size limits

Example: AWS Lambda Handler

Code

import json
import joblib
import numpy as np
import boto3

# Load model from S3 (di cold start)
s3 = boto3.client('s3')
s3.download_file('my-bucket', 'models/model.pkl', '/tmp/model.pkl')
model = joblib.load('/tmp/model.pkl')

def lambda_handler(event, context):
    """AWS Lambda handler function"""
    try:
        # Parse input
        body = json.loads(event['body'])
        features = np.array([[
            body['sepal_length'],
            body['sepal_width'],
            body['petal_length'],
            body['petal_width']
        ]])

        # Predict
        prediction = model.predict(features)[0]
        probability = model.predict_proba(features)[0]

        # Return response
        return {
            'statusCode': 200,
            'body': json.dumps({
                'prediction': int(prediction),
                'probability': float(probability[prediction])
            })
        }

    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({'error': str(e)})
        }

11.5.3 Edge Deployment

Edge Computing: Run ML models on device (smartphone, IoT, edge servers).

Why Edge? - Low latency: No network roundtrip - Privacy: Data stays on device - Offline capability: Works tanpa internet - Bandwidth saving: No data upload

Challenges:

Limited compute resources
Small memory footprint
Battery constraints
Model optimization crucial

Tools untuk Edge:

TensorFlow Lite: Mobile & embedded devices
ONNX Runtime Mobile: Cross-platform
CoreML: iOS devices
PyTorch Mobile: Android & iOS

Example: Model Quantization untuk Edge

Code

import torch
import torch.quantization

# Load PyTorch model
model = SimpleNN()
model.load_state_dict(torch.load('model.pth'))
model.eval()

# Quantization (Float32 -> Int8)
# Reduces model size 4x!
quantized_model = torch.quantization.quantize_dynamic(
    model,
    {torch.nn.Linear},  # Quantize Linear layers
    dtype=torch.qint8
)

# Save quantized model
torch.save(quantized_model.state_dict(), 'model_quantized.pth')

# Compare sizes
import os
original_size = os.path.getsize('model.pth') / (1024**2)  # MB
quantized_size = os.path.getsize('model_quantized.pth') / (1024**2)

print(f"Original model: {original_size:.2f} MB")
print(f"Quantized model: {quantized_size:.2f} MB")
print(f"Compression ratio: {original_size/quantized_size:.2f}x")

11.6 Monitoring & Maintenance

11.6.1 Mengapa Monitoring Penting?

Models degrade over time!

import matplotlib.pyplot as plt
import numpy as np

# Simulate model performance degradation
months = np.arange(0, 13)
accuracy = 95 - 0.5 * months + np.random.randn(13) * 0.5

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(months, accuracy, marker='o', linewidth=2, markersize=8, color='#e74c3c')
ax.axhline(y=90, color='orange', linestyle='--', linewidth=2, label='Warning Threshold')
ax.axhline(y=85, color='red', linestyle='--', linewidth=2, label='Critical Threshold')

ax.fill_between(months, 85, 90, alpha=0.2, color='orange')
ax.fill_between(months, 0, 85, alpha=0.2, color='red')

ax.set_xlabel('Months in Production', fontsize=12, fontweight='bold')
ax.set_ylabel('Model Accuracy (%)', fontsize=12, fontweight='bold')
ax.set_title('Model Performance Degradation Over Time', fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(alpha=0.3)
ax.set_ylim([80, 100])

# Add annotations
ax.annotate('Model deployed', xy=(0, 95), xytext=(1, 97),
            arrowprops=dict(arrowstyle='->', color='green'),
            fontsize=10, color='green', fontweight='bold')
ax.annotate('Retrain needed!', xy=(10, 88), xytext=(7, 82),
            arrowprops=dict(arrowstyle='->', color='red'),
            fontsize=10, color='red', fontweight='bold')

plt.tight_layout()
plt.show()

Why Performance Degrades:

Data Drift: Input data distribution changes
Concept Drift: Relationship between features dan target changes
Upstream Changes: Data pipeline changes
Seasonal Effects: Patterns change over time

11.6.2 What to Monitor?

1. Model Performance Metrics

Code

from prometheus_client import Counter, Histogram, Gauge
import time

# Define metrics
prediction_counter = Counter(
    'predictions_total',
    'Total number of predictions',
    ['model_version', 'prediction_class']
)

prediction_latency = Histogram(
    'prediction_latency_seconds',
    'Prediction latency in seconds',
    buckets=[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0]
)

model_accuracy = Gauge(
    'model_accuracy',
    'Current model accuracy',
    ['model_version']
)

# Usage dalam prediction function
@app.post("/predict")
async def predict(features: IrisFeatures):
    start_time = time.time()

    # Predict
    prediction = model.predict(...)

    # Record metrics
    prediction_counter.labels(
        model_version='1.0.0',
        prediction_class=prediction
    ).inc()

    latency = time.time() - start_time
    prediction_latency.observe(latency)

    return {"prediction": prediction}

2. Data Quality Metrics

Code

from typing import Dict
import numpy as np

class DataQualityMonitor:
    def __init__(self, reference_stats: Dict):
        """
        Initialize dengan reference statistics dari training data
        """
        self.reference_stats = reference_stats

    def check_data_drift(self, new_data: np.ndarray) -> Dict:
        """
        Check for statistical drift in input data
        """
        results = {}

        for i, feature_name in enumerate(self.reference_stats['features']):
            # Calculate statistics
            ref_mean = self.reference_stats['means'][i]
            ref_std = self.reference_stats['stds'][i]

            new_mean = new_data[:, i].mean()
            new_std = new_data[:, i].std()

            # Check drift (simple threshold-based)
            mean_drift = abs(new_mean - ref_mean) / ref_std
            std_drift = abs(new_std - ref_std) / ref_std

            results[feature_name] = {
                'mean_drift': mean_drift,
                'std_drift': std_drift,
                'drift_detected': mean_drift > 2.0 or std_drift > 0.5
            }

        return results

    def check_missing_values(self, data: np.ndarray) -> Dict:
        """Check for missing or invalid values"""
        return {
            'missing_count': np.isnan(data).sum(),
            'missing_percentage': np.isnan(data).mean() * 100
        }

    def check_outliers(self, data: np.ndarray) -> Dict:
        """Detect outliers using IQR method"""
        Q1 = np.percentile(data, 25, axis=0)
        Q3 = np.percentile(data, 75, axis=0)
        IQR = Q3 - Q1

        lower_bound = Q1 - 1.5 * IQR
        upper_bound = Q3 + 1.5 * IQR

        outliers = ((data < lower_bound) | (data > upper_bound)).any(axis=1)

        return {
            'outlier_count': outliers.sum(),
            'outlier_percentage': outliers.mean() * 100
        }

# Usage
monitor = DataQualityMonitor(reference_stats={
    'features': ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'],
    'means': [5.84, 3.05, 3.76, 1.20],
    'stds': [0.83, 0.43, 1.76, 0.76]
})

# Check new batch of data
drift_report = monitor.check_data_drift(new_data)
if any(result['drift_detected'] for result in drift_report.values()):
    print("⚠️ Data drift detected! Consider retraining.")

3. System Metrics

Code

import psutil
from prometheus_client import Gauge

# CPU and Memory metrics
cpu_usage = Gauge('cpu_usage_percent', 'CPU usage percentage')
memory_usage = Gauge('memory_usage_percent', 'Memory usage percentage')

def collect_system_metrics():
    """Collect system resource metrics"""
    cpu_usage.set(psutil.cpu_percent(interval=1))
    memory_usage.set(psutil.virtual_memory().percent)

# Call periodically (e.g., dalam background task)

11.6.3 Logging Best Practices

Structured Logging:

Code

import logging
import json
from datetime import datetime

class StructuredLogger:
    def __init__(self, name: str):
        self.logger = logging.getLogger(name)
        self.logger.setLevel(logging.INFO)

        # JSON formatter
        handler = logging.StreamHandler()
        self.logger.addHandler(handler)

    def log_prediction(self,
                      input_data: dict,
                      prediction: dict,
                      latency: float):
        """Log prediction dengan structured format"""
        log_entry = {
            'timestamp': datetime.utcnow().isoformat(),
            'event': 'prediction',
            'input': input_data,
            'output': prediction,
            'latency_ms': latency * 1000,
            'model_version': '1.0.0'
        }
        self.logger.info(json.dumps(log_entry))

    def log_error(self, error: Exception, context: dict):
        """Log error dengan context"""
        log_entry = {
            'timestamp': datetime.utcnow().isoformat(),
            'event': 'error',
            'error_type': type(error).__name__,
            'error_message': str(error),
            'context': context
        }
        self.logger.error(json.dumps(log_entry))

# Usage
logger = StructuredLogger('ml-api')

@app.post("/predict")
async def predict(features: IrisFeatures):
    start = time.time()

    try:
        prediction = model.predict(...)

        logger.log_prediction(
            input_data=features.dict(),
            prediction={'class': prediction},
            latency=time.time() - start
        )

        return {"prediction": prediction}

    except Exception as e:
        logger.log_error(e, context={'features': features.dict()})
        raise

11.6.4 Alerting & Notification

Setup Alerts untuk Critical Events:

Code

class AlertManager:
    def __init__(self, webhook_url: str):
        self.webhook_url = webhook_url

    def send_alert(self,
                   severity: str,
                   message: str,
                   metrics: dict = None):
        """Send alert ke Slack/Teams/Email"""
        import requests

        payload = {
            "text": f"🚨 [{severity}] ML Model Alert",
            "attachments": [{
                "color": "danger" if severity == "CRITICAL" else "warning",
                "fields": [
                    {"title": "Message", "value": message},
                    {"title": "Metrics", "value": str(metrics)}
                ],
                "ts": int(time.time())
            }]
        }

        requests.post(self.webhook_url, json=payload)

# Usage
alert_manager = AlertManager(webhook_url="https://hooks.slack.com/...")

# Check accuracy drop
if current_accuracy < 85:
    alert_manager.send_alert(
        severity="CRITICAL",
        message="Model accuracy dropped below threshold!",
        metrics={'current_accuracy': current_accuracy}
    )

11.6.5 Model Retraining Strategy

When to Retrain?

Code

flowchart TD
    A[Monitor Metrics] --> B{Performance<br/>Degraded?}
    B -->|No| A
    B -->|Yes| C{Data Drift<br/>Detected?}
    C -->|Yes| D[Collect New Data]
    C -->|No| E[Check Data Quality]
    D --> F[Retrain Model]
    E --> G{Data Quality<br/>OK?}
    G -->|Yes| H[Investigate Code]
    G -->|No| I[Fix Data Pipeline]
    F --> J[Evaluate New Model]
    J --> K{Better than<br/>Current?}
    K -->|Yes| L[Deploy New Model]
    K -->|No| M[Keep Current Model<br/>+ Investigate]
    L --> A
    M --> A

    style B fill:#ffd93d
    style C fill:#ffd93d
    style F fill:#51cf66
    style L fill:#51cf66

flowchart TD
    A[Monitor Metrics] --> B{Performance<br/>Degraded?}
    B -->|No| A
    B -->|Yes| C{Data Drift<br/>Detected?}
    C -->|Yes| D[Collect New Data]
    C -->|No| E[Check Data Quality]
    D --> F[Retrain Model]
    E --> G{Data Quality<br/>OK?}
    G -->|Yes| H[Investigate Code]
    G -->|No| I[Fix Data Pipeline]
    F --> J[Evaluate New Model]
    J --> K{Better than<br/>Current?}
    K -->|Yes| L[Deploy New Model]
    K -->|No| M[Keep Current Model<br/>+ Investigate]
    L --> A
    M --> A

    style B fill:#ffd93d
    style C fill:#ffd93d
    style F fill:#51cf66
    style L fill:#51cf66

Automated Retraining Pipeline:

Code

from datetime import datetime, timedelta
import schedule

class AutoRetrainingPipeline:
    def __init__(self,
                 model_trainer,
                 performance_threshold: float = 0.85):
        self.model_trainer = model_trainer
        self.performance_threshold = performance_threshold
        self.last_retrain = datetime.now()

    def should_retrain(self, current_metrics: dict) -> bool:
        """Decide if retraining is needed"""
        # Rule 1: Performance degradation
        if current_metrics['accuracy'] < self.performance_threshold:
            return True

        # Rule 2: Time-based (every 30 days)
        if (datetime.now() - self.last_retrain) > timedelta(days=30):
            return True

        # Rule 3: Data drift detected
        if current_metrics.get('data_drift', False):
            return True

        return False

    def retrain_and_deploy(self):
        """Full retraining and deployment pipeline"""
        print("🔄 Starting automated retraining...")

        # 1. Fetch new data
        new_data = self.fetch_new_training_data()

        # 2. Train new model
        new_model = self.model_trainer.train(new_data)

        # 3. Evaluate
        metrics = self.evaluate_model(new_model)

        # 4. Compare dengan current model
        if metrics['accuracy'] > self.get_current_model_accuracy():
            print("✅ New model better, deploying...")
            self.deploy_model(new_model)
            self.last_retrain = datetime.now()
        else:
            print("⚠️ New model not better, keeping current model")

    def fetch_new_training_data(self):
        """Fetch data from production logs"""
        # Implementation: query database, S3, etc.
        pass

    def evaluate_model(self, model):
        """Evaluate model on test set"""
        # Implementation
        pass

    def deploy_model(self, model):
        """Deploy new model to production"""
        # Implementation: save to S3, update model server, etc.
        pass

# Schedule retraining checks
pipeline = AutoRetrainingPipeline(model_trainer=...)

# Check every day
schedule.every().day.at("02:00").do(
    lambda: pipeline.retrain_and_deploy()
    if pipeline.should_retrain(get_current_metrics())
    else None
)

11.7 Model Versioning & Experiment Tracking

11.7.1 Mengapa Model Versioning?

Scenario tanpa versioning:

Engineer: "Model production bermasalah!"
Data Scientist: "Coba rollback ke versi sebelumnya"
Engineer: "Yang mana? Ada 10 file model.pkl..."
Data Scientist: "Yang... accuracy 95%?"
Engineer: "Semuanya 95%..." 😭

With proper versioning:

Engineer: "Rollback ke model v1.2.3"
System: "Deployed model v1.2.3 (trained 2024-01-15, accuracy 95.3%)"
Engineer: "Problem solved!" ✅

11.7.2 Tools untuk ML Experiment Tracking

Popular Tools:

MLflow: Open-source, comprehensive
Weights & Biases (wandb): Cloud-based, great UI
Neptune.ai: Enterprise features
TensorBoard: Built-in dengan TensorFlow/PyTorch

Example: MLflow

Code

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score

# Set experiment
mlflow.set_experiment("iris-classification")

# Start run
with mlflow.start_run(run_name="random-forest-v1"):
    # Log parameters
    params = {
        'n_estimators': 100,
        'max_depth': 10,
        'random_state': 42
    }
    mlflow.log_params(params)

    # Train model
    model = RandomForestClassifier(**params)
    model.fit(X_train, y_train)

    # Evaluate
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred, average='weighted')

    # Log metrics
    mlflow.log_metrics({
        'accuracy': accuracy,
        'f1_score': f1
    })

    # Log model
    mlflow.sklearn.log_model(
        model,
        "model",
        registered_model_name="iris-classifier"
    )

    # Log artifacts (plots, data, etc.)
    import matplotlib.pyplot as plt
    # ... create plot ...
    plt.savefig("confusion_matrix.png")
    mlflow.log_artifact("confusion_matrix.png")

    print(f"Run ID: {mlflow.active_run().info.run_id}")

MLflow UI:

# Start MLflow UI
mlflow ui --host 0.0.0.0 --port 5000

# Access at: http://localhost:5000

Load Best Model:

Code

# Load model by version
model_version = "1"
model = mlflow.pyfunc.load_model(f"models:/iris-classifier/{model_version}")

# Or load best model based on metric
from mlflow.tracking import MlflowClient

client = MlflowClient()
runs = client.search_runs(
    experiment_ids=["1"],
    order_by=["metrics.accuracy DESC"],
    max_results=1
)

best_run_id = runs[0].info.run_id
best_model = mlflow.sklearn.load_model(f"runs:/{best_run_id}/model")

11.7.3 Git untuk Model Versioning (DVC)

DVC (Data Version Control): Git untuk data dan models.

# Initialize DVC
dvc init

# Add model to DVC
dvc add models/iris_model.pkl

# This creates models/iris_model.pkl.dvc file
# Commit to Git
git add models/iris_model.pkl.dvc .gitignore
git commit -m "Add model v1.0.0"
git tag -a v1.0.0 -m "Model version 1.0.0"

# Push model to remote storage (S3, GCS, etc.)
dvc remote add -d storage s3://my-bucket/dvc-storage
dvc push

# Pull specific version
git checkout v1.0.0
dvc pull

11.8 CI/CD untuk ML (MLOps Pipeline)

11.8.1 Traditional CI/CD vs ML CI/CD

Traditional Software CI/CD:

Code → Build → Test → Deploy

ML CI/CD (includes CT - Continuous Training):

Code + Data → Build → Test → Train → Evaluate → Deploy → Monitor → Retrain

11.8.2 Example: GitHub Actions untuk ML

.github/workflows/ml-pipeline.yml:

name: ML Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    # Retrain weekly
    - cron: '0 0 * * 0'

jobs:
  data-validation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt

      - name: Validate data
        run: |
          python scripts/validate_data.py

      - name: Check data drift
        run: |
          python scripts/check_drift.py

  train-model:
    needs: data-validation
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt

      - name: Train model
        run: |
          python scripts/train_model.py

      - name: Evaluate model
        run: |
          python scripts/evaluate_model.py

      - name: Upload model artifact
        uses: actions/upload-artifact@v3
        with:
          name: trained-model
          path: models/

  test-api:
    needs: train-model
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Download model
        uses: actions/download-artifact@v3
        with:
          name: trained-model
          path: models/

      - name: Build Docker image
        run: |
          docker build -t ml-api:test .

      - name: Run API tests
        run: |
          docker run -d -p 8000:8000 ml-api:test
          sleep 10
          python tests/test_api.py

  deploy:
    needs: test-api
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v3

      - name: Download model
        uses: actions/download-artifact@v3
        with:
          name: trained-model
          path: models/

      - name: Deploy to production
        run: |
          # Deploy to cloud (AWS, GCP, Azure)
          # Or update Kubernetes deployment
          echo "Deploying to production..."

11.8.3 Model Testing

Different levels of testing:

Code

import pytest
import numpy as np
import joblib

# 1. Unit Tests: Individual components
def test_preprocessing():
    """Test preprocessing function"""
    from src.preprocessing import normalize_features

    data = np.array([[1, 2], [3, 4]])
    normalized = normalize_features(data)

    assert normalized.mean() < 0.1
    assert normalized.std() < 1.1

# 2. Model Tests: Model behavior
def test_model_predictions():
    """Test model produces valid predictions"""
    model = joblib.load('models/iris_model.pkl')

    # Test input
    X_test = np.array([[5.1, 3.5, 1.4, 0.2]])

    # Predict
    prediction = model.predict(X_test)
    probabilities = model.predict_proba(X_test)

    # Assertions
    assert prediction.shape == (1,)
    assert prediction[0] in [0, 1, 2]
    assert probabilities.shape == (1, 3)
    assert np.isclose(probabilities.sum(), 1.0)

# 3. Data Tests: Input validation
def test_data_schema():
    """Test input data matches expected schema"""
    import pandas as pd

    data = pd.read_csv('data/new_data.csv')

    # Check columns
    expected_columns = ['sepal_length', 'sepal_width',
                       'petal_length', 'petal_width']
    assert all(col in data.columns for col in expected_columns)

    # Check dtypes
    assert data[expected_columns].dtypes.apply(
        lambda x: np.issubdtype(x, np.number)
    ).all()

    # Check ranges
    assert (data[expected_columns] >= 0).all().all()
    assert (data[expected_columns] <= 10).all().all()

# 4. Performance Tests: Accuracy threshold
def test_model_accuracy():
    """Test model meets accuracy threshold"""
    model = joblib.load('models/iris_model.pkl')

    from sklearn.datasets import load_iris
    from sklearn.metrics import accuracy_score

    X, y = load_iris(return_X_y=True)
    predictions = model.predict(X)
    accuracy = accuracy_score(y, predictions)

    # Assert minimum accuracy
    assert accuracy >= 0.90, f"Model accuracy {accuracy} below threshold"

# 5. Integration Tests: API endpoint
def test_api_prediction():
    """Test API returns correct format"""
    import requests

    response = requests.post(
        "http://localhost:8000/predict",
        json={
            "sepal_length": 5.1,
            "sepal_width": 3.5,
            "petal_length": 1.4,
            "petal_width": 0.2
        }
    )

    assert response.status_code == 200
    data = response.json()
    assert 'prediction' in data
    assert 'probability' in data

# Run tests
# pytest tests/ -v

11.9 Best Practices Summary

11.10 Rangkuman

📚 Chapter Summary

1. MLOps Fundamentals:

MLOps = DevOps + Data + ML
Automate ML lifecycle
3 maturity levels: Manual → Pipeline → Full CI/CD

2. Containerization:

Docker untuk environment consistency
Dockerfile best practices
Docker Compose untuk multi-service apps

3. Model Serving:

FastAPI untuk production APIs
Flask untuk prototypes
REST API design principles

4. ONNX Optimization:

Framework-agnostic model format
Performance improvements
Cross-platform deployment

5. Deployment Strategies:

Batch vs Real-time
Cloud (PaaS, Kubernetes, Serverless)
Edge deployment

6. Monitoring:

Model performance metrics
Data drift detection
System metrics
Automated alerting

7. Model Versioning:

MLflow experiment tracking
DVC untuk data/model versioning
Git workflow

8. CI/CD:

Automated testing
Continuous training
Deployment automation

Key Takeaways

Production ≠ Jupyter Notebook: Need proper engineering practices
Monitoring is Critical: Models degrade over time
Automation: Automate everything (training, testing, deployment)
Versioning: Version code, data, AND models
Security: Don’t forget security best practices

11.11 Soal Latihan

Review Questions

Jelaskan perbedaan antara DevOps dan MLOps. Apa tantangan unique di ML?
Apa keuntungan menggunakan Docker untuk ML deployment?
Bandingkan FastAPI dan Flask untuk model serving. Kapan menggunakan masing-masing?
Apa itu ONNX? Mengapa penting untuk deployment?
Jelaskan perbedaan batch inference dan real-time inference. Berikan contoh use case masing-masing.
Apa itu data drift? Mengapa perlu dimonitor?
Sebutkan 5 metrics penting untuk monitor ML models di production.
Jelaskan konsep blue-green deployment dan canary deployment.
Apa itu model versioning? Mengapa penting?
Jelaskan CI/CD/CT dalam context MLOps.

Coding Exercises

Exercise 1: Dockerize ML Model - Train scikit-learn model - Create Dockerfile - Build dan run container - Test API endpoints

Exercise 2: FastAPI Application - Implement complete FastAPI app dengan:

Health check endpoint
Single prediction
Batch prediction
Model info endpoint
Add input validation dengan Pydantic
Add error handling

Exercise 3: ONNX Conversion - Train PyTorch model - Convert to ONNX - Compare performance (PyTorch vs ONNX) - Implement ONNX-powered API

Exercise 4: Monitoring Dashboard - Implement metrics collection (Prometheus) - Create Grafana dashboard - Setup alerts untuk accuracy drops - Log predictions ke database

Exercise 5: CI/CD Pipeline - Setup GitHub Actions workflow - Implement automated testing - Auto-deploy on main branch - Add model versioning

Project: End-to-End MLOps Pipeline

Build complete MLOps pipeline:

Development:
- Train classification model
- Use MLflow untuk experiment tracking
- Version data dengan DVC
API Development:
- Build FastAPI application
- Add comprehensive tests
- Document API
Containerization:
- Create optimized Dockerfile
- Setup Docker Compose dengan monitoring
Deployment:
- Deploy to cloud platform (Heroku/Railway/GCP)
- Setup CI/CD pipeline
- Configure monitoring
Monitoring:
- Implement metrics collection
- Setup alerting
- Create dashboard
Documentation:
- API documentation
- Deployment guide
- Monitoring guide

🎓 Selamat! Anda telah menyelesaikan Chapter 11 - MLOps & Deployment!

Ini adalah chapter terakhir di Part III. Anda sekarang memiliki skills untuk:

✅ Build ML models
✅ Deploy ke production
✅ Monitor dan maintain models
✅ Implement MLOps best practices

Next Steps:

Terapkan knowledge ini di project capstone
Explore advanced topics (Kubernetes, model optimization)
Build portfolio dengan production-ready ML systems

Keep learning, keep building! 🚀

--- title: "Bab 11: MLOps & Model Deployment" subtitle: "Dari Development ke Production: ML Lifecycle, Containerization & Deployment Strategies" number-sections: false --- # Bab 11: MLOps & Model Deployment {#sec-chapter-11} ::: {.callout-note} ## 🎯 Hasil Pembelajaran (Learning Outcomes) Setelah mempelajari bab ini, Anda akan mampu: 1. **Memahami** ML lifecycle dan prinsip-prinsip MLOps 2. **Mengimplementasikan** containerization dengan Docker untuk ML applications 3. **Membangun** REST API untuk model serving menggunakan FastAPI dan Flask 4. **Menerapkan** model optimization dengan ONNX Runtime 5. **Merancang** deployment strategies (batch vs real-time, cloud vs edge) 6. **Mengintegrasikan** monitoring dan logging untuk production systems 7. **Mengelola** model versioning dan CI/CD pipelines untuk ML ::: ## 11.1 MLOps: Bridging ML Development dan Production ### 11.1.1 Dari Jupyter Notebook ke Production **The Reality Check:** Anda telah berhasil membuat model dengan 95% accuracy di Jupyter Notebook. Tapi... **bagaimana cara menyajikannya ke users?** Ini adalah **salah satu tantangan terbesar** dalam machine learning! **Production Gap:** ```{mermaid} %%| fig-cap: "Production Gap - tantangan yang dihadapi saat membawa model dari Jupyter Notebook ke production" %%| label: fig-production-gap flowchart LR A["Jupyter Notebook 95% Accuracy"] --> B{"Production Gap"} B --> C["Scalability?"] B --> D["Latency?"] B --> E["Reliability?"] B --> F["Monitoring?"] B --> G["Updates?"] C --> H["Challenges"] D --> H E --> H F --> H G --> H H --> I["MLOps Solutions"] style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px style B fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px,color:#fff style C fill:#fff3cd,stroke:#856404,stroke-width:1px style D fill:#fff3cd,stroke:#856404,stroke-width:1px style E fill:#fff3cd,stroke:#856404,stroke-width:1px style F fill:#fff3cd,stroke:#856404,stroke-width:1px style G fill:#fff3cd,stroke:#856404,stroke-width:1px style H fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff style I fill:#51cf66,stroke:#2b8a3e,stroke-width:2px ``` **Statistik Mengejutkan:** - **87%** model ML **tidak pernah** masuk production (VentureBeat, 2023) - **80%** data science project **gagal deliver** business value - Average time dari prototype ke production: **6-12 bulan** ::: {.callout-warning} ## ⚠️ Common Pitfalls **Kesalahan Umum:** 1. **"Works on my machine"** syndrome 2. Tidak ada model versioning 3. Manual deployment process 4. Tidak ada monitoring 5. Hardcoded configurations 6. Tidak scalable ::: ### 11.1.2 Apa itu MLOps? **MLOps** = **Machine Learning Operations**: Praktik dan kultur untuk **mengotomasi dan mempercepat** ML lifecycle dari development hingga production. **MLOps = DevOps + Data + ML** ```{python} #| echo: true #| code-fold: false import matplotlib.pyplot as plt import matplotlib.patches as mpatches from matplotlib.patches import FancyBboxPatch import numpy as np fig, ax = plt.subplots(figsize=(14, 8)) # Define circles for Venn diagram circle1 = plt.Circle((0.35, 0.5), 0.3, color='#4ECDC4', alpha=0.3, label='DevOps') circle2 = plt.Circle((0.65, 0.5), 0.3, color='#FF6B6B', alpha=0.3, label='Data Engineering') circle3 = plt.Circle((0.5, 0.7), 0.3, color='#95E1D3', alpha=0.3, label='ML/Data Science') ax.add_patch(circle1) ax.add_patch(circle2) ax.add_patch(circle3) # Add labels ax.text(0.2, 0.3, 'CI/CD\nContainers\nOrchestration', fontsize=10, ha='center', weight='bold') ax.text(0.8, 0.3, 'Data Pipelines\nData Quality\nData Versioning', fontsize=10, ha='center', weight='bold') ax.text(0.5, 0.9, 'Model Training\nHyperparameter\nTuning', fontsize=10, ha='center', weight='bold') # Center - MLOps ax.text(0.5, 0.5, 'MLOps\n\n• Model Versioning\n• Automated Training\n• Model Monitoring\n• A/B Testing', fontsize=11, ha='center', va='center', weight='bold', bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.3)) ax.set_xlim(0, 1) ax.set_ylim(0, 1.2) ax.set_aspect('equal') ax.axis('off') ax.set_title('MLOps: Intersection of Three Disciplines', fontsize=16, fontweight='bold', pad=20) plt.tight_layout() plt.show() ``` **Prinsip Utama MLOps:** 1. **Automation**: Automated training, testing, deployment 2. **Continuous X**: CI/CD/CT (Continuous Training) 3. **Versioning**: Code, data, dan model versioning 4. **Monitoring**: Performance dan data drift monitoring 5. **Reproducibility**: Dapat reproduce hasil experiments 6. **Collaboration**: Data scientists, engineers, operations bekerja sama ### 11.1.3 ML Lifecycle **End-to-End ML Lifecycle:** ```{mermaid} graph TB A[Problem Definition] --> B[Data Collection] B --> C[EDA & Analysis] C --> D[Data Preprocessing] D --> E[Feature Engineering] E --> F[Model Training] F --> G[Model Evaluation] G --> H{Good Enough?} H -->|No| E H -->|Yes| I[Model Deployment] I --> J[Monitoring] J --> K{Drift Detected?} K -->|Yes| B K -->|No| J style A fill:#ffd93d style I fill:#6bcf7f style J fill:#4d96ff style K fill:#ff6b6b ``` **Perbandingan Traditional Software vs ML Systems:** | Aspek | Traditional Software | ML Systems | |-------|---------------------|------------| | **Code** | Deterministic | Probabilistic | | **Testing** | Unit tests cukup | Need data validation, model tests | | **Deployment** | Deploy code | Deploy code + data + model | | **Monitoring** | Log errors | Monitor metrics + data drift + model drift | | **Updates** | Fix bugs | Retrain dengan data baru | | **Dependencies** | Libraries | Libraries + data + infrastructure | ### 11.1.4 MLOps Maturity Levels **Level 0: Manual Process** - Everything manual - Jupyter notebooks - No automation - **Problem**: Tidak scalable, tidak reproducible **Level 1: ML Pipeline Automation** - Automated training pipeline - Model versioning - Basic CI/CD - **Better**: Reproducible, tapi deployment masih manual **Level 2: CI/CD Pipeline Automation** - Automated training, testing, deployment - Monitoring dan alerts - Data validation - **Production-ready**: Full automation ```{python} #| echo: true #| code-fold: false import matplotlib.pyplot as plt import numpy as np fig, ax = plt.subplots(figsize=(14, 6)) levels = ['Level 0:\nManual', 'Level 1:\nML Pipeline\nAutomation', 'Level 2:\nFull CI/CD\nAutomation'] characteristics = { 'Manual Work': [90, 40, 10], 'Automation': [10, 60, 90], 'Reproducibility': [20, 70, 95], 'Time to Deploy': [100, 50, 15] } x = np.arange(len(levels)) width = 0.2 multiplier = 0 colors = ['#ff6b6b', '#4ecdc4', '#45b7d1', '#f9ca24'] for attribute, measurement in characteristics.items(): offset = width * multiplier ax.bar(x + offset, measurement, width, label=attribute, alpha=0.8) multiplier += 1 ax.set_ylabel('Score / Time (arbitrary units)', fontsize=12, fontweight='bold') ax.set_xlabel('MLOps Maturity Level', fontsize=12, fontweight='bold') ax.set_title('MLOps Maturity Progression', fontsize=14, fontweight='bold') ax.set_xticks(x + width * 1.5) ax.set_xticklabels(levels) ax.legend(loc='upper left', fontsize=10) ax.grid(axis='y', alpha=0.3) plt.tight_layout() plt.show() ``` ::: {.callout-tip} ## 💡 Goal: Achieve Level 2 Di chapter ini, kita akan belajar fondasi untuk mencapai **Level 2 MLOps**: - Container-based deployment (Docker) - REST API untuk model serving - Model optimization (ONNX) - Basic monitoring ::: ## 11.2 Containerization dengan Docker ### 11.2.1 Mengapa Docker untuk ML? **Problem: "Works on my machine"** ``` Data Scientist: "Model saya works perfectly!" Engineer: "Di server production error terus..." DevOps: "Python version berbeda, library incompatible..." ``` **Solution: Docker Containers** Docker memastikan **environment consistency** dari development hingga production. **Analogi: Shipping Container** Seperti shipping container di pelabuhan: - **Standardized**: Same format, works everywhere - **Isolated**: Content tidak affect luar - **Portable**: Bisa pindah dari truck → ship → train ```{mermaid} graph LR A[Development Laptop] -->|Docker Image| B[Testing Server] B -->|Same Image| C[Production Cloud] C -->|Same Image| D[Edge Device] style A fill:#4ecdc4 style B fill:#45b7d1 style C fill:#51cf66 style D fill:#ffd93d ``` **Keuntungan Docker untuk ML:** 1. **Reproducibility**: Exact same environment 2. **Isolation**: Dependencies tidak conflict 3. **Portability**: Run anywhere (local, cloud, edge) 4. **Scalability**: Easy to scale dengan orchestration (Kubernetes) 5. **Version Control**: Docker images dapat di-version ### 11.2.2 Docker Basics untuk ML **Key Concepts:** - **Image**: Template untuk container (like a class) - **Container**: Running instance of image (like an object) - **Dockerfile**: Recipe untuk build image - **Docker Hub**: Registry untuk share images **Basic Docker Commands:** ```bash # Build image docker build -t my-ml-model:v1 . # Run container docker run -p 8000:8000 my-ml-model:v1 # List containers docker ps # Stop container docker stop <container-id> # Remove container docker rm <container-id> # List images docker images # Remove image docker rmi my-ml-model:v1 ``` ### 11.2.3 Dockerfile untuk ML Model **Example: Dockerfile untuk Scikit-learn Model** ```dockerfile # Base image dengan Python FROM python:3.10-slim # Set working directory WORKDIR /app # Copy requirements COPY requirements.txt . # Install dependencies RUN pip install --no-cache-dir -r requirements.txt # Copy application code COPY . . # Expose port EXPOSE 8000 # Run application CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] ``` **Penjelasan:** 1. **FROM**: Base image (Python 3.10 slim version) 2. **WORKDIR**: Set working directory di container 3. **COPY**: Copy files dari host ke container 4. **RUN**: Execute commands (install dependencies) 5. **EXPOSE**: Document port yang digunakan 6. **CMD**: Default command saat container start **Best Practices Dockerfile:** ```dockerfile # ✅ Good: Multi-stage build untuk size optimization FROM python:3.10-slim as builder WORKDIR /app # Install dependencies in builder stage COPY requirements.txt . RUN pip install --user --no-cache-dir -r requirements.txt # Runtime stage FROM python:3.10-slim WORKDIR /app # Copy only necessary files COPY --from=builder /root/.local /root/.local COPY main.py . COPY model.pkl . # Make sure scripts in .local are usable ENV PATH=/root/.local/bin:$PATH EXPOSE 8000 # Use non-root user untuk security RUN useradd -m myuser USER myuser CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] ``` **requirements.txt:** ```text fastapi==0.104.1 uvicorn[standard]==0.24.0 scikit-learn==1.3.2 pandas==2.1.3 numpy==1.26.2 pydantic==2.5.0 ``` ::: {.callout-important} ## 🔐 Security Best Practices 1. **Don't run as root**: Use non-root user 2. **Minimize image size**: Use slim/alpine base images 3. **Scan for vulnerabilities**: `docker scan my-image` 4. **Don't include secrets**: Use environment variables 5. **Pin versions**: Specific versions di requirements.txt ::: ### 11.2.4 Docker Compose untuk Multi-Container Untuk aplikasi dengan multiple services (API + database + monitoring): **docker-compose.yml:** ```yaml version: '3.8' services: # ML API Service ml-api: build: . ports: - "8000:8000" environment: - MODEL_PATH=/models/model.pkl - DATABASE_URL=postgresql://user:pass@db:5432/mlops volumes: - ./models:/models depends_on: - db restart: unless-stopped # PostgreSQL Database db: image: postgres:15-alpine environment: - POSTGRES_USER=user - POSTGRES_PASSWORD=pass - POSTGRES_DB=mlops volumes: - postgres_data:/var/lib/postgresql/data ports: - "5432:5432" # Prometheus Monitoring prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prometheus_data:/prometheus # Grafana Dashboard grafana: image: grafana/grafana:latest ports: - "3000:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=admin volumes: - grafana_data:/var/lib/grafana depends_on: - prometheus volumes: postgres_data: prometheus_data: grafana_data: ``` **Run dengan Docker Compose:** ```bash # Start all services docker-compose up -d # View logs docker-compose logs -f ml-api # Stop all services docker-compose down # Stop and remove volumes docker-compose down -v ``` ## 11.3 Model Serving dengan FastAPI ### 11.3.1 Mengapa FastAPI? **FastAPI** adalah modern, fast web framework untuk building APIs dengan Python. **Keunggulan FastAPI:** 1. **Performance**: Setara dengan NodeJS dan Go (thanks to Starlette + Pydantic) 2. **Type Hints**: Automatic validation dan documentation 3. **Async Support**: Native async/await support 4. **Auto Docs**: Interactive API docs (Swagger UI) 5. **Easy to Learn**: Intuitive, mirip Flask tapi lebih powerful **Performance Comparison:** ```{python} #| echo: true #| code-fold: false import matplotlib.pyplot as plt import numpy as np frameworks = ['Flask', 'Django', 'FastAPI', 'Go Gin', 'NodeJS\nExpress'] requests_per_sec = [1200, 800, 18000, 25000, 15000] colors = ['#3498db', '#e74c3c', '#2ecc71', '#f39c12', '#9b59b6'] fig, ax = plt.subplots(figsize=(12, 6)) bars = ax.barh(frameworks, requests_per_sec, color=colors, alpha=0.7) # Add value labels for i, (bar, val) in enumerate(zip(bars, requests_per_sec)): ax.text(val + 500, i, f'{val:,}', va='center', fontweight='bold') ax.set_xlabel('Requests per Second', fontsize=12, fontweight='bold') ax.set_title('Web Framework Performance Comparison\n(Higher is Better)', fontsize=14, fontweight='bold') ax.grid(axis='x', alpha=0.3) # Highlight FastAPI bars[2].set_edgecolor('green') bars[2].set_linewidth(3) plt.tight_layout() plt.show() ``` ### 11.3.2 Building ML API dengan FastAPI **Step 1: Train dan Save Model** ```{python} #| echo: true #| eval: false from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split import joblib import pandas as pd # Load data iris = load_iris() X, y = iris.data, iris.target # Split data X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # Train model model = RandomForestClassifier(n_estimators=100, random_state=42) model.fit(X_train, y_train) # Evaluate accuracy = model.score(X_test, y_test) print(f"Model Accuracy: {accuracy:.3f}") # Save model joblib.dump(model, 'models/iris_model.pkl') print("Model saved to models/iris_model.pkl") # Save feature names dan target names untuk API import json metadata = { 'feature_names': iris.feature_names, 'target_names': iris.target_names.tolist(), 'accuracy': float(accuracy) } with open('models/metadata.json', 'w') as f: json.dump(metadata, f, indent=2) ``` **Step 2: Create Pydantic Models untuk Validation** ```{python} #| echo: true #| eval: false from pydantic import BaseModel, Field, validator from typing import List class IrisFeatures(BaseModel): """Input features untuk Iris prediction""" sepal_length: float = Field(..., ge=0, le=10, description="Sepal length in cm") sepal_width: float = Field(..., ge=0, le=10, description="Sepal width in cm") petal_length: float = Field(..., ge=0, le=10, description="Petal length in cm") petal_width: float = Field(..., ge=0, le=10, description="Petal width in cm") @validator('*') def check_positive(cls, v): if v < 0: raise ValueError('Value must be positive') return v class PredictionResponse(BaseModel): """Response format untuk prediction""" prediction: str probability: float all_probabilities: dict model_version: str = "1.0.0" class HealthResponse(BaseModel): """Health check response""" status: str model_loaded: bool version: str ``` **Step 3: FastAPI Application (main.py)** ```{python} #| echo: true #| eval: false from fastapi import FastAPI, HTTPException, status from fastapi.responses import JSONResponse import joblib import numpy as np import json from pathlib import Path import logging # Setup logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) # Initialize FastAPI app app = FastAPI( title="Iris Classification API", description="ML API untuk prediksi species Iris flowers", version="1.0.0", docs_url="/docs", redoc_url="/redoc" ) # Global variables model = None metadata = None @app.on_event("startup") async def load_model(): """Load model saat aplikasi start""" global model, metadata try: model_path = Path("models/iris_model.pkl") metadata_path = Path("models/metadata.json") if not model_path.exists(): raise FileNotFoundError(f"Model file not found: {model_path}") model = joblib.load(model_path) if metadata_path.exists(): with open(metadata_path, 'r') as f: metadata = json.load(f) logger.info("Model loaded successfully") logger.info(f"Model accuracy: {metadata.get('accuracy', 'N/A')}") except Exception as e: logger.error(f"Error loading model: {e}") raise @app.get("/", tags=["General"]) async def root(): """Root endpoint""" return { "message": "Welcome to Iris Classification API", "docs": "/docs", "health": "/health" } @app.get("/health", response_model=HealthResponse, tags=["General"]) async def health_check(): """Health check endpoint""" return { "status": "healthy" if model is not None else "unhealthy", "model_loaded": model is not None, "version": "1.0.0" } @app.post("/predict", response_model=PredictionResponse, tags=["Prediction"]) async def predict(features: IrisFeatures): """ Predict Iris species berdasarkan features - **sepal_length**: Sepal length dalam cm - **sepal_width**: Sepal width dalam cm - **petal_length**: Petal length dalam cm - **petal_width**: Petal width dalam cm """ try: # Prepare input input_data = np.array([[ features.sepal_length, features.sepal_width, features.petal_length, features.petal_width ]]) # Predict prediction = model.predict(input_data)[0] probabilities = model.predict_proba(input_data)[0] # Get class names class_names = metadata.get('target_names', ['setosa', 'versicolor', 'virginica']) # Format response return { "prediction": class_names[prediction], "probability": float(probabilities[prediction]), "all_probabilities": { class_names[i]: float(prob) for i, prob in enumerate(probabilities) }, "model_version": "1.0.0" } except Exception as e: logger.error(f"Prediction error: {e}") raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f"Prediction failed: {str(e)}" ) @app.post("/batch_predict", tags=["Prediction"]) async def batch_predict(features_list: List[IrisFeatures]): """Batch prediction untuk multiple samples""" try: predictions = [] for features in features_list: input_data = np.array([[ features.sepal_length, features.sepal_width, features.petal_length, features.petal_width ]]) prediction = model.predict(input_data)[0] probabilities = model.predict_proba(input_data)[0] class_names = metadata.get('target_names', ['setosa', 'versicolor', 'virginica']) predictions.append({ "prediction": class_names[prediction], "probability": float(probabilities[prediction]) }) return {"predictions": predictions, "count": len(predictions)} except Exception as e: logger.error(f"Batch prediction error: {e}") raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f"Batch prediction failed: {str(e)}" ) @app.get("/model/info", tags=["Model"]) async def model_info(): """Get model information dan metadata""" if metadata is None: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail="Metadata not found" ) return { "model_type": "RandomForestClassifier", "accuracy": metadata.get('accuracy'), "features": metadata.get('feature_names'), "classes": metadata.get('target_names'), "version": "1.0.0" } # Exception handler @app.exception_handler(Exception) async def global_exception_handler(request, exc): logger.error(f"Global exception: {exc}") return JSONResponse( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, content={"detail": "Internal server error"} ) ``` **Step 4: Run API** ```bash # Development mode (auto-reload) uvicorn main:app --reload --host 0.0.0.0 --port 8000 # Production mode (multiple workers) uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4 ``` **Step 5: Test API** ```bash # Health check curl http://localhost:8000/health # Single prediction curl -X POST "http://localhost:8000/predict" \ -H "Content-Type: application/json" \ -d '{ "sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2 }' # Batch prediction curl -X POST "http://localhost:8000/batch_predict" \ -H "Content-Type: application/json" \ -d '[ {"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}, {"sepal_length": 6.7, "sepal_width": 3.0, "petal_length": 5.2, "petal_width": 2.3} ]' # Model info curl http://localhost:8000/model/info ``` **Interactive API Documentation:** Buka browser: `http://localhost:8000/docs` FastAPI automatically generates **Swagger UI** untuk testing API! ### 11.3.3 Flask Alternative (Simpler) Untuk project sederhana, Flask juga bisa digunakan: ```{python} #| echo: true #| eval: false from flask import Flask, request, jsonify import joblib import numpy as np app = Flask(__name__) # Load model model = joblib.load('models/iris_model.pkl') @app.route('/predict', methods=['POST']) def predict(): try: data = request.get_json() features = np.array([[ data['sepal_length'], data['sepal_width'], data['petal_length'], data['petal_width'] ]]) prediction = model.predict(features)[0] probability = model.predict_proba(features)[0] return jsonify({ 'prediction': int(prediction), 'probability': float(probability[prediction]) }) except Exception as e: return jsonify({'error': str(e)}), 400 if __name__ == '__main__': app.run(host='0.0.0.0', port=8000, debug=False) ``` **Perbandingan FastAPI vs Flask:** | Aspek | FastAPI | Flask | |-------|---------|-------| | **Performance** | ⭐⭐⭐⭐⭐ (Async) | ⭐⭐⭐ (Sync) | | **Type Validation** | ✅ Built-in (Pydantic) | ❌ Manual | | **Auto Documentation** | ✅ Swagger + ReDoc | ❌ Need extensions | | **Async Support** | ✅ Native | ⚠️ Via extensions | | **Learning Curve** | Medium | Easy | | **Best For** | Production APIs | Prototypes, simple apps | ::: {.callout-tip} ## 💡 Recommendation - **Prototype cepat**: Flask - **Production API**: FastAPI - **Legacy systems**: Flask (banyak resources) - **High performance**: FastAPI ::: ## 11.4 Model Optimization dengan ONNX ### 11.4.1 Apa itu ONNX? **ONNX** = **Open Neural Network Exchange**: Open format untuk representing ML models. **Problem yang Dipecahkan:** ```{mermaid} graph LR A[Train in PyTorch] --> B{Deploy where?} B --> C[TensorFlow Serving] B --> D[ONNX Runtime] B --> E[Mobile CoreML] B --> F[Edge TensorRT] style B fill:#ff6b6b style D fill:#51cf66 ``` Tanpa ONNX: Need to convert manually, atau stuck dengan satu framework. **Dengan ONNX:** 1. **Train** di framework favorit (PyTorch, TensorFlow, scikit-learn) 2. **Convert** ke ONNX format 3. **Deploy** di berbagai platforms dengan ONNX Runtime **Keuntungan ONNX:** 1. **Interoperability**: Framework-agnostic 2. **Performance**: ONNX Runtime highly optimized 3. **Portability**: Run di berbagai devices 4. **Optimization**: Built-in graph optimizations ### 11.4.2 ONNX Runtime Performance **Benchmark Results:** ```{python} #| echo: true #| code-fold: false import matplotlib.pyplot as plt import numpy as np frameworks = ['PyTorch\n(CPU)', 'TensorFlow\n(CPU)', 'ONNX Runtime\n(CPU)', 'PyTorch\n(GPU)', 'ONNX Runtime\n(GPU)'] inference_time = [12.5, 10.3, 4.2, 2.8, 1.1] # ms per batch fig, ax = plt.subplots(figsize=(12, 6)) colors = ['#e74c3c', '#3498db', '#2ecc71', '#e74c3c', '#2ecc71'] bars = ax.bar(frameworks, inference_time, color=colors, alpha=0.7) # Highlight ONNX bars[2].set_edgecolor('green') bars[2].set_linewidth(3) bars[4].set_edgecolor('green') bars[4].set_linewidth(3) ax.set_ylabel('Inference Time (ms)', fontsize=12, fontweight='bold') ax.set_title('ONNX Runtime Performance Comparison\n(Lower is Better)', fontsize=14, fontweight='bold') ax.grid(axis='y', alpha=0.3) # Add value labels for bar, val in zip(bars, inference_time): height = bar.get_height() ax.text(bar.get_x() + bar.get_width()/2., height, f'{val} ms', ha='center', va='bottom', fontweight='bold') # Add speedup annotations ax.text(0, 13, '3x slower', ha='center', fontsize=10, style='italic') ax.text(2, 5, '⚡ Fast!', ha='center', fontsize=11, fontweight='bold', color='green') ax.text(4, 2, '⚡⚡ Fastest!', ha='center', fontsize=11, fontweight='bold', color='green') plt.tight_layout() plt.show() ``` ### 11.4.3 Convert Scikit-learn Model ke ONNX ```{python} #| echo: true #| eval: false from skl2onnx import convert_sklearn from skl2onnx.common.data_types import FloatTensorType import joblib import numpy as np # Load trained model model = joblib.load('models/iris_model.pkl') # Define input type # Shape: (None, 4) berarti batch_size can vary, 4 features initial_type = [('float_input', FloatTensorType([None, 4]))] # Convert to ONNX onnx_model = convert_sklearn( model, initial_types=initial_type, target_opset=12 # ONNX opset version ) # Save ONNX model with open("models/iris_model.onnx", "wb") as f: f.write(onnx_model.SerializeToString()) print("Model converted to ONNX successfully!") # Verify conversion import onnxruntime as rt # Load ONNX model sess = rt.InferenceSession("models/iris_model.onnx") # Test prediction input_name = sess.get_inputs()[0].name label_name = sess.get_outputs()[0].name test_data = np.array([[5.1, 3.5, 1.4, 0.2]], dtype=np.float32) pred_onnx = sess.run([label_name], {input_name: test_data}) print(f"ONNX Prediction: {pred_onnx}") # Compare dengan original model pred_sklearn = model.predict(test_data) print(f"Scikit-learn Prediction: {pred_sklearn}") print(f"Match: {np.array_equal(pred_onnx[0], pred_sklearn)}") ``` ### 11.4.4 Convert PyTorch Model ke ONNX ```{python} #| echo: true #| eval: false import torch import torch.nn as nn import torch.onnx # Define simple PyTorch model class SimpleNN(nn.Module): def __init__(self): super(SimpleNN, self).__init__() self.fc1 = nn.Linear(4, 16) self.relu = nn.ReLU() self.fc2 = nn.Linear(16, 8) self.fc3 = nn.Linear(8, 3) def forward(self, x): x = self.fc1(x) x = self.relu(x) x = self.fc2(x) x = self.relu(x) x = self.fc3(x) return x # Create model instance model = SimpleNN() model.eval() # Dummy input untuk tracing dummy_input = torch.randn(1, 4) # Export to ONNX torch.onnx.export( model, # Model dummy_input, # Dummy input "models/pytorch_model.onnx", # Output file export_params=True, # Store trained weights opset_version=12, # ONNX version do_constant_folding=True, # Optimize constants input_names=['input'], # Input names output_names=['output'], # Output names dynamic_axes={ # Variable length axes 'input': {0: 'batch_size'}, 'output': {0: 'batch_size'} } ) print("PyTorch model exported to ONNX") # Verify import onnxruntime as rt import numpy as np sess = rt.InferenceSession("models/pytorch_model.onnx") input_name = sess.get_inputs()[0].name test_input = np.random.randn(1, 4).astype(np.float32) onnx_output = sess.run(None, {input_name: test_input}) print(f"ONNX Output shape: {onnx_output[0].shape}") ``` ### 11.4.5 FastAPI dengan ONNX Runtime **Optimized API dengan ONNX:** ```{python} #| echo: true #| eval: false from fastapi import FastAPI import onnxruntime as rt import numpy as np from pydantic import BaseModel app = FastAPI(title="ONNX-Powered ML API") # Load ONNX model session = rt.InferenceSession("models/iris_model.onnx") input_name = session.get_inputs()[0].name output_name = session.get_outputs()[0].name class IrisFeatures(BaseModel): sepal_length: float sepal_width: float petal_length: float petal_width: float @app.post("/predict") async def predict(features: IrisFeatures): # Prepare input input_data = np.array([[ features.sepal_length, features.sepal_width, features.petal_length, features.petal_width ]], dtype=np.float32) # Run inference dengan ONNX Runtime pred = session.run([output_name], {input_name: input_data}) return { "prediction": int(pred[0][0]), "model": "ONNX Runtime" } ``` **Performance Comparison:** ```{python} #| echo: true #| eval: false import time import numpy as np # Test data test_data = np.random.randn(1000, 4).astype(np.float32) # Benchmark scikit-learn start = time.time() for i in range(1000): _ = sklearn_model.predict(test_data[i:i+1]) sklearn_time = time.time() - start # Benchmark ONNX start = time.time() for i in range(1000): _ = session.run([output_name], {input_name: test_data[i:i+1]}) onnx_time = time.time() - start print(f"Scikit-learn: {sklearn_time:.3f}s") print(f"ONNX Runtime: {onnx_time:.3f}s") print(f"Speedup: {sklearn_time/onnx_time:.2f}x") ``` ::: {.callout-important} ## 🚀 When to Use ONNX? **Use ONNX when:** - High-throughput inference needed - Deploying to multiple platforms - Edge deployment (mobile, IoT) - Cross-framework compatibility required **Skip ONNX when:** - Simple prototype - Model changes frequently - Framework-specific features needed ::: ## 11.5 Deployment Strategies ### 11.5.1 Batch vs Real-time Serving **Batch Inference:** Process multiple samples together pada scheduled intervals. **Use Cases:** - Daily recommendations - Periodic fraud detection - Nightly data processing - Batch ETL pipelines **Architecture:** ```{mermaid} graph LR A[Data Lake] --> B[Batch Job Scheduler] B --> C[Load Data] C --> D[Model Inference] D --> E[Save Predictions] E --> F[(Results DB)] style B fill:#ffd93d style D fill:#51cf66 ``` **Example: Batch Prediction Script** ```{python} #| echo: true #| eval: false import pandas as pd import joblib from datetime import datetime import logging # Setup logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) def batch_predict(input_csv, output_csv, model_path): """ Batch prediction untuk large datasets """ # Load model logger.info(f"Loading model from {model_path}") model = joblib.load(model_path) # Load data in chunks (untuk large files) logger.info(f"Processing {input_csv}") chunk_size = 10000 chunks = [] for chunk in pd.read_csv(input_csv, chunksize=chunk_size): # Predict predictions = model.predict(chunk) probabilities = model.predict_proba(chunk).max(axis=1) # Add predictions to chunk chunk['prediction'] = predictions chunk['confidence'] = probabilities chunk['predicted_at'] = datetime.now() chunks.append(chunk) logger.info(f"Processed {len(chunk)} rows") # Combine all chunks result = pd.concat(chunks, ignore_index=True) # Save results result.to_csv(output_csv, index=False) logger.info(f"Results saved to {output_csv}") logger.info(f"Total rows processed: {len(result)}") return result # Run batch job if __name__ == "__main__": batch_predict( input_csv="data/new_samples.csv", output_csv="results/predictions.csv", model_path="models/iris_model.pkl" ) ``` **Scheduling dengan Cron (Linux):** ```bash # Run setiap hari jam 2 pagi 0 2 * * * /usr/bin/python3 /path/to/batch_predict.py ``` **Real-time Inference:** Process requests immediately saat user request. **Use Cases:** - Web applications - Mobile apps - Chatbots - Real-time recommendations - Fraud detection (transaction time) **Architecture:** ```{mermaid} graph LR A[User Request] --> B[Load Balancer] B --> C[API Server 1] B --> D[API Server 2] B --> E[API Server N] C --> F[Model] D --> F E --> F F --> G[Response] style B fill:#4ecdc4 style F fill:#51cf66 ``` **Perbandingan:** | Aspek | Batch | Real-time | |-------|-------|-----------| | **Latency** | Hours/days | Milliseconds | | **Throughput** | Very high | Lower | | **Cost** | Lower (off-peak) | Higher (always on) | | **Complexity** | Simple | Complex (scaling, monitoring) | | **Use Case** | Offline analytics | User-facing apps | ### 11.5.2 Cloud Deployment Options **Option 1: Platform as a Service (PaaS)** **Examples:** - Google Cloud AI Platform - AWS SageMaker - Azure ML - Hugging Face Inference API **Pros:** - Managed infrastructure - Auto-scaling - Built-in monitoring - Easy deployment **Cons:** - Higher cost - Vendor lock-in - Less control **Example: Deploy ke Hugging Face Spaces (Free!)** ```python # app.py - Gradio interface import gradio as gr import joblib import numpy as np model = joblib.load('iris_model.pkl') def predict(sepal_length, sepal_width, petal_length, petal_width): features = np.array([[sepal_length, sepal_width, petal_length, petal_width]]) pred = model.predict(features)[0] prob = model.predict_proba(features)[0] classes = ['Setosa', 'Versicolor', 'Virginica'] return {classes[i]: float(prob[i]) for i in range(3)} iface = gr.Interface( fn=predict, inputs=[ gr.Number(label="Sepal Length"), gr.Number(label="Sepal Width"), gr.Number(label="Petal Length"), gr.Number(label="Petal Width") ], outputs=gr.Label(label="Prediction"), title="Iris Classifier", description="Predict Iris species" ) iface.launch() ``` **Option 2: Container Orchestration (Kubernetes)** **For production-scale applications:** ```yaml # kubernetes-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: ml-api-deployment spec: replicas: 3 # 3 instances selector: matchLabels: app: ml-api template: metadata: labels: app: ml-api spec: containers: - name: ml-api image: your-registry/ml-api:v1 ports: - containerPort: 8000 resources: requests: memory: "256Mi" cpu: "500m" limits: memory: "512Mi" cpu: "1000m" env: - name: MODEL_PATH value: "/models/model.pkl" --- apiVersion: v1 kind: Service metadata: name: ml-api-service spec: selector: app: ml-api ports: - protocol: TCP port: 80 targetPort: 8000 type: LoadBalancer ``` **Option 3: Serverless Functions** **Examples:** - AWS Lambda - Google Cloud Functions - Azure Functions **Pros:** - Pay per request - Auto-scaling to zero - No server management **Cons:** - Cold start latency - Execution time limits - Package size limits **Example: AWS Lambda Handler** ```{python} #| echo: true #| eval: false import json import joblib import numpy as np import boto3 # Load model from S3 (di cold start) s3 = boto3.client('s3') s3.download_file('my-bucket', 'models/model.pkl', '/tmp/model.pkl') model = joblib.load('/tmp/model.pkl') def lambda_handler(event, context): """AWS Lambda handler function""" try: # Parse input body = json.loads(event['body']) features = np.array([[ body['sepal_length'], body['sepal_width'], body['petal_length'], body['petal_width'] ]]) # Predict prediction = model.predict(features)[0] probability = model.predict_proba(features)[0] # Return response return { 'statusCode': 200, 'body': json.dumps({ 'prediction': int(prediction), 'probability': float(probability[prediction]) }) } except Exception as e: return { 'statusCode': 500, 'body': json.dumps({'error': str(e)}) } ``` ### 11.5.3 Edge Deployment **Edge Computing**: Run ML models **on device** (smartphone, IoT, edge servers). **Why Edge?** - **Low latency**: No network roundtrip - **Privacy**: Data stays on device - **Offline capability**: Works tanpa internet - **Bandwidth saving**: No data upload **Challenges:** - Limited compute resources - Small memory footprint - Battery constraints - Model optimization crucial **Tools untuk Edge:** 1. **TensorFlow Lite**: Mobile & embedded devices 2. **ONNX Runtime Mobile**: Cross-platform 3. **CoreML**: iOS devices 4. **PyTorch Mobile**: Android & iOS **Example: Model Quantization untuk Edge** ```{python} #| echo: true #| eval: false import torch import torch.quantization # Load PyTorch model model = SimpleNN() model.load_state_dict(torch.load('model.pth')) model.eval() # Quantization (Float32 -> Int8) # Reduces model size 4x! quantized_model = torch.quantization.quantize_dynamic( model, {torch.nn.Linear}, # Quantize Linear layers dtype=torch.qint8 ) # Save quantized model torch.save(quantized_model.state_dict(), 'model_quantized.pth') # Compare sizes import os original_size = os.path.getsize('model.pth') / (1024**2) # MB quantized_size = os.path.getsize('model_quantized.pth') / (1024**2) print(f"Original model: {original_size:.2f} MB") print(f"Quantized model: {quantized_size:.2f} MB") print(f"Compression ratio: {original_size/quantized_size:.2f}x") ``` ## 11.6 Monitoring & Maintenance ### 11.6.1 Mengapa Monitoring Penting? **Models degrade over time!** ```{python} #| echo: true #| code-fold: false import matplotlib.pyplot as plt import numpy as np # Simulate model performance degradation months = np.arange(0, 13) accuracy = 95 - 0.5 * months + np.random.randn(13) * 0.5 fig, ax = plt.subplots(figsize=(12, 6)) ax.plot(months, accuracy, marker='o', linewidth=2, markersize=8, color='#e74c3c') ax.axhline(y=90, color='orange', linestyle='--', linewidth=2, label='Warning Threshold') ax.axhline(y=85, color='red', linestyle='--', linewidth=2, label='Critical Threshold') ax.fill_between(months, 85, 90, alpha=0.2, color='orange') ax.fill_between(months, 0, 85, alpha=0.2, color='red') ax.set_xlabel('Months in Production', fontsize=12, fontweight='bold') ax.set_ylabel('Model Accuracy (%)', fontsize=12, fontweight='bold') ax.set_title('Model Performance Degradation Over Time', fontsize=14, fontweight='bold') ax.legend(fontsize=11) ax.grid(alpha=0.3) ax.set_ylim([80, 100]) # Add annotations ax.annotate('Model deployed', xy=(0, 95), xytext=(1, 97), arrowprops=dict(arrowstyle='->', color='green'), fontsize=10, color='green', fontweight='bold') ax.annotate('Retrain needed!', xy=(10, 88), xytext=(7, 82), arrowprops=dict(arrowstyle='->', color='red'), fontsize=10, color='red', fontweight='bold') plt.tight_layout() plt.show() ``` **Why Performance Degrades:** 1. **Data Drift**: Input data distribution changes 2. **Concept Drift**: Relationship between features dan target changes 3. **Upstream Changes**: Data pipeline changes 4. **Seasonal Effects**: Patterns change over time ### 11.6.2 What to Monitor? **1. Model Performance Metrics** ```{python} #| echo: true #| eval: false from prometheus_client import Counter, Histogram, Gauge import time # Define metrics prediction_counter = Counter( 'predictions_total', 'Total number of predictions', ['model_version', 'prediction_class'] ) prediction_latency = Histogram( 'prediction_latency_seconds', 'Prediction latency in seconds', buckets=[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0] ) model_accuracy = Gauge( 'model_accuracy', 'Current model accuracy', ['model_version'] ) # Usage dalam prediction function @app.post("/predict") async def predict(features: IrisFeatures): start_time = time.time() # Predict prediction = model.predict(...) # Record metrics prediction_counter.labels( model_version='1.0.0', prediction_class=prediction ).inc() latency = time.time() - start_time prediction_latency.observe(latency) return {"prediction": prediction} ``` **2. Data Quality Metrics** ```{python} #| echo: true #| eval: false from typing import Dict import numpy as np class DataQualityMonitor: def __init__(self, reference_stats: Dict): """ Initialize dengan reference statistics dari training data """ self.reference_stats = reference_stats def check_data_drift(self, new_data: np.ndarray) -> Dict: """ Check for statistical drift in input data """ results = {} for i, feature_name in enumerate(self.reference_stats['features']): # Calculate statistics ref_mean = self.reference_stats['means'][i] ref_std = self.reference_stats['stds'][i] new_mean = new_data[:, i].mean() new_std = new_data[:, i].std() # Check drift (simple threshold-based) mean_drift = abs(new_mean - ref_mean) / ref_std std_drift = abs(new_std - ref_std) / ref_std results[feature_name] = { 'mean_drift': mean_drift, 'std_drift': std_drift, 'drift_detected': mean_drift > 2.0 or std_drift > 0.5 } return results def check_missing_values(self, data: np.ndarray) -> Dict: """Check for missing or invalid values""" return { 'missing_count': np.isnan(data).sum(), 'missing_percentage': np.isnan(data).mean() * 100 } def check_outliers(self, data: np.ndarray) -> Dict: """Detect outliers using IQR method""" Q1 = np.percentile(data, 25, axis=0) Q3 = np.percentile(data, 75, axis=0) IQR = Q3 - Q1 lower_bound = Q1 - 1.5 * IQR upper_bound = Q3 + 1.5 * IQR outliers = ((data < lower_bound) | (data > upper_bound)).any(axis=1) return { 'outlier_count': outliers.sum(), 'outlier_percentage': outliers.mean() * 100 } # Usage monitor = DataQualityMonitor(reference_stats={ 'features': ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], 'means': [5.84, 3.05, 3.76, 1.20], 'stds': [0.83, 0.43, 1.76, 0.76] }) # Check new batch of data drift_report = monitor.check_data_drift(new_data) if any(result['drift_detected'] for result in drift_report.values()): print("⚠️ Data drift detected! Consider retraining.") ``` **3. System Metrics** ```{python} #| echo: true #| eval: false import psutil from prometheus_client import Gauge # CPU and Memory metrics cpu_usage = Gauge('cpu_usage_percent', 'CPU usage percentage') memory_usage = Gauge('memory_usage_percent', 'Memory usage percentage') def collect_system_metrics(): """Collect system resource metrics""" cpu_usage.set(psutil.cpu_percent(interval=1)) memory_usage.set(psutil.virtual_memory().percent) # Call periodically (e.g., dalam background task) ``` ### 11.6.3 Logging Best Practices **Structured Logging:** ```{python} #| echo: true #| eval: false import logging import json from datetime import datetime class StructuredLogger: def __init__(self, name: str): self.logger = logging.getLogger(name) self.logger.setLevel(logging.INFO) # JSON formatter handler = logging.StreamHandler() self.logger.addHandler(handler) def log_prediction(self, input_data: dict, prediction: dict, latency: float): """Log prediction dengan structured format""" log_entry = { 'timestamp': datetime.utcnow().isoformat(), 'event': 'prediction', 'input': input_data, 'output': prediction, 'latency_ms': latency * 1000, 'model_version': '1.0.0' } self.logger.info(json.dumps(log_entry)) def log_error(self, error: Exception, context: dict): """Log error dengan context""" log_entry = { 'timestamp': datetime.utcnow().isoformat(), 'event': 'error', 'error_type': type(error).__name__, 'error_message': str(error), 'context': context } self.logger.error(json.dumps(log_entry)) # Usage logger = StructuredLogger('ml-api') @app.post("/predict") async def predict(features: IrisFeatures): start = time.time() try: prediction = model.predict(...) logger.log_prediction( input_data=features.dict(), prediction={'class': prediction}, latency=time.time() - start ) return {"prediction": prediction} except Exception as e: logger.log_error(e, context={'features': features.dict()}) raise ``` ### 11.6.4 Alerting & Notification **Setup Alerts untuk Critical Events:** ```{python} #| echo: true #| eval: false class AlertManager: def __init__(self, webhook_url: str): self.webhook_url = webhook_url def send_alert(self, severity: str, message: str, metrics: dict = None): """Send alert ke Slack/Teams/Email""" import requests payload = { "text": f"🚨 [{severity}] ML Model Alert", "attachments": [{ "color": "danger" if severity == "CRITICAL" else "warning", "fields": [ {"title": "Message", "value": message}, {"title": "Metrics", "value": str(metrics)} ], "ts": int(time.time()) }] } requests.post(self.webhook_url, json=payload) # Usage alert_manager = AlertManager(webhook_url="https://hooks.slack.com/...") # Check accuracy drop if current_accuracy < 85: alert_manager.send_alert( severity="CRITICAL", message="Model accuracy dropped below threshold!", metrics={'current_accuracy': current_accuracy} ) ``` ### 11.6.5 Model Retraining Strategy **When to Retrain?** ```{mermaid} flowchart TD A[Monitor Metrics] --> B{Performance Degraded?} B -->|No| A B -->|Yes| C{Data Drift Detected?} C -->|Yes| D[Collect New Data] C -->|No| E[Check Data Quality] D --> F[Retrain Model] E --> G{Data Quality OK?} G -->|Yes| H[Investigate Code] G -->|No| I[Fix Data Pipeline] F --> J[Evaluate New Model] J --> K{Better than Current?} K -->|Yes| L[Deploy New Model] K -->|No| M[Keep Current Model + Investigate] L --> A M --> A style B fill:#ffd93d style C fill:#ffd93d style F fill:#51cf66 style L fill:#51cf66 ``` **Automated Retraining Pipeline:** ```{python} #| echo: true #| eval: false from datetime import datetime, timedelta import schedule class AutoRetrainingPipeline: def __init__(self, model_trainer, performance_threshold: float = 0.85): self.model_trainer = model_trainer self.performance_threshold = performance_threshold self.last_retrain = datetime.now() def should_retrain(self, current_metrics: dict) -> bool: """Decide if retraining is needed""" # Rule 1: Performance degradation if current_metrics['accuracy'] < self.performance_threshold: return True # Rule 2: Time-based (every 30 days) if (datetime.now() - self.last_retrain) > timedelta(days=30): return True # Rule 3: Data drift detected if current_metrics.get('data_drift', False): return True return False def retrain_and_deploy(self): """Full retraining and deployment pipeline""" print("🔄 Starting automated retraining...") # 1. Fetch new data new_data = self.fetch_new_training_data() # 2. Train new model new_model = self.model_trainer.train(new_data) # 3. Evaluate metrics = self.evaluate_model(new_model) # 4. Compare dengan current model if metrics['accuracy'] > self.get_current_model_accuracy(): print("✅ New model better, deploying...") self.deploy_model(new_model) self.last_retrain = datetime.now() else: print("⚠️ New model not better, keeping current model") def fetch_new_training_data(self): """Fetch data from production logs""" # Implementation: query database, S3, etc. pass def evaluate_model(self, model): """Evaluate model on test set""" # Implementation pass def deploy_model(self, model): """Deploy new model to production""" # Implementation: save to S3, update model server, etc. pass # Schedule retraining checks pipeline = AutoRetrainingPipeline(model_trainer=...) # Check every day schedule.every().day.at("02:00").do( lambda: pipeline.retrain_and_deploy() if pipeline.should_retrain(get_current_metrics()) else None ) ``` ## 11.7 Model Versioning & Experiment Tracking ### 11.7.1 Mengapa Model Versioning? **Scenario tanpa versioning:** ``` Engineer: "Model production bermasalah!" Data Scientist: "Coba rollback ke versi sebelumnya" Engineer: "Yang mana? Ada 10 file model.pkl..." Data Scientist: "Yang... accuracy 95%?" Engineer: "Semuanya 95%..." 😭 ``` **With proper versioning:** ``` Engineer: "Rollback ke model v1.2.3" System: "Deployed model v1.2.3 (trained 2024-01-15, accuracy 95.3%)" Engineer: "Problem solved!" ✅ ``` ### 11.7.2 Tools untuk ML Experiment Tracking **Popular Tools:** 1. **MLflow**: Open-source, comprehensive 2. **Weights & Biases (wandb)**: Cloud-based, great UI 3. **Neptune.ai**: Enterprise features 4. **TensorBoard**: Built-in dengan TensorFlow/PyTorch **Example: MLflow** ```{python} #| echo: true #| eval: false import mlflow import mlflow.sklearn from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score, f1_score # Set experiment mlflow.set_experiment("iris-classification") # Start run with mlflow.start_run(run_name="random-forest-v1"): # Log parameters params = { 'n_estimators': 100, 'max_depth': 10, 'random_state': 42 } mlflow.log_params(params) # Train model model = RandomForestClassifier(**params) model.fit(X_train, y_train) # Evaluate y_pred = model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) f1 = f1_score(y_test, y_pred, average='weighted') # Log metrics mlflow.log_metrics({ 'accuracy': accuracy, 'f1_score': f1 }) # Log model mlflow.sklearn.log_model( model, "model", registered_model_name="iris-classifier" ) # Log artifacts (plots, data, etc.) import matplotlib.pyplot as plt # ... create plot ... plt.savefig("confusion_matrix.png") mlflow.log_artifact("confusion_matrix.png") print(f"Run ID: {mlflow.active_run().info.run_id}") ``` **MLflow UI:** ```bash # Start MLflow UI mlflow ui --host 0.0.0.0 --port 5000 # Access at: http://localhost:5000 ``` **Load Best Model:** ```{python} #| echo: true #| eval: false # Load model by version model_version = "1" model = mlflow.pyfunc.load_model(f"models:/iris-classifier/{model_version}") # Or load best model based on metric from mlflow.tracking import MlflowClient client = MlflowClient() runs = client.search_runs( experiment_ids=["1"], order_by=["metrics.accuracy DESC"], max_results=1 ) best_run_id = runs[0].info.run_id best_model = mlflow.sklearn.load_model(f"runs:/{best_run_id}/model") ``` ### 11.7.3 Git untuk Model Versioning (DVC) **DVC** (Data Version Control): Git untuk data dan models. ```bash # Initialize DVC dvc init # Add model to DVC dvc add models/iris_model.pkl # This creates models/iris_model.pkl.dvc file # Commit to Git git add models/iris_model.pkl.dvc .gitignore git commit -m "Add model v1.0.0" git tag -a v1.0.0 -m "Model version 1.0.0" # Push model to remote storage (S3, GCS, etc.) dvc remote add -d storage s3://my-bucket/dvc-storage dvc push # Pull specific version git checkout v1.0.0 dvc pull ``` ## 11.8 CI/CD untuk ML (MLOps Pipeline) ### 11.8.1 Traditional CI/CD vs ML CI/CD **Traditional Software CI/CD:** ``` Code → Build → Test → Deploy ``` **ML CI/CD (includes CT - Continuous Training):** ``` Code + Data → Build → Test → Train → Evaluate → Deploy → Monitor → Retrain ``` ### 11.8.2 Example: GitHub Actions untuk ML **.github/workflows/ml-pipeline.yml:** ```yaml name: ML Pipeline on: push: branches: [main] pull_request: branches: [main] schedule: # Retrain weekly - cron: '0 0 * * 0' jobs: data-validation: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup Python uses: actions/setup-python@v4 with: python-version: '3.10' - name: Install dependencies run: | pip install -r requirements.txt - name: Validate data run: | python scripts/validate_data.py - name: Check data drift run: | python scripts/check_drift.py train-model: needs: data-validation runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup Python uses: actions/setup-python@v4 with: python-version: '3.10' - name: Install dependencies run: | pip install -r requirements.txt - name: Train model run: | python scripts/train_model.py - name: Evaluate model run: | python scripts/evaluate_model.py - name: Upload model artifact uses: actions/upload-artifact@v3 with: name: trained-model path: models/ test-api: needs: train-model runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Download model uses: actions/download-artifact@v3 with: name: trained-model path: models/ - name: Build Docker image run: | docker build -t ml-api:test . - name: Run API tests run: | docker run -d -p 8000:8000 ml-api:test sleep 10 python tests/test_api.py deploy: needs: test-api runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' steps: - uses: actions/checkout@v3 - name: Download model uses: actions/download-artifact@v3 with: name: trained-model path: models/ - name: Deploy to production run: | # Deploy to cloud (AWS, GCP, Azure) # Or update Kubernetes deployment echo "Deploying to production..." ``` ### 11.8.3 Model Testing **Different levels of testing:** ```{python} #| echo: true #| eval: false import pytest import numpy as np import joblib # 1. Unit Tests: Individual components def test_preprocessing(): """Test preprocessing function""" from src.preprocessing import normalize_features data = np.array([[1, 2], [3, 4]]) normalized = normalize_features(data) assert normalized.mean() < 0.1 assert normalized.std() < 1.1 # 2. Model Tests: Model behavior def test_model_predictions(): """Test model produces valid predictions""" model = joblib.load('models/iris_model.pkl') # Test input X_test = np.array([[5.1, 3.5, 1.4, 0.2]]) # Predict prediction = model.predict(X_test) probabilities = model.predict_proba(X_test) # Assertions assert prediction.shape == (1,) assert prediction[0] in [0, 1, 2] assert probabilities.shape == (1, 3) assert np.isclose(probabilities.sum(), 1.0) # 3. Data Tests: Input validation def test_data_schema(): """Test input data matches expected schema""" import pandas as pd data = pd.read_csv('data/new_data.csv') # Check columns expected_columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'] assert all(col in data.columns for col in expected_columns) # Check dtypes assert data[expected_columns].dtypes.apply( lambda x: np.issubdtype(x, np.number) ).all() # Check ranges assert (data[expected_columns] >= 0).all().all() assert (data[expected_columns] <= 10).all().all() # 4. Performance Tests: Accuracy threshold def test_model_accuracy(): """Test model meets accuracy threshold""" model = joblib.load('models/iris_model.pkl') from sklearn.datasets import load_iris from sklearn.metrics import accuracy_score X, y = load_iris(return_X_y=True) predictions = model.predict(X) accuracy = accuracy_score(y, predictions) # Assert minimum accuracy assert accuracy >= 0.90, f"Model accuracy {accuracy} below threshold" # 5. Integration Tests: API endpoint def test_api_prediction(): """Test API returns correct format""" import requests response = requests.post( "http://localhost:8000/predict", json={ "sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2 } ) assert response.status_code == 200 data = response.json() assert 'prediction' in data assert 'probability' in data # Run tests # pytest tests/ -v ``` ## 11.9 Best Practices Summary ### 11.9.1 Development Best Practices ::: {.callout-note} ## ✅ Checklist: Model Development **Before Training:** - [ ] Version control code (Git) - [ ] Version control data (DVC) - [ ] Document experiment (MLflow/wandb) - [ ] Set random seeds untuk reproducibility - [ ] Define performance metrics **During Training:** - [ ] Log hyperparameters - [ ] Track metrics - [ ] Save checkpoints - [ ] Monitor training (TensorBoard) - [ ] Validate on held-out set **After Training:** - [ ] Evaluate on test set - [ ] Generate performance report - [ ] Check for biases - [ ] Document model card - [ ] Save model artifacts ::: ### 11.9.2 Deployment Best Practices ::: {.callout-note} ## ✅ Checklist: Deployment **Pre-deployment:** - [ ] Containerize application (Docker) - [ ] Write comprehensive tests - [ ] Document API endpoints - [ ] Set up monitoring - [ ] Plan rollback strategy **During Deployment:** - [ ] Use blue-green or canary deployment - [ ] Monitor metrics closely - [ ] Have alerting configured - [ ] Document deployment process - [ ] Communicate with stakeholders **Post-deployment:** - [ ] Monitor performance metrics - [ ] Track data drift - [ ] Collect user feedback - [ ] Plan retraining schedule - [ ] Update documentation ::: ### 11.9.3 Security Best Practices ::: {.callout-important} ## 🔐 Security Checklist **Application Security:** - [ ] Use HTTPS for API - [ ] Implement authentication (API keys, OAuth) - [ ] Rate limiting - [ ] Input validation - [ ] SQL injection protection (if using DB) **Infrastructure Security:** - [ ] Don't run containers as root - [ ] Scan Docker images for vulnerabilities - [ ] Use secrets management (AWS Secrets, Vault) - [ ] Network isolation - [ ] Regular security updates **Data Security:** - [ ] Encrypt data at rest - [ ] Encrypt data in transit - [ ] Anonymize sensitive data - [ ] Comply with GDPR/regulations - [ ] Audit logging ::: ## 11.10 Rangkuman ::: {.callout-note} ## 📚 Chapter Summary **1. MLOps Fundamentals:** - MLOps = DevOps + Data + ML - Automate ML lifecycle - 3 maturity levels: Manual → Pipeline → Full CI/CD **2. Containerization:** - Docker untuk environment consistency - Dockerfile best practices - Docker Compose untuk multi-service apps **3. Model Serving:** - FastAPI untuk production APIs - Flask untuk prototypes - REST API design principles **4. ONNX Optimization:** - Framework-agnostic model format - Performance improvements - Cross-platform deployment **5. Deployment Strategies:** - Batch vs Real-time - Cloud (PaaS, Kubernetes, Serverless) - Edge deployment **6. Monitoring:** - Model performance metrics - Data drift detection - System metrics - Automated alerting **7. Model Versioning:** - MLflow experiment tracking - DVC untuk data/model versioning - Git workflow **8. CI/CD:** - Automated testing - Continuous training - Deployment automation ::: ### Key Takeaways 1. **Production ≠ Jupyter Notebook**: Need proper engineering practices 2. **Monitoring is Critical**: Models degrade over time 3. **Automation**: Automate everything (training, testing, deployment) 4. **Versioning**: Version code, data, AND models 5. **Security**: Don't forget security best practices ## 11.11 Soal Latihan ### Review Questions 1. Jelaskan perbedaan antara DevOps dan MLOps. Apa tantangan unique di ML? 2. Apa keuntungan menggunakan Docker untuk ML deployment? 3. Bandingkan FastAPI dan Flask untuk model serving. Kapan menggunakan masing-masing? 4. Apa itu ONNX? Mengapa penting untuk deployment? 5. Jelaskan perbedaan batch inference dan real-time inference. Berikan contoh use case masing-masing. 6. Apa itu data drift? Mengapa perlu dimonitor? 7. Sebutkan 5 metrics penting untuk monitor ML models di production. 8. Jelaskan konsep blue-green deployment dan canary deployment. 9. Apa itu model versioning? Mengapa penting? 10. Jelaskan CI/CD/CT dalam context MLOps. ### Coding Exercises **Exercise 1: Dockerize ML Model** - Train scikit-learn model - Create Dockerfile - Build dan run container - Test API endpoints **Exercise 2: FastAPI Application** - Implement complete FastAPI app dengan: - Health check endpoint - Single prediction - Batch prediction - Model info endpoint - Add input validation dengan Pydantic - Add error handling **Exercise 3: ONNX Conversion** - Train PyTorch model - Convert to ONNX - Compare performance (PyTorch vs ONNX) - Implement ONNX-powered API **Exercise 4: Monitoring Dashboard** - Implement metrics collection (Prometheus) - Create Grafana dashboard - Setup alerts untuk accuracy drops - Log predictions ke database **Exercise 5: CI/CD Pipeline** - Setup GitHub Actions workflow - Implement automated testing - Auto-deploy on main branch - Add model versioning ### Project: End-to-End MLOps Pipeline **Build complete MLOps pipeline:** 1. **Development**: - Train classification model - Use MLflow untuk experiment tracking - Version data dengan DVC 2. **API Development**: - Build FastAPI application - Add comprehensive tests - Document API 3. **Containerization**: - Create optimized Dockerfile - Setup Docker Compose dengan monitoring 4. **Deployment**: - Deploy to cloud platform (Heroku/Railway/GCP) - Setup CI/CD pipeline - Configure monitoring 5. **Monitoring**: - Implement metrics collection - Setup alerting - Create dashboard 6. **Documentation**: - API documentation - Deployment guide - Monitoring guide --- **🎓 Selamat! Anda telah menyelesaikan Chapter 11 - MLOps & Deployment!** **Ini adalah chapter terakhir di Part III. Anda sekarang memiliki skills untuk:** - ✅ Build ML models - ✅ Deploy ke production - ✅ Monitor dan maintain models - ✅ Implement MLOps best practices **Next Steps:** - Terapkan knowledge ini di project capstone - Explore advanced topics (Kubernetes, model optimization) - Build portfolio dengan production-ready ML systems **Keep learning, keep building! 🚀**