Lab 11: Model Deployment dengan FastAPI, ONNX, dan Docker

Production-Ready Machine Learning Model Serving

Author

Pembelajaran Mesin - Data Science for Cybersecurity

Published

December 15, 2025

25 Pendahuluan

25.1 Tujuan Pembelajaran

Setelah menyelesaikan lab ini, Anda diharapkan dapat:

Memahami pipeline deployment model machine learning dari development ke production
Mengkonversi trained models ke format ONNX untuk optimasi
Membangun REST API menggunakan FastAPI untuk model serving
Mengimplementasikan containerization dengan Docker
Mengevaluasi performance model di production environment
Mengoptimalkan latency dan throughput model inference
Membuat unit tests dan integration tests untuk API
Menerapkan best practices dalam production deployment

25.2 Gambaran Umum Lab

Lab ini fokus pada deployment production-ready model machine learning dengan:

Model Training: Melatih model klasifikasi sederhana
Model Optimization: Konversi ke ONNX untuk performa lebih baik
API Development: FastAPI untuk model serving
Containerization: Docker untuk reproducibility
Testing & Monitoring: Quality assurance di production

25.2.1 Durasi Lab

2-3 jam dengan breakdown:

Setup & Installation: 20 menit
Model Training & Conversion: 30 menit
FastAPI Development: 45 menit
Docker Configuration: 30 menit
Testing & Optimization: 20 menit

25.2.2 Lab Structure

graph TD
    A[Setup & Installation] --> B[Train Model]
    B --> C[Convert to ONNX]
    C --> D[Build FastAPI App]
    D --> E[Create Dockerfile]
    E --> F[Local Testing]
    F --> G[Performance Optimization]
    G --> H[Docker Deployment]

    style A fill:#e6f3ff
    style B fill:#ffe6e6
    style C fill:#ffffcc
    style D fill:#ccffcc
    style E fill:#e6ccff
    style F fill:#ffcccc
    style G fill:#ccffe6
    style H fill:#ffccff

graph TD
    A[Setup & Installation] --> B[Train Model]
    B --> C[Convert to ONNX]
    C --> D[Build FastAPI App]
    D --> E[Create Dockerfile]
    E --> F[Local Testing]
    F --> G[Performance Optimization]
    G --> H[Docker Deployment]

    style A fill:#e6f3ff
    style B fill:#ffe6e6
    style C fill:#ffffcc
    style D fill:#ccffcc
    style E fill:#e6ccff
    style F fill:#ffcccc
    style G fill:#ccffe6
    style H fill:#ffccff

25.3 Persiapan Environment

25.3.1 Install Dependencies

import subprocess
import sys
import os

packages = [
    'fastapi',
    'uvicorn',
    'pydantic',
    'scikit-learn',
    'pandas',
    'numpy',
    'onnx',
    'onnxruntime',
    'skl2onnx',
    'httpx',  # untuk testing
    'python-multipart'
]

for package in packages:
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', package])

print("✓ All packages installed successfully!")

25.3.2 Import Libraries

# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import json
import warnings
warnings.filterwarnings('ignore')

# Model training
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score,
    f1_score, confusion_matrix, classification_report
)

# ONNX
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

# Utilities
from datetime import datetime
import joblib

print("✓ All imports successful!")

26 BAGIAN 1: Model Training & Optimization

26.1 1.1 Dataset Preparation

26.1.1 Load Iris Dataset

# Load dataset
X, y = load_iris(return_X_y=True)
feature_names = load_iris().feature_names
target_names = load_iris().target_names

print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(target_names)}")
print(f"Feature names: {feature_names}")
print(f"Target names: {target_names}")

# Create DataFrame
df = pd.DataFrame(X, columns=feature_names)
df['target'] = y

print(f"\nFirst 5 rows:\n{df.head()}")

26.1.2 Train-Test Split

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"✓ Data scaled successfully")

26.2 1.2 Model Training

# Train model
model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    random_state=42,
    n_jobs=-1
)

model.fit(X_train_scaled, y_train)

# Evaluate
y_pred_train = model.predict(X_train_scaled)
y_pred_test = model.predict(X_test_scaled)

print("Training Metrics:")
print(f"  Accuracy: {accuracy_score(y_train, y_pred_train):.4f}")

print("\nTest Metrics:")
print(f"  Accuracy: {accuracy_score(y_test, y_pred_test):.4f}")
print(f"  Precision: {precision_score(y_test, y_pred_test, average='weighted'):.4f}")
print(f"  Recall: {recall_score(y_test, y_pred_test, average='weighted'):.4f}")
print(f"  F1-Score: {f1_score(y_test, y_pred_test, average='weighted'):.4f}")

26.2.1 Save Trained Model

# Create model directory
model_dir = Path('models')
model_dir.mkdir(exist_ok=True)

# Save model and scaler
model_path = model_dir / 'iris_model.pkl'
scaler_path = model_dir / 'iris_scaler.pkl'

joblib.dump(model, model_path)
joblib.dump(scaler, scaler_path)

print(f"✓ Model saved to {model_path}")
print(f"✓ Scaler saved to {scaler_path}")

26.3 1.3 Model Conversion to ONNX

ONNX (Open Neural Network Exchange) adalah format standar untuk model machine learning yang memungkinkan:

Optimasi inference
Portabilitas antar platform
Hardware acceleration
Lebih cepat dari pickle format

# Define initial types
initial_type = [('float_input', FloatTensorType([None, 4]))]

# Convert model to ONNX
onnx_model = convert_sklearn(model, initial_types=initial_type)

# Save ONNX model
onnx_path = model_dir / 'iris_model.onnx'
onnx.save_model(onnx_model, str(onnx_path))

print(f"✓ Model converted to ONNX and saved to {onnx_path}")
print(f"✓ ONNX file size: {onnx_path.stat().st_size / 1024:.2f} KB")

# Verify ONNX model
onnx_model_loaded = onnx.load(str(onnx_path))
onnx.checker.check_model(onnx_model_loaded)
print(f"✓ ONNX model verified successfully")

26.4 1.4 ONNX Model Inference Testing

# Create ONNX Runtime session
sess = ort.InferenceSession(str(onnx_path))

# Get input/output names
input_name = sess.get_inputs()[0].name
output_names = [output.name for output in sess.get_outputs()]

print(f"Input name: {input_name}")
print(f"Output names: {output_names}")

# Prepare test data
X_test_float = X_test_scaled.astype(np.float32)

# Run inference
onnx_predictions = sess.run(
    output_names,
    {input_name: X_test_float}
)

print(f"\nONNX Predictions shape: {onnx_predictions[0].shape}")
print(f"First 5 predictions: {onnx_predictions[0][:5]}")

# Verify accuracy
onnx_pred_classes = np.argmax(onnx_predictions[1], axis=1)
onnx_accuracy = accuracy_score(y_test, onnx_pred_classes)

print(f"\n✓ ONNX Model Accuracy: {onnx_accuracy:.4f}")
print(f"✓ Matches sklearn model: {np.allclose(y_pred_test, onnx_pred_classes)}")

27 BAGIAN 2: FastAPI Model Serving

27.1 2.1 Create FastAPI Application Structure

# Create application directory structure
app_dir = Path('fastapi_app')
app_dir.mkdir(exist_ok=True)

print(f"Created application directory: {app_dir}")
print(f"Directory structure:")
print(f"  fastapi_app/")
print(f"    ├── main.py")
print(f"    ├── models.py")
print(f"    ├── utils.py")
print(f"    ├── config.py")
print(f"    └── requirements.txt")

27.2 2.2 Configuration Module

# Create config.py
config_content = '''"""
Configuration module for FastAPI application
"""
from pathlib import Path
from typing import List

# Paths
BASE_DIR = Path(__file__).parent.parent
MODELS_DIR = BASE_DIR / "models"

# Model configuration
MODEL_PATH = MODELS_DIR / "iris_model.onnx"
SCALER_PATH = MODELS_DIR / "iris_scaler.pkl"

# Feature configuration
FEATURE_NAMES = [
    "sepal_length_cm",
    "sepal_width_cm",
    "petal_length_cm",
    "petal_width_cm"
]

TARGET_NAMES = ["setosa", "versicolor", "virginica"]

# API configuration
API_TITLE = "Iris Model API"
API_DESCRIPTION = "Production-ready ML model serving API"
API_VERSION = "1.0.0"

# Server configuration
HOST = "0.0.0.0"
PORT = 8000
DEBUG = False

config_path = app_dir / ‘config.py’ config_path.write_text(config_content) print(f”✓ Created {config_path}“)


## 2.3 Data Models

::: {#e6eb5f13 .cell}
``` {.python .cell-code}
# Create models.py for Pydantic models
models_content = '''"""
Pydantic data models for request/response validation
"""
from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum

class IrisFeatures(BaseModel):
    """Input features for iris prediction"""
    sepal_length_cm: float = Field(..., gt=0, le=10)
    sepal_width_cm: float = Field(..., gt=0, le=10)
    petal_length_cm: float = Field(..., ge=0, le=10)
    petal_width_cm: float = Field(..., ge=0, le=10)

    class Config:
        schema_extra = {
            "example": {
                "sepal_length_cm": 5.1,
                "sepal_width_cm": 3.5,
                "petal_length_cm": 1.4,
                "petal_width_cm": 0.2
            }
        }

class IrisBatchFeatures(BaseModel):
    """Batch input features"""
    samples: List[IrisFeatures]

class PredictionResponse(BaseModel):
    """Single prediction response"""
    input_features: dict
    predicted_class: str
    predicted_label: int
    confidence: float
    probabilities: dict
    inference_time_ms: float

class BatchPredictionResponse(BaseModel):
    """Batch prediction response"""
    predictions: List[PredictionResponse]
    total_inference_time_ms: float
    samples_count: int

class HealthResponse(BaseModel):
    """Health check response"""
    status: str
    version: str
    models_loaded: bool

:::

models_path = app_dir / ‘models.py’ models_path.write_text(models_content) print(f”✓ Created {models_path}“)


## 2.4 Utilities Module

::: {#e62190fb .cell}
``` {.python .cell-code}
# Create utils.py for helper functions
utils_content = '''"""
Utility functions for model inference
"""
import numpy as np
import onnxruntime as ort
import joblib
from pathlib import Path
import time
from typing import Dict, Tuple, List
from config import MODEL_PATH, SCALER_PATH, TARGET_NAMES

class ModelInference:
    """Singleton class for model inference"""
    _instance = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
            cls._instance._initialized = False
        return cls._instance

    def __init__(self):
        if self._initialized:
            return

        # Load ONNX model
        self.session = ort.InferenceSession(str(MODEL_PATH))
        self.input_name = self.session.get_inputs()[0].name
        self.output_names = [o.name for o in self.session.get_outputs()]

        # Load scaler
        self.scaler = joblib.load(str(SCALER_PATH))

        self._initialized = True

    def predict(self, features: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
        """
        Make predictions using ONNX model

        Args:
            features: numpy array of shape (n_samples, 4)

        Returns:
            predicted_classes, probabilities
        """
        # Scale features
        features_scaled = self.scaler.transform(features)
        features_scaled = features_scaled.astype(np.float32)

        # Run inference
        outputs = self.session.run(
            self.output_names,
            {self.input_name: features_scaled}
        )

        # Extract predictions and probabilities
        probabilities = outputs[1]  # Class probabilities
        predictions = np.argmax(probabilities, axis=1)

        return predictions, probabilities

def get_model() -> ModelInference:
    """Get model inference singleton"""
    return ModelInference()

def format_prediction(
    features_dict: Dict,
    predicted_class: int,
    probabilities: np.ndarray,
    inference_time_ms: float
) -> Dict:
    """Format prediction results"""
    class_name = TARGET_NAMES[predicted_class]
    confidence = float(probabilities[0, predicted_class])

    probs_dict = {
        TARGET_NAMES[i]: float(probabilities[0, i])
        for i in range(len(TARGET_NAMES))
    }

    return {
        "input_features": features_dict,
        "predicted_class": class_name,
        "predicted_label": int(predicted_class),
        "confidence": confidence,
        "probabilities": probs_dict,
        "inference_time_ms": inference_time_ms
    }
'''

utils_path = app_dir / 'utils.py'
utils_path.write_text(utils_content)
print(f"✓ Created {utils_path}")

:::

27.3 2.5 Main FastAPI Application

# Create main.py
main_content = '''"""
FastAPI application for iris model serving
"""
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
import time
import numpy as np
from typing import List

from config import (
    API_TITLE, API_DESCRIPTION, API_VERSION,
    FEATURE_NAMES, TARGET_NAMES
)
from models import (
    IrisFeatures, IrisBatchFeatures,
    PredictionResponse, BatchPredictionResponse,
    HealthResponse
)
from utils import get_model, format_prediction

# Create FastAPI app
app = FastAPI(
    title=API_TITLE,
    description=API_DESCRIPTION,
    version=API_VERSION
)

# Add CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.on_event("startup")
async def startup_event():
    """Load model on startup"""
    try:
        model = get_model()
        print("✓ Model loaded successfully on startup")
    except Exception as e:
        print(f"✗ Error loading model: {e}")
        raise

@app.get("/health", response_model=HealthResponse)
async def health_check():
    """Health check endpoint"""
    try:
        model = get_model()
        return HealthResponse(
            status="healthy",
            version=API_VERSION,
            models_loaded=True
        )
    except Exception as e:
        return JSONResponse(
            status_code=503,
            content={
                "status": "unhealthy",
                "version": API_VERSION,
                "models_loaded": False,
                "error": str(e)
            }
        )

@app.get("/info")
async def get_info():
    """Get API information"""
    return {
        "api_title": API_TITLE,
        "api_version": API_VERSION,
        "feature_names": FEATURE_NAMES,
        "target_classes": TARGET_NAMES,
        "description": API_DESCRIPTION
    }

@app.post("/predict", response_model=PredictionResponse)
async def predict(features: IrisFeatures):
    """
    Make single prediction

    Example request:
    {
        "sepal_length_cm": 5.1,
        "sepal_width_cm": 3.5,
        "petal_length_cm": 1.4,
        "petal_width_cm": 0.2
    }
    """
    try:
        # Extract features
        feature_values = np.array([
            features.sepal_length_cm,
            features.sepal_width_cm,
            features.petal_length_cm,
            features.petal_width_cm
        ]).reshape(1, -1)

        # Measure inference time
        start_time = time.time()

        # Get model and make prediction
        model = get_model()
        predictions, probabilities = model.predict(feature_values)

        inference_time = (time.time() - start_time) * 1000

        # Format response
        result = format_prediction(
            features.dict(),
            predictions[0],
            probabilities,
            inference_time
        )

        return PredictionResponse(**result)

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/predict/batch", response_model=BatchPredictionResponse)
async def predict_batch(batch_features: IrisBatchFeatures):
    """
    Make batch predictions

    Example request:
    {
        "samples": [
            {"sepal_length_cm": 5.1, "sepal_width_cm": 3.5,
             "petal_length_cm": 1.4, "petal_width_cm": 0.2},
            {"sepal_length_cm": 7.0, "sepal_width_cm": 3.2,
             "petal_length_cm": 4.7, "petal_width_cm": 1.4}
        ]
    }
    """
    try:
        if not batch_features.samples:
            raise HTTPException(status_code=400, detail="Empty samples list")

        # Extract features
        feature_values = np.array([
            [
                s.sepal_length_cm,
                s.sepal_width_cm,
                s.petal_length_cm,
                s.petal_width_cm
            ]
            for s in batch_features.samples
        ])

        # Measure inference time
        start_time = time.time()

        # Get model and make predictions
        model = get_model()
        predictions, probabilities = model.predict(feature_values)

        total_inference_time = (time.time() - start_time) * 1000

        # Format responses
        results = []
        for i, sample in enumerate(batch_features.samples):
            result = format_prediction(
                sample.dict(),
                predictions[i],
                probabilities,
                total_inference_time / len(batch_features.samples)
            )
            results.append(PredictionResponse(**result))

        return BatchPredictionResponse(
            predictions=results,
            total_inference_time_ms=total_inference_time,
            samples_count=len(batch_features.samples)
        )

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(
        app,
        host="0.0.0.0",
        port=8000,
        workers=1
    )
'''

main_path = app_dir / 'main.py'
main_path.write_text(main_content)
print(f"✓ Created {main_path}")

27.4 2.6 Requirements File

# Create requirements.txt for FastAPI app
requirements_content = '''fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
scikit-learn==1.3.2
pandas==2.1.3
numpy==1.24.3
onnx==1.15.0
onnxruntime==1.17.0
skl2onnx==1.16.0
joblib==1.3.2
httpx==0.25.1
python-multipart==0.0.6
'''

req_path = app_dir / 'requirements.txt'
req_path.write_text(requirements_content)
print(f"✓ Created {req_path}")

28 BAGIAN 3: Docker Containerization

28.1 3.1 Create Dockerfile

# Create Dockerfile
dockerfile_content = '''FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \\
    build-essential \\
    && rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY fastapi_app/requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY fastapi_app/ .
COPY models/ /app/models/

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \\
    CMD python -c "import requests; requests.get('http://localhost:8000/health')" || exit 1

# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
'''

dockerfile_path = Path('Dockerfile')
dockerfile_path.write_text(dockerfile_content)
print(f"✓ Created Dockerfile")

28.2 3.2 Create Docker Compose (Optional)

# Create docker-compose.yml for easy local testing
docker_compose_content = '''version: '3.8'

services:
  iris-api:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    volumes:
      - ./models:/app/models:ro
    environment:
      - PYTHONUNBUFFERED=1
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
'''

compose_path = Path('docker-compose.yml')
compose_path.write_text(docker_compose_content)
print(f"✓ Created docker-compose.yml")

28.3 3.3 .dockerignore File

# Create .dockerignore
dockerignore_content = '''__pycache__
*.pyc
*.pyo
*.egg-info
.env
.venv
.git
.gitignore
.docker
.dockerignore
node_modules
*.md
.pytest_cache
.coverage
htmlcov
dist
build
*.log
tests/
.vscode
.idea
'''

dockerignore_path = Path('.dockerignore')
dockerignore_path.write_text(dockerignore_content)
print(f"✓ Created .dockerignore")

29 BAGIAN 4: Testing & Quality Assurance

29.1 4.1 Unit Tests

# Create tests directory and test_api.py
tests_dir = Path('tests')
tests_dir.mkdir(exist_ok=True)

test_api_content = '''"""
Unit tests for FastAPI application
"""
import pytest
import sys
from pathlib import Path
import numpy as np

# Add parent directory to path
sys.path.insert(0, str(Path(__file__).parent.parent / "fastapi_app"))

from fastapi.testclient import TestClient
from main import app
from config import TARGET_NAMES

client = TestClient(app)

class TestHealthEndpoint:
    """Test health check endpoint"""

    def test_health_check_returns_200(self):
        response = client.get("/health")
        assert response.status_code == 200

    def test_health_response_structure(self):
        response = client.get("/health")
        data = response.json()
        assert "status" in data
        assert "version" in data
        assert "models_loaded" in data
        assert data["status"] == "healthy"

class TestInfoEndpoint:
    """Test info endpoint"""

    def test_get_info(self):
        response = client.get("/info")
        assert response.status_code == 200

        data = response.json()
        assert "api_title" in data
        assert "feature_names" in data
        assert "target_classes" in data
        assert len(data["feature_names"]) == 4
        assert len(data["target_classes"]) == 3

class TestPredictEndpoint:
    """Test single prediction endpoint"""

    def test_valid_prediction(self):
        """Test valid prediction request"""
        payload = {
            "sepal_length_cm": 5.1,
            "sepal_width_cm": 3.5,
            "petal_length_cm": 1.4,
            "petal_width_cm": 0.2
        }

        response = client.post("/predict", json=payload)
        assert response.status_code == 200

        data = response.json()
        assert "predicted_class" in data
        assert "predicted_label" in data
        assert "confidence" in data
        assert "probabilities" in data
        assert "inference_time_ms" in data

        # Verify prediction is valid
        assert data["predicted_label"] in [0, 1, 2]
        assert data["predicted_class"] in TARGET_NAMES
        assert 0 <= data["confidence"] <= 1

    def test_invalid_sepal_length(self):
        """Test invalid sepal length (negative)"""
        payload = {
            "sepal_length_cm": -5.1,
            "sepal_width_cm": 3.5,
            "petal_length_cm": 1.4,
            "petal_width_cm": 0.2
        }

        response = client.post("/predict", json=payload)
        assert response.status_code == 422  # Validation error

    def test_setosa_prediction(self):
        """Test setosa iris prediction"""
        payload = {
            "sepal_length_cm": 5.1,
            "sepal_width_cm": 3.5,
            "petal_length_cm": 1.4,
            "petal_width_cm": 0.2
        }

        response = client.post("/predict", json=payload)
        assert response.status_code == 200

        data = response.json()
        # Setosa should have distinct features
        assert data["predicted_class"] == "setosa"

    def test_virginica_prediction(self):
        """Test virginica iris prediction"""
        payload = {
            "sepal_length_cm": 7.9,
            "sepal_width_cm": 3.8,
            "petal_length_cm": 6.4,
            "petal_width_cm": 2.0
        }

        response = client.post("/predict", json=payload)
        assert response.status_code == 200

        data = response.json()
        assert data["predicted_class"] == "virginica"

class TestBatchPredictEndpoint:
    """Test batch prediction endpoint"""

    def test_valid_batch_prediction(self):
        """Test valid batch prediction"""
        payload = {
            "samples": [
                {
                    "sepal_length_cm": 5.1,
                    "sepal_width_cm": 3.5,
                    "petal_length_cm": 1.4,
                    "petal_width_cm": 0.2
                },
                {
                    "sepal_length_cm": 7.0,
                    "sepal_width_cm": 3.2,
                    "petal_length_cm": 4.7,
                    "petal_width_cm": 1.4
                }
            ]
        }

        response = client.post("/predict/batch", json=payload)
        assert response.status_code == 200

        data = response.json()
        assert "predictions" in data
        assert "total_inference_time_ms" in data
        assert "samples_count" in data
        assert len(data["predictions"]) == 2
        assert data["samples_count"] == 2

    def test_empty_batch_prediction(self):
        """Test empty batch prediction"""
        payload = {"samples": []}

        response = client.post("/predict/batch", json=payload)
        assert response.status_code == 400

class TestInferenceTime:
    """Test inference time measurements"""

    def test_single_prediction_time(self):
        """Test that inference time is reasonable"""
        payload = {
            "sepal_length_cm": 5.1,
            "sepal_width_cm": 3.5,
            "petal_length_cm": 1.4,
            "petal_width_cm": 0.2
        }

        response = client.post("/predict", json=payload)
        data = response.json()

        # Should complete in less than 100ms
        assert data["inference_time_ms"] < 100

    def test_batch_prediction_time(self):
        """Test batch prediction inference time"""
        payload = {
            "samples": [
                {
                    "sepal_length_cm": 5.1,
                    "sepal_width_cm": 3.5,
                    "petal_length_cm": 1.4,
                    "petal_width_cm": 0.2
                }
                for _ in range(10)
            ]
        }

        response = client.post("/predict/batch", json=payload)
        data = response.json()

        # Should complete in less than 200ms
        assert data["total_inference_time_ms"] < 200

if __name__ == "__main__":
    pytest.main([__file__, "-v"])
'''

test_path = tests_dir / 'test_api.py'
test_path.write_text(test_api_content)
print(f"✓ Created {test_path}")

# Create __init__.py
init_path = tests_dir / '__init__.py'
init_path.write_text('')
print(f"✓ Created {init_path}")

29.2 4.2 Load Testing Script

# Create load testing script
load_test_content = '''"""
Load testing script for API performance evaluation
"""
import time
import requests
import json
from concurrent.futures import ThreadPoolExecutor, as_completed
import statistics

class APILoadTester:
    """Load testing for FastAPI application"""

    def __init__(self, base_url="http://localhost:8000"):
        self.base_url = base_url
        self.results = []

    def single_request(self):
        """Make single prediction request"""
        payload = {
            "sepal_length_cm": 5.1,
            "sepal_width_cm": 3.5,
            "petal_length_cm": 1.4,
            "petal_width_cm": 0.2
        }

        start = time.time()
        try:
            response = requests.post(
                f"{self.base_url}/predict",
                json=payload,
                timeout=5
            )
            elapsed = (time.time() - start) * 1000

            return {
                "status": response.status_code,
                "time_ms": elapsed,
                "success": response.status_code == 200
            }
        except Exception as e:
            elapsed = (time.time() - start) * 1000
            return {
                "status": 0,
                "time_ms": elapsed,
                "success": False,
                "error": str(e)
            }

    def run_load_test(self, num_requests=100, num_workers=10):
        """Run load test with specified parameters"""
        print(f"Starting load test...")
        print(f"  Total requests: {num_requests}")
        print(f"  Concurrent workers: {num_workers}")

        start_time = time.time()

        with ThreadPoolExecutor(max_workers=num_workers) as executor:
            futures = [
                executor.submit(self.single_request)
                for _ in range(num_requests)
            ]

            for i, future in enumerate(as_completed(futures)):
                result = future.result()
                self.results.append(result)

                if (i + 1) % 10 == 0:
                    print(f"  Completed: {i + 1}/{num_requests}")

        total_time = time.time() - start_time
        self.print_results(total_time)

    def print_results(self, total_time):
        """Print load test results"""
        successful = [r for r in self.results if r["success"]]
        failed = [r for r in self.results if not r["success"]]

        times = [r["time_ms"] for r in successful]

        print(f"\\n{'='*50}")
        print(f"LOAD TEST RESULTS")
        print(f"{'='*50}")
        print(f"Total Requests: {len(self.results)}")
        print(f"Successful: {len(successful)} ({len(successful)/len(self.results)*100:.1f}%)")
        print(f"Failed: {len(failed)} ({len(failed)/len(self.results)*100:.1f}%)")
        print(f"Total Time: {total_time:.2f}s")
        print(f"Throughput: {len(self.results)/total_time:.2f} requests/sec")

        if times:
            print(f"\\nResponse Times (ms):")
            print(f"  Min: {min(times):.2f}")
            print(f"  Max: {max(times):.2f}")
            print(f"  Mean: {statistics.mean(times):.2f}")
            print(f"  Median: {statistics.median(times):.2f}")
            print(f"  P95: {sorted(times)[int(len(times)*0.95)]:.2f}")
            print(f"  P99: {sorted(times)[int(len(times)*0.99)]:.2f}")

        if failed:
            print(f"\\nErrors:")
            for error in set(f["error"] for f in failed if "error" in f):
                print(f"  - {error}")

if __name__ == "__main__":
    tester = APILoadTester()

    print("Make sure FastAPI server is running!")
    print("Run: uvicorn fastapi_app.main:app --reload")
    print()

    # Run load test
    tester.run_load_test(num_requests=100, num_workers=10)
'''

load_test_path = Path('load_test.py')
load_test_path.write_text(load_test_content)
print(f"✓ Created {load_test_path}")

30 BAGIAN 5: Deployment & Monitoring

30.1 5.1 Build Summary

# Create summary of all created files
print("\n" + "="*60)
print("DEPLOYMENT PACKAGE CREATED SUCCESSFULLY")
print("="*60)

print("\n📁 Directory Structure:")
print("""
.
├── models/
│   ├── iris_model.pkl
│   ├── iris_scaler.pkl
│   └── iris_model.onnx
├── fastapi_app/
│   ├── config.py
│   ├── models.py
│   ├── utils.py
│   ├── main.py
│   └── requirements.txt
├── tests/
│   ├── __init__.py
│   └── test_api.py
├── Dockerfile
├── docker-compose.yml
├── .dockerignore
└── load_test.py
""")

print("\n📊 Model Files:")
print(f"  ✓ iris_model.pkl (Python pickle)")
print(f"  ✓ iris_scaler.pkl (StandardScaler)")
print(f"  ✓ iris_model.onnx (ONNX format)")

print("\n🚀 API Endpoints:")
print("  GET  /health              - Health check")
print("  GET  /info                - API information")
print("  POST /predict             - Single prediction")
print("  POST /predict/batch       - Batch predictions")

print("\n🔧 Available Commands:")
print("  # Run FastAPI server")
print("  uvicorn fastapi_app.main:app --reload --host 0.0.0.0 --port 8000")
print()
print("  # Run tests")
print("  pytest tests/test_api.py -v")
print()
print("  # Build Docker image")
print("  docker build -t iris-api:latest .")
print()
print("  # Run with Docker")
print("  docker-compose up")
print()
print("  # Run load test")
print("  python load_test.py")

30.2 5.2 Performance Benchmarking

# Performance comparison: pickle vs ONNX
import time
import joblib

print("\n" + "="*60)
print("PERFORMANCE BENCHMARKING")
print("="*60)

# Load both models
sklearn_model = joblib.load(model_dir / 'iris_model.pkl')
scaler = joblib.load(model_dir / 'iris_scaler.pkl')
session = ort.InferenceSession(str(model_dir / 'iris_model.onnx'))
input_name = session.get_inputs()[0].name
output_names = [o.name for o in session.get_outputs()]

# Prepare test data
test_data = X_test_scaled.astype(np.float32)
n_iterations = 1000

# Benchmark sklearn
print("\n⏱️ Benchmarking sklearn RandomForest...")
start = time.time()
for _ in range(n_iterations):
    sklearn_model.predict(test_data)
sklearn_time = time.time() - start

print(f"  Total time: {sklearn_time:.4f}s")
print(f"  Time per sample: {(sklearn_time/n_iterations/len(test_data))*1000:.4f}ms")

# Benchmark ONNX
print("\n⏱️ Benchmarking ONNX Runtime...")
start = time.time()
for _ in range(n_iterations):
    session.run(output_names, {input_name: test_data})
onnx_time = time.time() - start

print(f"  Total time: {onnx_time:.4f}s")
print(f"  Time per sample: {(onnx_time/n_iterations/len(test_data))*1000:.4f}ms")

# Calculate speedup
speedup = sklearn_time / onnx_time
print(f"\n✨ ONNX Speedup: {speedup:.2f}x faster than sklearn!")

31 Summary

Dalam lab ini, Anda telah mempelajari:

31.1 Model Deployment Pipeline

Training: Melatih dan mengevaluasi model
Optimization: Mengkonversi ke ONNX format
API Development: Membangun FastAPI application
Containerization: Docker deployment
Testing: Unit tests dan load testing
Monitoring: Performance benchmarking

31.2 Key Technologies

FastAPI: Modern, fast web framework untuk API
ONNX: Standard format untuk model optimization
ONNX Runtime: Efficient inference engine
Docker: Container technology untuk deployment
Pytest: Unit testing framework

31.3 Best Practices

Input validation dengan Pydantic
Error handling yang robust
Performance monitoring dan benchmarking
Containerization untuk reproducibility
Comprehensive testing sebelum deployment

31.4 Next Steps

Deploy API ke production environment
Implementasi monitoring dan logging
Setup CI/CD pipeline
Optimize untuk high-traffic scenarios
Implementasi authentication dan rate limiting

32 References

--- title: "Lab 11: Model Deployment dengan FastAPI, ONNX, dan Docker" subtitle: "Production-Ready Machine Learning Model Serving" author: "Pembelajaran Mesin - Data Science for Cybersecurity" date: today format: html: toc: true toc-depth: 4 toc-location: left number-sections: true number-depth: 4 code-fold: false code-tools: true code-line-numbers: true code-copy: true theme: cosmo highlight-style: github fig-width: 10 fig-height: 6 fig-dpi: 300 css: ../styles/lab-style.css pdf: toc: true number-sections: true colorlinks: true geometry: - top=20mm - left=20mm - right=20mm - bottom=20mm code-block-bg: "#f5f5f5" code-block-border-left: "#3498db" jupyter: python3 execute: echo: true warning: false message: false cache: false --- # Pendahuluan {#sec-intro} ## Tujuan Pembelajaran {#sec-objectives} Setelah menyelesaikan lab ini, Anda diharapkan dapat: 1. **Memahami** pipeline deployment model machine learning dari development ke production 2. **Mengkonversi** trained models ke format ONNX untuk optimasi 3. **Membangun** REST API menggunakan FastAPI untuk model serving 4. **Mengimplementasikan** containerization dengan Docker 5. **Mengevaluasi** performance model di production environment 6. **Mengoptimalkan** latency dan throughput model inference 7. **Membuat** unit tests dan integration tests untuk API 8. **Menerapkan** best practices dalam production deployment ## Gambaran Umum Lab {#sec-overview} Lab ini fokus pada **deployment production-ready** model machine learning dengan: - **Model Training**: Melatih model klasifikasi sederhana - **Model Optimization**: Konversi ke ONNX untuk performa lebih baik - **API Development**: FastAPI untuk model serving - **Containerization**: Docker untuk reproducibility - **Testing & Monitoring**: Quality assurance di production ### Durasi Lab **2-3 jam** dengan breakdown: - Setup & Installation: 20 menit - Model Training & Conversion: 30 menit - FastAPI Development: 45 menit - Docker Configuration: 30 menit - Testing & Optimization: 20 menit ### Lab Structure ```{mermaid} graph TD A[Setup & Installation] --> B[Train Model] B --> C[Convert to ONNX] C --> D[Build FastAPI App] D --> E[Create Dockerfile] E --> F[Local Testing] F --> G[Performance Optimization] G --> H[Docker Deployment] style A fill:#e6f3ff style B fill:#ffe6e6 style C fill:#ffffcc style D fill:#ccffcc style E fill:#e6ccff style F fill:#ffcccc style G fill:#ccffe6 style H fill:#ffccff ``` ## Persiapan Environment {#sec-setup} ### Install Dependencies ```{python} import subprocess import sys import os packages = [ 'fastapi', 'uvicorn', 'pydantic', 'scikit-learn', 'pandas', 'numpy', 'onnx', 'onnxruntime', 'skl2onnx', 'httpx', # untuk testing 'python-multipart' ] for package in packages: subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', package]) print("✓ All packages installed successfully!") ``` ### Import Libraries ```{python} # Core libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from pathlib import Path import json import warnings warnings.filterwarnings('ignore') # Model training from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import ( accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report ) # ONNX import onnx import onnxruntime as ort from skl2onnx import convert_sklearn from skl2onnx.common.data_types import FloatTensorType # Utilities from datetime import datetime import joblib print("✓ All imports successful!") ``` --- # BAGIAN 1: Model Training & Optimization {#sec-part1} ## 1.1 Dataset Preparation ### Load Iris Dataset ```{python} # Load dataset X, y = load_iris(return_X_y=True) feature_names = load_iris().feature_names target_names = load_iris().target_names print(f"Dataset shape: {X.shape}") print(f"Number of classes: {len(target_names)}") print(f"Feature names: {feature_names}") print(f"Target names: {target_names}") # Create DataFrame df = pd.DataFrame(X, columns=feature_names) df['target'] = y print(f"\nFirst 5 rows:\n{df.head()}") ``` ### Train-Test Split ```{python} # Split data X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y ) print(f"Training set size: {X_train.shape[0]}") print(f"Test set size: {X_test.shape[0]}") # Scale features scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) print(f"✓ Data scaled successfully") ``` ## 1.2 Model Training ```{python} # Train model model = RandomForestClassifier( n_estimators=100, max_depth=10, random_state=42, n_jobs=-1 ) model.fit(X_train_scaled, y_train) # Evaluate y_pred_train = model.predict(X_train_scaled) y_pred_test = model.predict(X_test_scaled) print("Training Metrics:") print(f" Accuracy: {accuracy_score(y_train, y_pred_train):.4f}") print("\nTest Metrics:") print(f" Accuracy: {accuracy_score(y_test, y_pred_test):.4f}") print(f" Precision: {precision_score(y_test, y_pred_test, average='weighted'):.4f}") print(f" Recall: {recall_score(y_test, y_pred_test, average='weighted'):.4f}") print(f" F1-Score: {f1_score(y_test, y_pred_test, average='weighted'):.4f}") ``` ### Save Trained Model ```{python} # Create model directory model_dir = Path('models') model_dir.mkdir(exist_ok=True) # Save model and scaler model_path = model_dir / 'iris_model.pkl' scaler_path = model_dir / 'iris_scaler.pkl' joblib.dump(model, model_path) joblib.dump(scaler, scaler_path) print(f"✓ Model saved to {model_path}") print(f"✓ Scaler saved to {scaler_path}") ``` ## 1.3 Model Conversion to ONNX ONNX (Open Neural Network Exchange) adalah format standar untuk model machine learning yang memungkinkan: - Optimasi inference - Portabilitas antar platform - Hardware acceleration - Lebih cepat dari pickle format ```{python} # Define initial types initial_type = [('float_input', FloatTensorType([None, 4]))] # Convert model to ONNX onnx_model = convert_sklearn(model, initial_types=initial_type) # Save ONNX model onnx_path = model_dir / 'iris_model.onnx' onnx.save_model(onnx_model, str(onnx_path)) print(f"✓ Model converted to ONNX and saved to {onnx_path}") print(f"✓ ONNX file size: {onnx_path.stat().st_size / 1024:.2f} KB") # Verify ONNX model onnx_model_loaded = onnx.load(str(onnx_path)) onnx.checker.check_model(onnx_model_loaded) print(f"✓ ONNX model verified successfully") ``` ## 1.4 ONNX Model Inference Testing ```{python} # Create ONNX Runtime session sess = ort.InferenceSession(str(onnx_path)) # Get input/output names input_name = sess.get_inputs()[0].name output_names = [output.name for output in sess.get_outputs()] print(f"Input name: {input_name}") print(f"Output names: {output_names}") # Prepare test data X_test_float = X_test_scaled.astype(np.float32) # Run inference onnx_predictions = sess.run( output_names, {input_name: X_test_float} ) print(f"\nONNX Predictions shape: {onnx_predictions[0].shape}") print(f"First 5 predictions: {onnx_predictions[0][:5]}") # Verify accuracy onnx_pred_classes = np.argmax(onnx_predictions[1], axis=1) onnx_accuracy = accuracy_score(y_test, onnx_pred_classes) print(f"\n✓ ONNX Model Accuracy: {onnx_accuracy:.4f}") print(f"✓ Matches sklearn model: {np.allclose(y_pred_test, onnx_pred_classes)}") ``` --- # BAGIAN 2: FastAPI Model Serving {#sec-part2} ## 2.1 Create FastAPI Application Structure ```{python} # Create application directory structure app_dir = Path('fastapi_app') app_dir.mkdir(exist_ok=True) print(f"Created application directory: {app_dir}") print(f"Directory structure:") print(f" fastapi_app/") print(f" ├── main.py") print(f" ├── models.py") print(f" ├── utils.py") print(f" ├── config.py") print(f" └── requirements.txt") ``` ## 2.2 Configuration Module ```{python} # Create config.py config_content = '''""" Configuration module for FastAPI application """ from pathlib import Path from typing import List # Paths BASE_DIR = Path(__file__).parent.parent MODELS_DIR = BASE_DIR / "models" # Model configuration MODEL_PATH = MODELS_DIR / "iris_model.onnx" SCALER_PATH = MODELS_DIR / "iris_scaler.pkl" # Feature configuration FEATURE_NAMES = [ "sepal_length_cm", "sepal_width_cm", "petal_length_cm", "petal_width_cm" ] TARGET_NAMES = ["setosa", "versicolor", "virginica"] # API configuration API_TITLE = "Iris Model API" API_DESCRIPTION = "Production-ready ML model serving API" API_VERSION = "1.0.0" # Server configuration HOST = "0.0.0.0" PORT = 8000 DEBUG = False ``` config_path = app_dir / 'config.py' config_path.write_text(config_content) print(f"✓ Created {config_path}") ``` ## 2.3 Data Models ```{python} # Create models.py for Pydantic models models_content = '''""" Pydantic data models for request/response validation """ from pydantic import BaseModel, Field from typing import List, Optional from enum import Enum class IrisFeatures(BaseModel): """Input features for iris prediction""" sepal_length_cm: float = Field(..., gt=0, le=10) sepal_width_cm: float = Field(..., gt=0, le=10) petal_length_cm: float = Field(..., ge=0, le=10) petal_width_cm: float = Field(..., ge=0, le=10) class Config: schema_extra = { "example": { "sepal_length_cm": 5.1, "sepal_width_cm": 3.5, "petal_length_cm": 1.4, "petal_width_cm": 0.2 } } class IrisBatchFeatures(BaseModel): """Batch input features""" samples: List[IrisFeatures] class PredictionResponse(BaseModel): """Single prediction response""" input_features: dict predicted_class: str predicted_label: int confidence: float probabilities: dict inference_time_ms: float class BatchPredictionResponse(BaseModel): """Batch prediction response""" predictions: List[PredictionResponse] total_inference_time_ms: float samples_count: int class HealthResponse(BaseModel): """Health check response""" status: str version: str models_loaded: bool ``` models_path = app_dir / 'models.py' models_path.write_text(models_content) print(f"✓ Created {models_path}") ``` ## 2.4 Utilities Module ```{python} # Create utils.py for helper functions utils_content = '''""" Utility functions for model inference """ import numpy as np import onnxruntime as ort import joblib from pathlib import Path import time from typing import Dict, Tuple, List from config import MODEL_PATH, SCALER_PATH, TARGET_NAMES class ModelInference: """Singleton class for model inference""" _instance = None def __new__(cls): if cls._instance is None: cls._instance = super().__new__(cls) cls._instance._initialized = False return cls._instance def __init__(self): if self._initialized: return # Load ONNX model self.session = ort.InferenceSession(str(MODEL_PATH)) self.input_name = self.session.get_inputs()[0].name self.output_names = [o.name for o in self.session.get_outputs()] # Load scaler self.scaler = joblib.load(str(SCALER_PATH)) self._initialized = True def predict(self, features: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: """ Make predictions using ONNX model Args: features: numpy array of shape (n_samples, 4) Returns: predicted_classes, probabilities """ # Scale features features_scaled = self.scaler.transform(features) features_scaled = features_scaled.astype(np.float32) # Run inference outputs = self.session.run( self.output_names, {self.input_name: features_scaled} ) # Extract predictions and probabilities probabilities = outputs[1] # Class probabilities predictions = np.argmax(probabilities, axis=1) return predictions, probabilities def get_model() -> ModelInference: """Get model inference singleton""" return ModelInference() def format_prediction( features_dict: Dict, predicted_class: int, probabilities: np.ndarray, inference_time_ms: float ) -> Dict: """Format prediction results""" class_name = TARGET_NAMES[predicted_class] confidence = float(probabilities[0, predicted_class]) probs_dict = { TARGET_NAMES[i]: float(probabilities[0, i]) for i in range(len(TARGET_NAMES)) } return { "input_features": features_dict, "predicted_class": class_name, "predicted_label": int(predicted_class), "confidence": confidence, "probabilities": probs_dict, "inference_time_ms": inference_time_ms } ''' utils_path = app_dir / 'utils.py' utils_path.write_text(utils_content) print(f"✓ Created {utils_path}") ``` ## 2.5 Main FastAPI Application ```{python} # Create main.py main_content = '''""" FastAPI application for iris model serving """ from fastapi import FastAPI, HTTPException from fastapi.middleware.cors import CORSMiddleware from fastapi.responses import JSONResponse import time import numpy as np from typing import List from config import ( API_TITLE, API_DESCRIPTION, API_VERSION, FEATURE_NAMES, TARGET_NAMES ) from models import ( IrisFeatures, IrisBatchFeatures, PredictionResponse, BatchPredictionResponse, HealthResponse ) from utils import get_model, format_prediction # Create FastAPI app app = FastAPI( title=API_TITLE, description=API_DESCRIPTION, version=API_VERSION ) # Add CORS middleware app.add_middleware( CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) @app.on_event("startup") async def startup_event(): """Load model on startup""" try: model = get_model() print("✓ Model loaded successfully on startup") except Exception as e: print(f"✗ Error loading model: {e}") raise @app.get("/health", response_model=HealthResponse) async def health_check(): """Health check endpoint""" try: model = get_model() return HealthResponse( status="healthy", version=API_VERSION, models_loaded=True ) except Exception as e: return JSONResponse( status_code=503, content={ "status": "unhealthy", "version": API_VERSION, "models_loaded": False, "error": str(e) } ) @app.get("/info") async def get_info(): """Get API information""" return { "api_title": API_TITLE, "api_version": API_VERSION, "feature_names": FEATURE_NAMES, "target_classes": TARGET_NAMES, "description": API_DESCRIPTION } @app.post("/predict", response_model=PredictionResponse) async def predict(features: IrisFeatures): """ Make single prediction Example request: { "sepal_length_cm": 5.1, "sepal_width_cm": 3.5, "petal_length_cm": 1.4, "petal_width_cm": 0.2 } """ try: # Extract features feature_values = np.array([ features.sepal_length_cm, features.sepal_width_cm, features.petal_length_cm, features.petal_width_cm ]).reshape(1, -1) # Measure inference time start_time = time.time() # Get model and make prediction model = get_model() predictions, probabilities = model.predict(feature_values) inference_time = (time.time() - start_time) * 1000 # Format response result = format_prediction( features.dict(), predictions[0], probabilities, inference_time ) return PredictionResponse(**result) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) @app.post("/predict/batch", response_model=BatchPredictionResponse) async def predict_batch(batch_features: IrisBatchFeatures): """ Make batch predictions Example request: { "samples": [ {"sepal_length_cm": 5.1, "sepal_width_cm": 3.5, "petal_length_cm": 1.4, "petal_width_cm": 0.2}, {"sepal_length_cm": 7.0, "sepal_width_cm": 3.2, "petal_length_cm": 4.7, "petal_width_cm": 1.4} ] } """ try: if not batch_features.samples: raise HTTPException(status_code=400, detail="Empty samples list") # Extract features feature_values = np.array([ [ s.sepal_length_cm, s.sepal_width_cm, s.petal_length_cm, s.petal_width_cm ] for s in batch_features.samples ]) # Measure inference time start_time = time.time() # Get model and make predictions model = get_model() predictions, probabilities = model.predict(feature_values) total_inference_time = (time.time() - start_time) * 1000 # Format responses results = [] for i, sample in enumerate(batch_features.samples): result = format_prediction( sample.dict(), predictions[i], probabilities, total_inference_time / len(batch_features.samples) ) results.append(PredictionResponse(**result)) return BatchPredictionResponse( predictions=results, total_inference_time_ms=total_inference_time, samples_count=len(batch_features.samples) ) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) if __name__ == "__main__": import uvicorn uvicorn.run( app, host="0.0.0.0", port=8000, workers=1 ) ''' main_path = app_dir / 'main.py' main_path.write_text(main_content) print(f"✓ Created {main_path}") ``` ## 2.6 Requirements File ```{python} # Create requirements.txt for FastAPI app requirements_content = '''fastapi==0.104.1 uvicorn[standard]==0.24.0 pydantic==2.5.0 scikit-learn==1.3.2 pandas==2.1.3 numpy==1.24.3 onnx==1.15.0 onnxruntime==1.17.0 skl2onnx==1.16.0 joblib==1.3.2 httpx==0.25.1 python-multipart==0.0.6 ''' req_path = app_dir / 'requirements.txt' req_path.write_text(requirements_content) print(f"✓ Created {req_path}") ``` --- # BAGIAN 3: Docker Containerization {#sec-part3} ## 3.1 Create Dockerfile ```{python} # Create Dockerfile dockerfile_content = '''FROM python:3.11-slim # Set working directory WORKDIR /app # Install system dependencies RUN apt-get update && apt-get install -y \\ build-essential \\ && rm -rf /var/lib/apt/lists/* # Copy requirements COPY fastapi_app/requirements.txt . # Install Python dependencies RUN pip install --no-cache-dir -r requirements.txt # Copy application code COPY fastapi_app/ . COPY models/ /app/models/ # Expose port EXPOSE 8000 # Health check HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \\ CMD python -c "import requests; requests.get('http://localhost:8000/health')" || exit 1 # Run application CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] ''' dockerfile_path = Path('Dockerfile') dockerfile_path.write_text(dockerfile_content) print(f"✓ Created Dockerfile") ``` ## 3.2 Create Docker Compose (Optional) ```{python} # Create docker-compose.yml for easy local testing docker_compose_content = '''version: '3.8' services: iris-api: build: context: . dockerfile: Dockerfile ports: - "8000:8000" volumes: - ./models:/app/models:ro environment: - PYTHONUNBUFFERED=1 restart: unless-stopped healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s ''' compose_path = Path('docker-compose.yml') compose_path.write_text(docker_compose_content) print(f"✓ Created docker-compose.yml") ``` ## 3.3 .dockerignore File ```{python} # Create .dockerignore dockerignore_content = '''__pycache__ *.pyc *.pyo *.egg-info .env .venv .git .gitignore .docker .dockerignore node_modules *.md .pytest_cache .coverage htmlcov dist build *.log tests/ .vscode .idea ''' dockerignore_path = Path('.dockerignore') dockerignore_path.write_text(dockerignore_content) print(f"✓ Created .dockerignore") ``` --- # BAGIAN 4: Testing & Quality Assurance {#sec-part4} ## 4.1 Unit Tests ```{python} # Create tests directory and test_api.py tests_dir = Path('tests') tests_dir.mkdir(exist_ok=True) test_api_content = '''""" Unit tests for FastAPI application """ import pytest import sys from pathlib import Path import numpy as np # Add parent directory to path sys.path.insert(0, str(Path(__file__).parent.parent / "fastapi_app")) from fastapi.testclient import TestClient from main import app from config import TARGET_NAMES client = TestClient(app) class TestHealthEndpoint: """Test health check endpoint""" def test_health_check_returns_200(self): response = client.get("/health") assert response.status_code == 200 def test_health_response_structure(self): response = client.get("/health") data = response.json() assert "status" in data assert "version" in data assert "models_loaded" in data assert data["status"] == "healthy" class TestInfoEndpoint: """Test info endpoint""" def test_get_info(self): response = client.get("/info") assert response.status_code == 200 data = response.json() assert "api_title" in data assert "feature_names" in data assert "target_classes" in data assert len(data["feature_names"]) == 4 assert len(data["target_classes"]) == 3 class TestPredictEndpoint: """Test single prediction endpoint""" def test_valid_prediction(self): """Test valid prediction request""" payload = { "sepal_length_cm": 5.1, "sepal_width_cm": 3.5, "petal_length_cm": 1.4, "petal_width_cm": 0.2 } response = client.post("/predict", json=payload) assert response.status_code == 200 data = response.json() assert "predicted_class" in data assert "predicted_label" in data assert "confidence" in data assert "probabilities" in data assert "inference_time_ms" in data # Verify prediction is valid assert data["predicted_label"] in [0, 1, 2] assert data["predicted_class"] in TARGET_NAMES assert 0 <= data["confidence"] <= 1 def test_invalid_sepal_length(self): """Test invalid sepal length (negative)""" payload = { "sepal_length_cm": -5.1, "sepal_width_cm": 3.5, "petal_length_cm": 1.4, "petal_width_cm": 0.2 } response = client.post("/predict", json=payload) assert response.status_code == 422 # Validation error def test_setosa_prediction(self): """Test setosa iris prediction""" payload = { "sepal_length_cm": 5.1, "sepal_width_cm": 3.5, "petal_length_cm": 1.4, "petal_width_cm": 0.2 } response = client.post("/predict", json=payload) assert response.status_code == 200 data = response.json() # Setosa should have distinct features assert data["predicted_class"] == "setosa" def test_virginica_prediction(self): """Test virginica iris prediction""" payload = { "sepal_length_cm": 7.9, "sepal_width_cm": 3.8, "petal_length_cm": 6.4, "petal_width_cm": 2.0 } response = client.post("/predict", json=payload) assert response.status_code == 200 data = response.json() assert data["predicted_class"] == "virginica" class TestBatchPredictEndpoint: """Test batch prediction endpoint""" def test_valid_batch_prediction(self): """Test valid batch prediction""" payload = { "samples": [ { "sepal_length_cm": 5.1, "sepal_width_cm": 3.5, "petal_length_cm": 1.4, "petal_width_cm": 0.2 }, { "sepal_length_cm": 7.0, "sepal_width_cm": 3.2, "petal_length_cm": 4.7, "petal_width_cm": 1.4 } ] } response = client.post("/predict/batch", json=payload) assert response.status_code == 200 data = response.json() assert "predictions" in data assert "total_inference_time_ms" in data assert "samples_count" in data assert len(data["predictions"]) == 2 assert data["samples_count"] == 2 def test_empty_batch_prediction(self): """Test empty batch prediction""" payload = {"samples": []} response = client.post("/predict/batch", json=payload) assert response.status_code == 400 class TestInferenceTime: """Test inference time measurements""" def test_single_prediction_time(self): """Test that inference time is reasonable""" payload = { "sepal_length_cm": 5.1, "sepal_width_cm": 3.5, "petal_length_cm": 1.4, "petal_width_cm": 0.2 } response = client.post("/predict", json=payload) data = response.json() # Should complete in less than 100ms assert data["inference_time_ms"] < 100 def test_batch_prediction_time(self): """Test batch prediction inference time""" payload = { "samples": [ { "sepal_length_cm": 5.1, "sepal_width_cm": 3.5, "petal_length_cm": 1.4, "petal_width_cm": 0.2 } for _ in range(10) ] } response = client.post("/predict/batch", json=payload) data = response.json() # Should complete in less than 200ms assert data["total_inference_time_ms"] < 200 if __name__ == "__main__": pytest.main([__file__, "-v"]) ''' test_path = tests_dir / 'test_api.py' test_path.write_text(test_api_content) print(f"✓ Created {test_path}") # Create __init__.py init_path = tests_dir / '__init__.py' init_path.write_text('') print(f"✓ Created {init_path}") ``` ## 4.2 Load Testing Script ```{python} # Create load testing script load_test_content = '''""" Load testing script for API performance evaluation """ import time import requests import json from concurrent.futures import ThreadPoolExecutor, as_completed import statistics class APILoadTester: """Load testing for FastAPI application""" def __init__(self, base_url="http://localhost:8000"): self.base_url = base_url self.results = [] def single_request(self): """Make single prediction request""" payload = { "sepal_length_cm": 5.1, "sepal_width_cm": 3.5, "petal_length_cm": 1.4, "petal_width_cm": 0.2 } start = time.time() try: response = requests.post( f"{self.base_url}/predict", json=payload, timeout=5 ) elapsed = (time.time() - start) * 1000 return { "status": response.status_code, "time_ms": elapsed, "success": response.status_code == 200 } except Exception as e: elapsed = (time.time() - start) * 1000 return { "status": 0, "time_ms": elapsed, "success": False, "error": str(e) } def run_load_test(self, num_requests=100, num_workers=10): """Run load test with specified parameters""" print(f"Starting load test...") print(f" Total requests: {num_requests}") print(f" Concurrent workers: {num_workers}") start_time = time.time() with ThreadPoolExecutor(max_workers=num_workers) as executor: futures = [ executor.submit(self.single_request) for _ in range(num_requests) ] for i, future in enumerate(as_completed(futures)): result = future.result() self.results.append(result) if (i + 1) % 10 == 0: print(f" Completed: {i + 1}/{num_requests}") total_time = time.time() - start_time self.print_results(total_time) def print_results(self, total_time): """Print load test results""" successful = [r for r in self.results if r["success"]] failed = [r for r in self.results if not r["success"]] times = [r["time_ms"] for r in successful] print(f"\\n{'='*50}") print(f"LOAD TEST RESULTS") print(f"{'='*50}") print(f"Total Requests: {len(self.results)}") print(f"Successful: {len(successful)} ({len(successful)/len(self.results)*100:.1f}%)") print(f"Failed: {len(failed)} ({len(failed)/len(self.results)*100:.1f}%)") print(f"Total Time: {total_time:.2f}s") print(f"Throughput: {len(self.results)/total_time:.2f} requests/sec") if times: print(f"\\nResponse Times (ms):") print(f" Min: {min(times):.2f}") print(f" Max: {max(times):.2f}") print(f" Mean: {statistics.mean(times):.2f}") print(f" Median: {statistics.median(times):.2f}") print(f" P95: {sorted(times)[int(len(times)*0.95)]:.2f}") print(f" P99: {sorted(times)[int(len(times)*0.99)]:.2f}") if failed: print(f"\\nErrors:") for error in set(f["error"] for f in failed if "error" in f): print(f" - {error}") if __name__ == "__main__": tester = APILoadTester() print("Make sure FastAPI server is running!") print("Run: uvicorn fastapi_app.main:app --reload") print() # Run load test tester.run_load_test(num_requests=100, num_workers=10) ''' load_test_path = Path('load_test.py') load_test_path.write_text(load_test_content) print(f"✓ Created {load_test_path}") ``` --- # BAGIAN 5: Deployment & Monitoring {#sec-part5} ## 5.1 Build Summary ```{python} # Create summary of all created files print("\n" + "="*60) print("DEPLOYMENT PACKAGE CREATED SUCCESSFULLY") print("="*60) print("\n📁 Directory Structure:") print(""" . ├── models/ │ ├── iris_model.pkl │ ├── iris_scaler.pkl │ └── iris_model.onnx ├── fastapi_app/ │ ├── config.py │ ├── models.py │ ├── utils.py │ ├── main.py │ └── requirements.txt ├── tests/ │ ├── __init__.py │ └── test_api.py ├── Dockerfile ├── docker-compose.yml ├── .dockerignore └── load_test.py """) print("\n📊 Model Files:") print(f" ✓ iris_model.pkl (Python pickle)") print(f" ✓ iris_scaler.pkl (StandardScaler)") print(f" ✓ iris_model.onnx (ONNX format)") print("\n🚀 API Endpoints:") print(" GET /health - Health check") print(" GET /info - API information") print(" POST /predict - Single prediction") print(" POST /predict/batch - Batch predictions") print("\n🔧 Available Commands:") print(" # Run FastAPI server") print(" uvicorn fastapi_app.main:app --reload --host 0.0.0.0 --port 8000") print() print(" # Run tests") print(" pytest tests/test_api.py -v") print() print(" # Build Docker image") print(" docker build -t iris-api:latest .") print() print(" # Run with Docker") print(" docker-compose up") print() print(" # Run load test") print(" python load_test.py") ``` ## 5.2 Performance Benchmarking ```{python} # Performance comparison: pickle vs ONNX import time import joblib print("\n" + "="*60) print("PERFORMANCE BENCHMARKING") print("="*60) # Load both models sklearn_model = joblib.load(model_dir / 'iris_model.pkl') scaler = joblib.load(model_dir / 'iris_scaler.pkl') session = ort.InferenceSession(str(model_dir / 'iris_model.onnx')) input_name = session.get_inputs()[0].name output_names = [o.name for o in session.get_outputs()] # Prepare test data test_data = X_test_scaled.astype(np.float32) n_iterations = 1000 # Benchmark sklearn print("\n⏱️ Benchmarking sklearn RandomForest...") start = time.time() for _ in range(n_iterations): sklearn_model.predict(test_data) sklearn_time = time.time() - start print(f" Total time: {sklearn_time:.4f}s") print(f" Time per sample: {(sklearn_time/n_iterations/len(test_data))*1000:.4f}ms") # Benchmark ONNX print("\n⏱️ Benchmarking ONNX Runtime...") start = time.time() for _ in range(n_iterations): session.run(output_names, {input_name: test_data}) onnx_time = time.time() - start print(f" Total time: {onnx_time:.4f}s") print(f" Time per sample: {(onnx_time/n_iterations/len(test_data))*1000:.4f}ms") # Calculate speedup speedup = sklearn_time / onnx_time print(f"\n✨ ONNX Speedup: {speedup:.2f}x faster than sklearn!") ``` --- # Summary {#sec-summary} Dalam lab ini, Anda telah mempelajari: ## Model Deployment Pipeline 1. **Training**: Melatih dan mengevaluasi model 2. **Optimization**: Mengkonversi ke ONNX format 3. **API Development**: Membangun FastAPI application 4. **Containerization**: Docker deployment 5. **Testing**: Unit tests dan load testing 6. **Monitoring**: Performance benchmarking ## Key Technologies - **FastAPI**: Modern, fast web framework untuk API - **ONNX**: Standard format untuk model optimization - **ONNX Runtime**: Efficient inference engine - **Docker**: Container technology untuk deployment - **Pytest**: Unit testing framework ## Best Practices - Input validation dengan Pydantic - Error handling yang robust - Performance monitoring dan benchmarking - Containerization untuk reproducibility - Comprehensive testing sebelum deployment ## Next Steps 1. Deploy API ke production environment 2. Implementasi monitoring dan logging 3. Setup CI/CD pipeline 4. Optimize untuk high-traffic scenarios 5. Implementasi authentication dan rate limiting --- # References {#sec-references} - [FastAPI Documentation](https://fastapi.tiangolo.com/) - [ONNX Documentation](https://onnx.ai/) - [Docker Documentation](https://docs.docker.com/) - [scikit-learn Documentation](https://scikit-learn.org/) - [ONNX Runtime Documentation](https://onnxruntime.ai/)