Lab 11: Model Deployment dengan FastAPI, ONNX, dan Docker

Production-Ready Machine Learning Model Serving

Author

Pembelajaran Mesin - Data Science for Cybersecurity

Published

December 15, 2025

25 Pendahuluan

25.1 Tujuan Pembelajaran

Setelah menyelesaikan lab ini, Anda diharapkan dapat:

  1. Memahami pipeline deployment model machine learning dari development ke production
  2. Mengkonversi trained models ke format ONNX untuk optimasi
  3. Membangun REST API menggunakan FastAPI untuk model serving
  4. Mengimplementasikan containerization dengan Docker
  5. Mengevaluasi performance model di production environment
  6. Mengoptimalkan latency dan throughput model inference
  7. Membuat unit tests dan integration tests untuk API
  8. Menerapkan best practices dalam production deployment

25.2 Gambaran Umum Lab

Lab ini fokus pada deployment production-ready model machine learning dengan:

  • Model Training: Melatih model klasifikasi sederhana
  • Model Optimization: Konversi ke ONNX untuk performa lebih baik
  • API Development: FastAPI untuk model serving
  • Containerization: Docker untuk reproducibility
  • Testing & Monitoring: Quality assurance di production

25.2.1 Durasi Lab

2-3 jam dengan breakdown:

  • Setup & Installation: 20 menit
  • Model Training & Conversion: 30 menit
  • FastAPI Development: 45 menit
  • Docker Configuration: 30 menit
  • Testing & Optimization: 20 menit

25.2.2 Lab Structure

graph TD
    A[Setup & Installation] --> B[Train Model]
    B --> C[Convert to ONNX]
    C --> D[Build FastAPI App]
    D --> E[Create Dockerfile]
    E --> F[Local Testing]
    F --> G[Performance Optimization]
    G --> H[Docker Deployment]

    style A fill:#e6f3ff
    style B fill:#ffe6e6
    style C fill:#ffffcc
    style D fill:#ccffcc
    style E fill:#e6ccff
    style F fill:#ffcccc
    style G fill:#ccffe6
    style H fill:#ffccff

graph TD
    A[Setup & Installation] --> B[Train Model]
    B --> C[Convert to ONNX]
    C --> D[Build FastAPI App]
    D --> E[Create Dockerfile]
    E --> F[Local Testing]
    F --> G[Performance Optimization]
    G --> H[Docker Deployment]

    style A fill:#e6f3ff
    style B fill:#ffe6e6
    style C fill:#ffffcc
    style D fill:#ccffcc
    style E fill:#e6ccff
    style F fill:#ffcccc
    style G fill:#ccffe6
    style H fill:#ffccff

25.3 Persiapan Environment

25.3.1 Install Dependencies

import subprocess
import sys
import os

packages = [
    'fastapi',
    'uvicorn',
    'pydantic',
    'scikit-learn',
    'pandas',
    'numpy',
    'onnx',
    'onnxruntime',
    'skl2onnx',
    'httpx',  # untuk testing
    'python-multipart'
]

for package in packages:
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', package])

print("✓ All packages installed successfully!")

25.3.2 Import Libraries

# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import json
import warnings
warnings.filterwarnings('ignore')

# Model training
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score,
    f1_score, confusion_matrix, classification_report
)

# ONNX
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

# Utilities
from datetime import datetime
import joblib

print("✓ All imports successful!")

26 BAGIAN 1: Model Training & Optimization

26.1 1.1 Dataset Preparation

26.1.1 Load Iris Dataset

# Load dataset
X, y = load_iris(return_X_y=True)
feature_names = load_iris().feature_names
target_names = load_iris().target_names

print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(target_names)}")
print(f"Feature names: {feature_names}")
print(f"Target names: {target_names}")

# Create DataFrame
df = pd.DataFrame(X, columns=feature_names)
df['target'] = y

print(f"\nFirst 5 rows:\n{df.head()}")

26.1.2 Train-Test Split

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"✓ Data scaled successfully")

26.2 1.2 Model Training

# Train model
model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    random_state=42,
    n_jobs=-1
)

model.fit(X_train_scaled, y_train)

# Evaluate
y_pred_train = model.predict(X_train_scaled)
y_pred_test = model.predict(X_test_scaled)

print("Training Metrics:")
print(f"  Accuracy: {accuracy_score(y_train, y_pred_train):.4f}")

print("\nTest Metrics:")
print(f"  Accuracy: {accuracy_score(y_test, y_pred_test):.4f}")
print(f"  Precision: {precision_score(y_test, y_pred_test, average='weighted'):.4f}")
print(f"  Recall: {recall_score(y_test, y_pred_test, average='weighted'):.4f}")
print(f"  F1-Score: {f1_score(y_test, y_pred_test, average='weighted'):.4f}")

26.2.1 Save Trained Model

# Create model directory
model_dir = Path('models')
model_dir.mkdir(exist_ok=True)

# Save model and scaler
model_path = model_dir / 'iris_model.pkl'
scaler_path = model_dir / 'iris_scaler.pkl'

joblib.dump(model, model_path)
joblib.dump(scaler, scaler_path)

print(f"✓ Model saved to {model_path}")
print(f"✓ Scaler saved to {scaler_path}")

26.3 1.3 Model Conversion to ONNX

ONNX (Open Neural Network Exchange) adalah format standar untuk model machine learning yang memungkinkan:

  • Optimasi inference
  • Portabilitas antar platform
  • Hardware acceleration
  • Lebih cepat dari pickle format
# Define initial types
initial_type = [('float_input', FloatTensorType([None, 4]))]

# Convert model to ONNX
onnx_model = convert_sklearn(model, initial_types=initial_type)

# Save ONNX model
onnx_path = model_dir / 'iris_model.onnx'
onnx.save_model(onnx_model, str(onnx_path))

print(f"✓ Model converted to ONNX and saved to {onnx_path}")
print(f"✓ ONNX file size: {onnx_path.stat().st_size / 1024:.2f} KB")

# Verify ONNX model
onnx_model_loaded = onnx.load(str(onnx_path))
onnx.checker.check_model(onnx_model_loaded)
print(f"✓ ONNX model verified successfully")

26.4 1.4 ONNX Model Inference Testing

# Create ONNX Runtime session
sess = ort.InferenceSession(str(onnx_path))

# Get input/output names
input_name = sess.get_inputs()[0].name
output_names = [output.name for output in sess.get_outputs()]

print(f"Input name: {input_name}")
print(f"Output names: {output_names}")

# Prepare test data
X_test_float = X_test_scaled.astype(np.float32)

# Run inference
onnx_predictions = sess.run(
    output_names,
    {input_name: X_test_float}
)

print(f"\nONNX Predictions shape: {onnx_predictions[0].shape}")
print(f"First 5 predictions: {onnx_predictions[0][:5]}")

# Verify accuracy
onnx_pred_classes = np.argmax(onnx_predictions[1], axis=1)
onnx_accuracy = accuracy_score(y_test, onnx_pred_classes)

print(f"\n✓ ONNX Model Accuracy: {onnx_accuracy:.4f}")
print(f"✓ Matches sklearn model: {np.allclose(y_pred_test, onnx_pred_classes)}")

27 BAGIAN 2: FastAPI Model Serving

27.1 2.1 Create FastAPI Application Structure

# Create application directory structure
app_dir = Path('fastapi_app')
app_dir.mkdir(exist_ok=True)

print(f"Created application directory: {app_dir}")
print(f"Directory structure:")
print(f"  fastapi_app/")
print(f"    ├── main.py")
print(f"    ├── models.py")
print(f"    ├── utils.py")
print(f"    ├── config.py")
print(f"    └── requirements.txt")

27.2 2.2 Configuration Module

# Create config.py
config_content = '''"""
Configuration module for FastAPI application
"""
from pathlib import Path
from typing import List

# Paths
BASE_DIR = Path(__file__).parent.parent
MODELS_DIR = BASE_DIR / "models"

# Model configuration
MODEL_PATH = MODELS_DIR / "iris_model.onnx"
SCALER_PATH = MODELS_DIR / "iris_scaler.pkl"

# Feature configuration
FEATURE_NAMES = [
    "sepal_length_cm",
    "sepal_width_cm",
    "petal_length_cm",
    "petal_width_cm"
]

TARGET_NAMES = ["setosa", "versicolor", "virginica"]

# API configuration
API_TITLE = "Iris Model API"
API_DESCRIPTION = "Production-ready ML model serving API"
API_VERSION = "1.0.0"

# Server configuration
HOST = "0.0.0.0"
PORT = 8000
DEBUG = False

config_path = app_dir / ‘config.py’ config_path.write_text(config_content) print(f”✓ Created {config_path}“)


## 2.3 Data Models

::: {#e6eb5f13 .cell}
``` {.python .cell-code}
# Create models.py for Pydantic models
models_content = '''"""
Pydantic data models for request/response validation
"""
from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum

class IrisFeatures(BaseModel):
    """Input features for iris prediction"""
    sepal_length_cm: float = Field(..., gt=0, le=10)
    sepal_width_cm: float = Field(..., gt=0, le=10)
    petal_length_cm: float = Field(..., ge=0, le=10)
    petal_width_cm: float = Field(..., ge=0, le=10)

    class Config:
        schema_extra = {
            "example": {
                "sepal_length_cm": 5.1,
                "sepal_width_cm": 3.5,
                "petal_length_cm": 1.4,
                "petal_width_cm": 0.2
            }
        }

class IrisBatchFeatures(BaseModel):
    """Batch input features"""
    samples: List[IrisFeatures]

class PredictionResponse(BaseModel):
    """Single prediction response"""
    input_features: dict
    predicted_class: str
    predicted_label: int
    confidence: float
    probabilities: dict
    inference_time_ms: float

class BatchPredictionResponse(BaseModel):
    """Batch prediction response"""
    predictions: List[PredictionResponse]
    total_inference_time_ms: float
    samples_count: int

class HealthResponse(BaseModel):
    """Health check response"""
    status: str
    version: str
    models_loaded: bool

:::

models_path = app_dir / ‘models.py’ models_path.write_text(models_content) print(f”✓ Created {models_path}“)


## 2.4 Utilities Module

::: {#e62190fb .cell}
``` {.python .cell-code}
# Create utils.py for helper functions
utils_content = '''"""
Utility functions for model inference
"""
import numpy as np
import onnxruntime as ort
import joblib
from pathlib import Path
import time
from typing import Dict, Tuple, List
from config import MODEL_PATH, SCALER_PATH, TARGET_NAMES

class ModelInference:
    """Singleton class for model inference"""
    _instance = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
            cls._instance._initialized = False
        return cls._instance

    def __init__(self):
        if self._initialized:
            return

        # Load ONNX model
        self.session = ort.InferenceSession(str(MODEL_PATH))
        self.input_name = self.session.get_inputs()[0].name
        self.output_names = [o.name for o in self.session.get_outputs()]

        # Load scaler
        self.scaler = joblib.load(str(SCALER_PATH))

        self._initialized = True

    def predict(self, features: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
        """
        Make predictions using ONNX model

        Args:
            features: numpy array of shape (n_samples, 4)

        Returns:
            predicted_classes, probabilities
        """
        # Scale features
        features_scaled = self.scaler.transform(features)
        features_scaled = features_scaled.astype(np.float32)

        # Run inference
        outputs = self.session.run(
            self.output_names,
            {self.input_name: features_scaled}
        )

        # Extract predictions and probabilities
        probabilities = outputs[1]  # Class probabilities
        predictions = np.argmax(probabilities, axis=1)

        return predictions, probabilities

def get_model() -> ModelInference:
    """Get model inference singleton"""
    return ModelInference()

def format_prediction(
    features_dict: Dict,
    predicted_class: int,
    probabilities: np.ndarray,
    inference_time_ms: float
) -> Dict:
    """Format prediction results"""
    class_name = TARGET_NAMES[predicted_class]
    confidence = float(probabilities[0, predicted_class])

    probs_dict = {
        TARGET_NAMES[i]: float(probabilities[0, i])
        for i in range(len(TARGET_NAMES))
    }

    return {
        "input_features": features_dict,
        "predicted_class": class_name,
        "predicted_label": int(predicted_class),
        "confidence": confidence,
        "probabilities": probs_dict,
        "inference_time_ms": inference_time_ms
    }
'''

utils_path = app_dir / 'utils.py'
utils_path.write_text(utils_content)
print(f"✓ Created {utils_path}")

:::

27.3 2.5 Main FastAPI Application

# Create main.py
main_content = '''"""
FastAPI application for iris model serving
"""
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
import time
import numpy as np
from typing import List

from config import (
    API_TITLE, API_DESCRIPTION, API_VERSION,
    FEATURE_NAMES, TARGET_NAMES
)
from models import (
    IrisFeatures, IrisBatchFeatures,
    PredictionResponse, BatchPredictionResponse,
    HealthResponse
)
from utils import get_model, format_prediction

# Create FastAPI app
app = FastAPI(
    title=API_TITLE,
    description=API_DESCRIPTION,
    version=API_VERSION
)

# Add CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.on_event("startup")
async def startup_event():
    """Load model on startup"""
    try:
        model = get_model()
        print("✓ Model loaded successfully on startup")
    except Exception as e:
        print(f"✗ Error loading model: {e}")
        raise

@app.get("/health", response_model=HealthResponse)
async def health_check():
    """Health check endpoint"""
    try:
        model = get_model()
        return HealthResponse(
            status="healthy",
            version=API_VERSION,
            models_loaded=True
        )
    except Exception as e:
        return JSONResponse(
            status_code=503,
            content={
                "status": "unhealthy",
                "version": API_VERSION,
                "models_loaded": False,
                "error": str(e)
            }
        )

@app.get("/info")
async def get_info():
    """Get API information"""
    return {
        "api_title": API_TITLE,
        "api_version": API_VERSION,
        "feature_names": FEATURE_NAMES,
        "target_classes": TARGET_NAMES,
        "description": API_DESCRIPTION
    }

@app.post("/predict", response_model=PredictionResponse)
async def predict(features: IrisFeatures):
    """
    Make single prediction

    Example request:
    {
        "sepal_length_cm": 5.1,
        "sepal_width_cm": 3.5,
        "petal_length_cm": 1.4,
        "petal_width_cm": 0.2
    }
    """
    try:
        # Extract features
        feature_values = np.array([
            features.sepal_length_cm,
            features.sepal_width_cm,
            features.petal_length_cm,
            features.petal_width_cm
        ]).reshape(1, -1)

        # Measure inference time
        start_time = time.time()

        # Get model and make prediction
        model = get_model()
        predictions, probabilities = model.predict(feature_values)

        inference_time = (time.time() - start_time) * 1000

        # Format response
        result = format_prediction(
            features.dict(),
            predictions[0],
            probabilities,
            inference_time
        )

        return PredictionResponse(**result)

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/predict/batch", response_model=BatchPredictionResponse)
async def predict_batch(batch_features: IrisBatchFeatures):
    """
    Make batch predictions

    Example request:
    {
        "samples": [
            {"sepal_length_cm": 5.1, "sepal_width_cm": 3.5,
             "petal_length_cm": 1.4, "petal_width_cm": 0.2},
            {"sepal_length_cm": 7.0, "sepal_width_cm": 3.2,
             "petal_length_cm": 4.7, "petal_width_cm": 1.4}
        ]
    }
    """
    try:
        if not batch_features.samples:
            raise HTTPException(status_code=400, detail="Empty samples list")

        # Extract features
        feature_values = np.array([
            [
                s.sepal_length_cm,
                s.sepal_width_cm,
                s.petal_length_cm,
                s.petal_width_cm
            ]
            for s in batch_features.samples
        ])

        # Measure inference time
        start_time = time.time()

        # Get model and make predictions
        model = get_model()
        predictions, probabilities = model.predict(feature_values)

        total_inference_time = (time.time() - start_time) * 1000

        # Format responses
        results = []
        for i, sample in enumerate(batch_features.samples):
            result = format_prediction(
                sample.dict(),
                predictions[i],
                probabilities,
                total_inference_time / len(batch_features.samples)
            )
            results.append(PredictionResponse(**result))

        return BatchPredictionResponse(
            predictions=results,
            total_inference_time_ms=total_inference_time,
            samples_count=len(batch_features.samples)
        )

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(
        app,
        host="0.0.0.0",
        port=8000,
        workers=1
    )
'''

main_path = app_dir / 'main.py'
main_path.write_text(main_content)
print(f"✓ Created {main_path}")

27.4 2.6 Requirements File

# Create requirements.txt for FastAPI app
requirements_content = '''fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
scikit-learn==1.3.2
pandas==2.1.3
numpy==1.24.3
onnx==1.15.0
onnxruntime==1.17.0
skl2onnx==1.16.0
joblib==1.3.2
httpx==0.25.1
python-multipart==0.0.6
'''

req_path = app_dir / 'requirements.txt'
req_path.write_text(requirements_content)
print(f"✓ Created {req_path}")

28 BAGIAN 3: Docker Containerization

28.1 3.1 Create Dockerfile

# Create Dockerfile
dockerfile_content = '''FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \\
    build-essential \\
    && rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY fastapi_app/requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY fastapi_app/ .
COPY models/ /app/models/

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \\
    CMD python -c "import requests; requests.get('http://localhost:8000/health')" || exit 1

# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
'''

dockerfile_path = Path('Dockerfile')
dockerfile_path.write_text(dockerfile_content)
print(f"✓ Created Dockerfile")

28.2 3.2 Create Docker Compose (Optional)

# Create docker-compose.yml for easy local testing
docker_compose_content = '''version: '3.8'

services:
  iris-api:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    volumes:
      - ./models:/app/models:ro
    environment:
      - PYTHONUNBUFFERED=1
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
'''

compose_path = Path('docker-compose.yml')
compose_path.write_text(docker_compose_content)
print(f"✓ Created docker-compose.yml")

28.3 3.3 .dockerignore File

# Create .dockerignore
dockerignore_content = '''__pycache__
*.pyc
*.pyo
*.egg-info
.env
.venv
.git
.gitignore
.docker
.dockerignore
node_modules
*.md
.pytest_cache
.coverage
htmlcov
dist
build
*.log
tests/
.vscode
.idea
'''

dockerignore_path = Path('.dockerignore')
dockerignore_path.write_text(dockerignore_content)
print(f"✓ Created .dockerignore")

29 BAGIAN 4: Testing & Quality Assurance

29.1 4.1 Unit Tests

# Create tests directory and test_api.py
tests_dir = Path('tests')
tests_dir.mkdir(exist_ok=True)

test_api_content = '''"""
Unit tests for FastAPI application
"""
import pytest
import sys
from pathlib import Path
import numpy as np

# Add parent directory to path
sys.path.insert(0, str(Path(__file__).parent.parent / "fastapi_app"))

from fastapi.testclient import TestClient
from main import app
from config import TARGET_NAMES

client = TestClient(app)

class TestHealthEndpoint:
    """Test health check endpoint"""

    def test_health_check_returns_200(self):
        response = client.get("/health")
        assert response.status_code == 200

    def test_health_response_structure(self):
        response = client.get("/health")
        data = response.json()
        assert "status" in data
        assert "version" in data
        assert "models_loaded" in data
        assert data["status"] == "healthy"

class TestInfoEndpoint:
    """Test info endpoint"""

    def test_get_info(self):
        response = client.get("/info")
        assert response.status_code == 200

        data = response.json()
        assert "api_title" in data
        assert "feature_names" in data
        assert "target_classes" in data
        assert len(data["feature_names"]) == 4
        assert len(data["target_classes"]) == 3

class TestPredictEndpoint:
    """Test single prediction endpoint"""

    def test_valid_prediction(self):
        """Test valid prediction request"""
        payload = {
            "sepal_length_cm": 5.1,
            "sepal_width_cm": 3.5,
            "petal_length_cm": 1.4,
            "petal_width_cm": 0.2
        }

        response = client.post("/predict", json=payload)
        assert response.status_code == 200

        data = response.json()
        assert "predicted_class" in data
        assert "predicted_label" in data
        assert "confidence" in data
        assert "probabilities" in data
        assert "inference_time_ms" in data

        # Verify prediction is valid
        assert data["predicted_label"] in [0, 1, 2]
        assert data["predicted_class"] in TARGET_NAMES
        assert 0 <= data["confidence"] <= 1

    def test_invalid_sepal_length(self):
        """Test invalid sepal length (negative)"""
        payload = {
            "sepal_length_cm": -5.1,
            "sepal_width_cm": 3.5,
            "petal_length_cm": 1.4,
            "petal_width_cm": 0.2
        }

        response = client.post("/predict", json=payload)
        assert response.status_code == 422  # Validation error

    def test_setosa_prediction(self):
        """Test setosa iris prediction"""
        payload = {
            "sepal_length_cm": 5.1,
            "sepal_width_cm": 3.5,
            "petal_length_cm": 1.4,
            "petal_width_cm": 0.2
        }

        response = client.post("/predict", json=payload)
        assert response.status_code == 200

        data = response.json()
        # Setosa should have distinct features
        assert data["predicted_class"] == "setosa"

    def test_virginica_prediction(self):
        """Test virginica iris prediction"""
        payload = {
            "sepal_length_cm": 7.9,
            "sepal_width_cm": 3.8,
            "petal_length_cm": 6.4,
            "petal_width_cm": 2.0
        }

        response = client.post("/predict", json=payload)
        assert response.status_code == 200

        data = response.json()
        assert data["predicted_class"] == "virginica"

class TestBatchPredictEndpoint:
    """Test batch prediction endpoint"""

    def test_valid_batch_prediction(self):
        """Test valid batch prediction"""
        payload = {
            "samples": [
                {
                    "sepal_length_cm": 5.1,
                    "sepal_width_cm": 3.5,
                    "petal_length_cm": 1.4,
                    "petal_width_cm": 0.2
                },
                {
                    "sepal_length_cm": 7.0,
                    "sepal_width_cm": 3.2,
                    "petal_length_cm": 4.7,
                    "petal_width_cm": 1.4
                }
            ]
        }

        response = client.post("/predict/batch", json=payload)
        assert response.status_code == 200

        data = response.json()
        assert "predictions" in data
        assert "total_inference_time_ms" in data
        assert "samples_count" in data
        assert len(data["predictions"]) == 2
        assert data["samples_count"] == 2

    def test_empty_batch_prediction(self):
        """Test empty batch prediction"""
        payload = {"samples": []}

        response = client.post("/predict/batch", json=payload)
        assert response.status_code == 400

class TestInferenceTime:
    """Test inference time measurements"""

    def test_single_prediction_time(self):
        """Test that inference time is reasonable"""
        payload = {
            "sepal_length_cm": 5.1,
            "sepal_width_cm": 3.5,
            "petal_length_cm": 1.4,
            "petal_width_cm": 0.2
        }

        response = client.post("/predict", json=payload)
        data = response.json()

        # Should complete in less than 100ms
        assert data["inference_time_ms"] < 100

    def test_batch_prediction_time(self):
        """Test batch prediction inference time"""
        payload = {
            "samples": [
                {
                    "sepal_length_cm": 5.1,
                    "sepal_width_cm": 3.5,
                    "petal_length_cm": 1.4,
                    "petal_width_cm": 0.2
                }
                for _ in range(10)
            ]
        }

        response = client.post("/predict/batch", json=payload)
        data = response.json()

        # Should complete in less than 200ms
        assert data["total_inference_time_ms"] < 200

if __name__ == "__main__":
    pytest.main([__file__, "-v"])
'''

test_path = tests_dir / 'test_api.py'
test_path.write_text(test_api_content)
print(f"✓ Created {test_path}")

# Create __init__.py
init_path = tests_dir / '__init__.py'
init_path.write_text('')
print(f"✓ Created {init_path}")

29.2 4.2 Load Testing Script

# Create load testing script
load_test_content = '''"""
Load testing script for API performance evaluation
"""
import time
import requests
import json
from concurrent.futures import ThreadPoolExecutor, as_completed
import statistics

class APILoadTester:
    """Load testing for FastAPI application"""

    def __init__(self, base_url="http://localhost:8000"):
        self.base_url = base_url
        self.results = []

    def single_request(self):
        """Make single prediction request"""
        payload = {
            "sepal_length_cm": 5.1,
            "sepal_width_cm": 3.5,
            "petal_length_cm": 1.4,
            "petal_width_cm": 0.2
        }

        start = time.time()
        try:
            response = requests.post(
                f"{self.base_url}/predict",
                json=payload,
                timeout=5
            )
            elapsed = (time.time() - start) * 1000

            return {
                "status": response.status_code,
                "time_ms": elapsed,
                "success": response.status_code == 200
            }
        except Exception as e:
            elapsed = (time.time() - start) * 1000
            return {
                "status": 0,
                "time_ms": elapsed,
                "success": False,
                "error": str(e)
            }

    def run_load_test(self, num_requests=100, num_workers=10):
        """Run load test with specified parameters"""
        print(f"Starting load test...")
        print(f"  Total requests: {num_requests}")
        print(f"  Concurrent workers: {num_workers}")

        start_time = time.time()

        with ThreadPoolExecutor(max_workers=num_workers) as executor:
            futures = [
                executor.submit(self.single_request)
                for _ in range(num_requests)
            ]

            for i, future in enumerate(as_completed(futures)):
                result = future.result()
                self.results.append(result)

                if (i + 1) % 10 == 0:
                    print(f"  Completed: {i + 1}/{num_requests}")

        total_time = time.time() - start_time
        self.print_results(total_time)

    def print_results(self, total_time):
        """Print load test results"""
        successful = [r for r in self.results if r["success"]]
        failed = [r for r in self.results if not r["success"]]

        times = [r["time_ms"] for r in successful]

        print(f"\\n{'='*50}")
        print(f"LOAD TEST RESULTS")
        print(f"{'='*50}")
        print(f"Total Requests: {len(self.results)}")
        print(f"Successful: {len(successful)} ({len(successful)/len(self.results)*100:.1f}%)")
        print(f"Failed: {len(failed)} ({len(failed)/len(self.results)*100:.1f}%)")
        print(f"Total Time: {total_time:.2f}s")
        print(f"Throughput: {len(self.results)/total_time:.2f} requests/sec")

        if times:
            print(f"\\nResponse Times (ms):")
            print(f"  Min: {min(times):.2f}")
            print(f"  Max: {max(times):.2f}")
            print(f"  Mean: {statistics.mean(times):.2f}")
            print(f"  Median: {statistics.median(times):.2f}")
            print(f"  P95: {sorted(times)[int(len(times)*0.95)]:.2f}")
            print(f"  P99: {sorted(times)[int(len(times)*0.99)]:.2f}")

        if failed:
            print(f"\\nErrors:")
            for error in set(f["error"] for f in failed if "error" in f):
                print(f"  - {error}")

if __name__ == "__main__":
    tester = APILoadTester()

    print("Make sure FastAPI server is running!")
    print("Run: uvicorn fastapi_app.main:app --reload")
    print()

    # Run load test
    tester.run_load_test(num_requests=100, num_workers=10)
'''

load_test_path = Path('load_test.py')
load_test_path.write_text(load_test_content)
print(f"✓ Created {load_test_path}")

30 BAGIAN 5: Deployment & Monitoring

30.1 5.1 Build Summary

# Create summary of all created files
print("\n" + "="*60)
print("DEPLOYMENT PACKAGE CREATED SUCCESSFULLY")
print("="*60)

print("\n📁 Directory Structure:")
print("""
.
├── models/
│   ├── iris_model.pkl
│   ├── iris_scaler.pkl
│   └── iris_model.onnx
├── fastapi_app/
│   ├── config.py
│   ├── models.py
│   ├── utils.py
│   ├── main.py
│   └── requirements.txt
├── tests/
│   ├── __init__.py
│   └── test_api.py
├── Dockerfile
├── docker-compose.yml
├── .dockerignore
└── load_test.py
""")

print("\n📊 Model Files:")
print(f"  ✓ iris_model.pkl (Python pickle)")
print(f"  ✓ iris_scaler.pkl (StandardScaler)")
print(f"  ✓ iris_model.onnx (ONNX format)")

print("\n🚀 API Endpoints:")
print("  GET  /health              - Health check")
print("  GET  /info                - API information")
print("  POST /predict             - Single prediction")
print("  POST /predict/batch       - Batch predictions")

print("\n🔧 Available Commands:")
print("  # Run FastAPI server")
print("  uvicorn fastapi_app.main:app --reload --host 0.0.0.0 --port 8000")
print()
print("  # Run tests")
print("  pytest tests/test_api.py -v")
print()
print("  # Build Docker image")
print("  docker build -t iris-api:latest .")
print()
print("  # Run with Docker")
print("  docker-compose up")
print()
print("  # Run load test")
print("  python load_test.py")

30.2 5.2 Performance Benchmarking

# Performance comparison: pickle vs ONNX
import time
import joblib

print("\n" + "="*60)
print("PERFORMANCE BENCHMARKING")
print("="*60)

# Load both models
sklearn_model = joblib.load(model_dir / 'iris_model.pkl')
scaler = joblib.load(model_dir / 'iris_scaler.pkl')
session = ort.InferenceSession(str(model_dir / 'iris_model.onnx'))
input_name = session.get_inputs()[0].name
output_names = [o.name for o in session.get_outputs()]

# Prepare test data
test_data = X_test_scaled.astype(np.float32)
n_iterations = 1000

# Benchmark sklearn
print("\n⏱️ Benchmarking sklearn RandomForest...")
start = time.time()
for _ in range(n_iterations):
    sklearn_model.predict(test_data)
sklearn_time = time.time() - start

print(f"  Total time: {sklearn_time:.4f}s")
print(f"  Time per sample: {(sklearn_time/n_iterations/len(test_data))*1000:.4f}ms")

# Benchmark ONNX
print("\n⏱️ Benchmarking ONNX Runtime...")
start = time.time()
for _ in range(n_iterations):
    session.run(output_names, {input_name: test_data})
onnx_time = time.time() - start

print(f"  Total time: {onnx_time:.4f}s")
print(f"  Time per sample: {(onnx_time/n_iterations/len(test_data))*1000:.4f}ms")

# Calculate speedup
speedup = sklearn_time / onnx_time
print(f"\n✨ ONNX Speedup: {speedup:.2f}x faster than sklearn!")

31 Summary

Dalam lab ini, Anda telah mempelajari:

31.1 Model Deployment Pipeline

  1. Training: Melatih dan mengevaluasi model
  2. Optimization: Mengkonversi ke ONNX format
  3. API Development: Membangun FastAPI application
  4. Containerization: Docker deployment
  5. Testing: Unit tests dan load testing
  6. Monitoring: Performance benchmarking

31.2 Key Technologies

  • FastAPI: Modern, fast web framework untuk API
  • ONNX: Standard format untuk model optimization
  • ONNX Runtime: Efficient inference engine
  • Docker: Container technology untuk deployment
  • Pytest: Unit testing framework

31.3 Best Practices

  • Input validation dengan Pydantic
  • Error handling yang robust
  • Performance monitoring dan benchmarking
  • Containerization untuk reproducibility
  • Comprehensive testing sebelum deployment

31.4 Next Steps

  1. Deploy API ke production environment
  2. Implementasi monitoring dan logging
  3. Setup CI/CD pipeline
  4. Optimize untuk high-traffic scenarios
  5. Implementasi authentication dan rate limiting

32 References