---
title: "Lab 11: Model Deployment dengan FastAPI, ONNX, dan Docker"
subtitle: "Production-Ready Machine Learning Model Serving"
author: "Pembelajaran Mesin - Data Science for Cybersecurity"
date: today
format:
html:
toc: true
toc-depth: 4
toc-location: left
number-sections: true
number-depth: 4
code-fold: false
code-tools: true
code-line-numbers: true
code-copy: true
theme: cosmo
highlight-style: github
fig-width: 10
fig-height: 6
fig-dpi: 300
css: ../styles/lab-style.css
pdf:
toc: true
number-sections: true
colorlinks: true
geometry:
- top=20mm
- left=20mm
- right=20mm
- bottom=20mm
code-block-bg: "#f5f5f5"
code-block-border-left: "#3498db"
jupyter: python3
execute:
echo: true
warning: false
message: false
cache: false
---
# Pendahuluan {#sec-intro}
## Tujuan Pembelajaran {#sec-objectives}
Setelah menyelesaikan lab ini, Anda diharapkan dapat:
1. **Memahami** pipeline deployment model machine learning dari development ke production
2. **Mengkonversi** trained models ke format ONNX untuk optimasi
3. **Membangun** REST API menggunakan FastAPI untuk model serving
4. **Mengimplementasikan** containerization dengan Docker
5. **Mengevaluasi** performance model di production environment
6. **Mengoptimalkan** latency dan throughput model inference
7. **Membuat** unit tests dan integration tests untuk API
8. **Menerapkan** best practices dalam production deployment
## Gambaran Umum Lab {#sec-overview}
Lab ini fokus pada **deployment production-ready** model machine learning dengan:
- **Model Training**: Melatih model klasifikasi sederhana
- **Model Optimization**: Konversi ke ONNX untuk performa lebih baik
- **API Development**: FastAPI untuk model serving
- **Containerization**: Docker untuk reproducibility
- **Testing & Monitoring**: Quality assurance di production
### Durasi Lab
**2-3 jam** dengan breakdown:
- Setup & Installation: 20 menit
- Model Training & Conversion: 30 menit
- FastAPI Development: 45 menit
- Docker Configuration: 30 menit
- Testing & Optimization: 20 menit
### Lab Structure
```{mermaid}
graph TD
A[Setup & Installation] --> B[Train Model]
B --> C[Convert to ONNX]
C --> D[Build FastAPI App]
D --> E[Create Dockerfile]
E --> F[Local Testing]
F --> G[Performance Optimization]
G --> H[Docker Deployment]
style A fill:#e6f3ff
style B fill:#ffe6e6
style C fill:#ffffcc
style D fill:#ccffcc
style E fill:#e6ccff
style F fill:#ffcccc
style G fill:#ccffe6
style H fill:#ffccff
```
## Persiapan Environment {#sec-setup}
### Install Dependencies
```{python}
import subprocess
import sys
import os
packages = [
'fastapi' ,
'uvicorn' ,
'pydantic' ,
'scikit-learn' ,
'pandas' ,
'numpy' ,
'onnx' ,
'onnxruntime' ,
'skl2onnx' ,
'httpx' , # untuk testing
'python-multipart'
]
for package in packages:
subprocess.check_call([sys.executable, '-m' , 'pip' , 'install' , '-q' , package])
print ("✓ All packages installed successfully!" )
```
### Import Libraries
```{python}
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import json
import warnings
warnings.filterwarnings('ignore' )
# Model training
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (
accuracy_score, precision_score, recall_score,
f1_score, confusion_matrix, classification_report
)
# ONNX
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
# Utilities
from datetime import datetime
import joblib
print ("✓ All imports successful!" )
```
---
# BAGIAN 1: Model Training & Optimization {#sec-part1}
## 1.1 Dataset Preparation
### Load Iris Dataset
```{python}
# Load dataset
X, y = load_iris(return_X_y= True )
feature_names = load_iris().feature_names
target_names = load_iris().target_names
print (f"Dataset shape: { X. shape} " )
print (f"Number of classes: { len (target_names)} " )
print (f"Feature names: { feature_names} " )
print (f"Target names: { target_names} " )
# Create DataFrame
df = pd.DataFrame(X, columns= feature_names)
df['target' ] = y
print (f" \n First 5 rows: \n { df. head()} " )
```
### Train-Test Split
```{python}
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size= 0.2 , random_state= 42 , stratify= y
)
print (f"Training set size: { X_train. shape[0 ]} " )
print (f"Test set size: { X_test. shape[0 ]} " )
# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
print (f"✓ Data scaled successfully" )
```
## 1.2 Model Training
```{python}
# Train model
model = RandomForestClassifier(
n_estimators= 100 ,
max_depth= 10 ,
random_state= 42 ,
n_jobs=- 1
)
model.fit(X_train_scaled, y_train)
# Evaluate
y_pred_train = model.predict(X_train_scaled)
y_pred_test = model.predict(X_test_scaled)
print ("Training Metrics:" )
print (f" Accuracy: { accuracy_score(y_train, y_pred_train):.4f} " )
print (" \n Test Metrics:" )
print (f" Accuracy: { accuracy_score(y_test, y_pred_test):.4f} " )
print (f" Precision: { precision_score(y_test, y_pred_test, average= 'weighted' ):.4f} " )
print (f" Recall: { recall_score(y_test, y_pred_test, average= 'weighted' ):.4f} " )
print (f" F1-Score: { f1_score(y_test, y_pred_test, average= 'weighted' ):.4f} " )
```
### Save Trained Model
```{python}
# Create model directory
model_dir = Path('models' )
model_dir.mkdir(exist_ok= True )
# Save model and scaler
model_path = model_dir / 'iris_model.pkl'
scaler_path = model_dir / 'iris_scaler.pkl'
joblib.dump(model, model_path)
joblib.dump(scaler, scaler_path)
print (f"✓ Model saved to { model_path} " )
print (f"✓ Scaler saved to { scaler_path} " )
```
## 1.3 Model Conversion to ONNX
ONNX (Open Neural Network Exchange) adalah format standar untuk model machine learning yang memungkinkan:
- Optimasi inference
- Portabilitas antar platform
- Hardware acceleration
- Lebih cepat dari pickle format
```{python}
# Define initial types
initial_type = [('float_input' , FloatTensorType([None , 4 ]))]
# Convert model to ONNX
onnx_model = convert_sklearn(model, initial_types= initial_type)
# Save ONNX model
onnx_path = model_dir / 'iris_model.onnx'
onnx.save_model(onnx_model, str (onnx_path))
print (f"✓ Model converted to ONNX and saved to { onnx_path} " )
print (f"✓ ONNX file size: { onnx_path. stat(). st_size / 1024 :.2f} KB" )
# Verify ONNX model
onnx_model_loaded = onnx.load(str (onnx_path))
onnx.checker.check_model(onnx_model_loaded)
print (f"✓ ONNX model verified successfully" )
```
## 1.4 ONNX Model Inference Testing
```{python}
# Create ONNX Runtime session
sess = ort.InferenceSession(str (onnx_path))
# Get input/output names
input_name = sess.get_inputs()[0 ].name
output_names = [output.name for output in sess.get_outputs()]
print (f"Input name: { input_name} " )
print (f"Output names: { output_names} " )
# Prepare test data
X_test_float = X_test_scaled.astype(np.float32)
# Run inference
onnx_predictions = sess.run(
output_names,
{input_name: X_test_float}
)
print (f" \n ONNX Predictions shape: { onnx_predictions[0 ]. shape} " )
print (f"First 5 predictions: { onnx_predictions[0 ][:5 ]} " )
# Verify accuracy
onnx_pred_classes = np.argmax(onnx_predictions[1 ], axis= 1 )
onnx_accuracy = accuracy_score(y_test, onnx_pred_classes)
print (f" \n ✓ ONNX Model Accuracy: { onnx_accuracy:.4f} " )
print (f"✓ Matches sklearn model: { np. allclose(y_pred_test, onnx_pred_classes)} " )
```
---
# BAGIAN 2: FastAPI Model Serving {#sec-part2}
## 2.1 Create FastAPI Application Structure
```{python}
# Create application directory structure
app_dir = Path('fastapi_app' )
app_dir.mkdir(exist_ok= True )
print (f"Created application directory: { app_dir} " )
print (f"Directory structure:" )
print (f" fastapi_app/" )
print (f" ├── main.py" )
print (f" ├── models.py" )
print (f" ├── utils.py" )
print (f" ├── config.py" )
print (f" └── requirements.txt" )
```
## 2.2 Configuration Module
```{python}
# Create config.py
config_content = '''"""
Configuration module for FastAPI application
"""
from pathlib import Path
from typing import List
# Paths
BASE_DIR = Path(__file__).parent.parent
MODELS_DIR = BASE_DIR / "models"
# Model configuration
MODEL_PATH = MODELS_DIR / "iris_model.onnx"
SCALER_PATH = MODELS_DIR / "iris_scaler.pkl"
# Feature configuration
FEATURE_NAMES = [
"sepal_length_cm",
"sepal_width_cm",
"petal_length_cm",
"petal_width_cm"
]
TARGET_NAMES = ["setosa", "versicolor", "virginica"]
# API configuration
API_TITLE = "Iris Model API"
API_DESCRIPTION = "Production-ready ML model serving API"
API_VERSION = "1.0.0"
# Server configuration
HOST = "0.0.0.0"
PORT = 8000
DEBUG = False
```
config_path = app_dir / 'config.py'
config_path.write_text(config_content)
print(f"✓ Created {config_path} ")
```
## 2.3 Data Models
```{python}
# Create models.py for Pydantic models
models_content = '''"""
Pydantic data models for request/response validation
"""
from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum
class IrisFeatures(BaseModel):
"""Input features for iris prediction"""
sepal_length_cm: float = Field(..., gt= 0 , le= 10 )
sepal_width_cm: float = Field(..., gt= 0 , le= 10 )
petal_length_cm: float = Field(..., ge= 0 , le= 10 )
petal_width_cm: float = Field(..., ge= 0 , le= 10 )
class Config:
schema_extra = {
"example" : {
"sepal_length_cm" : 5.1 ,
"sepal_width_cm" : 3.5 ,
"petal_length_cm" : 1.4 ,
"petal_width_cm" : 0.2
}
}
class IrisBatchFeatures(BaseModel):
"""Batch input features"""
samples: List[IrisFeatures]
class PredictionResponse(BaseModel):
"""Single prediction response"""
input_features: dict
predicted_class: str
predicted_label: int
confidence: float
probabilities: dict
inference_time_ms: float
class BatchPredictionResponse(BaseModel):
"""Batch prediction response"""
predictions: List[PredictionResponse]
total_inference_time_ms: float
samples_count: int
class HealthResponse(BaseModel):
"""Health check response"""
status: str
version: str
models_loaded: bool
```
models_path = app_dir / 'models.py'
models_path.write_text(models_content)
print(f"✓ Created {models_path}")
```
## 2.4 Utilities Module
```{python}
# Create utils.py for helper functions
utils_content = '''"""
Utility functions for model inference
"""
import numpy as np
import onnxruntime as ort
import joblib
from pathlib import Path
import time
from typing import Dict, Tuple, List
from config import MODEL_PATH, SCALER_PATH, TARGET_NAMES
class ModelInference:
"""Singleton class for model inference"""
_instance = None
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
cls._instance._initialized = False
return cls._instance
def __init__(self):
if self._initialized:
return
# Load ONNX model
self.session = ort.InferenceSession(str(MODEL_PATH))
self.input_name = self.session.get_inputs()[0].name
self.output_names = [o.name for o in self.session.get_outputs()]
# Load scaler
self.scaler = joblib.load(str(SCALER_PATH))
self._initialized = True
def predict(self, features: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
"""
Make predictions using ONNX model
Args:
features: numpy array of shape (n_samples, 4)
Returns:
predicted_classes, probabilities
"""
# Scale features
features_scaled = self.scaler.transform(features)
features_scaled = features_scaled.astype(np.float32)
# Run inference
outputs = self.session.run(
self.output_names,
{self.input_name: features_scaled}
)
# Extract predictions and probabilities
probabilities = outputs[1] # Class probabilities
predictions = np.argmax(probabilities, axis=1)
return predictions, probabilities
def get_model() -> ModelInference:
"""Get model inference singleton"""
return ModelInference()
def format_prediction(
features_dict: Dict,
predicted_class: int,
probabilities: np.ndarray,
inference_time_ms: float
) -> Dict:
"""Format prediction results"""
class_name = TARGET_NAMES[predicted_class]
confidence = float(probabilities[0, predicted_class])
probs_dict = {
TARGET_NAMES[i]: float(probabilities[0, i])
for i in range(len(TARGET_NAMES))
}
return {
"input_features": features_dict,
"predicted_class": class_name,
"predicted_label": int(predicted_class),
"confidence": confidence,
"probabilities": probs_dict,
"inference_time_ms": inference_time_ms
}
'''
utils_path = app_dir / 'utils.py'
utils_path.write_text(utils_content)
print(f"✓ Created {utils_path}")
```
## 2.5 Main FastAPI Application
```{python}
# Create main.py
main_content = '''"""
FastAPI application for iris model serving
"""
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
import time
import numpy as np
from typing import List
from config import (
API_TITLE, API_DESCRIPTION, API_VERSION,
FEATURE_NAMES, TARGET_NAMES
)
from models import (
IrisFeatures, IrisBatchFeatures,
PredictionResponse, BatchPredictionResponse,
HealthResponse
)
from utils import get_model, format_prediction
# Create FastAPI app
app = FastAPI(
title=API_TITLE,
description=API_DESCRIPTION,
version=API_VERSION
)
# Add CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.on_event("startup")
async def startup_event():
"""Load model on startup"""
try:
model = get_model()
print("✓ Model loaded successfully on startup")
except Exception as e:
print(f"✗ Error loading model: {e} ")
raise
@app.get("/health", response_model=HealthResponse)
async def health_check():
"""Health check endpoint"""
try:
model = get_model()
return HealthResponse(
status="healthy",
version=API_VERSION,
models_loaded=True
)
except Exception as e:
return JSONResponse(
status_code=503,
content={
"status": "unhealthy",
"version": API_VERSION,
"models_loaded": False,
"error": str(e)
}
)
@app.get("/info")
async def get_info():
"""Get API information"""
return {
"api_title": API_TITLE,
"api_version": API_VERSION,
"feature_names": FEATURE_NAMES,
"target_classes": TARGET_NAMES,
"description": API_DESCRIPTION
}
@app.post("/predict", response_model=PredictionResponse)
async def predict(features: IrisFeatures):
"""
Make single prediction
Example request:
{
"sepal_length_cm": 5.1,
"sepal_width_cm": 3.5,
"petal_length_cm": 1.4,
"petal_width_cm": 0.2
}
"""
try:
# Extract features
feature_values = np.array([
features.sepal_length_cm,
features.sepal_width_cm,
features.petal_length_cm,
features.petal_width_cm
]).reshape(1, -1)
# Measure inference time
start_time = time.time()
# Get model and make prediction
model = get_model()
predictions, probabilities = model.predict(feature_values)
inference_time = (time.time() - start_time) * 1000
# Format response
result = format_prediction(
features.dict(),
predictions[0],
probabilities,
inference_time
)
return PredictionResponse(**result)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.post("/predict/batch", response_model=BatchPredictionResponse)
async def predict_batch(batch_features: IrisBatchFeatures):
"""
Make batch predictions
Example request:
{
"samples": [
{"sepal_length_cm": 5.1, "sepal_width_cm": 3.5,
"petal_length_cm": 1.4, "petal_width_cm": 0.2},
{"sepal_length_cm": 7.0, "sepal_width_cm": 3.2,
"petal_length_cm": 4.7, "petal_width_cm": 1.4}
]
}
"""
try:
if not batch_features.samples:
raise HTTPException(status_code=400, detail="Empty samples list")
# Extract features
feature_values = np.array([
[
s.sepal_length_cm,
s.sepal_width_cm,
s.petal_length_cm,
s.petal_width_cm
]
for s in batch_features.samples
])
# Measure inference time
start_time = time.time()
# Get model and make predictions
model = get_model()
predictions, probabilities = model.predict(feature_values)
total_inference_time = (time.time() - start_time) * 1000
# Format responses
results = []
for i, sample in enumerate(batch_features.samples):
result = format_prediction(
sample.dict(),
predictions[i],
probabilities,
total_inference_time / len(batch_features.samples)
)
results.append(PredictionResponse(**result))
return BatchPredictionResponse(
predictions=results,
total_inference_time_ms=total_inference_time,
samples_count=len(batch_features.samples)
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
if __name__ == "__main__":
import uvicorn
uvicorn.run(
app,
host="0.0.0.0",
port=8000,
workers=1
)
'''
main_path = app_dir / 'main.py'
main_path.write_text(main_content)
print (f"✓ Created { main_path} " )
```
## 2.6 Requirements File
```{python}
# Create requirements.txt for FastAPI app
requirements_content = '''fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
scikit-learn==1.3.2
pandas==2.1.3
numpy==1.24.3
onnx==1.15.0
onnxruntime==1.17.0
skl2onnx==1.16.0
joblib==1.3.2
httpx==0.25.1
python-multipart==0.0.6
'''
req_path = app_dir / 'requirements.txt'
req_path.write_text(requirements_content)
print (f"✓ Created { req_path} " )
```
---
# BAGIAN 3: Docker Containerization {#sec-part3}
## 3.1 Create Dockerfile
```{python}
# Create Dockerfile
dockerfile_content = '''FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \\
build-essential \\
&& rm -rf /var/lib/apt/lists/*
# Copy requirements
COPY fastapi_app/requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY fastapi_app/ .
COPY models/ /app/models/
# Expose port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \\
CMD python -c "import requests; requests.get('http://localhost:8000/health')" || exit 1
# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
'''
dockerfile_path = Path('Dockerfile' )
dockerfile_path.write_text(dockerfile_content)
print (f"✓ Created Dockerfile" )
```
## 3.2 Create Docker Compose (Optional)
```{python}
# Create docker-compose.yml for easy local testing
docker_compose_content = '''version: '3.8'
services:
iris-api:
build:
context: .
dockerfile: Dockerfile
ports:
- "8000:8000"
volumes:
- ./models:/app/models:ro
environment:
- PYTHONUNBUFFERED=1
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
'''
compose_path = Path('docker-compose.yml' )
compose_path.write_text(docker_compose_content)
print (f"✓ Created docker-compose.yml" )
```
## 3.3 .dockerignore File
```{python}
# Create .dockerignore
dockerignore_content = '''__pycache__
*.pyc
*.pyo
*.egg-info
.env
.venv
.git
.gitignore
.docker
.dockerignore
node_modules
*.md
.pytest_cache
.coverage
htmlcov
dist
build
*.log
tests/
.vscode
.idea
'''
dockerignore_path = Path('.dockerignore' )
dockerignore_path.write_text(dockerignore_content)
print (f"✓ Created .dockerignore" )
```
---
# BAGIAN 4: Testing & Quality Assurance {#sec-part4}
## 4.1 Unit Tests
```{python}
# Create tests directory and test_api.py
tests_dir = Path('tests' )
tests_dir.mkdir(exist_ok= True )
test_api_content = '''"""
Unit tests for FastAPI application
"""
import pytest
import sys
from pathlib import Path
import numpy as np
# Add parent directory to path
sys.path.insert(0, str(Path(__file__).parent.parent / "fastapi_app"))
from fastapi.testclient import TestClient
from main import app
from config import TARGET_NAMES
client = TestClient(app)
class TestHealthEndpoint:
"""Test health check endpoint"""
def test_health_check_returns_200(self):
response = client.get("/health")
assert response.status_code == 200
def test_health_response_structure(self):
response = client.get("/health")
data = response.json()
assert "status" in data
assert "version" in data
assert "models_loaded" in data
assert data["status"] == "healthy"
class TestInfoEndpoint:
"""Test info endpoint"""
def test_get_info(self):
response = client.get("/info")
assert response.status_code == 200
data = response.json()
assert "api_title" in data
assert "feature_names" in data
assert "target_classes" in data
assert len(data["feature_names"]) == 4
assert len(data["target_classes"]) == 3
class TestPredictEndpoint:
"""Test single prediction endpoint"""
def test_valid_prediction(self):
"""Test valid prediction request"""
payload = {
"sepal_length_cm": 5.1,
"sepal_width_cm": 3.5,
"petal_length_cm": 1.4,
"petal_width_cm": 0.2
}
response = client.post("/predict", json=payload)
assert response.status_code == 200
data = response.json()
assert "predicted_class" in data
assert "predicted_label" in data
assert "confidence" in data
assert "probabilities" in data
assert "inference_time_ms" in data
# Verify prediction is valid
assert data["predicted_label"] in [0, 1, 2]
assert data["predicted_class"] in TARGET_NAMES
assert 0 <= data["confidence"] <= 1
def test_invalid_sepal_length(self):
"""Test invalid sepal length (negative)"""
payload = {
"sepal_length_cm": -5.1,
"sepal_width_cm": 3.5,
"petal_length_cm": 1.4,
"petal_width_cm": 0.2
}
response = client.post("/predict", json=payload)
assert response.status_code == 422 # Validation error
def test_setosa_prediction(self):
"""Test setosa iris prediction"""
payload = {
"sepal_length_cm": 5.1,
"sepal_width_cm": 3.5,
"petal_length_cm": 1.4,
"petal_width_cm": 0.2
}
response = client.post("/predict", json=payload)
assert response.status_code == 200
data = response.json()
# Setosa should have distinct features
assert data["predicted_class"] == "setosa"
def test_virginica_prediction(self):
"""Test virginica iris prediction"""
payload = {
"sepal_length_cm": 7.9,
"sepal_width_cm": 3.8,
"petal_length_cm": 6.4,
"petal_width_cm": 2.0
}
response = client.post("/predict", json=payload)
assert response.status_code == 200
data = response.json()
assert data["predicted_class"] == "virginica"
class TestBatchPredictEndpoint:
"""Test batch prediction endpoint"""
def test_valid_batch_prediction(self):
"""Test valid batch prediction"""
payload = {
"samples": [
{
"sepal_length_cm": 5.1,
"sepal_width_cm": 3.5,
"petal_length_cm": 1.4,
"petal_width_cm": 0.2
},
{
"sepal_length_cm": 7.0,
"sepal_width_cm": 3.2,
"petal_length_cm": 4.7,
"petal_width_cm": 1.4
}
]
}
response = client.post("/predict/batch", json=payload)
assert response.status_code == 200
data = response.json()
assert "predictions" in data
assert "total_inference_time_ms" in data
assert "samples_count" in data
assert len(data["predictions"]) == 2
assert data["samples_count"] == 2
def test_empty_batch_prediction(self):
"""Test empty batch prediction"""
payload = {"samples": []}
response = client.post("/predict/batch", json=payload)
assert response.status_code == 400
class TestInferenceTime:
"""Test inference time measurements"""
def test_single_prediction_time(self):
"""Test that inference time is reasonable"""
payload = {
"sepal_length_cm": 5.1,
"sepal_width_cm": 3.5,
"petal_length_cm": 1.4,
"petal_width_cm": 0.2
}
response = client.post("/predict", json=payload)
data = response.json()
# Should complete in less than 100ms
assert data["inference_time_ms"] < 100
def test_batch_prediction_time(self):
"""Test batch prediction inference time"""
payload = {
"samples": [
{
"sepal_length_cm": 5.1,
"sepal_width_cm": 3.5,
"petal_length_cm": 1.4,
"petal_width_cm": 0.2
}
for _ in range(10)
]
}
response = client.post("/predict/batch", json=payload)
data = response.json()
# Should complete in less than 200ms
assert data["total_inference_time_ms"] < 200
if __name__ == "__main__":
pytest.main([__file__, "-v"])
'''
test_path = tests_dir / 'test_api.py'
test_path.write_text(test_api_content)
print (f"✓ Created { test_path} " )
# Create __init__.py
init_path = tests_dir / '__init__.py'
init_path.write_text('' )
print (f"✓ Created { init_path} " )
```
## 4.2 Load Testing Script
```{python}
# Create load testing script
load_test_content = '''"""
Load testing script for API performance evaluation
"""
import time
import requests
import json
from concurrent.futures import ThreadPoolExecutor, as_completed
import statistics
class APILoadTester:
"""Load testing for FastAPI application"""
def __init__(self, base_url="http://localhost:8000"):
self.base_url = base_url
self.results = []
def single_request(self):
"""Make single prediction request"""
payload = {
"sepal_length_cm": 5.1,
"sepal_width_cm": 3.5,
"petal_length_cm": 1.4,
"petal_width_cm": 0.2
}
start = time.time()
try:
response = requests.post(
f" {self.base_url} /predict",
json=payload,
timeout=5
)
elapsed = (time.time() - start) * 1000
return {
"status": response.status_code,
"time_ms": elapsed,
"success": response.status_code == 200
}
except Exception as e:
elapsed = (time.time() - start) * 1000
return {
"status": 0,
"time_ms": elapsed,
"success": False,
"error": str(e)
}
def run_load_test(self, num_requests=100, num_workers=10):
"""Run load test with specified parameters"""
print(f"Starting load test...")
print(f" Total requests: {num_requests} ")
print(f" Concurrent workers: {num_workers} ")
start_time = time.time()
with ThreadPoolExecutor(max_workers=num_workers) as executor:
futures = [
executor.submit(self.single_request)
for _ in range(num_requests)
]
for i, future in enumerate(as_completed(futures)):
result = future.result()
self.results.append(result)
if (i + 1) % 10 == 0:
print(f" Completed: {i + 1}/ {num_requests} ")
total_time = time.time() - start_time
self.print_results(total_time)
def print_results(self, total_time):
"""Print load test results"""
successful = [r for r in self.results if r["success"]]
failed = [r for r in self.results if not r["success"]]
times = [r["time_ms"] for r in successful]
print(f" \\ n{'='*50}")
print(f"LOAD TEST RESULTS")
print(f"{'='*50}")
print(f"Total Requests: {len(self.results)}")
print(f"Successful: {len(successful)} ({len(successful)/len(self.results)*100:.1f}%)")
print(f"Failed: {len(failed)} ({len(failed)/len(self.results)*100:.1f}%)")
print(f"Total Time: {total_time:.2f} s")
print(f"Throughput: {len(self.results)/total_time:.2f} requests/sec")
if times:
print(f" \\ nResponse Times (ms):")
print(f" Min: {min(times):.2f}")
print(f" Max: {max(times):.2f}")
print(f" Mean: {statistics.mean(times):.2f}")
print(f" Median: {statistics.median(times):.2f}")
print(f" P95: {sorted(times)[int(len(times)*0.95)]:.2f}")
print(f" P99: {sorted(times)[int(len(times)*0.99)]:.2f}")
if failed:
print(f" \\ nErrors:")
for error in set(f["error"] for f in failed if "error" in f):
print(f" - {error} ")
if __name__ == "__main__":
tester = APILoadTester()
print("Make sure FastAPI server is running!")
print("Run: uvicorn fastapi_app.main:app --reload")
print()
# Run load test
tester.run_load_test(num_requests=100, num_workers=10)
'''
load_test_path = Path('load_test.py' )
load_test_path.write_text(load_test_content)
print (f"✓ Created { load_test_path} " )
```
---
# BAGIAN 5: Deployment & Monitoring {#sec-part5}
## 5.1 Build Summary
```{python}
# Create summary of all created files
print (" \n " + "=" * 60 )
print ("DEPLOYMENT PACKAGE CREATED SUCCESSFULLY" )
print ("=" * 60 )
print (" \n 📁 Directory Structure:" )
print ("""
.
├── models/
│ ├── iris_model.pkl
│ ├── iris_scaler.pkl
│ └── iris_model.onnx
├── fastapi_app/
│ ├── config.py
│ ├── models.py
│ ├── utils.py
│ ├── main.py
│ └── requirements.txt
├── tests/
│ ├── __init__.py
│ └── test_api.py
├── Dockerfile
├── docker-compose.yml
├── .dockerignore
└── load_test.py
""" )
print (" \n 📊 Model Files:" )
print (f" ✓ iris_model.pkl (Python pickle)" )
print (f" ✓ iris_scaler.pkl (StandardScaler)" )
print (f" ✓ iris_model.onnx (ONNX format)" )
print (" \n 🚀 API Endpoints:" )
print (" GET /health - Health check" )
print (" GET /info - API information" )
print (" POST /predict - Single prediction" )
print (" POST /predict/batch - Batch predictions" )
print (" \n 🔧 Available Commands:" )
print (" # Run FastAPI server" )
print (" uvicorn fastapi_app.main:app --reload --host 0.0.0.0 --port 8000" )
print ()
print (" # Run tests" )
print (" pytest tests/test_api.py -v" )
print ()
print (" # Build Docker image" )
print (" docker build -t iris-api:latest ." )
print ()
print (" # Run with Docker" )
print (" docker-compose up" )
print ()
print (" # Run load test" )
print (" python load_test.py" )
```
## 5.2 Performance Benchmarking
```{python}
# Performance comparison: pickle vs ONNX
import time
import joblib
print (" \n " + "=" * 60 )
print ("PERFORMANCE BENCHMARKING" )
print ("=" * 60 )
# Load both models
sklearn_model = joblib.load(model_dir / 'iris_model.pkl' )
scaler = joblib.load(model_dir / 'iris_scaler.pkl' )
session = ort.InferenceSession(str (model_dir / 'iris_model.onnx' ))
input_name = session.get_inputs()[0 ].name
output_names = [o.name for o in session.get_outputs()]
# Prepare test data
test_data = X_test_scaled.astype(np.float32)
n_iterations = 1000
# Benchmark sklearn
print (" \n ⏱️ Benchmarking sklearn RandomForest..." )
start = time.time()
for _ in range (n_iterations):
sklearn_model.predict(test_data)
sklearn_time = time.time() - start
print (f" Total time: { sklearn_time:.4f} s" )
print (f" Time per sample: { (sklearn_time/ n_iterations/ len (test_data))* 1000 :.4f} ms" )
# Benchmark ONNX
print (" \n ⏱️ Benchmarking ONNX Runtime..." )
start = time.time()
for _ in range (n_iterations):
session.run(output_names, {input_name: test_data})
onnx_time = time.time() - start
print (f" Total time: { onnx_time:.4f} s" )
print (f" Time per sample: { (onnx_time/ n_iterations/ len (test_data))* 1000 :.4f} ms" )
# Calculate speedup
speedup = sklearn_time / onnx_time
print (f" \n ✨ ONNX Speedup: { speedup:.2f} x faster than sklearn!" )
```
---
# Summary {#sec-summary}
Dalam lab ini, Anda telah mempelajari:
## Model Deployment Pipeline
1. **Training**: Melatih dan mengevaluasi model
2. **Optimization**: Mengkonversi ke ONNX format
3. **API Development**: Membangun FastAPI application
4. **Containerization**: Docker deployment
5. **Testing**: Unit tests dan load testing
6. **Monitoring**: Performance benchmarking
## Key Technologies
- **FastAPI**: Modern, fast web framework untuk API
- **ONNX**: Standard format untuk model optimization
- **ONNX Runtime**: Efficient inference engine
- **Docker**: Container technology untuk deployment
- **Pytest**: Unit testing framework
## Best Practices
- Input validation dengan Pydantic
- Error handling yang robust
- Performance monitoring dan benchmarking
- Containerization untuk reproducibility
- Comprehensive testing sebelum deployment
## Next Steps
1. Deploy API ke production environment
2. Implementasi monitoring dan logging
3. Setup CI/CD pipeline
4. Optimize untuk high-traffic scenarios
5. Implementasi authentication dan rate limiting
---
# References {#sec-references}
- [ FastAPI Documentation ](https://fastapi.tiangolo.com/)
- [ ONNX Documentation ](https://onnx.ai/)
- [ Docker Documentation ](https://docs.docker.com/)
- [ scikit-learn Documentation ](https://scikit-learn.org/)
- [ ONNX Runtime Documentation ](https://onnxruntime.ai/)