Dari Development ke Production: ML Lifecycle, Containerization & Deployment Strategies
Bab 11: MLOps & Model Deployment
🎯 Hasil Pembelajaran (Learning Outcomes)
Setelah mempelajari bab ini, Anda akan mampu:
Memahami ML lifecycle dan prinsip-prinsip MLOps
Mengimplementasikan containerization dengan Docker untuk ML applications
Membangun REST API untuk model serving menggunakan FastAPI dan Flask
Menerapkan model optimization dengan ONNX Runtime
Merancang deployment strategies (batch vs real-time, cloud vs edge)
Mengintegrasikan monitoring dan logging untuk production systems
Mengelola model versioning dan CI/CD pipelines untuk ML
11.1 MLOps: Bridging ML Development dan Production
11.1.1 Dari Jupyter Notebook ke Production
The Reality Check:
Anda telah berhasil membuat model dengan 95% accuracy di Jupyter Notebook. Tapi… bagaimana cara menyajikannya ke users? Ini adalah salah satu tantangan terbesar dalam machine learning!
Production Gap:
Code
flowchart LR A["Jupyter Notebook<br/>95% Accuracy"] --> B{"Production Gap"} B --> C["Scalability?"] B --> D["Latency?"] B --> E["Reliability?"] B --> F["Monitoring?"] B --> G["Updates?"] C --> H["Challenges"] D --> H E --> H F --> H G --> H H --> I["MLOps Solutions"] style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px style B fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px,color:#fff style C fill:#fff3cd,stroke:#856404,stroke-width:1px style D fill:#fff3cd,stroke:#856404,stroke-width:1px style E fill:#fff3cd,stroke:#856404,stroke-width:1px style F fill:#fff3cd,stroke:#856404,stroke-width:1px style G fill:#fff3cd,stroke:#856404,stroke-width:1px style H fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff style I fill:#51cf66,stroke:#2b8a3e,stroke-width:2px
flowchart LR
A["Jupyter Notebook<br/>95% Accuracy"] --> B{"Production Gap"}
B --> C["Scalability?"]
B --> D["Latency?"]
B --> E["Reliability?"]
B --> F["Monitoring?"]
B --> G["Updates?"]
C --> H["Challenges"]
D --> H
E --> H
F --> H
G --> H
H --> I["MLOps Solutions"]
style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style B fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px,color:#fff
style C fill:#fff3cd,stroke:#856404,stroke-width:1px
style D fill:#fff3cd,stroke:#856404,stroke-width:1px
style E fill:#fff3cd,stroke:#856404,stroke-width:1px
style F fill:#fff3cd,stroke:#856404,stroke-width:1px
style G fill:#fff3cd,stroke:#856404,stroke-width:1px
style H fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
style I fill:#51cf66,stroke:#2b8a3e,stroke-width:2px
Figure 22.1: Production Gap - tantangan yang dihadapi saat membawa model dari Jupyter Notebook ke production
Statistik Mengejutkan:
87% model ML tidak pernah masuk production (VentureBeat, 2023)
80% data science project gagal deliver business value
Average time dari prototype ke production: 6-12 bulan
⚠️ Common Pitfalls
Kesalahan Umum:
“Works on my machine” syndrome
Tidak ada model versioning
Manual deployment process
Tidak ada monitoring
Hardcoded configurations
Tidak scalable
11.1.2 Apa itu MLOps?
MLOps = Machine Learning Operations: Praktik dan kultur untuk mengotomasi dan mempercepat ML lifecycle dari development hingga production.
Reproducibility: Dapat reproduce hasil experiments
Collaboration: Data scientists, engineers, operations bekerja sama
11.1.3 ML Lifecycle
End-to-End ML Lifecycle:
Code
graph TB A[Problem Definition] --> B[Data Collection] B --> C[EDA & Analysis] C --> D[Data Preprocessing] D --> E[Feature Engineering] E --> F[Model Training] F --> G[Model Evaluation] G --> H{Good Enough?} H -->|No| E H -->|Yes| I[Model Deployment] I --> J[Monitoring] J --> K{Drift Detected?} K -->|Yes| B K -->|No| J style A fill:#ffd93d style I fill:#6bcf7f style J fill:#4d96ff style K fill:#ff6b6b
graph TB
A[Problem Definition] --> B[Data Collection]
B --> C[EDA & Analysis]
C --> D[Data Preprocessing]
D --> E[Feature Engineering]
E --> F[Model Training]
F --> G[Model Evaluation]
G --> H{Good Enough?}
H -->|No| E
H -->|Yes| I[Model Deployment]
I --> J[Monitoring]
J --> K{Drift Detected?}
K -->|Yes| B
K -->|No| J
style A fill:#ffd93d
style I fill:#6bcf7f
style J fill:#4d96ff
style K fill:#ff6b6b
Perbandingan Traditional Software vs ML Systems:
Aspek
Traditional Software
ML Systems
Code
Deterministic
Probabilistic
Testing
Unit tests cukup
Need data validation, model tests
Deployment
Deploy code
Deploy code + data + model
Monitoring
Log errors
Monitor metrics + data drift + model drift
Updates
Fix bugs
Retrain dengan data baru
Dependencies
Libraries
Libraries + data + infrastructure
11.1.4 MLOps Maturity Levels
Level 0: Manual Process - Everything manual - Jupyter notebooks - No automation - Problem: Tidak scalable, tidak reproducible
Level 1: ML Pipeline Automation - Automated training pipeline - Model versioning - Basic CI/CD - Better: Reproducible, tapi deployment masih manual
Level 2: CI/CD Pipeline Automation - Automated training, testing, deployment - Monitoring dan alerts - Data validation - Production-ready: Full automation
Di chapter ini, kita akan belajar fondasi untuk mencapai Level 2 MLOps:
Container-based deployment (Docker)
REST API untuk model serving
Model optimization (ONNX)
Basic monitoring
11.2 Containerization dengan Docker
11.2.1 Mengapa Docker untuk ML?
Problem: “Works on my machine”
Data Scientist: "Model saya works perfectly!"
Engineer: "Di server production error terus..."
DevOps: "Python version berbeda, library incompatible..."
Solution: Docker Containers
Docker memastikan environment consistency dari development hingga production.
Analogi: Shipping Container
Seperti shipping container di pelabuhan:
Standardized: Same format, works everywhere
Isolated: Content tidak affect luar
Portable: Bisa pindah dari truck → ship → train
Code
graph LR A[Development<br/>Laptop] -->|Docker Image| B[Testing<br/>Server] B -->|Same Image| C[Production<br/>Cloud] C -->|Same Image| D[Edge<br/>Device] style A fill:#4ecdc4 style B fill:#45b7d1 style C fill:#51cf66 style D fill:#ffd93d
graph LR
A[Development<br/>Laptop] -->|Docker Image| B[Testing<br/>Server]
B -->|Same Image| C[Production<br/>Cloud]
C -->|Same Image| D[Edge<br/>Device]
style A fill:#4ecdc4
style B fill:#45b7d1
style C fill:#51cf66
style D fill:#ffd93d
Keuntungan Docker untuk ML:
Reproducibility: Exact same environment
Isolation: Dependencies tidak conflict
Portability: Run anywhere (local, cloud, edge)
Scalability: Easy to scale dengan orchestration (Kubernetes)
Version Control: Docker images dapat di-version
11.2.2 Docker Basics untuk ML
Key Concepts:
Image: Template untuk container (like a class)
Container: Running instance of image (like an object)
Dockerfile: Recipe untuk build image
Docker Hub: Registry untuk share images
Basic Docker Commands:
# Build imagedocker build -t my-ml-model:v1 .# Run containerdocker run -p 8000:8000 my-ml-model:v1# List containersdocker ps# Stop containerdocker stop <container-id># Remove containerdocker rm <container-id># List imagesdocker images# Remove imagedocker rmi my-ml-model:v1
11.2.3 Dockerfile untuk ML Model
Example: Dockerfile untuk Scikit-learn Model
# Base image dengan PythonFROM python:3.10-slim# Set working directoryWORKDIR /app# Copy requirementsCOPY requirements.txt .# Install dependenciesRUNpip install --no-cache-dir-r requirements.txt# Copy application codeCOPY . .# Expose portEXPOSE 8000# Run applicationCMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Penjelasan:
FROM: Base image (Python 3.10 slim version)
WORKDIR: Set working directory di container
COPY: Copy files dari host ke container
RUN: Execute commands (install dependencies)
EXPOSE: Document port yang digunakan
CMD: Default command saat container start
Best Practices Dockerfile:
# ✅ Good: Multi-stage build untuk size optimizationFROM python:3.10-slim as builderWORKDIR /app# Install dependencies in builder stageCOPY requirements.txt .RUNpip install --user--no-cache-dir-r requirements.txt# Runtime stageFROM python:3.10-slimWORKDIR /app# Copy only necessary filesCOPY--from=builder /root/.local /root/.localCOPY main.py .COPY model.pkl .# Make sure scripts in .local are usableENV PATH=/root/.local/bin:$PATHEXPOSE 8000# Use non-root user untuk securityRUNuseradd-m myuserUSER myuserCMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
# Start all servicesdocker-compose up -d# View logsdocker-compose logs -f ml-api# Stop all servicesdocker-compose down# Stop and remove volumesdocker-compose down -v
11.3 Model Serving dengan FastAPI
11.3.1 Mengapa FastAPI?
FastAPI adalah modern, fast web framework untuk building APIs dengan Python.
Keunggulan FastAPI:
Performance: Setara dengan NodeJS dan Go (thanks to Starlette + Pydantic)
Type Hints: Automatic validation dan documentation
Async Support: Native async/await support
Auto Docs: Interactive API docs (Swagger UI)
Easy to Learn: Intuitive, mirip Flask tapi lebih powerful
Performance Comparison:
import matplotlib.pyplot as pltimport numpy as npframeworks = ['Flask', 'Django', 'FastAPI', 'Go Gin', 'NodeJS\nExpress']requests_per_sec = [1200, 800, 18000, 25000, 15000]colors = ['#3498db', '#e74c3c', '#2ecc71', '#f39c12', '#9b59b6']fig, ax = plt.subplots(figsize=(12, 6))bars = ax.barh(frameworks, requests_per_sec, color=colors, alpha=0.7)# Add value labelsfor i, (bar, val) inenumerate(zip(bars, requests_per_sec)): ax.text(val +500, i, f'{val:,}', va='center', fontweight='bold')ax.set_xlabel('Requests per Second', fontsize=12, fontweight='bold')ax.set_title('Web Framework Performance Comparison\n(Higher is Better)', fontsize=14, fontweight='bold')ax.grid(axis='x', alpha=0.3)# Highlight FastAPIbars[2].set_edgecolor('green')bars[2].set_linewidth(3)plt.tight_layout()plt.show()
11.3.2 Building ML API dengan FastAPI
Step 1: Train dan Save Model
Code
from sklearn.datasets import load_irisfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.model_selection import train_test_splitimport joblibimport pandas as pd# Load datairis = load_iris()X, y = iris.data, iris.target# Split dataX_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)# Train modelmodel = RandomForestClassifier(n_estimators=100, random_state=42)model.fit(X_train, y_train)# Evaluateaccuracy = model.score(X_test, y_test)print(f"Model Accuracy: {accuracy:.3f}")# Save modeljoblib.dump(model, 'models/iris_model.pkl')print("Model saved to models/iris_model.pkl")# Save feature names dan target names untuk APIimport jsonmetadata = {'feature_names': iris.feature_names,'target_names': iris.target_names.tolist(),'accuracy': float(accuracy)}withopen('models/metadata.json', 'w') as f: json.dump(metadata, f, indent=2)
Step 2: Create Pydantic Models untuk Validation
Code
from pydantic import BaseModel, Field, validatorfrom typing import Listclass IrisFeatures(BaseModel):"""Input features untuk Iris prediction""" sepal_length: float= Field(..., ge=0, le=10, description="Sepal length in cm") sepal_width: float= Field(..., ge=0, le=10, description="Sepal width in cm") petal_length: float= Field(..., ge=0, le=10, description="Petal length in cm") petal_width: float= Field(..., ge=0, le=10, description="Petal width in cm")@validator('*')def check_positive(cls, v):if v <0:raiseValueError('Value must be positive')return vclass PredictionResponse(BaseModel):"""Response format untuk prediction""" prediction: str probability: float all_probabilities: dict model_version: str="1.0.0"class HealthResponse(BaseModel):"""Health check response""" status: str model_loaded: bool version: str
Step 3: FastAPI Application (main.py)
Code
from fastapi import FastAPI, HTTPException, statusfrom fastapi.responses import JSONResponseimport joblibimport numpy as npimport jsonfrom pathlib import Pathimport logging# Setup logginglogging.basicConfig(level=logging.INFO)logger = logging.getLogger(__name__)# Initialize FastAPI appapp = FastAPI( title="Iris Classification API", description="ML API untuk prediksi species Iris flowers", version="1.0.0", docs_url="/docs", redoc_url="/redoc")# Global variablesmodel =Nonemetadata =None@app.on_event("startup")asyncdef load_model():"""Load model saat aplikasi start"""global model, metadatatry: model_path = Path("models/iris_model.pkl") metadata_path = Path("models/metadata.json")ifnot model_path.exists():raiseFileNotFoundError(f"Model file not found: {model_path}") model = joblib.load(model_path)if metadata_path.exists():withopen(metadata_path, 'r') as f: metadata = json.load(f) logger.info("Model loaded successfully") logger.info(f"Model accuracy: {metadata.get('accuracy', 'N/A')}")exceptExceptionas e: logger.error(f"Error loading model: {e}")raise@app.get("/", tags=["General"])asyncdef root():"""Root endpoint"""return {"message": "Welcome to Iris Classification API","docs": "/docs","health": "/health" }@app.get("/health", response_model=HealthResponse, tags=["General"])asyncdef health_check():"""Health check endpoint"""return {"status": "healthy"if model isnotNoneelse"unhealthy","model_loaded": model isnotNone,"version": "1.0.0" }@app.post("/predict", response_model=PredictionResponse, tags=["Prediction"])asyncdef predict(features: IrisFeatures):""" Predict Iris species berdasarkan features - **sepal_length**: Sepal length dalam cm - **sepal_width**: Sepal width dalam cm - **petal_length**: Petal length dalam cm - **petal_width**: Petal width dalam cm """try:# Prepare input input_data = np.array([[ features.sepal_length, features.sepal_width, features.petal_length, features.petal_width ]])# Predict prediction = model.predict(input_data)[0] probabilities = model.predict_proba(input_data)[0]# Get class names class_names = metadata.get('target_names', ['setosa', 'versicolor', 'virginica'])# Format responsereturn {"prediction": class_names[prediction],"probability": float(probabilities[prediction]),"all_probabilities": { class_names[i]: float(prob)for i, prob inenumerate(probabilities) },"model_version": "1.0.0" }exceptExceptionas e: logger.error(f"Prediction error: {e}")raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f"Prediction failed: {str(e)}" )@app.post("/batch_predict", tags=["Prediction"])asyncdef batch_predict(features_list: List[IrisFeatures]):"""Batch prediction untuk multiple samples"""try: predictions = []for features in features_list: input_data = np.array([[ features.sepal_length, features.sepal_width, features.petal_length, features.petal_width ]]) prediction = model.predict(input_data)[0] probabilities = model.predict_proba(input_data)[0] class_names = metadata.get('target_names', ['setosa', 'versicolor', 'virginica']) predictions.append({"prediction": class_names[prediction],"probability": float(probabilities[prediction]) })return {"predictions": predictions, "count": len(predictions)}exceptExceptionas e: logger.error(f"Batch prediction error: {e}")raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f"Batch prediction failed: {str(e)}" )@app.get("/model/info", tags=["Model"])asyncdef model_info():"""Get model information dan metadata"""if metadata isNone:raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail="Metadata not found" )return {"model_type": "RandomForestClassifier","accuracy": metadata.get('accuracy'),"features": metadata.get('feature_names'),"classes": metadata.get('target_names'),"version": "1.0.0" }# Exception handler@app.exception_handler(Exception)asyncdef global_exception_handler(request, exc): logger.error(f"Global exception: {exc}")return JSONResponse( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, content={"detail": "Internal server error"} )
Step 4: Run API
# Development mode (auto-reload)uvicorn main:app --reload--host 0.0.0.0 --port 8000# Production mode (multiple workers)uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
Step 5: Test API
# Health checkcurl http://localhost:8000/health# Single predictioncurl-X POST "http://localhost:8000/predict"\-H"Content-Type: application/json"\-d'{ "sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2 }'# Batch predictioncurl-X POST "http://localhost:8000/batch_predict"\-H"Content-Type: application/json"\-d'[ {"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}, {"sepal_length": 6.7, "sepal_width": 3.0, "petal_length": 5.2, "petal_width": 2.3} ]'# Model infocurl http://localhost:8000/model/info
Interactive API Documentation:
Buka browser: http://localhost:8000/docs
FastAPI automatically generates Swagger UI untuk testing API!
11.3.3 Flask Alternative (Simpler)
Untuk project sederhana, Flask juga bisa digunakan:
Code
from flask import Flask, request, jsonifyimport joblibimport numpy as npapp = Flask(__name__)# Load modelmodel = joblib.load('models/iris_model.pkl')@app.route('/predict', methods=['POST'])def predict():try: data = request.get_json() features = np.array([[ data['sepal_length'], data['sepal_width'], data['petal_length'], data['petal_width'] ]]) prediction = model.predict(features)[0] probability = model.predict_proba(features)[0]return jsonify({'prediction': int(prediction),'probability': float(probability[prediction]) })exceptExceptionas e:return jsonify({'error': str(e)}), 400if__name__=='__main__': app.run(host='0.0.0.0', port=8000, debug=False)
Perbandingan FastAPI vs Flask:
Aspek
FastAPI
Flask
Performance
⭐⭐⭐⭐⭐ (Async)
⭐⭐⭐ (Sync)
Type Validation
✅ Built-in (Pydantic)
❌ Manual
Auto Documentation
✅ Swagger + ReDoc
❌ Need extensions
Async Support
✅ Native
⚠️ Via extensions
Learning Curve
Medium
Easy
Best For
Production APIs
Prototypes, simple apps
💡 Recommendation
Prototype cepat: Flask
Production API: FastAPI
Legacy systems: Flask (banyak resources)
High performance: FastAPI
11.4 Model Optimization dengan ONNX
11.4.1 Apa itu ONNX?
ONNX = Open Neural Network Exchange: Open format untuk representing ML models.
Problem yang Dipecahkan:
Code
graph LR A[Train in<br/>PyTorch] --> B{Deploy where?} B --> C[TensorFlow<br/>Serving] B --> D[ONNX<br/>Runtime] B --> E[Mobile<br/>CoreML] B --> F[Edge<br/>TensorRT] style B fill:#ff6b6b style D fill:#51cf66
graph LR
A[Train in<br/>PyTorch] --> B{Deploy where?}
B --> C[TensorFlow<br/>Serving]
B --> D[ONNX<br/>Runtime]
B --> E[Mobile<br/>CoreML]
B --> F[Edge<br/>TensorRT]
style B fill:#ff6b6b
style D fill:#51cf66
Tanpa ONNX: Need to convert manually, atau stuck dengan satu framework.
Dengan ONNX:
Train di framework favorit (PyTorch, TensorFlow, scikit-learn)
Process multiple samples together pada scheduled intervals.
Use Cases:
Daily recommendations
Periodic fraud detection
Nightly data processing
Batch ETL pipelines
Architecture:
Code
graph LR A[Data Lake] --> B[Batch Job<br/>Scheduler] B --> C[Load Data] C --> D[Model Inference] D --> E[Save Predictions] E --> F[(Results DB)] style B fill:#ffd93d style D fill:#51cf66
graph LR
A[Data Lake] --> B[Batch Job<br/>Scheduler]
B --> C[Load Data]
C --> D[Model Inference]
D --> E[Save Predictions]
E --> F[(Results DB)]
style B fill:#ffd93d
style D fill:#51cf66
Example: Batch Prediction Script
Code
import pandas as pdimport joblibfrom datetime import datetimeimport logging# Setup logginglogging.basicConfig(level=logging.INFO)logger = logging.getLogger(__name__)def batch_predict(input_csv, output_csv, model_path):""" Batch prediction untuk large datasets """# Load model logger.info(f"Loading model from {model_path}") model = joblib.load(model_path)# Load data in chunks (untuk large files) logger.info(f"Processing {input_csv}") chunk_size =10000 chunks = []for chunk in pd.read_csv(input_csv, chunksize=chunk_size):# Predict predictions = model.predict(chunk) probabilities = model.predict_proba(chunk).max(axis=1)# Add predictions to chunk chunk['prediction'] = predictions chunk['confidence'] = probabilities chunk['predicted_at'] = datetime.now() chunks.append(chunk) logger.info(f"Processed {len(chunk)} rows")# Combine all chunks result = pd.concat(chunks, ignore_index=True)# Save results result.to_csv(output_csv, index=False) logger.info(f"Results saved to {output_csv}") logger.info(f"Total rows processed: {len(result)}")return result# Run batch jobif__name__=="__main__": batch_predict( input_csv="data/new_samples.csv", output_csv="results/predictions.csv", model_path="models/iris_model.pkl" )
Scheduling dengan Cron (Linux):
# Run setiap hari jam 2 pagi0 2 *** /usr/bin/python3 /path/to/batch_predict.py
Real-time Inference:
Process requests immediately saat user request.
Use Cases:
Web applications
Mobile apps
Chatbots
Real-time recommendations
Fraud detection (transaction time)
Architecture:
Code
graph LR A[User Request] --> B[Load Balancer] B --> C[API Server 1] B --> D[API Server 2] B --> E[API Server N] C --> F[Model] D --> F E --> F F --> G[Response] style B fill:#4ecdc4 style F fill:#51cf66
graph LR
A[User Request] --> B[Load Balancer]
B --> C[API Server 1]
B --> D[API Server 2]
B --> E[API Server N]
C --> F[Model]
D --> F
E --> F
F --> G[Response]
style B fill:#4ecdc4
style F fill:#51cf66
Perbandingan:
Aspek
Batch
Real-time
Latency
Hours/days
Milliseconds
Throughput
Very high
Lower
Cost
Lower (off-peak)
Higher (always on)
Complexity
Simple
Complex (scaling, monitoring)
Use Case
Offline analytics
User-facing apps
11.5.2 Cloud Deployment Options
Option 1: Platform as a Service (PaaS)
Examples:
Google Cloud AI Platform
AWS SageMaker
Azure ML
Hugging Face Inference API
Pros:
Managed infrastructure
Auto-scaling
Built-in monitoring
Easy deployment
Cons:
Higher cost
Vendor lock-in
Less control
Example: Deploy ke Hugging Face Spaces (Free!)
# app.py - Gradio interfaceimport gradio as grimport joblibimport numpy as npmodel = joblib.load('iris_model.pkl')def predict(sepal_length, sepal_width, petal_length, petal_width): features = np.array([[sepal_length, sepal_width, petal_length, petal_width]]) pred = model.predict(features)[0] prob = model.predict_proba(features)[0] classes = ['Setosa', 'Versicolor', 'Virginica']return {classes[i]: float(prob[i]) for i inrange(3)}iface = gr.Interface( fn=predict, inputs=[ gr.Number(label="Sepal Length"), gr.Number(label="Sepal Width"), gr.Number(label="Petal Length"), gr.Number(label="Petal Width") ], outputs=gr.Label(label="Prediction"), title="Iris Classifier", description="Predict Iris species")iface.launch()
import jsonimport joblibimport numpy as npimport boto3# Load model from S3 (di cold start)s3 = boto3.client('s3')s3.download_file('my-bucket', 'models/model.pkl', '/tmp/model.pkl')model = joblib.load('/tmp/model.pkl')def lambda_handler(event, context):"""AWS Lambda handler function"""try:# Parse input body = json.loads(event['body']) features = np.array([[ body['sepal_length'], body['sepal_width'], body['petal_length'], body['petal_width'] ]])# Predict prediction = model.predict(features)[0] probability = model.predict_proba(features)[0]# Return responsereturn {'statusCode': 200,'body': json.dumps({'prediction': int(prediction),'probability': float(probability[prediction]) }) }exceptExceptionas e:return {'statusCode': 500,'body': json.dumps({'error': str(e)}) }
11.5.3 Edge Deployment
Edge Computing: Run ML models on device (smartphone, IoT, edge servers).
Why Edge? - Low latency: No network roundtrip - Privacy: Data stays on device - Offline capability: Works tanpa internet - Bandwidth saving: No data upload
flowchart TD A[Monitor Metrics] --> B{Performance<br/>Degraded?} B -->|No| A B -->|Yes| C{Data Drift<br/>Detected?} C -->|Yes| D[Collect New Data] C -->|No| E[Check Data Quality] D --> F[Retrain Model] E --> G{Data Quality<br/>OK?} G -->|Yes| H[Investigate Code] G -->|No| I[Fix Data Pipeline] F --> J[Evaluate New Model] J --> K{Better than<br/>Current?} K -->|Yes| L[Deploy New Model] K -->|No| M[Keep Current Model<br/>+ Investigate] L --> A M --> A style B fill:#ffd93d style C fill:#ffd93d style F fill:#51cf66 style L fill:#51cf66
flowchart TD
A[Monitor Metrics] --> B{Performance<br/>Degraded?}
B -->|No| A
B -->|Yes| C{Data Drift<br/>Detected?}
C -->|Yes| D[Collect New Data]
C -->|No| E[Check Data Quality]
D --> F[Retrain Model]
E --> G{Data Quality<br/>OK?}
G -->|Yes| H[Investigate Code]
G -->|No| I[Fix Data Pipeline]
F --> J[Evaluate New Model]
J --> K{Better than<br/>Current?}
K -->|Yes| L[Deploy New Model]
K -->|No| M[Keep Current Model<br/>+ Investigate]
L --> A
M --> A
style B fill:#ffd93d
style C fill:#ffd93d
style F fill:#51cf66
style L fill:#51cf66
Automated Retraining Pipeline:
Code
from datetime import datetime, timedeltaimport scheduleclass AutoRetrainingPipeline:def__init__(self, model_trainer, performance_threshold: float=0.85):self.model_trainer = model_trainerself.performance_threshold = performance_thresholdself.last_retrain = datetime.now()def should_retrain(self, current_metrics: dict) ->bool:"""Decide if retraining is needed"""# Rule 1: Performance degradationif current_metrics['accuracy'] <self.performance_threshold:returnTrue# Rule 2: Time-based (every 30 days)if (datetime.now() -self.last_retrain) > timedelta(days=30):returnTrue# Rule 3: Data drift detectedif current_metrics.get('data_drift', False):returnTruereturnFalsedef retrain_and_deploy(self):"""Full retraining and deployment pipeline"""print("🔄 Starting automated retraining...")# 1. Fetch new data new_data =self.fetch_new_training_data()# 2. Train new model new_model =self.model_trainer.train(new_data)# 3. Evaluate metrics =self.evaluate_model(new_model)# 4. Compare dengan current modelif metrics['accuracy'] >self.get_current_model_accuracy():print("✅ New model better, deploying...")self.deploy_model(new_model)self.last_retrain = datetime.now()else:print("⚠️ New model not better, keeping current model")def fetch_new_training_data(self):"""Fetch data from production logs"""# Implementation: query database, S3, etc.passdef evaluate_model(self, model):"""Evaluate model on test set"""# Implementationpassdef deploy_model(self, model):"""Deploy new model to production"""# Implementation: save to S3, update model server, etc.pass# Schedule retraining checkspipeline = AutoRetrainingPipeline(model_trainer=...)# Check every dayschedule.every().day.at("02:00").do(lambda: pipeline.retrain_and_deploy()if pipeline.should_retrain(get_current_metrics())elseNone)
11.7 Model Versioning & Experiment Tracking
11.7.1 Mengapa Model Versioning?
Scenario tanpa versioning:
Engineer: "Model production bermasalah!"
Data Scientist: "Coba rollback ke versi sebelumnya"
Engineer: "Yang mana? Ada 10 file model.pkl..."
Data Scientist: "Yang... accuracy 95%?"
Engineer: "Semuanya 95%..." 😭
With proper versioning:
Engineer: "Rollback ke model v1.2.3"
System: "Deployed model v1.2.3 (trained 2024-01-15, accuracy 95.3%)"
Engineer: "Problem solved!" ✅
# Load model by versionmodel_version ="1"model = mlflow.pyfunc.load_model(f"models:/iris-classifier/{model_version}")# Or load best model based on metricfrom mlflow.tracking import MlflowClientclient = MlflowClient()runs = client.search_runs( experiment_ids=["1"], order_by=["metrics.accuracy DESC"], max_results=1)best_run_id = runs[0].info.run_idbest_model = mlflow.sklearn.load_model(f"runs:/{best_run_id}/model")
11.7.3 Git untuk Model Versioning (DVC)
DVC (Data Version Control): Git untuk data dan models.
# Initialize DVCdvc init# Add model to DVCdvc add models/iris_model.pkl# This creates models/iris_model.pkl.dvc file# Commit to Gitgit add models/iris_model.pkl.dvc .gitignoregit commit -m"Add model v1.0.0"git tag -a v1.0.0 -m"Model version 1.0.0"# Push model to remote storage (S3, GCS, etc.)dvc remote add -d storage s3://my-bucket/dvc-storagedvc push# Pull specific versiongit checkout v1.0.0dvc pull
11.8 CI/CD untuk ML (MLOps Pipeline)
11.8.1 Traditional CI/CD vs ML CI/CD
Traditional Software CI/CD:
Code → Build → Test → Deploy
ML CI/CD (includes CT - Continuous Training):
Code + Data → Build → Test → Train → Evaluate → Deploy → Monitor → Retrain
11.8.2 Example: GitHub Actions untuk ML
.github/workflows/ml-pipeline.yml:
name: ML Pipelineon:push:branches:[main]pull_request:branches:[main]schedule: # Retrain weekly-cron:'0 0 * * 0'jobs:data-validation:runs-on: ubuntu-lateststeps:-uses: actions/checkout@v3-name: Setup Pythonuses: actions/setup-python@v4with:python-version:'3.10'-name: Install dependencies run: | pip install -r requirements.txt-name: Validate data run: | python scripts/validate_data.py-name: Check data drift run: | python scripts/check_drift.pytrain-model:needs: data-validationruns-on: ubuntu-lateststeps:-uses: actions/checkout@v3-name: Setup Pythonuses: actions/setup-python@v4with:python-version:'3.10'-name: Install dependencies run: | pip install -r requirements.txt-name: Train model run: | python scripts/train_model.py-name: Evaluate model run: | python scripts/evaluate_model.py-name: Upload model artifactuses: actions/upload-artifact@v3with:name: trained-modelpath: models/test-api:needs: train-modelruns-on: ubuntu-lateststeps:-uses: actions/checkout@v3-name: Download modeluses: actions/download-artifact@v3with:name: trained-modelpath: models/-name: Build Docker image run: | docker build -t ml-api:test .-name: Run API tests run: | docker run -d -p 8000:8000 ml-api:test sleep 10 python tests/test_api.pydeploy:needs: test-apiruns-on: ubuntu-latestif: github.ref == 'refs/heads/main'steps:-uses: actions/checkout@v3-name: Download modeluses: actions/download-artifact@v3with:name: trained-modelpath: models/-name: Deploy to production run: | # Deploy to cloud (AWS, GCP, Azure) # Or update Kubernetes deployment echo "Deploying to production..."
11.8.3 Model Testing
Different levels of testing:
Code
import pytestimport numpy as npimport joblib# 1. Unit Tests: Individual componentsdef test_preprocessing():"""Test preprocessing function"""from src.preprocessing import normalize_features data = np.array([[1, 2], [3, 4]]) normalized = normalize_features(data)assert normalized.mean() <0.1assert normalized.std() <1.1# 2. Model Tests: Model behaviordef test_model_predictions():"""Test model produces valid predictions""" model = joblib.load('models/iris_model.pkl')# Test input X_test = np.array([[5.1, 3.5, 1.4, 0.2]])# Predict prediction = model.predict(X_test) probabilities = model.predict_proba(X_test)# Assertionsassert prediction.shape == (1,)assert prediction[0] in [0, 1, 2]assert probabilities.shape == (1, 3)assert np.isclose(probabilities.sum(), 1.0)# 3. Data Tests: Input validationdef test_data_schema():"""Test input data matches expected schema"""import pandas as pd data = pd.read_csv('data/new_data.csv')# Check columns expected_columns = ['sepal_length', 'sepal_width','petal_length', 'petal_width']assertall(col in data.columns for col in expected_columns)# Check dtypesassert data[expected_columns].dtypes.apply(lambda x: np.issubdtype(x, np.number) ).all()# Check rangesassert (data[expected_columns] >=0).all().all()assert (data[expected_columns] <=10).all().all()# 4. Performance Tests: Accuracy thresholddef test_model_accuracy():"""Test model meets accuracy threshold""" model = joblib.load('models/iris_model.pkl')from sklearn.datasets import load_irisfrom sklearn.metrics import accuracy_score X, y = load_iris(return_X_y=True) predictions = model.predict(X) accuracy = accuracy_score(y, predictions)# Assert minimum accuracyassert accuracy >=0.90, f"Model accuracy {accuracy} below threshold"# 5. Integration Tests: API endpointdef test_api_prediction():"""Test API returns correct format"""import requests response = requests.post("http://localhost:8000/predict", json={"sepal_length": 5.1,"sepal_width": 3.5,"petal_length": 1.4,"petal_width": 0.2 } )assert response.status_code ==200 data = response.json()assert'prediction'in dataassert'probability'in data# Run tests# pytest tests/ -v
11.9 Best Practices Summary
11.9.1 Development Best Practices
✅ Checklist: Model Development
Before Training:
During Training:
After Training:
11.9.2 Deployment Best Practices
✅ Checklist: Deployment
Pre-deployment:
During Deployment:
Post-deployment:
11.9.3 Security Best Practices
🔐 Security Checklist
Application Security:
Infrastructure Security:
Data Security:
11.10 Rangkuman
📚 Chapter Summary
1. MLOps Fundamentals:
MLOps = DevOps + Data + ML
Automate ML lifecycle
3 maturity levels: Manual → Pipeline → Full CI/CD
2. Containerization:
Docker untuk environment consistency
Dockerfile best practices
Docker Compose untuk multi-service apps
3. Model Serving:
FastAPI untuk production APIs
Flask untuk prototypes
REST API design principles
4. ONNX Optimization:
Framework-agnostic model format
Performance improvements
Cross-platform deployment
5. Deployment Strategies:
Batch vs Real-time
Cloud (PaaS, Kubernetes, Serverless)
Edge deployment
6. Monitoring:
Model performance metrics
Data drift detection
System metrics
Automated alerting
7. Model Versioning:
MLflow experiment tracking
DVC untuk data/model versioning
Git workflow
8. CI/CD:
Automated testing
Continuous training
Deployment automation
Key Takeaways
Production ≠ Jupyter Notebook: Need proper engineering practices