Bab 14: Panduan Capstone Project & Best Practices

Dari Ideasi hingga Implementasi: Merencanakan, Mengeksekusi, dan Mengkomunikasikan Proyek ML Profesional

Bab 14: Panduan Capstone Project & Best Practices

🎯 Hasil Pembelajaran (Learning Outcomes)

Setelah menyelesaikan bab ini, Anda akan mampu:

Merencanakan proyek ML produksi dengan scope yang jelas dan timeline realistis
Merumuskan masalah dengan baik melalui problem statement yang terukur (SMART criteria)
Merancang strategi data yang komprehensif dari koleksi hingga validasi
Membangun baseline model dan iterasi sistematis menuju target performance
Mengevaluasi dan memvalidasi model dengan metrik yang sesuai use case
Mendokumentasikan proyek dengan standar profesional dan reproducibility
Mempresentasikan project findings dan insights dengan jelas kepada stakeholder
Mengidentifikasi dan menghindari pitfall umum yang sering terjadi dalam capstone project

14.1 Project Planning & Scoping

14.1.1 Mengapa Scoping Penting?

Banyak student menghabiskan waktu untuk modeling, sementara planning dan scoping sering diabaikan. Padahal, poor scoping adalah penyebab utama project failure.

Statistik Mencolok:

45% capstone projects gagal deliver hasil meaningful karena scope tidak jelas
60% students underestimate timeline di awal project
38% mencoba problem yang terlalu ambitious untuk 1 semester

⚠️ Common Mistakes

Scope Creep: Mulai dengan scope jelas, terus bertambah fitur baru
Underestimating Complexity: “Sepertinya mudah” → ternyata butuh 3x lebih lama
Fixing Scope, Not Timeline: Tenggat waktu diperas, hasil jadi jelek
Tidak Ada Minimum Viable Product (MVP): Semua atau tidak ada

14.1.2 Project Scoping Framework

Step 1: Tentukan Tujuan Proyek

Proyek bukan hanya tentang ANDA, tapi tentang VALUE yang akan diberikan.

Pertanyaan Kunci:

Siapa stakeholder utama?
Masalah apa yang dipecahkan?
Bagaimana kesuksesan diukur?
Apa timeline yang realistis?

Contoh Good Scope vs Bad Scope:

Bad Scope	Good Scope
“Build a ML model for sentiment analysis”	“Build sentiment analysis model untuk classify customer feedback (positive/negative/neutral) dengan target accuracy 85%, untuk membantu customer service team prioritize complaints”
“Predict stock prices”	“Build LSTM model untuk predict intraday price movements (+/- 2% threshold) menggunakan 6 bulan historical data, untuk identify trading opportunities dengan risk-adjusted returns”
“Image classification”	“Classify malware vs benign Windows executable files dengan 90% recall untuk security screening, menggunakan image representation dari binary files”

Step 2: Define Success Criteria (SMART)

Specific: Jelas dan terdefinisi
Measurable: Bisa diukur dengan metrik konkret
Achievable: Realistis dengan resources yang tersedia
Relevant: Penting bagi stakeholder
Time-bound: Ada deadline yang jelas

Contoh SMART Criteria:

❌ Bad: "Model harus akurat"

✅ Good: "Achieve 85%+ accuracy pada test set dengan balanced
dataset (n=5000 samples) menggunakan Random Forest,
validation dilakukan dengan 5-fold cross-validation,
deadline 31 Desember 2024"

Step 3: Risk Assessment

Identifikasi potential blockers SEBELUM dimulai:

Risk	Probability	Impact	Mitigation
Data tidak tersedia	Medium	Critical	Request dari company 2 minggu sebelumnya
Data quality buruk	High	Medium	Plan intensive data cleaning phase
Model susah konvergen	Medium	High	Research state-of-art papers, test multiple algorithms early

14.1.3 Timeline & Milestone Planning

Struktur Timeline Capstone (1 semester = 16 minggu):

Weeks 1-2:   Problem Definition + Planning (10%)
Weeks 3-5:   Data Collection & EDA (15%)
Weeks 6-8:   Feature Engineering & Preprocessing (15%)
Weeks 9-11:  Model Development & Experimentation (25%)
Weeks 12-13: Evaluation & Optimization (15%)
Weeks 14-15: Documentation & Presentation Prep (15%)
Week 16:     Final Presentation & Submission (5%)

Critical Milestones:

📋 Project Milestones Checklist

Month 1: Project Setup (Due: Week 4) - [ ] Problem statement finalized dan approved - [ ] Stakeholder identified - [ ] Preliminary data assessment done - [ ] Team roles & responsibilities defined - [ ] Git repository setup dengan proper structure

Month 2: Data & Baseline (Due: Week 8) - [ ] Dataset collected dan cleaned - [ ] EDA report completed - [ ] Data splits (train/val/test) finalized - [ ] Baseline model implemented - [ ] Evaluation metrics selected

Month 3: Model Development (Due: Week 12) - [ ] 3+ models trained dan compared - [ ] Hyperparameter tuning completed - [ ] Best model selected - [ ] Cross-validation done - [ ] Model card drafted

Final 2 Weeks: Finalization - [ ] Documentation complete - [ ] Code cleaned & tested - [ ] Demo prepared - [ ] Presentation slides ready

14.2 Problem Formulation

14.2.1 Anatomy of Good Problem Statement

Komponen penting:

Context: Latar belakang dan business case
Problem: Apa yang perlu dipecahkan
Data: Apa dan berapa banyak data yang tersedia
Success Metrics: Bagaimana kesuksesan diukur
Constraints: Batasan teknis dan non-teknis

Contoh Problem Statement:

CONTEXT:
GrowthBank melayani 50,000+ customers B2B dengan
average loan size Rp 500 juta. Manual credit approval
memakan 3-5 hari dan memiliki default rate 8%.

PROBLEM:
Otomasi credit scoring process untuk mengurangi approval
time menjadi <24 jam dan default rate menjadi <5%,
dengan tetap mempertahankan customer satisfaction.

DATA:
- 10,000 historical loans (2018-2023)
- 40+ features: company profile, financials, payment history
- 5% data missing (handled appropriately)

SUCCESS METRICS:
1. Model accuracy: 85%+ pada test set
2. Default recall: 90% (catch bad borrowers)
3. Processing speed: <5 seconds per application
4. Interpretability: Top 5 important features identifiable

CONSTRAINTS:
- Data privacy: PII must be removed/encrypted
- Latency: Must respond in <5 sec
- Availability: 99% uptime required
- Fairness: No discrimination against protected groups

14.2.2 Problem Type Classification

Classification:

Binary (yes/no, churn/stay, fraud/legitimate)
Multi-class (sentiment: positive/neutral/negative)
Multi-label (music genres: rock, pop, jazz simultaneously)

Regression:

Continuous values (price, temperature, traffic volume)
Time series (stock price prediction, demand forecasting)

Ranking/Recommendation:

Prioritize items (search ranking, recommendation system)
Matching (matching job seekers to jobs)

Anomaly Detection:

Outlier detection (fraud, system intrusion, equipment failure)
Novelty detection (new attack types)

Clustering:

Customer segmentation
Document clustering

Choosing the right problem type determines:

Data requirements
Metrics selection
Algorithm choices
Evaluation approach

14.2.3 Defining Metrics

💡 Best Practice: Match Metrics to Business Goals

“60% accuracy” tidak berarti apa-apa. Metrics harus: 1. Aligned dengan business KPI 2. Interpretable (bukan hanya untuk statistician) 3. Actionable (bisa disambung dengan keputusan)

Contoh Metric Selection:

PROBLEM: Fraud Detection
↓
BUSINESS GOAL: Catch 95% frauds, minimize false positives
↓
METRICS: Recall=95% (catch fraud), Precision high
         (avoid false alarms)
↓
IMPLEMENTATION: Select threshold yang maximize F2-score
                (2x weight pada recall)

Common Metrics by Problem Type:

Problem Type	Primary Metric	Secondary Metrics
Classification	Accuracy (balanced), Precision, Recall, F1	AUC-ROC, Confusion Matrix
Imbalanced	Precision, Recall, F1-score, AUC	Sensitivity, Specificity
Regression	MAE, RMSE	R², MAPE
Ranking	NDCG, MAP	MRR, Recall@K
Clustering	Silhouette Score	Davies-Bouldin Index
Anomaly	Detection Rate, False Positive Rate	Precision@K, AUROC

Caution on Single Metric:

# ❌ Don't do this
if accuracy > 0.85:
    print("Model is good!")

# ✅ Do this instead
metrics = {
    'accuracy': 0.85,
    'precision': 0.82,
    'recall': 0.88,
    'f1': 0.85,
    'auc_roc': 0.89
}

# Interpret holistically
print(f"High recall (0.88) → catches most positives")
print(f"OK precision (0.82) → some false alarms acceptable")
print(f"Balanced F1 (0.85) → good overall trade-off")

14.3 Data Strategy

14.3.1 Data Collection Plan

Template: Data Collection Checklist

📋 Data Collection Planning

Data Source - [ ] Source identified (API, database, CSV, web scraping) - [ ] Access obtained (permissions, credentials) - [ ] Data freshness understood (real-time, daily, monthly) - [ ] Size confirmed (n samples × m features)

Data Quality Assessment - [ ] Missing values documented (<5% acceptable) - [ ] Duplicates checked - [ ] Outliers identified - [ ] Data type validation done - [ ] Value ranges reasonable

Data Privacy & Ethics - [ ] PII removal/anonymization done - [ ] GDPR/compliance checked - [ ] Bias in data identified - [ ] Consent obtained (if needed) - [ ] Data retention policy defined

Data Documentation - [ ] Data dictionary created (each feature explained) - [ ] Data quality report generated - [ ] Collection date/period documented - [ ] Known issues documented

Contoh: Data Dictionary

Feature Name: transaction_amount
├─ Type: float64
├─ Unit: Indonesian Rupiah (IDR)
├─ Range: 10,000 - 999,999,999
├─ Missing: 0.2% (handled by median imputation)
├─ Distribution: Right-skewed (log-transform applied)
├─ Source: transaction_table.amount
└─ Notes: 3 outliers > 999M (verified, kept)

Feature Name: customer_age
├─ Type: int64
├─ Unit: Years
├─ Range: 18 - 75
├─ Missing: 1.5% (filled with median)
├─ Distribution: Relatively uniform
├─ Source: customer_table.age
└─ Notes: Some suspicious values (999), filtered out

14.3.2 Exploratory Data Analysis (EDA) Structure

Layered EDA Approach:

Univariate Analysis (1 variable at a time)
- Distribution, central tendency, spread
- Outliers, skewness, missing values
Bivariate Analysis (2 variables)
- Correlation dengan target
- Feature relationships
- Potential interactions
Multivariate Analysis (3+ variables)
- Feature correlations
- Clustering patterns
- Domain insights

EDA Outputs to Document:

Checklist untuk EDA Report:
- [ ] Dataset shape dan basic info
- [ ] Missing values visualization & handling
- [ ] Distributions (histograms, KDE plots)
- [ ] Outliers identified & approach decided
- [ ] Correlation heatmap & top correlated features
- [ ] Feature importance from EDA
- [ ] Class imbalance (if classification)
- [ ] Data quality issues & resolutions
- [ ] Key insights & hypotheses
- [ ] Feature engineering ideas

14.3.3 Data Preparation Workflow

Raw Data
   ↓
[Clean] → Remove duplicates, fix obvious errors
   ↓
[Transform] → Handle missing, encode categorical, scale
   ↓
[Validate] → Check quality, range, distribution
   ↓
[Split] → Train (70%) / Validation (15%) / Test (15%)
   ↓
[Document] → Version data, document transformations
   ↓
Ready for Modeling

Key Decision Points:

Decision	Options	Trade-offs
Missing Values	Drop / Impute (mean/median/KNN)	Lose data vs bias
Categorical Encoding	One-hot / Label / Ordinal	Sparsity vs information
Feature Scaling	StandardScaler / MinMaxScaler / RobustScaler	Interpretability vs performance
Imbalanced Data	Oversample / Undersample / SMOTE	Overfitting vs underfitting

14.4 Baseline & Iteration

14.4.1 Establishing Baseline

“Baseline” = Simplest possible model untuk problem Anda.

💡 Why Baseline Matters

Baseline bukan tentang performa tinggi. Baseline adalah: 1. Sanity check → Model Anda better than baseline? 2. Reference point → Berapa improvement dari baseline? 3. Proof of concept → Apakah problem solvable dengan ML?

Baseline Ideas by Problem Type:

CLASSIFICATION:
├─ Majority class (predict always positive/negative)
├─ Random classifier (50% untuk binary)
├─ Logistic Regression
└─ Decision Tree

REGRESSION:
├─ Mean predictor (always predict mean)
├─ Median predictor
├─ Linear Regression
└─ Decision Tree

RANKING:
├─ Random ranking
├─ Popularity ranking
└─ TF-IDF based ranking

Example: Fraud Detection Baseline

from sklearn.metrics import precision_recall_curve
import numpy as np

# Baseline 1: Always predict "no fraud" (majority class)
baseline1_accuracy = (fraud_data.label == 0).mean()  # e.g., 98.5%
baseline1_recall = 0  # Catches 0% of frauds

# Baseline 2: Logistic Regression
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(X_train, y_train)
baseline2_accuracy = lr.score(X_test, y_test)
baseline2_recall = (lr.predict(X_test) == 1).mean()

print(f"Baseline 1 (always negative): Acc={baseline1_accuracy:.1%}")
print(f"Baseline 2 (Logistic): Acc={baseline2_accuracy:.1%}, "
      f"Recall={baseline2_recall:.1%}")

# Your model harus BOTH lebih akurat AND better recall

14.4.2 Systematic Iteration Process

Don’t randomly try 100 algorithms. Iterate systematically.

Phase 1: Simple Models (Week 1-2)
  Try: Logistic Regression, Decision Tree, KNN
  Goal: Understand problem & get baseline

Phase 2: Intermediate Models (Week 2-3)
  Try: Random Forest, SVM, Gradient Boosting
  Goal: Find algorithm yang works best

Phase 3: Advanced Models (Week 3-4)
  Try: Neural Networks, Ensemble, State-of-art
  Goal: Push towards target performance

Phase 4: Optimization (Week 4-5)
  Try: Hyperparameter tuning, ensemble methods
  Goal: Final squeeze on performance

Iteration Template to Document:

## Experiment Log

### Experiment 1: Logistic Regression Baseline
- Date: 2024-01-10
- Model: LogisticRegression(C=1.0, max_iter=1000)
- Features: 35 features, no scaling
- Result: Accuracy=0.82, Recall=0.75, F1=0.78
- Note: Baseline established
- Next: Try feature scaling

### Experiment 2: Logistic Regression + Scaling
- Date: 2024-01-11
- Model: LogisticRegression with StandardScaler
- Features: 35 features, StandardScaled
- Result: Accuracy=0.84, Recall=0.78, F1=0.81
- Note: Slight improvement from scaling
- Next: Try feature engineering

### Experiment 3: Random Forest
- Date: 2024-01-12
- Model: RandomForest(n_estimators=100, max_depth=10)
- Features: 35 features + 8 engineered features
- Result: Accuracy=0.87, Recall=0.85, F1=0.86
- Note: Significant improvement!
- Next: Hyperparameter tuning for RF

14.4.3 Common Iteration Pitfalls

⚠️ Things NOT to Do

❌ Tuning on Test Set

# WRONG: Evaluating on test set repeatedly
for hyperparams in search_space:
    model = fit_model(X_train, y_train, hyperparams)
    score = model.score(X_test, y_test)  # OVERFITTING TO TEST!

# RIGHT: Evaluate on validation set
for hyperparams in search_space:
    model = fit_model(X_train, y_train, hyperparams)
    score = model.score(X_val, y_val)  # Use validation
# Final evaluation pada test set saja!

❌ Data Leakage

# WRONG: Scale entire dataset, then split
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # fit on ALL data!
X_train, X_test = train_test_split(X_scaled)

# RIGHT: Fit scaler only on training data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)  # transform using train stats

❌ Metric Obsession

# WRONG: Chasing single metric
# "My model has 97% accuracy! I'm done!"
# But: 0% recall, 500ms latency, not deployable

# RIGHT: Consider all metrics
# Accuracy: 85%, Recall: 90%, Latency: 50ms, Interpretable: Yes

14.5 Evaluation & Validation

14.5.1 Comprehensive Evaluation Framework

Jangan evaluasi model berdasarkan single metric saja.

Model Evaluation Matrix:

Dimension	Metric	Target
Accuracy	F1-Score	≥ 0.85
Fairness	Demographic Parity	< 0.1 difference
Robustness	Adversarial Accuracy	> 0.80
Interpretability	SHAP importance	Top-5 features clear
Speed	Latency (p95)	< 100ms
Resource Usage	Memory, CPU	< 1GB RAM

14.5.2 Advanced Validation Techniques

1. K-Fold Cross Validation

from sklearn.model_selection import cross_validate

# Better than single train/test split
cv_results = cross_validate(
    model, X, y,
    cv=5,  # 5-fold
    scoring=['accuracy', 'precision', 'recall', 'f1'],
    return_train_score=True
)

# Check for overfitting
train_score = cv_results['train_accuracy'].mean()
test_score = cv_results['test_accuracy'].mean()
print(f"Train: {train_score:.3f}, Test: {test_score:.3f}")

if (train_score - test_score) > 0.15:
    print("⚠️ Possible overfitting! Gap > 15%")

2. Stratified Split (untuk imbalanced data)

from sklearn.model_selection import StratifiedKFold

# Ensure class distribution maintained
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
for train_idx, test_idx in skf.split(X, y):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]
    # Training loop

3. Time Series Cross Validation (untuk sequential data)

from sklearn.model_selection import TimeSeriesSplit

# Don't shuffle! Respect temporal order
tscv = TimeSeriesSplit(n_splits=5)
for train_idx, test_idx in tscv.split(X):
    X_train, X_test = X[train_idx], X[test_idx]
    # Model trained on past, tested on future

14.5.3 Diagnostic Plots

Essential diagnostic plots untuk capstone:

import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, roc_curve, auc
import seaborn as sns

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# 1. Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, ax=axes[0, 0], cmap='Blues')
axes[0, 0].set_title('Confusion Matrix')

# 2. ROC Curve
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
auc_score = auc(fpr, tpr)
axes[0, 1].plot(fpr, tpr, label=f'AUC={auc_score:.3f}')
axes[0, 1].plot([0, 1], [0, 1], 'k--', label='Random')
axes[0, 1].set_xlabel('FPR'), axes[0, 1].set_ylabel('TPR')
axes[0, 1].set_title('ROC Curve')
axes[0, 1].legend()

# 3. Precision-Recall Curve
from sklearn.metrics import precision_recall_curve
precision, recall, _ = precision_recall_curve(y_test, y_pred_proba)
axes[1, 0].plot(recall, precision)
axes[1, 0].set_xlabel('Recall'), axes[1, 0].set_ylabel('Precision')
axes[1, 0].set_title('Precision-Recall Curve')

# 4. Feature Importance
feature_importance = pd.DataFrame({
    'feature': X_test.columns,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)

axes[1, 1].barh(feature_importance['feature'][:10],
                feature_importance['importance'][:10])
axes[1, 1].set_title('Top 10 Important Features')

plt.tight_layout()
plt.savefig('model_diagnostics.png', dpi=300, bbox_inches='tight')

14.6 Documentation Best Practices

14.6.1 Project Structure & Documentation

Professional project structure:

my-capstone-project/
├── README.md                          # Project overview
├── DOCUMENTATION.md                   # Detailed documentation
├── LICENSE                            # MIT or similar
├── .gitignore                        # Exclude large files, cache
│
├── data/
│   ├── raw/                          # Original data
│   ├── processed/                    # Cleaned, transformed data
│   └── README.md                     # Data dictionary
│
├── notebooks/
│   ├── 01_eda.ipynb                 # Exploratory analysis
│   ├── 02_preprocessing.ipynb        # Data preparation
│   └── 03_modeling.ipynb             # Model training
│
├── src/
│   ├── __init__.py
│   ├── preprocessing.py              # Data prep functions
│   ├── features.py                   # Feature engineering
│   ├── models.py                     # Model definitions
│   ├── evaluation.py                 # Evaluation metrics
│   └── utils.py                      # Helper functions
│
├── models/
│   ├── model_v1.0.pkl               # Saved models
│   ├── model_v1.1.pkl
│   └── model_card.md                # Model documentation
│
├── reports/
│   ├── eda_report.html              # EDA visualization
│   ├── model_comparison.csv         # Experiment results
│   └── final_report.pdf             # Final analysis
│
├── tests/
│   ├── test_preprocessing.py        # Unit tests
│   ├── test_models.py
│   └── test_pipeline.py
│
├── requirements.txt                  # Dependencies
├── setup.py                         # Package setup (if publishing)
└── train.py                         # Main training script

14.6.2 Documentation Template

README.md Example:

# Credit Risk Prediction Model

## Overview
Building an automated credit scoring system to reduce loan
approval time from 3 days to <24 hours while maintaining
default rate below 5%.

## Dataset
- **Source**: GrowthBank historical loans (2018-2023)
- **Size**: 10,000 samples × 42 features
- **Target**: Binary (Default/Non-default), 8% positive class
- **Time Period**: 2018-01-01 to 2023-12-31

## Project Structure
[File structure description]

## Quick Start

pip install -r requirements.txt python train.py


## Results
- **Best Model**: Random Forest
- **Accuracy**: 87.2% ± 1.3% (5-fold CV)
- **Recall (Default)**: 91.5% (catch 91.5% of defaults)
- **Precision**: 45.2% (acceptable false positive rate)
- **ROC-AUC**: 0.934

## Model Performance
[Performance visualization and metrics table]

## Key Findings
1. Transaction frequency is strongest indicator of default risk
2. Recent payment history more important than historical average
3. Model shows no significant bias against age groups

## Limitations
- Limited to B2B loans (may not generalize to consumer)
- Training data from 2018-2023 (concept drift possible)
- No alternative data sources (e.g., behavioral)

## Future Work
- Implement real-time model monitoring
- Extend to multi-class risk levels (low/medium/high)
- Add fairness constraints for protected attributes

14.6.3 Model Card Documentation

Model Card = standardized documentation untuk ML model.

# Model Card: Credit Risk Classifier v1.2

## Model Details
- **Model Type**: Random Forest Classification
- **Framework**: Scikit-learn
- **Version**: 1.2
- **Date**: 2024-01-15
- **Authors**: [Your Name]
- **License**: MIT

## Intended Use
- **Intended Use**: Automated credit risk assessment for B2B loans
- **Primary Users**: Credit department, lending officers
- **Out-of-Scope**: Consumer lending, international markets

## Performance
- **Training Data**: 8,000 samples (80%)
- **Test Data**: 2,000 samples (20%)
- **Metric**: Accuracy, Precision, Recall, F1

### Detailed Performance Metrics

              Precision  Recall  F1-Score  Support

Non-Default (0) 0.93 0.88 0.90 1840 Default (1) 0.45 0.92 0.61 160 Accuracy 0.87 2000


## Fairness Analysis
- **Gender Bias**: FPR diff = 1.2% (acceptable)
- **Age Bias**: ROC-AUC for <30 = 0.92, >50 = 0.94 (no significant difference)

## Limitations
- Only trained on B2B segment
- Assumes data distribution similar to training period
- Requires regular retraining

## Data and Preprocessing
- **Training Data**: 8000 historical loans with known outcomes
- **Input Features**: 42 features (company profile + financials)
- **Preprocessing**: StandardScaler on numerical, OneHot on categorical

14.7 Code Quality & Reproducibility

14.7.1 Reproducibility Checklist

Example: Reproducible Training Script

# train.py - Fully reproducible training

import numpy as np
import random
import tensorflow as tf
from pathlib import Path
import json

# Set random seeds for reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
random.seed(RANDOM_SEED)
tf.random.set_seed(RANDOM_SEED)

def main(config_path='config.json'):
    """Main training function"""

    # Load configuration
    with open(config_path, 'r') as f:
        config = json.load(f)

    # Load and preprocess data
    X_train, y_train, X_test, y_test = load_data(
        config['data_path'],
        test_size=config['test_size'],
        random_state=RANDOM_SEED
    )

    # Train model
    model = train_model(
        X_train, y_train,
        **config['model_params']
    )

    # Evaluate
    metrics = evaluate_model(model, X_test, y_test)

    # Save results
    save_artifacts(model, metrics, config)

    return metrics

if __name__ == "__main__":
    metrics = main()
    print(f"Final accuracy: {metrics['accuracy']:.3f}")

config.json:

{
  "data_path": "data/processed/",
  "test_size": 0.2,
  "model_params": {
    "n_estimators": 100,
    "max_depth": 10,
    "random_state": 42
  },
  "output_dir": "results/model_v1.2/"
}

14.7.2 Code Quality Standards

Minimal standards untuk capstone:

# ✅ Good: Functions with docstrings
def prepare_features(X, categorical_cols, numerical_cols):
    """
    Prepare features untuk model training.

    Parameters
    ----------
    X : pd.DataFrame
        Input features dengan categorical dan numerical columns
    categorical_cols : list
        Names of categorical columns untuk OneHot encoding
    numerical_cols : list
        Names of numerical columns untuk scaling

    Returns
    -------
    X_prepared : np.ndarray
        Prepared feature matrix ready for training
    """
    # Implementation
    return X_prepared

# ✅ Good: Comments explain WHY, not WHAT
# Skip features dengan >50% missing values
# (imputation would introduce too much bias)
mask = feature_missing_rate < 0.5
X_filtered = X.loc[:, mask]

# ❌ Bad: Comments repeat code
for i in range(len(data)):  # Loop through data
    if data[i] > threshold:  # Check if greater
        results.append(data[i])  # Add to results

Type hints untuk clarity:

from typing import Tuple, List, Dict
import pandas as pd
import numpy as np

def split_features(
    X: pd.DataFrame,
    target_ratio: float = 0.2
) -> Tuple[pd.DataFrame, pd.DataFrame]:
    """Split into train and test with specified ratio."""
    n_test = int(len(X) * target_ratio)
    return X[:-n_test], X[-n_test:]

def calculate_metrics(
    y_true: np.ndarray,
    y_pred: np.ndarray
) -> Dict[str, float]:
    """Calculate evaluation metrics."""
    return {
        'accuracy': (y_true == y_pred).mean(),
        # ... more metrics
    }

14.8 Technical Report Writing

14.8.1 Report Structure

Standar technical report capstone (15-30 pages):

1. Executive Summary (1 page)
   - Problem, solution, results
   - Key recommendation
   - Business impact

2. Introduction (2-3 pages)
   - Context dan motivation
   - Problem statement
   - Research questions
   - Contributions

3. Literature Review (2-3 pages)
   - Related work
   - Existing solutions
   - Knowledge gaps
   - How your work differs

4. Methodology (3-4 pages)
   - Problem formulation (mathematical)
   - Algorithms & approaches
   - Evaluation metrics
   - Hyperparameters

5. Data Description (2-3 pages)
   - Dataset characteristics
   - Data collection & preprocessing
   - Feature engineering
   - Class distribution, missing values
   - Train/val/test splits

6. Results (3-5 pages)
   - Model comparison
   - Best model performance
   - Ablation studies
   - Visualizations (confusion matrix, ROC, etc.)

7. Analysis & Discussion (2-3 pages)
   - Interpret results
   - Why did model work/fail
   - Limitations
   - Error analysis

8. Conclusion & Future Work (1-2 pages)
   - Summary of findings
   - Practical implications
   - Directions for future research

9. References (1-2 pages)
   - Academic citations
   - Data sources
   - Software libraries

14.8.2 Writing Best Practices

❌ Avoid:

Overly technical jargon tanpa explanation
Unsupported claims (“Our model is the best!”)
Lengthy code listings in main report
Vague statements (“We tried many models”)

✅ Do:

Explain technical concepts tersedia untuk general audience
Support claims dengan evidence (metrics, citations)
Put code in appendix
Be specific (“We evaluated 12 models using GridSearchCV”)

Example: Good vs Bad Writing

❌ BAD:
"We used an RF with optimized hyperparameters to
maximize the AUC on the validation dataset."

✅ GOOD:
"We trained a Random Forest classifier with
100 trees and max depth of 10 (selected via 5-fold
cross-validation to maximize AUC-ROC). The model
achieved 87.2% accuracy on the test set."

14.9 Presentation & Demo Skills

14.9.1 Presentation Structure

Final presentation (15-20 minutes):

0-1 min:   Title slide, introduce yourself
1-2 min:   The Problem (why should audience care?)
2-3 min:   Your Solution (brief overview)
3-8 min:   Key Results (metrics, visualizations)
8-12 min:  Technical Deep Dive (1-2 complex topics)
12-15 min: Limitations & Future Work
15-20 min: Q&A

Slide Guidelines:

Slide 1-2: Title + Problem (make audience care!)
Slide 3-4: Data overview (n samples, features, class distribution)
Slide 5-6: Approach (your method vs baselines)
Slide 7-10: Results (visualizations!)
- Confusion matrix
- ROC/PR curve
- Feature importance
- Model comparison
Slide 11-12: Deep dive on 1-2 interesting findings
Slide 13: Limitations honestly discussed
Slide 14: Future work & lessons learned

💡 Presentation Tips

Lead with Why: “Why should anyone care about this problem?”
Show Impact: How does this solve a real problem?
Use Visualizations: Data > tables > text
Tell a Story: Not just “We did X and got Y accuracy”
Prepare for Questions: Know your code, data, and assumptions
Practice: Time yourself, get feedback

14.9.2 Live Demo Best Practices

If demonstrating working system:

✅ DO:
- [ ] Test demo multiple times before presentation
- [ ] Have backup screenshots/video if live breaks
- [ ] Keep demo simple (don't show complex edge cases)
- [ ] Make prediction in <5 seconds
- [ ] Explain what model learned, not just output

❌ DON'T:
- Don't go off script ("Let me show you something...")
- Don't click around aimlessly
- Don't make predictions on untypical data
- Don't forget what model is doing
- Don't spend >2 minutes on demo

14.10 Project Examples & Case Studies

14.10.1 Case Study 1: Fraud Detection (Classification)

Problem: E-commerce platform dengan 100K+ transactions/day, fraud rate 0.8%.

Approach:

1. Baseline: Logistic Regression
   - Accuracy: 99.8% (predicting all non-fraud!)
   - Recall: 0% (catches 0 frauds) ❌

2. Strategy: Class imbalance handling
   - SMOTE oversampling untuk minority class
   - Use Recall as primary metric
   - Adjust decision threshold

3. Best Model: Gradient Boosting + Custom Threshold
   - Accuracy: 98.5%
   - Recall: 92% (catch 92% of frauds)
   - Precision: 35% (acceptable for fraud detection)
   - Live performance: catches $2M fraud/month

Key Lessons:

Accuracy misleading pada imbalanced data
Recall > Precision untuk fraud (catch fraud, accept false alarms)
Real-time constraints matter (must score transaction in <100ms)

14.10.2 Case Study 2: Predictive Maintenance (Regression)

Problem: Manufacturing plant, reduce unplanned downtime.

Approach:

1. Data: 5 years sensor data (temperature, vibration, pressure)
2. Target: RUL (Remaining Useful Life) prediction
3. Baseline: Linear regression on recent sensor readings
   - RMSE: 150 hours

4. Advanced: Sequence-to-sequence LSTM
   - Input: 7 days sensor history
   - Output: Days until failure
   - RMSE: 42 hours (3x better)
   - Enables preventive maintenance

Challenges & Solutions:

Class imbalance: Most machines work fine
- Solution: Weighted loss function, focus on failures
Seasonality: Equipment behaves differently by season
- Solution: Add seasonal features
Concept drift: Equipment degrades over 5 years
- Solution: Retrain monthly with recent data

14.11 Common Pitfalls & Prevention

14.11.1 Technical Pitfalls

⚠️ Data Leakage (Most Common!)

❌ Problem: Information from future leaks into training

# WRONG: Use future information in features
for idx in range(len(data)):
    # Using data[idx+1] (future) to predict data[idx]
    features[idx] = [
        data[idx]['price'],
        data[idx+1]['price'],  # ← FUTURE DATA!
        data[idx+1]['volume']   # ← FUTURE DATA!
    ]

# RIGHT: Use only past information
for idx in range(1, len(data)):  # Start from idx=1
    features[idx] = [
        data[idx]['price'],
        data[idx-1]['price'],  # ← Past
        data[idx-1]['volume']   # ← Past
    ]

❌ Problem: Scaling entire dataset before train/test split

# WRONG: Fit scaler on entire dataset
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # ← Fit on ALL data
X_train, X_test = train_test_split(X_scaled)

# Model "sees" test set information during scaling!

# RIGHT: Fit scaler ONLY on training data
X_train, X_test = train_test_split(X)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # ← Fit on train only
X_test_scaled = scaler.transform(X_test)  # ← Transform using train stats

How to prevent:

Double-check: Can model see future/test data?
Audit preprocessing steps
Use cross-validation to catch leakage
Have someone else review code

⚠️ Overfitting

❌ Training accuracy 99%, test accuracy 70%

# Signs of overfitting:
if train_acc - test_acc > 0.15:  # Gap > 15%
    print("⚠️ Probable overfitting!")

# Solutions:
# 1. More training data
# 2. Reduce model complexity
# 3. Regularization (L1/L2)
# 4. Early stopping
# 5. Dropout (neural networks)

# ✅ Regularized model
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(C=0.01, penalty='l2')
# C < 1 means stronger regularization

⚠️ Hyperparameter Tuning on Test Set

# ❌ WRONG: Evaluating on test set repeatedly
best_score = 0
for C in [0.001, 0.01, 0.1, 1, 10]:
    model = LogisticRegression(C=C)
    model.fit(X_train, y_train)
    score = model.score(X_test, y_test)  # ← TUNING ON TEST!
    if score > best_score:
        best_score = score
        best_C = C

# ✅ RIGHT: Use validation set for hyperparameter selection
for C in [0.001, 0.01, 0.1, 1, 10]:
    model = LogisticRegression(C=C)
    model.fit(X_train, y_train)
    score = model.score(X_val, y_val)  # ← USE VALIDATION
    if score > best_score:
        best_score = score
        best_C = C

# Final test on held-out test set
final_score = best_model.score(X_test, y_test)

14.11.2 Project Management Pitfalls

Pitfall	Symptom	Prevention
Scope Creep	Keeps adding features	Lock scope week 2, use change request process
No Baseline	Don’t know if model good	Implement baseline in week 1
Single Train/Test	High variance in metrics	Use 5-fold cross-validation
Ignoring Class Imbalance	98% accuracy but useless	Check class distribution day 1
No Documentation	Can’t reproduce results	Write as you code, not at end
Last-Minute Demo	Presentation full of errors	Practice presentation 1 week before

14.11.3 Communication Pitfalls

⚠️ Overselling Results

❌ WRONG:

“Our model is 95% accurate!” (In context of imbalanced dataset: 95% = predicting majority class)

✅ RIGHT:

“Our model achieves 95% accuracy, 85% recall, and 60% precision. This means we catch 85% of frauds with 60% of alerts being true positives.” (Specific, contextualized, honest)

14.12 Project Grading Rubric

14.12.1 Comprehensive Rubric untuk Evaluasi

Total: 100 points

1. Problem Definition & Planning (15 points)

Criteria	Excellent (A)	Good (B)	Fair (C)	Poor (D)
Problem Statement	Clear, specific, measurable, business-aligned	Clear problem, minor scope issues	Problem vague, weak business case	Problem ill-defined
(0-5 pts)	5	4	2	0
Data Planning	Complete strategy, quality assessment, privacy check	Data plan documented, mostly complete	Minimal planning, data quality not assessed	No data strategy
(0-5 pts)	5	4	2	0
Feasibility & Timeline	Realistic scope, detailed milestone, risk mitigation	Reasonable scope, milestone defined	Over-scoped or unrealistic timeline	No feasibility analysis
(0-5 pts)	5	4	2	0

2. Data & Analysis (15 points)

Criteria	Excellent (A)	Good (B)	Fair (C)	Poor (D)
Data Quality	Clean, well-documented, proper handling of missing/outliers	Mostly clean, minor issues handled	Some quality issues, incomplete cleaning	Poor data quality
(0-5 pts)	5	4	2	0
EDA & Insights	Comprehensive EDA, clear insights, good visualizations	Good EDA, identified patterns	Basic EDA, minimal insights	Insufficient analysis
(0-5 pts)	5	4	2	0
Feature Engineering	Smart features, validated impact, interpretable	Some feature engineering, reasonable	Minimal feature work	No feature engineering
(0-5 pts)	5	4	2	0

3. Model Development (25 points)

Criteria	Excellent (A)	Good (B)	Fair (C)	Poor (D)
Approach	Multiple algorithms tested, systematic comparison, justification	2-3 algorithms, reasonable comparison	Limited algorithm exploration	Single model tried
(0-8 pts)	8	6	4	0
Validation	K-fold CV, proper train/val/test split, no data leakage	Proper splitting, no obvious leakage	Basic validation, possible issues	Train/test contamination
(0-9 pts)	9	7	5	0
Hyperparameter Tuning	Systematic tuning, documented process, prevents overfitting	Good tuning, reasonable results	Limited tuning	Random/no tuning
(0-8 pts)	8	6	4	0

4. Evaluation & Results (20 points)

Criteria	Excellent (A)	Good (B)	Fair (C)	Poor (D)
Metrics Selection	Appropriate for problem, multiple metrics, justified	Good metrics, mostly justified	Limited metrics, weak justification	Inappropriate metrics
(0-5 pts)	5	4	2	0
Results & Analysis	Clear improvement over baseline, thorough analysis, error discussion	Good results with analysis	Acceptable results, minimal analysis	Poor results, no analysis
(0-8 pts)	8	6	4	0
Reproducibility	Complete reproducibility, clear results, documented seeds	Mostly reproducible, good documentation	Somewhat reproducible, missing details	Not reproducible
(0-7 pts)	7	5	3	0

5. Documentation & Code Quality (15 points)

Criteria	Excellent (A)	Good (B)	Fair (C)	Poor (D)
Code Quality	Clean, documented, proper structure, follows best practices	Generally clean, documented, reasonable structure	Some documentation, inconsistent style	Poor documentation, messy code
(0-5 pts)	5	4	2	0
Documentation	Complete report, clear writing, all sections thorough	Good documentation, mostly complete	Incomplete documentation, some clarity issues	Minimal documentation
(0-5 pts)	5	4	2	0
Repository	Organized structure, git history clean, includes all artifacts	Good organization, mostly complete	Basic organization, missing some items	Disorganized or incomplete
(0-5 pts)	5	4	2	0

6. Presentation (10 points)

Criteria	Excellent (A)	Good (B)	Fair (C)	Poor (D)
Clarity	Clear story, appropriate level, engaging for audience	Generally clear, minor issues	Some clarity issues, organization weak	Confusing or hard to follow
(0-5 pts)	5	4	2	0
Delivery	Confident, paced well, handles questions effectively	Good delivery, minor pacing issues	Nervous or rushed, struggles with questions	Poor presentation skills
(0-5 pts)	5	4	2	0

14.12.2 Grading Scale

90-100: A (Excellent)
  - Professional-quality project
  - Clear contribution
  - Production-ready code
  - Excellent presentation

80-89:  B (Good)
  - Solid project with minor issues
  - Good approach and results
  - Well-documented
  - Good presentation

70-79:  C (Fair)
  - Acceptable project, significant gaps
  - Basic approach, adequate results
  - Documentation could be better
  - Adequate presentation

60-69:  D (Poor)
  - Project not fully meeting standards
  - Issues in approach or execution
  - Poor documentation
  - Weak presentation

<60:    F (Fail)
  - Does not meet minimum standards

14.13 Ringkasan

📚 Chapter Summary

1. Project Planning - Scoping adalah kunci sukses - Define problem dengan SMART criteria - Plan realistic timeline dengan milestones

2. Problem Formulation - Clear problem statement (context, problem, data, metrics) - Match metrics dengan business goals - Consider constraints (latency, privacy, fairness)

3. Data Strategy - Comprehensive data collection plan - Structured EDA dengan insights - Proper preprocessing dengan documentation

4. Baseline & Iteration - Establish baseline untuk context - Iterate systematically (not randomly) - Document experiments carefully

5. Evaluation & Validation - Use multiple metrics (not single number) - K-fold cross-validation untuk robustness - Create diagnostic plots untuk understanding

6. Documentation - Professional project structure - Complete README dan model card - Reproducible code dengan seeds

7. Presentation - Lead dengan WHY - Use visualizations effectively - Tell coherent story

8. Avoid Common Pitfalls - Data leakage (most common!) - Overfitting - Tuning on test set - Ignoring class imbalance

Checklist Akhir: 30 Days Before Submission

✅ Final 30-Day Checklist

Week 1-2 (Code finalization) - [ ] All code reviewed and cleaned - [ ] Tests written and passing - [ ] No hardcoded paths or credentials - [ ] Requirements.txt updated dengan versions - [ ] Git history clean (meaningful commits)

Week 2-3 (Documentation) - [ ] README.md complete dan tested - [ ] Code comments explain WHY not WHAT - [ ] Model card written - [ ] EDA report finalized - [ ] All results reproducible

Week 3-4 (Presentation) - [ ] Slides drafted (14-15 slides) - [ ] Key visualizations created - [ ] Practice presentation (3x minimum) - [ ] Get feedback from mentor/friend - [ ] Prepare for common Q&A

Final Week (Testing) - [ ] Run entire pipeline end-to-end - [ ] Verify all outputs match report - [ ] Check presentation on actual equipment - [ ] Submit early (day before deadline) - [ ] Take screenshot proof

🎓 Selamat! Anda siap memulai capstone project!

Ingat: Scope kecil tapi selesai dengan baik > scope besar tapi tidak selesai.

Fokus pada QUALITY over QUANTITY.

Good luck! 🚀

--- title: "Bab 14: Panduan Capstone Project & Best Practices" subtitle: "Dari Ideasi hingga Implementasi: Merencanakan, Mengeksekusi, dan Mengkomunikasikan Proyek ML Profesional" number-sections: false --- # Bab 14: Panduan Capstone Project & Best Practices {#sec-chapter-14} ::: {.callout-note} ## 🎯 Hasil Pembelajaran (Learning Outcomes) Setelah menyelesaikan bab ini, Anda akan mampu: 1. **Merencanakan** proyek ML produksi dengan scope yang jelas dan timeline realistis 2. **Merumuskan** masalah dengan baik melalui problem statement yang terukur (SMART criteria) 3. **Merancang** strategi data yang komprehensif dari koleksi hingga validasi 4. **Membangun** baseline model dan iterasi sistematis menuju target performance 5. **Mengevaluasi** dan memvalidasi model dengan metrik yang sesuai use case 6. **Mendokumentasikan** proyek dengan standar profesional dan reproducibility 7. **Mempresentasikan** project findings dan insights dengan jelas kepada stakeholder 8. **Mengidentifikasi** dan menghindari pitfall umum yang sering terjadi dalam capstone project ::: ## 14.1 Project Planning & Scoping ### 14.1.1 Mengapa Scoping Penting? Banyak student menghabiskan waktu untuk modeling, sementara planning dan scoping sering diabaikan. Padahal, **poor scoping adalah penyebab utama project failure**. **Statistik Mencolok:** - 45% capstone projects **gagal deliver** hasil meaningful karena scope tidak jelas - 60% students **underestimate timeline** di awal project - 38% mencoba problem yang **terlalu ambitious** untuk 1 semester ::: {.callout-warning} ## ⚠️ Common Mistakes 1. **Scope Creep**: Mulai dengan scope jelas, terus bertambah fitur baru 2. **Underestimating Complexity**: "Sepertinya mudah" → ternyata butuh 3x lebih lama 3. **Fixing Scope, Not Timeline**: Tenggat waktu diperas, hasil jadi jelek 4. **Tidak Ada Minimum Viable Product (MVP)**: Semua atau tidak ada ::: ### 14.1.2 Project Scoping Framework **Step 1: Tentukan Tujuan Proyek** ``` Proyek bukan hanya tentang ANDA, tapi tentang VALUE yang akan diberikan. ``` **Pertanyaan Kunci:** - Siapa stakeholder utama? - Masalah apa yang dipecahkan? - Bagaimana kesuksesan diukur? - Apa timeline yang realistis? **Contoh Good Scope vs Bad Scope:** | Bad Scope | Good Scope | |-----------|-----------| | "Build a ML model for sentiment analysis" | "Build sentiment analysis model untuk classify customer feedback (positive/negative/neutral) dengan target accuracy 85%, untuk membantu customer service team prioritize complaints" | | "Predict stock prices" | "Build LSTM model untuk predict intraday price movements (+/- 2% threshold) menggunakan 6 bulan historical data, untuk identify trading opportunities dengan risk-adjusted returns" | | "Image classification" | "Classify malware vs benign Windows executable files dengan 90% recall untuk security screening, menggunakan image representation dari binary files" | **Step 2: Define Success Criteria (SMART)** - **Specific**: Jelas dan terdefinisi - **Measurable**: Bisa diukur dengan metrik konkret - **Achievable**: Realistis dengan resources yang tersedia - **Relevant**: Penting bagi stakeholder - **Time-bound**: Ada deadline yang jelas **Contoh SMART Criteria:** ``` ❌ Bad: "Model harus akurat" ✅ Good: "Achieve 85%+ accuracy pada test set dengan balanced dataset (n=5000 samples) menggunakan Random Forest, validation dilakukan dengan 5-fold cross-validation, deadline 31 Desember 2024" ``` **Step 3: Risk Assessment** Identifikasi potential blockers SEBELUM dimulai: | Risk | Probability | Impact | Mitigation | |------|-------------|--------|-----------| | Data tidak tersedia | Medium | Critical | Request dari company 2 minggu sebelumnya | | Data quality buruk | High | Medium | Plan intensive data cleaning phase | | Model susah konvergen | Medium | High | Research state-of-art papers, test multiple algorithms early | ### 14.1.3 Timeline & Milestone Planning **Struktur Timeline Capstone (1 semester = 16 minggu):** ``` Weeks 1-2: Problem Definition + Planning (10%) Weeks 3-5: Data Collection & EDA (15%) Weeks 6-8: Feature Engineering & Preprocessing (15%) Weeks 9-11: Model Development & Experimentation (25%) Weeks 12-13: Evaluation & Optimization (15%) Weeks 14-15: Documentation & Presentation Prep (15%) Week 16: Final Presentation & Submission (5%) ``` **Critical Milestones:** ::: {.callout-important} ## 📋 Project Milestones Checklist **Month 1: Project Setup (Due: Week 4)** - [ ] Problem statement finalized dan approved - [ ] Stakeholder identified - [ ] Preliminary data assessment done - [ ] Team roles & responsibilities defined - [ ] Git repository setup dengan proper structure **Month 2: Data & Baseline (Due: Week 8)** - [ ] Dataset collected dan cleaned - [ ] EDA report completed - [ ] Data splits (train/val/test) finalized - [ ] Baseline model implemented - [ ] Evaluation metrics selected **Month 3: Model Development (Due: Week 12)** - [ ] 3+ models trained dan compared - [ ] Hyperparameter tuning completed - [ ] Best model selected - [ ] Cross-validation done - [ ] Model card drafted **Final 2 Weeks: Finalization** - [ ] Documentation complete - [ ] Code cleaned & tested - [ ] Demo prepared - [ ] Presentation slides ready ::: ## 14.2 Problem Formulation ### 14.2.1 Anatomy of Good Problem Statement **Komponen penting:** 1. **Context**: Latar belakang dan business case 2. **Problem**: Apa yang perlu dipecahkan 3. **Data**: Apa dan berapa banyak data yang tersedia 4. **Success Metrics**: Bagaimana kesuksesan diukur 5. **Constraints**: Batasan teknis dan non-teknis **Contoh Problem Statement:** ``` CONTEXT: GrowthBank melayani 50,000+ customers B2B dengan average loan size Rp 500 juta. Manual credit approval memakan 3-5 hari dan memiliki default rate 8%. PROBLEM: Otomasi credit scoring process untuk mengurangi approval time menjadi <24 jam dan default rate menjadi <5%, dengan tetap mempertahankan customer satisfaction. DATA: - 10,000 historical loans (2018-2023) - 40+ features: company profile, financials, payment history - 5% data missing (handled appropriately) SUCCESS METRICS: 1. Model accuracy: 85%+ pada test set 2. Default recall: 90% (catch bad borrowers) 3. Processing speed: <5 seconds per application 4. Interpretability: Top 5 important features identifiable CONSTRAINTS: - Data privacy: PII must be removed/encrypted - Latency: Must respond in <5 sec - Availability: 99% uptime required - Fairness: No discrimination against protected groups ``` ### 14.2.2 Problem Type Classification **Classification:** - Binary (yes/no, churn/stay, fraud/legitimate) - Multi-class (sentiment: positive/neutral/negative) - Multi-label (music genres: rock, pop, jazz simultaneously) **Regression:** - Continuous values (price, temperature, traffic volume) - Time series (stock price prediction, demand forecasting) **Ranking/Recommendation:** - Prioritize items (search ranking, recommendation system) - Matching (matching job seekers to jobs) **Anomaly Detection:** - Outlier detection (fraud, system intrusion, equipment failure) - Novelty detection (new attack types) **Clustering:** - Customer segmentation - Document clustering **Choosing the right problem type determines:** - Data requirements - Metrics selection - Algorithm choices - Evaluation approach ### 14.2.3 Defining Metrics ::: {.callout-tip} ## 💡 Best Practice: Match Metrics to Business Goals "60% accuracy" tidak berarti apa-apa. Metrics harus: 1. Aligned dengan business KPI 2. Interpretable (bukan hanya untuk statistician) 3. Actionable (bisa disambung dengan keputusan) ::: **Contoh Metric Selection:** ``` PROBLEM: Fraud Detection ↓ BUSINESS GOAL: Catch 95% frauds, minimize false positives ↓ METRICS: Recall=95% (catch fraud), Precision high (avoid false alarms) ↓ IMPLEMENTATION: Select threshold yang maximize F2-score (2x weight pada recall) ``` **Common Metrics by Problem Type:** | Problem Type | Primary Metric | Secondary Metrics | |--------------|----------------|-------------------| | **Classification** | Accuracy (balanced), Precision, Recall, F1 | AUC-ROC, Confusion Matrix | | **Imbalanced** | Precision, Recall, F1-score, AUC | Sensitivity, Specificity | | **Regression** | MAE, RMSE | R², MAPE | | **Ranking** | NDCG, MAP | MRR, Recall@K | | **Clustering** | Silhouette Score | Davies-Bouldin Index | | **Anomaly** | Detection Rate, False Positive Rate | Precision@K, AUROC | **Caution on Single Metric:** ```python # ❌ Don't do this if accuracy > 0.85: print("Model is good!") # ✅ Do this instead metrics = { 'accuracy': 0.85, 'precision': 0.82, 'recall': 0.88, 'f1': 0.85, 'auc_roc': 0.89 } # Interpret holistically print(f"High recall (0.88) → catches most positives") print(f"OK precision (0.82) → some false alarms acceptable") print(f"Balanced F1 (0.85) → good overall trade-off") ``` ## 14.3 Data Strategy ### 14.3.1 Data Collection Plan **Template: Data Collection Checklist** ::: {.callout-important} ## 📋 Data Collection Planning **Data Source** - [ ] Source identified (API, database, CSV, web scraping) - [ ] Access obtained (permissions, credentials) - [ ] Data freshness understood (real-time, daily, monthly) - [ ] Size confirmed (n samples × m features) **Data Quality Assessment** - [ ] Missing values documented (<5% acceptable) - [ ] Duplicates checked - [ ] Outliers identified - [ ] Data type validation done - [ ] Value ranges reasonable **Data Privacy & Ethics** - [ ] PII removal/anonymization done - [ ] GDPR/compliance checked - [ ] Bias in data identified - [ ] Consent obtained (if needed) - [ ] Data retention policy defined **Data Documentation** - [ ] Data dictionary created (each feature explained) - [ ] Data quality report generated - [ ] Collection date/period documented - [ ] Known issues documented ::: **Contoh: Data Dictionary** ``` Feature Name: transaction_amount ├─ Type: float64 ├─ Unit: Indonesian Rupiah (IDR) ├─ Range: 10,000 - 999,999,999 ├─ Missing: 0.2% (handled by median imputation) ├─ Distribution: Right-skewed (log-transform applied) ├─ Source: transaction_table.amount └─ Notes: 3 outliers > 999M (verified, kept) Feature Name: customer_age ├─ Type: int64 ├─ Unit: Years ├─ Range: 18 - 75 ├─ Missing: 1.5% (filled with median) ├─ Distribution: Relatively uniform ├─ Source: customer_table.age └─ Notes: Some suspicious values (999), filtered out ``` ### 14.3.2 Exploratory Data Analysis (EDA) Structure **Layered EDA Approach:** 1. **Univariate Analysis** (1 variable at a time) - Distribution, central tendency, spread - Outliers, skewness, missing values 2. **Bivariate Analysis** (2 variables) - Correlation dengan target - Feature relationships - Potential interactions 3. **Multivariate Analysis** (3+ variables) - Feature correlations - Clustering patterns - Domain insights **EDA Outputs to Document:** ``` Checklist untuk EDA Report: - [ ] Dataset shape dan basic info - [ ] Missing values visualization & handling - [ ] Distributions (histograms, KDE plots) - [ ] Outliers identified & approach decided - [ ] Correlation heatmap & top correlated features - [ ] Feature importance from EDA - [ ] Class imbalance (if classification) - [ ] Data quality issues & resolutions - [ ] Key insights & hypotheses - [ ] Feature engineering ideas ``` ### 14.3.3 Data Preparation Workflow ``` Raw Data ↓ [Clean] → Remove duplicates, fix obvious errors ↓ [Transform] → Handle missing, encode categorical, scale ↓ [Validate] → Check quality, range, distribution ↓ [Split] → Train (70%) / Validation (15%) / Test (15%) ↓ [Document] → Version data, document transformations ↓ Ready for Modeling ``` **Key Decision Points:** | Decision | Options | Trade-offs | |----------|---------|-----------| | **Missing Values** | Drop / Impute (mean/median/KNN) | Lose data vs bias | | **Categorical Encoding** | One-hot / Label / Ordinal | Sparsity vs information | | **Feature Scaling** | StandardScaler / MinMaxScaler / RobustScaler | Interpretability vs performance | | **Imbalanced Data** | Oversample / Undersample / SMOTE | Overfitting vs underfitting | ## 14.4 Baseline & Iteration ### 14.4.1 Establishing Baseline **"Baseline" = Simplest possible model untuk problem Anda.** ::: {.callout-tip} ## 💡 Why Baseline Matters Baseline bukan tentang performa tinggi. Baseline adalah: 1. **Sanity check** → Model Anda better than baseline? 2. **Reference point** → Berapa improvement dari baseline? 3. **Proof of concept** → Apakah problem solvable dengan ML? ::: **Baseline Ideas by Problem Type:** ``` CLASSIFICATION: ├─ Majority class (predict always positive/negative) ├─ Random classifier (50% untuk binary) ├─ Logistic Regression └─ Decision Tree REGRESSION: ├─ Mean predictor (always predict mean) ├─ Median predictor ├─ Linear Regression └─ Decision Tree RANKING: ├─ Random ranking ├─ Popularity ranking └─ TF-IDF based ranking ``` **Example: Fraud Detection Baseline** ```python from sklearn.metrics import precision_recall_curve import numpy as np # Baseline 1: Always predict "no fraud" (majority class) baseline1_accuracy = (fraud_data.label == 0).mean() # e.g., 98.5% baseline1_recall = 0 # Catches 0% of frauds # Baseline 2: Logistic Regression from sklearn.linear_model import LogisticRegression lr = LogisticRegression() lr.fit(X_train, y_train) baseline2_accuracy = lr.score(X_test, y_test) baseline2_recall = (lr.predict(X_test) == 1).mean() print(f"Baseline 1 (always negative): Acc={baseline1_accuracy:.1%}") print(f"Baseline 2 (Logistic): Acc={baseline2_accuracy:.1%}, " f"Recall={baseline2_recall:.1%}") # Your model harus BOTH lebih akurat AND better recall ``` ### 14.4.2 Systematic Iteration Process **Don't randomly try 100 algorithms. Iterate systematically.** ``` Phase 1: Simple Models (Week 1-2) Try: Logistic Regression, Decision Tree, KNN Goal: Understand problem & get baseline Phase 2: Intermediate Models (Week 2-3) Try: Random Forest, SVM, Gradient Boosting Goal: Find algorithm yang works best Phase 3: Advanced Models (Week 3-4) Try: Neural Networks, Ensemble, State-of-art Goal: Push towards target performance Phase 4: Optimization (Week 4-5) Try: Hyperparameter tuning, ensemble methods Goal: Final squeeze on performance ``` **Iteration Template to Document:** ```markdown ## Experiment Log ### Experiment 1: Logistic Regression Baseline - Date: 2024-01-10 - Model: LogisticRegression(C=1.0, max_iter=1000) - Features: 35 features, no scaling - Result: Accuracy=0.82, Recall=0.75, F1=0.78 - Note: Baseline established - Next: Try feature scaling ### Experiment 2: Logistic Regression + Scaling - Date: 2024-01-11 - Model: LogisticRegression with StandardScaler - Features: 35 features, StandardScaled - Result: Accuracy=0.84, Recall=0.78, F1=0.81 - Note: Slight improvement from scaling - Next: Try feature engineering ### Experiment 3: Random Forest - Date: 2024-01-12 - Model: RandomForest(n_estimators=100, max_depth=10) - Features: 35 features + 8 engineered features - Result: Accuracy=0.87, Recall=0.85, F1=0.86 - Note: Significant improvement! - Next: Hyperparameter tuning for RF ``` ### 14.4.3 Common Iteration Pitfalls ::: {.callout-warning} ## ⚠️ Things NOT to Do **❌ Tuning on Test Set** ```python # WRONG: Evaluating on test set repeatedly for hyperparams in search_space: model = fit_model(X_train, y_train, hyperparams) score = model.score(X_test, y_test) # OVERFITTING TO TEST! # RIGHT: Evaluate on validation set for hyperparams in search_space: model = fit_model(X_train, y_train, hyperparams) score = model.score(X_val, y_val) # Use validation # Final evaluation pada test set saja! ``` **❌ Data Leakage** ```python # WRONG: Scale entire dataset, then split scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # fit on ALL data! X_train, X_test = train_test_split(X_scaled) # RIGHT: Fit scaler only on training data scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # transform using train stats ``` **❌ Metric Obsession** ```python # WRONG: Chasing single metric # "My model has 97% accuracy! I'm done!" # But: 0% recall, 500ms latency, not deployable # RIGHT: Consider all metrics # Accuracy: 85%, Recall: 90%, Latency: 50ms, Interpretable: Yes ``` ::: ## 14.5 Evaluation & Validation ### 14.5.1 Comprehensive Evaluation Framework **Jangan evaluasi model berdasarkan single metric saja.** **Model Evaluation Matrix:** | Dimension | Metric | Target | |-----------|--------|--------| | **Accuracy** | F1-Score | ≥ 0.85 | | **Fairness** | Demographic Parity | < 0.1 difference | | **Robustness** | Adversarial Accuracy | > 0.80 | | **Interpretability** | SHAP importance | Top-5 features clear | | **Speed** | Latency (p95) | < 100ms | | **Resource Usage** | Memory, CPU | < 1GB RAM | ### 14.5.2 Advanced Validation Techniques **1. K-Fold Cross Validation** ```python from sklearn.model_selection import cross_validate # Better than single train/test split cv_results = cross_validate( model, X, y, cv=5, # 5-fold scoring=['accuracy', 'precision', 'recall', 'f1'], return_train_score=True ) # Check for overfitting train_score = cv_results['train_accuracy'].mean() test_score = cv_results['test_accuracy'].mean() print(f"Train: {train_score:.3f}, Test: {test_score:.3f}") if (train_score - test_score) > 0.15: print("⚠️ Possible overfitting! Gap > 15%") ``` **2. Stratified Split (untuk imbalanced data)** ```python from sklearn.model_selection import StratifiedKFold # Ensure class distribution maintained skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) for train_idx, test_idx in skf.split(X, y): X_train, X_test = X[train_idx], X[test_idx] y_train, y_test = y[train_idx], y[test_idx] # Training loop ``` **3. Time Series Cross Validation (untuk sequential data)** ```python from sklearn.model_selection import TimeSeriesSplit # Don't shuffle! Respect temporal order tscv = TimeSeriesSplit(n_splits=5) for train_idx, test_idx in tscv.split(X): X_train, X_test = X[train_idx], X[test_idx] # Model trained on past, tested on future ``` ### 14.5.3 Diagnostic Plots **Essential diagnostic plots untuk capstone:** ```python import matplotlib.pyplot as plt from sklearn.metrics import confusion_matrix, roc_curve, auc import seaborn as sns fig, axes = plt.subplots(2, 2, figsize=(12, 10)) # 1. Confusion Matrix cm = confusion_matrix(y_test, y_pred) sns.heatmap(cm, annot=True, ax=axes[0, 0], cmap='Blues') axes[0, 0].set_title('Confusion Matrix') # 2. ROC Curve fpr, tpr, _ = roc_curve(y_test, y_pred_proba) auc_score = auc(fpr, tpr) axes[0, 1].plot(fpr, tpr, label=f'AUC={auc_score:.3f}') axes[0, 1].plot([0, 1], [0, 1], 'k--', label='Random') axes[0, 1].set_xlabel('FPR'), axes[0, 1].set_ylabel('TPR') axes[0, 1].set_title('ROC Curve') axes[0, 1].legend() # 3. Precision-Recall Curve from sklearn.metrics import precision_recall_curve precision, recall, _ = precision_recall_curve(y_test, y_pred_proba) axes[1, 0].plot(recall, precision) axes[1, 0].set_xlabel('Recall'), axes[1, 0].set_ylabel('Precision') axes[1, 0].set_title('Precision-Recall Curve') # 4. Feature Importance feature_importance = pd.DataFrame({ 'feature': X_test.columns, 'importance': model.feature_importances_ }).sort_values('importance', ascending=False) axes[1, 1].barh(feature_importance['feature'][:10], feature_importance['importance'][:10]) axes[1, 1].set_title('Top 10 Important Features') plt.tight_layout() plt.savefig('model_diagnostics.png', dpi=300, bbox_inches='tight') ``` ## 14.6 Documentation Best Practices ### 14.6.1 Project Structure & Documentation **Professional project structure:** ``` my-capstone-project/ ├── README.md # Project overview ├── DOCUMENTATION.md # Detailed documentation ├── LICENSE # MIT or similar ├── .gitignore # Exclude large files, cache │ ├── data/ │ ├── raw/ # Original data │ ├── processed/ # Cleaned, transformed data │ └── README.md # Data dictionary │ ├── notebooks/ │ ├── 01_eda.ipynb # Exploratory analysis │ ├── 02_preprocessing.ipynb # Data preparation │ └── 03_modeling.ipynb # Model training │ ├── src/ │ ├── __init__.py │ ├── preprocessing.py # Data prep functions │ ├── features.py # Feature engineering │ ├── models.py # Model definitions │ ├── evaluation.py # Evaluation metrics │ └── utils.py # Helper functions │ ├── models/ │ ├── model_v1.0.pkl # Saved models │ ├── model_v1.1.pkl │ └── model_card.md # Model documentation │ ├── reports/ │ ├── eda_report.html # EDA visualization │ ├── model_comparison.csv # Experiment results │ └── final_report.pdf # Final analysis │ ├── tests/ │ ├── test_preprocessing.py # Unit tests │ ├── test_models.py │ └── test_pipeline.py │ ├── requirements.txt # Dependencies ├── setup.py # Package setup (if publishing) └── train.py # Main training script ``` ### 14.6.2 Documentation Template **README.md Example:** ```markdown # Credit Risk Prediction Model ## Overview Building an automated credit scoring system to reduce loan approval time from 3 days to <24 hours while maintaining default rate below 5%. ## Dataset - **Source**: GrowthBank historical loans (2018-2023) - **Size**: 10,000 samples × 42 features - **Target**: Binary (Default/Non-default), 8% positive class - **Time Period**: 2018-01-01 to 2023-12-31 ## Project Structure [File structure description] ## Quick Start ``` pip install -r requirements.txt python train.py ``` ## Results - **Best Model**: Random Forest - **Accuracy**: 87.2% ± 1.3% (5-fold CV) - **Recall (Default)**: 91.5% (catch 91.5% of defaults) - **Precision**: 45.2% (acceptable false positive rate) - **ROC-AUC**: 0.934 ## Model Performance [Performance visualization and metrics table] ## Key Findings 1. Transaction frequency is strongest indicator of default risk 2. Recent payment history more important than historical average 3. Model shows no significant bias against age groups ## Limitations - Limited to B2B loans (may not generalize to consumer) - Training data from 2018-2023 (concept drift possible) - No alternative data sources (e.g., behavioral) ## Future Work - Implement real-time model monitoring - Extend to multi-class risk levels (low/medium/high) - Add fairness constraints for protected attributes ``` ### 14.6.3 Model Card Documentation **Model Card** = standardized documentation untuk ML model. ```markdown # Model Card: Credit Risk Classifier v1.2 ## Model Details - **Model Type**: Random Forest Classification - **Framework**: Scikit-learn - **Version**: 1.2 - **Date**: 2024-01-15 - **Authors**: [Your Name] - **License**: MIT ## Intended Use - **Intended Use**: Automated credit risk assessment for B2B loans - **Primary Users**: Credit department, lending officers - **Out-of-Scope**: Consumer lending, international markets ## Performance - **Training Data**: 8,000 samples (80%) - **Test Data**: 2,000 samples (20%) - **Metric**: Accuracy, Precision, Recall, F1 ### Detailed Performance Metrics ``` Precision Recall F1-Score Support Non-Default (0) 0.93 0.88 0.90 1840 Default (1) 0.45 0.92 0.61 160 Accuracy 0.87 2000 ``` ## Fairness Analysis - **Gender Bias**: FPR diff = 1.2% (acceptable) - **Age Bias**: ROC-AUC for <30 = 0.92, >50 = 0.94 (no significant difference) ## Limitations - Only trained on B2B segment - Assumes data distribution similar to training period - Requires regular retraining ## Data and Preprocessing - **Training Data**: 8000 historical loans with known outcomes - **Input Features**: 42 features (company profile + financials) - **Preprocessing**: StandardScaler on numerical, OneHot on categorical ``` ## 14.7 Code Quality & Reproducibility ### 14.7.1 Reproducibility Checklist ::: {.callout-important} ## ✅ Reproducibility Checklist **Code Reproducibility:** - [ ] Git repository with full history - [ ] `requirements.txt` dengan pinned versions - [ ] Random seeds set (numpy, tensorflow, random) - [ ] Training script dapat dijalankan end-to-end - [ ] Output konsisten di multiple runs **Data Reproducibility:** - [ ] Data version tracked (DVC atau git-lfs) - [ ] Data dictionary lengkap - [ ] Train/val/test split reproducible - [ ] Data preprocessing documented - [ ] Raw data tetap immutable **Results Reproducibility:** - [ ] Model weights saved - [ ] Hyperparameters documented - [ ] Metrics calculation transparent - [ ] Training logs preserved - [ ] Can regenerate paper/presentation figures ::: **Example: Reproducible Training Script** ```python # train.py - Fully reproducible training import numpy as np import random import tensorflow as tf from pathlib import Path import json # Set random seeds for reproducibility RANDOM_SEED = 42 np.random.seed(RANDOM_SEED) random.seed(RANDOM_SEED) tf.random.set_seed(RANDOM_SEED) def main(config_path='config.json'): """Main training function""" # Load configuration with open(config_path, 'r') as f: config = json.load(f) # Load and preprocess data X_train, y_train, X_test, y_test = load_data( config['data_path'], test_size=config['test_size'], random_state=RANDOM_SEED ) # Train model model = train_model( X_train, y_train, **config['model_params'] ) # Evaluate metrics = evaluate_model(model, X_test, y_test) # Save results save_artifacts(model, metrics, config) return metrics if __name__ == "__main__": metrics = main() print(f"Final accuracy: {metrics['accuracy']:.3f}") ``` **config.json:** ```json { "data_path": "data/processed/", "test_size": 0.2, "model_params": { "n_estimators": 100, "max_depth": 10, "random_state": 42 }, "output_dir": "results/model_v1.2/" } ``` ### 14.7.2 Code Quality Standards **Minimal standards untuk capstone:** ```python # ✅ Good: Functions with docstrings def prepare_features(X, categorical_cols, numerical_cols): """ Prepare features untuk model training. Parameters ---------- X : pd.DataFrame Input features dengan categorical dan numerical columns categorical_cols : list Names of categorical columns untuk OneHot encoding numerical_cols : list Names of numerical columns untuk scaling Returns ------- X_prepared : np.ndarray Prepared feature matrix ready for training """ # Implementation return X_prepared # ✅ Good: Comments explain WHY, not WHAT # Skip features dengan >50% missing values # (imputation would introduce too much bias) mask = feature_missing_rate < 0.5 X_filtered = X.loc[:, mask] # ❌ Bad: Comments repeat code for i in range(len(data)): # Loop through data if data[i] > threshold: # Check if greater results.append(data[i]) # Add to results ``` **Type hints untuk clarity:** ```python from typing import Tuple, List, Dict import pandas as pd import numpy as np def split_features( X: pd.DataFrame, target_ratio: float = 0.2 ) -> Tuple[pd.DataFrame, pd.DataFrame]: """Split into train and test with specified ratio.""" n_test = int(len(X) * target_ratio) return X[:-n_test], X[-n_test:] def calculate_metrics( y_true: np.ndarray, y_pred: np.ndarray ) -> Dict[str, float]: """Calculate evaluation metrics.""" return { 'accuracy': (y_true == y_pred).mean(), # ... more metrics } ``` ## 14.8 Technical Report Writing ### 14.8.1 Report Structure **Standar technical report capstone (15-30 pages):** ``` 1. Executive Summary (1 page) - Problem, solution, results - Key recommendation - Business impact 2. Introduction (2-3 pages) - Context dan motivation - Problem statement - Research questions - Contributions 3. Literature Review (2-3 pages) - Related work - Existing solutions - Knowledge gaps - How your work differs 4. Methodology (3-4 pages) - Problem formulation (mathematical) - Algorithms & approaches - Evaluation metrics - Hyperparameters 5. Data Description (2-3 pages) - Dataset characteristics - Data collection & preprocessing - Feature engineering - Class distribution, missing values - Train/val/test splits 6. Results (3-5 pages) - Model comparison - Best model performance - Ablation studies - Visualizations (confusion matrix, ROC, etc.) 7. Analysis & Discussion (2-3 pages) - Interpret results - Why did model work/fail - Limitations - Error analysis 8. Conclusion & Future Work (1-2 pages) - Summary of findings - Practical implications - Directions for future research 9. References (1-2 pages) - Academic citations - Data sources - Software libraries ``` ### 14.8.2 Writing Best Practices **❌ Avoid:** - Overly technical jargon tanpa explanation - Unsupported claims ("Our model is the best!") - Lengthy code listings in main report - Vague statements ("We tried many models") **✅ Do:** - Explain technical concepts tersedia untuk general audience - Support claims dengan evidence (metrics, citations) - Put code in appendix - Be specific ("We evaluated 12 models using GridSearchCV") **Example: Good vs Bad Writing** ``` ❌ BAD: "We used an RF with optimized hyperparameters to maximize the AUC on the validation dataset." ✅ GOOD: "We trained a Random Forest classifier with 100 trees and max depth of 10 (selected via 5-fold cross-validation to maximize AUC-ROC). The model achieved 87.2% accuracy on the test set." ``` ## 14.9 Presentation & Demo Skills ### 14.9.1 Presentation Structure **Final presentation (15-20 minutes):** ``` 0-1 min: Title slide, introduce yourself 1-2 min: The Problem (why should audience care?) 2-3 min: Your Solution (brief overview) 3-8 min: Key Results (metrics, visualizations) 8-12 min: Technical Deep Dive (1-2 complex topics) 12-15 min: Limitations & Future Work 15-20 min: Q&A ``` **Slide Guidelines:** - **Slide 1-2**: Title + Problem (make audience care!) - **Slide 3-4**: Data overview (n samples, features, class distribution) - **Slide 5-6**: Approach (your method vs baselines) - **Slide 7-10**: Results (visualizations!) - Confusion matrix - ROC/PR curve - Feature importance - Model comparison - **Slide 11-12**: Deep dive on 1-2 interesting findings - **Slide 13**: Limitations honestly discussed - **Slide 14**: Future work & lessons learned ::: {.callout-tip} ## 💡 Presentation Tips 1. **Lead with Why**: "Why should anyone care about this problem?" 2. **Show Impact**: How does this solve a real problem? 3. **Use Visualizations**: Data > tables > text 4. **Tell a Story**: Not just "We did X and got Y accuracy" 5. **Prepare for Questions**: Know your code, data, and assumptions 6. **Practice**: Time yourself, get feedback ::: ### 14.9.2 Live Demo Best Practices **If demonstrating working system:** ``` ✅ DO: - [ ] Test demo multiple times before presentation - [ ] Have backup screenshots/video if live breaks - [ ] Keep demo simple (don't show complex edge cases) - [ ] Make prediction in <5 seconds - [ ] Explain what model learned, not just output ❌ DON'T: - Don't go off script ("Let me show you something...") - Don't click around aimlessly - Don't make predictions on untypical data - Don't forget what model is doing - Don't spend >2 minutes on demo ``` ## 14.10 Project Examples & Case Studies ### 14.10.1 Case Study 1: Fraud Detection (Classification) **Problem:** E-commerce platform dengan 100K+ transactions/day, fraud rate 0.8%. **Approach:** ``` 1. Baseline: Logistic Regression - Accuracy: 99.8% (predicting all non-fraud!) - Recall: 0% (catches 0 frauds) ❌ 2. Strategy: Class imbalance handling - SMOTE oversampling untuk minority class - Use Recall as primary metric - Adjust decision threshold 3. Best Model: Gradient Boosting + Custom Threshold - Accuracy: 98.5% - Recall: 92% (catch 92% of frauds) - Precision: 35% (acceptable for fraud detection) - Live performance: catches $2M fraud/month ``` **Key Lessons:** - Accuracy misleading pada imbalanced data - Recall > Precision untuk fraud (catch fraud, accept false alarms) - Real-time constraints matter (must score transaction in <100ms) ### 14.10.2 Case Study 2: Predictive Maintenance (Regression) **Problem:** Manufacturing plant, reduce unplanned downtime. **Approach:** ``` 1. Data: 5 years sensor data (temperature, vibration, pressure) 2. Target: RUL (Remaining Useful Life) prediction 3. Baseline: Linear regression on recent sensor readings - RMSE: 150 hours 4. Advanced: Sequence-to-sequence LSTM - Input: 7 days sensor history - Output: Days until failure - RMSE: 42 hours (3x better) - Enables preventive maintenance ``` **Challenges & Solutions:** - **Class imbalance**: Most machines work fine - Solution: Weighted loss function, focus on failures - **Seasonality**: Equipment behaves differently by season - Solution: Add seasonal features - **Concept drift**: Equipment degrades over 5 years - Solution: Retrain monthly with recent data ## 14.11 Common Pitfalls & Prevention ### 14.11.1 Technical Pitfalls ::: {.callout-warning} ## ⚠️ Data Leakage (Most Common!) **❌ Problem: Information from future leaks into training** ```python # WRONG: Use future information in features for idx in range(len(data)): # Using data[idx+1] (future) to predict data[idx] features[idx] = [ data[idx]['price'], data[idx+1]['price'], # ← FUTURE DATA! data[idx+1]['volume'] # ← FUTURE DATA! ] # RIGHT: Use only past information for idx in range(1, len(data)): # Start from idx=1 features[idx] = [ data[idx]['price'], data[idx-1]['price'], # ← Past data[idx-1]['volume'] # ← Past ] ``` **❌ Problem: Scaling entire dataset before train/test split** ```python # WRONG: Fit scaler on entire dataset scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # ← Fit on ALL data X_train, X_test = train_test_split(X_scaled) # Model "sees" test set information during scaling! # RIGHT: Fit scaler ONLY on training data X_train, X_test = train_test_split(X) scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) # ← Fit on train only X_test_scaled = scaler.transform(X_test) # ← Transform using train stats ``` **How to prevent:** - [ ] Double-check: Can model see future/test data? - [ ] Audit preprocessing steps - [ ] Use cross-validation to catch leakage - [ ] Have someone else review code ::: ::: {.callout-warning} ## ⚠️ Overfitting **❌ Training accuracy 99%, test accuracy 70%** ```python # Signs of overfitting: if train_acc - test_acc > 0.15: # Gap > 15% print("⚠️ Probable overfitting!") # Solutions: # 1. More training data # 2. Reduce model complexity # 3. Regularization (L1/L2) # 4. Early stopping # 5. Dropout (neural networks) # ✅ Regularized model from sklearn.linear_model import LogisticRegression model = LogisticRegression(C=0.01, penalty='l2') # C < 1 means stronger regularization ``` ::: ::: {.callout-warning} ## ⚠️ Hyperparameter Tuning on Test Set ```python # ❌ WRONG: Evaluating on test set repeatedly best_score = 0 for C in [0.001, 0.01, 0.1, 1, 10]: model = LogisticRegression(C=C) model.fit(X_train, y_train) score = model.score(X_test, y_test) # ← TUNING ON TEST! if score > best_score: best_score = score best_C = C # ✅ RIGHT: Use validation set for hyperparameter selection for C in [0.001, 0.01, 0.1, 1, 10]: model = LogisticRegression(C=C) model.fit(X_train, y_train) score = model.score(X_val, y_val) # ← USE VALIDATION if score > best_score: best_score = score best_C = C # Final test on held-out test set final_score = best_model.score(X_test, y_test) ``` ::: ### 14.11.2 Project Management Pitfalls | Pitfall | Symptom | Prevention | |---------|---------|-----------| | **Scope Creep** | Keeps adding features | Lock scope week 2, use change request process | | **No Baseline** | Don't know if model good | Implement baseline in week 1 | | **Single Train/Test** | High variance in metrics | Use 5-fold cross-validation | | **Ignoring Class Imbalance** | 98% accuracy but useless | Check class distribution day 1 | | **No Documentation** | Can't reproduce results | Write as you code, not at end | | **Last-Minute Demo** | Presentation full of errors | Practice presentation 1 week before | ### 14.11.3 Communication Pitfalls ::: {.callout-warning} ## ⚠️ Overselling Results **❌ WRONG:** "Our model is 95% accurate!" (In context of imbalanced dataset: 95% = predicting majority class) **✅ RIGHT:** "Our model achieves 95% accuracy, 85% recall, and 60% precision. This means we catch 85% of frauds with 60% of alerts being true positives." (Specific, contextualized, honest) ::: ## 14.12 Project Grading Rubric ### 14.12.1 Comprehensive Rubric untuk Evaluasi **Total: 100 points** **1. Problem Definition & Planning (15 points)** | Criteria | Excellent (A) | Good (B) | Fair (C) | Poor (D) | |----------|---------------|---------|---------|---------| | **Problem Statement** | Clear, specific, measurable, business-aligned | Clear problem, minor scope issues | Problem vague, weak business case | Problem ill-defined | | (0-5 pts) | 5 | 4 | 2 | 0 | | **Data Planning** | Complete strategy, quality assessment, privacy check | Data plan documented, mostly complete | Minimal planning, data quality not assessed | No data strategy | | (0-5 pts) | 5 | 4 | 2 | 0 | | **Feasibility & Timeline** | Realistic scope, detailed milestone, risk mitigation | Reasonable scope, milestone defined | Over-scoped or unrealistic timeline | No feasibility analysis | | (0-5 pts) | 5 | 4 | 2 | 0 | **2. Data & Analysis (15 points)** | Criteria | Excellent (A) | Good (B) | Fair (C) | Poor (D) | |----------|---------------|---------|---------|---------| | **Data Quality** | Clean, well-documented, proper handling of missing/outliers | Mostly clean, minor issues handled | Some quality issues, incomplete cleaning | Poor data quality | | (0-5 pts) | 5 | 4 | 2 | 0 | | **EDA & Insights** | Comprehensive EDA, clear insights, good visualizations | Good EDA, identified patterns | Basic EDA, minimal insights | Insufficient analysis | | (0-5 pts) | 5 | 4 | 2 | 0 | | **Feature Engineering** | Smart features, validated impact, interpretable | Some feature engineering, reasonable | Minimal feature work | No feature engineering | | (0-5 pts) | 5 | 4 | 2 | 0 | **3. Model Development (25 points)** | Criteria | Excellent (A) | Good (B) | Fair (C) | Poor (D) | |----------|---------------|---------|---------|---------| | **Approach** | Multiple algorithms tested, systematic comparison, justification | 2-3 algorithms, reasonable comparison | Limited algorithm exploration | Single model tried | | (0-8 pts) | 8 | 6 | 4 | 0 | | **Validation** | K-fold CV, proper train/val/test split, no data leakage | Proper splitting, no obvious leakage | Basic validation, possible issues | Train/test contamination | | (0-9 pts) | 9 | 7 | 5 | 0 | | **Hyperparameter Tuning** | Systematic tuning, documented process, prevents overfitting | Good tuning, reasonable results | Limited tuning | Random/no tuning | | (0-8 pts) | 8 | 6 | 4 | 0 | **4. Evaluation & Results (20 points)** | Criteria | Excellent (A) | Good (B) | Fair (C) | Poor (D) | |----------|---------------|---------|---------|---------| | **Metrics Selection** | Appropriate for problem, multiple metrics, justified | Good metrics, mostly justified | Limited metrics, weak justification | Inappropriate metrics | | (0-5 pts) | 5 | 4 | 2 | 0 | | **Results & Analysis** | Clear improvement over baseline, thorough analysis, error discussion | Good results with analysis | Acceptable results, minimal analysis | Poor results, no analysis | | (0-8 pts) | 8 | 6 | 4 | 0 | | **Reproducibility** | Complete reproducibility, clear results, documented seeds | Mostly reproducible, good documentation | Somewhat reproducible, missing details | Not reproducible | | (0-7 pts) | 7 | 5 | 3 | 0 | **5. Documentation & Code Quality (15 points)** | Criteria | Excellent (A) | Good (B) | Fair (C) | Poor (D) | |----------|---------------|---------|---------|---------| | **Code Quality** | Clean, documented, proper structure, follows best practices | Generally clean, documented, reasonable structure | Some documentation, inconsistent style | Poor documentation, messy code | | (0-5 pts) | 5 | 4 | 2 | 0 | | **Documentation** | Complete report, clear writing, all sections thorough | Good documentation, mostly complete | Incomplete documentation, some clarity issues | Minimal documentation | | (0-5 pts) | 5 | 4 | 2 | 0 | | **Repository** | Organized structure, git history clean, includes all artifacts | Good organization, mostly complete | Basic organization, missing some items | Disorganized or incomplete | | (0-5 pts) | 5 | 4 | 2 | 0 | **6. Presentation (10 points)** | Criteria | Excellent (A) | Good (B) | Fair (C) | Poor (D) | |----------|---------------|---------|---------|---------| | **Clarity** | Clear story, appropriate level, engaging for audience | Generally clear, minor issues | Some clarity issues, organization weak | Confusing or hard to follow | | (0-5 pts) | 5 | 4 | 2 | 0 | | **Delivery** | Confident, paced well, handles questions effectively | Good delivery, minor pacing issues | Nervous or rushed, struggles with questions | Poor presentation skills | | (0-5 pts) | 5 | 4 | 2 | 0 | ### 14.12.2 Grading Scale ``` 90-100: A (Excellent) - Professional-quality project - Clear contribution - Production-ready code - Excellent presentation 80-89: B (Good) - Solid project with minor issues - Good approach and results - Well-documented - Good presentation 70-79: C (Fair) - Acceptable project, significant gaps - Basic approach, adequate results - Documentation could be better - Adequate presentation 60-69: D (Poor) - Project not fully meeting standards - Issues in approach or execution - Poor documentation - Weak presentation <60: F (Fail) - Does not meet minimum standards ``` ## 14.13 Ringkasan ::: {.callout-note} ## 📚 Chapter Summary **1. Project Planning** - Scoping adalah kunci sukses - Define problem dengan SMART criteria - Plan realistic timeline dengan milestones **2. Problem Formulation** - Clear problem statement (context, problem, data, metrics) - Match metrics dengan business goals - Consider constraints (latency, privacy, fairness) **3. Data Strategy** - Comprehensive data collection plan - Structured EDA dengan insights - Proper preprocessing dengan documentation **4. Baseline & Iteration** - Establish baseline untuk context - Iterate systematically (not randomly) - Document experiments carefully **5. Evaluation & Validation** - Use multiple metrics (not single number) - K-fold cross-validation untuk robustness - Create diagnostic plots untuk understanding **6. Documentation** - Professional project structure - Complete README dan model card - Reproducible code dengan seeds **7. Presentation** - Lead dengan WHY - Use visualizations effectively - Tell coherent story **8. Avoid Common Pitfalls** - Data leakage (most common!) - Overfitting - Tuning on test set - Ignoring class imbalance ::: ## Checklist Akhir: 30 Days Before Submission ::: {.callout-important} ## ✅ Final 30-Day Checklist **Week 1-2 (Code finalization)** - [ ] All code reviewed and cleaned - [ ] Tests written and passing - [ ] No hardcoded paths or credentials - [ ] Requirements.txt updated dengan versions - [ ] Git history clean (meaningful commits) **Week 2-3 (Documentation)** - [ ] README.md complete dan tested - [ ] Code comments explain WHY not WHAT - [ ] Model card written - [ ] EDA report finalized - [ ] All results reproducible **Week 3-4 (Presentation)** - [ ] Slides drafted (14-15 slides) - [ ] Key visualizations created - [ ] Practice presentation (3x minimum) - [ ] Get feedback from mentor/friend - [ ] Prepare for common Q&A **Final Week (Testing)** - [ ] Run entire pipeline end-to-end - [ ] Verify all outputs match report - [ ] Check presentation on actual equipment - [ ] Submit early (day before deadline) - [ ] Take screenshot proof ::: --- **🎓 Selamat! Anda siap memulai capstone project!** Ingat: Scope kecil tapi selesai dengan baik > scope besar tapi tidak selesai. Fokus pada **QUALITY over QUANTITY**. Good luck! 🚀