Bab 6: Convolutional Neural Networks & Transfer Learning

Computer Vision dan Deep Learning untuk Image Recognition

Bab 6: Convolutional Neural Networks & Transfer Learning

🎯 Hasil Pembelajaran (Learning Outcomes)

Setelah mempelajari bab ini, Anda akan mampu:

Memahami arsitektur CNN dan komponen fundamentalnya (convolutional layers, pooling, filters)
Mengimplementasikan CNN dari scratch menggunakan Keras dan PyTorch untuk klasifikasi gambar
Menerapkan transfer learning dengan pre-trained models (VGG16, ResNet, MobileNet)
Menggunakan data augmentation techniques untuk meningkatkan generalisasi model
Melakukan fine-tuning strategies untuk mengadaptasi pre-trained models ke domain spesifik
Mengevaluasi performa CNN dan membandingkan berbagai arsitektur untuk computer vision tasks

6.1 Pengantar Computer Vision dan CNN

6.1.1 Mengapa Specialized Architecture untuk Images?

Di Chapter 5, kita belajar MLP (Multilayer Perceptron) yang bekerja dengan flatten input. Namun, untuk gambar, pendekatan ini memiliki masalah fundamental:

Problem dengan MLP untuk Images:

Bayangkan gambar 224×224×3 (RGB):

Input neurons: 224 × 224 × 3 = 150,528 neurons
Hidden layer (1000 neurons): 150,528 × 1000 = 150 juta parameters
Hanya untuk satu layer!

Masalah:

Overfitting: Terlalu banyak parameters untuk data terbatas
Computational cost: Training sangat lambat
Loss of spatial structure: Flatten menghilangkan informasi lokasi
No translation invariance: Objek di posisi berbeda = pattern berbeda

💡 Intuisi CNN

CNN mengatasi masalah di atas dengan:

Local connectivity: Setiap neuron hanya connect ke region kecil
Parameter sharing: Filter yang sama digunakan di seluruh gambar
Hierarchical learning: Layer awal deteksi edges, layer dalam deteksi objects
Translation invariance: Deteksi pattern di posisi manapun

Hasil: CNN dengan 99% lebih sedikit parameters daripada fully-connected MLP!

6.1.2 Evolution of Computer Vision

Era Pre-Deep Learning (sebelum 2012):

Manual feature extraction (SIFT, HOG, SURF)
Classical ML (SVM, Random Forest) pada engineered features
ImageNet 2011: Error rate ~26%

Deep Learning Revolution (2012-sekarang):

AlexNet (2012): Error 15.3% - first CNN breakthrough
VGGNet (2014): Deeper networks (16-19 layers)
ResNet (2015): Skip connections, 152 layers, error 3.57%
EfficientNet (2019): Optimal scaling, SOTA performance
Vision Transformers (2020): Attention mechanisms untuk vision

📊 Computer Vision Applications Today

Industry Applications:

Healthcare: Medical imaging analysis, tumor detection
Autonomous Vehicles: Object detection, lane recognition
Retail: Visual search, inventory management
Security: Face recognition, surveillance systems
Manufacturing: Quality control, defect detection
Agriculture: Crop disease identification, yield prediction

6.1.3 Image Fundamentals untuk Deep Learning

Representasi Digital Image:

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import requests
from io import BytesIO

# Create simple synthetic image untuk demonstrasi
img_array = np.random.randint(0, 255, (64, 64, 3), dtype=np.uint8)

# Visualisasi RGB channels
fig, axes = plt.subplots(1, 4, figsize=(16, 4))

# Original image
axes[0].imshow(img_array)
axes[0].set_title('Original Image\nShape: (64, 64, 3)', fontweight='bold')
axes[0].axis('off')

# Individual channels
channel_names = ['Red Channel', 'Green Channel', 'Blue Channel']
colors = ['Reds', 'Greens', 'Blues']

for i in range(3):
    axes[i+1].imshow(img_array[:, :, i], cmap=colors[i])
    axes[i+1].set_title(f'{channel_names[i]}\nShape: (64, 64)', fontweight='bold')
    axes[i+1].axis('off')

plt.tight_layout()
plt.show()

# Print image properties
print(f"Image shape: {img_array.shape}")
print(f"Data type: {img_array.dtype}")
print(f"Value range: [{img_array.min()}, {img_array.max()}]")
print(f"Total pixels: {img_array.shape[0] * img_array.shape[1]}")
print(f"Total values: {img_array.size}")

Image Preprocessing untuk CNN:

import tensorflow as tf
from tensorflow import keras

# Preprocessing pipeline
def preprocess_image(image_array):
    """
    Standard preprocessing untuk CNN
    """
    # 1. Normalize ke [0, 1]
    normalized = image_array.astype('float32') / 255.0

    # 2. Standardize (zero mean, unit variance) - optional
    mean = np.array([0.485, 0.456, 0.406])  # ImageNet statistics
    std = np.array([0.229, 0.224, 0.225])
    standardized = (normalized - mean) / std

    return normalized, standardized

# Demonstrasi
normalized, standardized = preprocess_image(img_array)

print("Original range:", img_array.min(), "-", img_array.max())
print("Normalized range:", normalized.min(), "-", normalized.max())
print("Standardized range:", standardized.min(), "-", standardized.max())

Spatial Relationships dalam Images:

Code

graph TD
    A[Raw Pixels] --> B[Local Patterns<br/>edges, corners, textures]
    B --> C[Object Parts<br/>eyes, wheels, windows]
    C --> D[Objects<br/>faces, cars, buildings]
    D --> E[Scenes<br/>street, office, nature]

    style A fill:#ff9999
    style B fill:#ffcc99
    style C fill:#ffff99
    style D fill:#99ff99
    style E fill:#99ccff

graph TD
    A[Raw Pixels] --> B[Local Patterns<br/>edges, corners, textures]
    B --> C[Object Parts<br/>eyes, wheels, windows]
    C --> D[Objects<br/>faces, cars, buildings]
    D --> E[Scenes<br/>street, office, nature]

    style A fill:#ff9999
    style B fill:#ffcc99
    style C fill:#ffff99
    style D fill:#99ff99
    style E fill:#99ccff

6.1.4 Challenges dalam Image Classification

Common Challenges:

Intra-class Variation: Objek sama, penampilan berbeda
Viewpoint Variation: Sudut pandang berbeda
Illumination Changes: Pencahayaan berbeda
Occlusion: Objek terhalang
Background Clutter: Latar belakang complex
Deformation: Perubahan bentuk (pose)

# Demonstrasi transformations yang CNN harus handle
from scipy import ndimage

# Create simple shape
shape = np.zeros((100, 100))
shape[30:70, 30:70] = 1

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Original
axes[0, 0].imshow(shape, cmap='gray')
axes[0, 0].set_title('Original', fontweight='bold')

# Rotation
rotated = ndimage.rotate(shape, 45, reshape=False)
axes[0, 1].imshow(rotated, cmap='gray')
axes[0, 1].set_title('Rotation (45°)', fontweight='bold')

# Scaling
from scipy.ndimage import zoom
scaled = zoom(shape, 0.7)
padded_scaled = np.zeros_like(shape)
y_offset = (shape.shape[0] - scaled.shape[0]) // 2
x_offset = (shape.shape[1] - scaled.shape[1]) // 2
padded_scaled[y_offset:y_offset+scaled.shape[0],
              x_offset:x_offset+scaled.shape[1]] = scaled
axes[0, 2].imshow(padded_scaled, cmap='gray')
axes[0, 2].set_title('Scaling (0.7x)', fontweight='bold')

# Translation
translated = np.roll(shape, (20, 20), axis=(0, 1))
axes[1, 0].imshow(translated, cmap='gray')
axes[1, 0].set_title('Translation', fontweight='bold')

# Noise
noisy = shape + np.random.normal(0, 0.1, shape.shape)
axes[1, 1].imshow(noisy, cmap='gray')
axes[1, 1].set_title('Gaussian Noise', fontweight='bold')

# Blur
blurred = ndimage.gaussian_filter(shape, sigma=3)
axes[1, 2].imshow(blurred, cmap='gray')
axes[1, 2].set_title('Gaussian Blur', fontweight='bold')

for ax in axes.flat:
    ax.axis('off')

plt.tight_layout()
plt.show()

6.2 Arsitektur CNN: Building Blocks

6.2.1 Convolutional Layer: Core Operation

Konsep Convolution:

Convolution adalah operasi matematika yang mengaplikasikan filter (kernel) ke image untuk mengekstrak features.

Convolution Operation:

import numpy as np
import matplotlib.pyplot as plt

# Simple 5x5 image
image = np.array([
    [1, 2, 3, 2, 1],
    [2, 3, 4, 3, 2],
    [3, 4, 5, 4, 3],
    [2, 3, 4, 3, 2],
    [1, 2, 3, 2, 1]
])

# Edge detection filter (vertical edges)
kernel_vertical = np.array([
    [-1, 0, 1],
    [-2, 0, 2],
    [-1, 0, 1]
])

# Edge detection filter (horizontal edges)
kernel_horizontal = np.array([
    [-1, -2, -1],
    [ 0,  0,  0],
    [ 1,  2,  1]
])

# Manual convolution operation
def convolve2d(image, kernel):
    """
    Simple 2D convolution (without padding)
    """
    h, w = image.shape
    kh, kw = kernel.shape
    output_h = h - kh + 1
    output_w = w - kw + 1
    output = np.zeros((output_h, output_w))

    for i in range(output_h):
        for j in range(output_w):
            region = image[i:i+kh, j:j+kw]
            output[i, j] = np.sum(region * kernel)

    return output

# Apply filters
vertical_edges = convolve2d(image, kernel_vertical)
horizontal_edges = convolve2d(image, kernel_horizontal)

# Visualize
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

axes[0, 0].imshow(image, cmap='gray')
axes[0, 0].set_title('Original Image\n(5×5)', fontweight='bold', fontsize=12)

axes[0, 1].imshow(kernel_vertical, cmap='seismic', vmin=-2, vmax=2)
axes[0, 1].set_title('Vertical Edge Filter\n(Sobel-X)', fontweight='bold', fontsize=12)
for i in range(3):
    for j in range(3):
        axes[0, 1].text(j, i, f'{kernel_vertical[i, j]}',
                       ha='center', va='center', fontsize=14, fontweight='bold')

axes[0, 2].imshow(vertical_edges, cmap='gray')
axes[0, 2].set_title('Vertical Edges\n(3×3 output)', fontweight='bold', fontsize=12)

axes[1, 0].imshow(image, cmap='gray')
axes[1, 0].set_title('Original Image\n(5×5)', fontweight='bold', fontsize=12)

axes[1, 1].imshow(kernel_horizontal, cmap='seismic', vmin=-2, vmax=2)
axes[1, 1].set_title('Horizontal Edge Filter\n(Sobel-Y)', fontweight='bold', fontsize=12)
for i in range(3):
    for j in range(3):
        axes[1, 1].text(j, i, f'{kernel_horizontal[i, j]}',
                       ha='center', va='center', fontsize=14, fontweight='bold')

axes[1, 2].imshow(horizontal_edges, cmap='gray')
axes[1, 2].set_title('Horizontal Edges\n(3×3 output)', fontweight='bold', fontsize=12)

for ax in axes.flat:
    ax.axis('off')

plt.tight_layout()
plt.show()

print("\nConvolution Mathematics:")
print("Input shape:", image.shape)
print("Kernel shape:", kernel_vertical.shape)
print("Output shape:", vertical_edges.shape)
print("\nOutput size formula: (H - K + 1) × (W - K + 1)")
print(f"= ({image.shape[0]} - {kernel_vertical.shape[0]} + 1) × ({image.shape[1]} - {kernel_vertical.shape[1]} + 1)")
print(f"= {vertical_edges.shape}")

Convolutional Layer Parameters:

# Visualize convolutional layer parameters
def visualize_conv_params():
    """
    Visualisasi parameter convolutional layer
    """
    fig, axes = plt.subplots(2, 2, figsize=(14, 12))

    # 1. Filter Size Effect
    ax = axes[0, 0]
    filters = {
        '3×3': np.random.randn(3, 3),
        '5×5': np.random.randn(5, 5),
        '7×7': np.random.randn(7, 7)
    }

    positions = [0, 4, 9]
    for idx, (name, filt) in enumerate(filters.items()):
        ax.imshow(filt, cmap='coolwarm', extent=[positions[idx], positions[idx]+filt.shape[0],
                                                  0, filt.shape[0]])
        ax.text(positions[idx]+filt.shape[0]/2, -1, name,
               ha='center', fontweight='bold', fontsize=11)

    ax.set_xlim(-1, 16)
    ax.set_ylim(-2, 8)
    ax.set_title('Filter Size (Kernel Size)', fontweight='bold', fontsize=13)
    ax.axis('off')

    # 2. Stride Effect
    ax = axes[0, 1]
    input_size = 7
    kernel_size = 3
    strides = [1, 2]

    for s_idx, stride in enumerate(strides):
        y_offset = s_idx * 8

        # Draw input
        for i in range(input_size):
            for j in range(input_size):
                rect = plt.Rectangle((j, y_offset+i), 1, 1,
                                    fill=False, edgecolor='blue', linewidth=1.5)
                ax.add_patch(rect)

        # Draw kernel positions
        output_size = (input_size - kernel_size) // stride + 1
        for i in range(output_size):
            for j in range(output_size):
                y_pos = y_offset + i * stride
                x_pos = j * stride
                rect = plt.Rectangle((x_pos, y_pos), kernel_size, kernel_size,
                                   fill=False, edgecolor='red', linewidth=2.5)
                ax.add_patch(rect)

        ax.text(-1.5, y_offset+3, f'Stride={stride}\nOutput: {output_size}×{output_size}',
               ha='right', fontweight='bold', fontsize=10)

    ax.set_xlim(-3, 8)
    ax.set_ylim(-1, 17)
    ax.set_title('Stride Effect on Output Size', fontweight='bold', fontsize=13)
    ax.axis('off')

    # 3. Padding Types
    ax = axes[1, 0]
    padding_types = {
        'Valid (no padding)': {'input': 5, 'kernel': 3, 'padding': 0},
        'Same (zero padding)': {'input': 5, 'kernel': 3, 'padding': 1}
    }

    for p_idx, (name, params) in enumerate(padding_types.items()):
        y_offset = p_idx * 8
        inp_size = params['input']
        pad = params['padding']

        # Draw padding
        if pad > 0:
            for i in range(inp_size + 2*pad):
                for j in range(inp_size + 2*pad):
                    if i < pad or i >= inp_size+pad or j < pad or j >= inp_size+pad:
                        rect = plt.Rectangle((j, y_offset+i), 1, 1,
                                           fill=True, facecolor='lightgray',
                                           edgecolor='gray', linewidth=1)
                        ax.add_patch(rect)

        # Draw input
        for i in range(inp_size):
            for j in range(inp_size):
                rect = plt.Rectangle((j+pad, y_offset+i+pad), 1, 1,
                                   fill=True, facecolor='lightblue',
                                   edgecolor='blue', linewidth=1.5)
                ax.add_patch(rect)

        ax.text(-1.5, y_offset+3, name, ha='right', fontweight='bold', fontsize=10)

    ax.set_xlim(-3, 8)
    ax.set_ylim(-1, 17)
    ax.set_title('Padding: Valid vs Same', fontweight='bold', fontsize=13)
    ax.axis('off')

    # 4. Number of Filters
    ax = axes[1, 1]
    num_filters = [32, 64, 128]

    for f_idx, nf in enumerate(num_filters):
        y_pos = f_idx * 3

        # Draw input
        rect = plt.Rectangle((0, y_pos), 2, 2, fill=True,
                           facecolor='lightblue', edgecolor='blue', linewidth=2)
        ax.add_patch(rect)

        # Draw filters
        for i in range(min(nf, 10)):
            x_pos = 3 + i * 0.3
            rect = plt.Rectangle((x_pos, y_pos), 0.25, 2,
                               fill=True, facecolor='orange',
                               edgecolor='red', linewidth=1)
            ax.add_patch(rect)

        ax.text(7, y_pos+1, f'{nf} filters', ha='left',
               va='center', fontweight='bold', fontsize=11)

    ax.set_xlim(-1, 11)
    ax.set_ylim(-1, 10)
    ax.set_title('Number of Filters (Feature Maps)', fontweight='bold', fontsize=13)
    ax.axis('off')

    plt.tight_layout()
    plt.show()

visualize_conv_params()

Output Size Calculation:

def calculate_output_size(input_size, kernel_size, stride=1, padding=0):
    """
    Formula: O = (I - K + 2P) / S + 1

    O: Output size
    I: Input size
    K: Kernel size
    P: Padding
    S: Stride
    """
    output = (input_size - kernel_size + 2*padding) // stride + 1
    return output

# Examples
examples = [
    {'input': 224, 'kernel': 3, 'stride': 1, 'padding': 0},
    {'input': 224, 'kernel': 3, 'stride': 1, 'padding': 1},
    {'input': 224, 'kernel': 5, 'stride': 2, 'padding': 2},
    {'input': 56, 'kernel': 3, 'stride': 2, 'padding': 1},
]

print("Convolutional Layer Output Size Calculations:")
print("=" * 80)
print(f"{'Input':<10} {'Kernel':<10} {'Stride':<10} {'Padding':<10} {'Output':<10}")
print("-" * 80)

for ex in examples:
    output = calculate_output_size(ex['input'], ex['kernel'], ex['stride'], ex['padding'])
    print(f"{ex['input']:<10} {ex['kernel']:<10} {ex['stride']:<10} {ex['padding']:<10} {output:<10}")

6.2.2 Activation Functions untuk CNN

ReLU: Standard Choice untuk CNN

# Visualize activation functions
x = np.linspace(-5, 5, 1000)

activations = {
    'ReLU': lambda x: np.maximum(0, x),
    'Leaky ReLU': lambda x: np.where(x > 0, x, 0.01*x),
    'ELU': lambda x: np.where(x > 0, x, np.exp(x) - 1),
    'Swish': lambda x: x * (1 / (1 + np.exp(-x)))
}

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

for idx, (name, func) in enumerate(activations.items()):
    y = func(x)
    axes[idx].plot(x, y, linewidth=2.5, color='blue')
    axes[idx].axhline(y=0, color='k', linewidth=0.8, linestyle='--', alpha=0.3)
    axes[idx].axvline(x=0, color='k', linewidth=0.8, linestyle='--', alpha=0.3)
    axes[idx].grid(True, alpha=0.3)
    axes[idx].set_title(f'{name} Activation', fontweight='bold', fontsize=13)
    axes[idx].set_xlabel('Input', fontsize=11)
    axes[idx].set_ylabel('Output', fontsize=11)

    # Add formula
    formulas = {
        'ReLU': r'$f(x) = \max(0, x)$',
        'Leaky ReLU': r'$f(x) = \max(0.01x, x)$',
        'ELU': r'$f(x) = x$ if $x>0$ else $e^x-1$',
        'Swish': r'$f(x) = x \cdot \sigma(x)$'
    }
    axes[idx].text(0.05, 0.95, formulas[name], transform=axes[idx].transAxes,
                  fontsize=11, verticalalignment='top',
                  bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.tight_layout()
plt.show()

💡 Mengapa ReLU?

Advantages:

Computational efficiency: Simple max(0, x)
Sparse activation: ~50% neurons activated
No vanishing gradient: Gradient 1 untuk x > 0
Empirically works well: State-of-the-art results

Disadvantages:

Dying ReLU: Neurons can become inactive (always 0)
Not zero-centered: Can slow convergence

Solution: Leaky ReLU, ELU untuk addressing dying ReLU problem

6.2.3 Pooling Layers: Downsampling

Pooling Operation:

# Demonstrate pooling operations
feature_map = np.array([
    [1, 3, 2, 4],
    [5, 6, 1, 3],
    [2, 1, 4, 7],
    [3, 2, 5, 8]
])

def max_pooling(input_map, pool_size=2):
    """Max pooling operation"""
    h, w = input_map.shape
    output_h = h // pool_size
    output_w = w // pool_size
    output = np.zeros((output_h, output_w))

    for i in range(output_h):
        for j in range(output_w):
            region = input_map[i*pool_size:(i+1)*pool_size,
                              j*pool_size:(j+1)*pool_size]
            output[i, j] = np.max(region)

    return output

def average_pooling(input_map, pool_size=2):
    """Average pooling operation"""
    h, w = input_map.shape
    output_h = h // pool_size
    output_w = w // pool_size
    output = np.zeros((output_h, output_w))

    for i in range(output_h):
        for j in range(output_w):
            region = input_map[i*pool_size:(i+1)*pool_size,
                              j*pool_size:(j+1)*pool_size]
            output[i, j] = np.mean(region)

    return output

max_pooled = max_pooling(feature_map)
avg_pooled = average_pooling(feature_map)

# Visualize
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Original
im0 = axes[0].imshow(feature_map, cmap='YlOrRd', vmin=0, vmax=8)
axes[0].set_title('Feature Map\n(4×4)', fontweight='bold', fontsize=13)
for i in range(4):
    for j in range(4):
        axes[0].text(j, i, f'{feature_map[i, j]:.0f}',
                    ha='center', va='center', fontsize=14, fontweight='bold')
axes[0].set_xticks([])
axes[0].set_yticks([])

# Max pooling
im1 = axes[1].imshow(max_pooled, cmap='YlOrRd', vmin=0, vmax=8)
axes[1].set_title('Max Pooling\n(2×2 pool)', fontweight='bold', fontsize=13)
for i in range(2):
    for j in range(2):
        axes[1].text(j, i, f'{max_pooled[i, j]:.0f}',
                    ha='center', va='center', fontsize=14, fontweight='bold')
axes[1].set_xticks([])
axes[1].set_yticks([])

# Average pooling
im2 = axes[2].imshow(avg_pooled, cmap='YlOrRd', vmin=0, vmax=8)
axes[2].set_title('Average Pooling\n(2×2 pool)', fontweight='bold', fontsize=13)
for i in range(2):
    for j in range(2):
        axes[2].text(j, i, f'{avg_pooled[i, j]:.1f}',
                    ha='center', va='center', fontsize=14, fontweight='bold')
axes[2].set_xticks([])
axes[2].set_yticks([])

plt.tight_layout()
plt.show()

print("Pooling Reduces Spatial Dimensions:")
print(f"Input: {feature_map.shape} → Max Pool: {max_pooled.shape}")
print(f"Size reduction: {feature_map.size} → {max_pooled.size} (75% reduction)")

Pooling Benefits:

Code

graph LR
    A[Pooling Layer] --> B[Dimensionality<br/>Reduction]
    A --> C[Translation<br/>Invariance]
    A --> D[Computational<br/>Efficiency]
    A --> E[Overfitting<br/>Prevention]

    style A fill:#99ccff
    style B fill:#ffcc99
    style C fill:#ffcc99
    style D fill:#ffcc99
    style E fill:#ffcc99

graph LR
    A[Pooling Layer] --> B[Dimensionality<br/>Reduction]
    A --> C[Translation<br/>Invariance]
    A --> D[Computational<br/>Efficiency]
    A --> E[Overfitting<br/>Prevention]

    style A fill:#99ccff
    style B fill:#ffcc99
    style C fill:#ffcc99
    style D fill:#ffcc99
    style E fill:#ffcc99

6.2.4 Complete CNN Architecture

Typical CNN Architecture:

Code

flowchart TD
    A["Input\n224x224x3"] --> B["Conv1\n3x3, 32"]
    B --> C["ReLU"]
    C --> D["Pool1\n2x2"]
    D --> E["Conv2\n3x3, 64"]
    E --> F["ReLU"]
    F --> G["Pool2\n2x2"]
    G --> H["Conv3\n3x3, 128"]
    H --> I["ReLU"]
    I --> J["Pool3\n2x2"]
    J --> K["Flatten"]
    K --> L["Dense\n512"]
    L --> M["ReLU"]
    M --> N["Dropout"]
    N --> O["Output\n10 classes"]

    style A fill:#ff9999
    style B fill:#ffcc99
    style E fill:#ffcc99
    style H fill:#ffcc99
    style K fill:#99ff99
    style L fill:#99ccff
    style O fill:#cc99ff

flowchart TD
    A["Input\n224x224x3"] --> B["Conv1\n3x3, 32"]
    B --> C["ReLU"]
    C --> D["Pool1\n2x2"]
    D --> E["Conv2\n3x3, 64"]
    E --> F["ReLU"]
    F --> G["Pool2\n2x2"]
    G --> H["Conv3\n3x3, 128"]
    H --> I["ReLU"]
    I --> J["Pool3\n2x2"]
    J --> K["Flatten"]
    K --> L["Dense\n512"]
    L --> M["ReLU"]
    M --> N["Dropout"]
    N --> O["Output\n10 classes"]

    style A fill:#ff9999
    style B fill:#ffcc99
    style E fill:#ffcc99
    style H fill:#ffcc99
    style K fill:#99ff99
    style L fill:#99ccff
    style O fill:#cc99ff

Complete CNN Architecture

Implementation: Simple CNN

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Build simple CNN
def create_simple_cnn(input_shape=(32, 32, 3), num_classes=10):
    """
    Simple CNN for image classification
    """
    model = keras.Sequential([
        # Block 1
        layers.Conv2D(32, (3, 3), activation='relu', padding='same',
                     input_shape=input_shape, name='conv1'),
        layers.BatchNormalization(name='bn1'),
        layers.MaxPooling2D((2, 2), name='pool1'),

        # Block 2
        layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv2'),
        layers.BatchNormalization(name='bn2'),
        layers.MaxPooling2D((2, 2), name='pool2'),

        # Block 3
        layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv3'),
        layers.BatchNormalization(name='bn3'),
        layers.MaxPooling2D((2, 2), name='pool3'),

        # Classifier
        layers.Flatten(name='flatten'),
        layers.Dense(512, activation='relu', name='fc1'),
        layers.Dropout(0.5, name='dropout'),
        layers.Dense(num_classes, activation='softmax', name='output')
    ], name='SimpleCNN')

    return model

# Create model
model = create_simple_cnn()

# Display architecture
model.summary()

# Calculate parameters
total_params = model.count_params()
trainable_params = sum([tf.size(w).numpy() for w in model.trainable_weights])

print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

Visualize Feature Maps:

# Visualize what CNN learns
from tensorflow.keras.preprocessing import image

# Create dummy input
dummy_input = np.random.rand(1, 32, 32, 3).astype('float32')

# Get intermediate layer outputs
layer_outputs = [layer.output for layer in model.layers[:6]]  # First 6 layers
activation_model = keras.Model(inputs=model.input, outputs=layer_outputs)

# Get activations
activations = activation_model.predict(dummy_input, verbose=0)

# Visualize first convolutional layer
fig, axes = plt.subplots(4, 8, figsize=(16, 8))
axes = axes.flatten()

conv1_activation = activations[0][0]  # First conv layer output

for i in range(min(32, len(axes))):
    if i < conv1_activation.shape[-1]:
        axes[i].imshow(conv1_activation[:, :, i], cmap='viridis')
        axes[i].set_title(f'Filter {i+1}', fontsize=9)
    axes[i].axis('off')

plt.suptitle('Conv1 Feature Maps (32 filters)', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

6.3 Building CNN from Scratch

6.3.1 Dataset: MNIST Handwritten Digits

Load and Explore MNIST:

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load MNIST
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Preprocess
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0

y_train_cat = to_categorical(y_train, 10)
y_test_cat = to_categorical(y_test, 10)

# Visualize samples
fig, axes = plt.subplots(4, 10, figsize=(15, 6))
for i in range(4):
    for j in range(10):
        idx = i * 10 + j
        axes[i, j].imshow(X_train[idx].squeeze(), cmap='gray')
        axes[i, j].set_title(f'{y_train[idx]}', fontsize=11, fontweight='bold')
        axes[i, j].axis('off')

plt.suptitle('MNIST Dataset Samples', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"Classes: {np.unique(y_train)}")

6.3.2 Build CNN in Keras

def create_mnist_cnn():
    """
    CNN for MNIST classification
    """
    model = keras.Sequential([
        # Convolutional Block 1
        layers.Conv2D(32, (3, 3), activation='relu', padding='same',
                     input_shape=(28, 28, 1)),
        layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),

        # Convolutional Block 2
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),

        # Classifier
        layers.Flatten(),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(10, activation='softmax')
    ], name='MNIST_CNN')

    return model

# Create and compile
mnist_cnn = create_mnist_cnn()
mnist_cnn.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

mnist_cnn.summary()

Train CNN:

# Training callbacks
callbacks = [
    keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=5,
        restore_best_weights=True
    ),
    keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=3
    )
]

# Train model
history = mnist_cnn.fit(
    X_train, y_train_cat,
    batch_size=128,
    epochs=20,
    validation_split=0.15,
    callbacks=callbacks,
    verbose=1
)

# Evaluate
test_loss, test_acc = mnist_cnn.evaluate(X_test, y_test_cat, verbose=0)
print(f"\nTest Accuracy: {test_acc:.4f}")
print(f"Test Loss: {test_loss:.4f}")

Visualize Training:

# Plot training history
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy
axes[0].plot(history.history['accuracy'], label='Train', linewidth=2.5)
axes[0].plot(history.history['val_accuracy'], label='Validation', linewidth=2.5)
axes[0].set_title('Model Accuracy', fontweight='bold', fontsize=13)
axes[0].set_xlabel('Epoch', fontsize=11)
axes[0].set_ylabel('Accuracy', fontsize=11)
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# Loss
axes[1].plot(history.history['loss'], label='Train', linewidth=2.5)
axes[1].plot(history.history['val_loss'], label='Validation', linewidth=2.5)
axes[1].set_title('Model Loss', fontweight='bold', fontsize=13)
axes[1].set_xlabel('Epoch', fontsize=11)
axes[1].set_ylabel('Loss', fontsize=11)
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

6.3.3 Build CNN in PyTorch

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Define CNN architecture
class MNISTNet(nn.Module):
    def __init__(self):
        super(MNISTNet, self).__init__()

        # Convolutional layers
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 32, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(64, 64, kernel_size=3, padding=1)

        # Pooling
        self.pool = nn.MaxPool2d(2, 2)

        # Fully connected layers
        self.fc1 = nn.Linear(64 * 7 * 7, 512)
        self.fc2 = nn.Linear(512, 10)

        # Dropout
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)

    def forward(self, x):
        # Block 1
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        x = self.dropout1(x)

        # Block 2
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv4(x))
        x = self.pool(x)
        x = self.dropout1(x)

        # Classifier
        x = x.view(-1, 64 * 7 * 7)
        x = F.relu(self.fc1(x))
        x = self.dropout2(x)
        x = self.fc2(x)

        return x

# Create model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
pytorch_model = MNISTNet().to(device)

print(pytorch_model)
print(f"\nDevice: {device}")

# Count parameters
total_params = sum(p.numel() for p in pytorch_model.parameters())
trainable_params = sum(p.numel() for p in pytorch_model.parameters() if p.requires_grad)
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

Train PyTorch CNN:

# Prepare data loaders
X_train_torch = torch.FloatTensor(X_train)
y_train_torch = torch.LongTensor(y_train)
X_test_torch = torch.FloatTensor(X_test)
y_test_torch = torch.LongTensor(y_test)

train_dataset = TensorDataset(X_train_torch, y_train_torch)
test_dataset = TensorDataset(X_test_torch, y_test_torch)

train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(pytorch_model.parameters(), lr=0.001)

# Training function
def train_epoch(model, loader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, labels in loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()

    return running_loss / len(loader), 100. * correct / total

# Evaluation function
def evaluate(model, loader, criterion, device):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for inputs, labels in loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            running_loss += loss.item()
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()

    return running_loss / len(loader), 100. * correct / total

# Train model
epochs = 10
train_losses, train_accs = [], []
test_losses, test_accs = [], []

print("Training PyTorch CNN...")
for epoch in range(epochs):
    train_loss, train_acc = train_epoch(pytorch_model, train_loader,
                                       criterion, optimizer, device)
    test_loss, test_acc = evaluate(pytorch_model, test_loader,
                                   criterion, device)

    train_losses.append(train_loss)
    train_accs.append(train_acc)
    test_losses.append(test_loss)
    test_accs.append(test_acc)

    print(f"Epoch {epoch+1}/{epochs}: "
          f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}% | "
          f"Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2f}%")

# Plot results
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].plot(train_accs, label='Train', linewidth=2.5)
axes[0].plot(test_accs, label='Test', linewidth=2.5)
axes[0].set_title('PyTorch CNN Accuracy', fontweight='bold', fontsize=13)
axes[0].set_xlabel('Epoch', fontsize=11)
axes[0].set_ylabel('Accuracy (%)', fontsize=11)
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

axes[1].plot(train_losses, label='Train', linewidth=2.5)
axes[1].plot(test_losses, label='Test', linewidth=2.5)
axes[1].set_title('PyTorch CNN Loss', fontweight='bold', fontsize=13)
axes[1].set_xlabel('Epoch', fontsize=11)
axes[1].set_ylabel('Loss', fontsize=11)
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

6.3.4 CIFAR-10: More Complex Dataset

Load CIFAR-10:

from tensorflow.keras.datasets import cifar10

# Load dataset
(X_train_cifar, y_train_cifar), (X_test_cifar, y_test_cifar) = cifar10.load_data()

# Normalize
X_train_cifar = X_train_cifar.astype('float32') / 255.0
X_test_cifar = X_test_cifar.astype('float32') / 255.0

# One-hot encode
y_train_cifar_cat = to_categorical(y_train_cifar, 10)
y_test_cifar_cat = to_categorical(y_test_cifar, 10)

# Class names
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

# Visualize samples
fig, axes = plt.subplots(4, 10, figsize=(15, 6))
for i in range(4):
    for j in range(10):
        idx = i * 10 + j
        axes[i, j].imshow(X_train_cifar[idx])
        axes[i, j].set_title(f'{class_names[y_train_cifar[idx][0]]}',
                           fontsize=9, fontweight='bold')
        axes[i, j].axis('off')

plt.suptitle('CIFAR-10 Dataset Samples', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

print(f"Training set: {X_train_cifar.shape}")
print(f"Test set: {X_test_cifar.shape}")
print(f"Classes: {class_names}")

CNN for CIFAR-10:

def create_cifar10_cnn():
    """
    Deeper CNN for CIFAR-10 classification
    """
    model = keras.Sequential([
        # Block 1
        layers.Conv2D(64, (3, 3), activation='relu', padding='same',
                     input_shape=(32, 32, 3)),
        layers.BatchNormalization(),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.3),

        # Block 2
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.4),

        # Block 3
        layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.5),

        # Classifier
        layers.Flatten(),
        layers.Dense(512, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.5),
        layers.Dense(10, activation='softmax')
    ], name='CIFAR10_CNN')

    return model

# Create and compile
cifar_cnn = create_cifar10_cnn()
cifar_cnn.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

cifar_cnn.summary()

6.4 Transfer Learning Fundamentals

6.4.1 Konsep Transfer Learning

Mengapa Transfer Learning?

Training CNN dari scratch membutuhkan:

Dataset besar (>100K images)
Computational resources (GPU, waktu training lama)
Expertise dalam architecture design dan hyperparameter tuning

Transfer Learning Solution:

Code

graph TD
    A[Pre-trained Model<br/>ImageNet: 1M images, 1000 classes] --> B[Feature Extractor<br/>Lower layers: generic features]
    B --> C[Your Dataset<br/>Small: 1K images, custom classes]
    C --> D[Fine-tune<br/>Higher layers + classifier]
    D --> E[Specialized Model<br/>Your domain]

    style A fill:#99ccff
    style B fill:#ffcc99
    style C fill:#ff9999
    style D fill:#99ff99
    style E fill:#cc99ff

graph TD
    A[Pre-trained Model<br/>ImageNet: 1M images, 1000 classes] --> B[Feature Extractor<br/>Lower layers: generic features]
    B --> C[Your Dataset<br/>Small: 1K images, custom classes]
    C --> D[Fine-tune<br/>Higher layers + classifier]
    D --> E[Specialized Model<br/>Your domain]

    style A fill:#99ccff
    style B fill:#ffcc99
    style C fill:#ff9999
    style D fill:#99ff99
    style E fill:#cc99ff

💡 Transfer Learning Analogy

Bayangkan belajar bahasa baru:

From scratch: Belajar huruf, kata, grammar dari nol (CNN from scratch)
Transfer learning: Sudah bisa bahasa Inggris, belajar bahasa Prancis lebih cepat (reuse knowledge)

Lower layers CNN = general knowledge (edges, textures) Higher layers CNN = task-specific knowledge (object parts, semantics)

Transfer Learning Strategies:

# Visualize transfer learning strategies
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

strategies = [
    {
        'name': 'Feature Extraction\n(Freeze all)',
        'frozen': [True, True, True, True],
        'trainable': [False, False, False, True]
    },
    {
        'name': 'Fine-tuning\n(Last layers)',
        'frozen': [True, True, False, False],
        'trainable': [False, False, True, True]
    },
    {
        'name': 'Full Fine-tuning\n(All layers)',
        'frozen': [False, False, False, False],
        'trainable': [True, True, True, True]
    }
]

for idx, strategy in enumerate(strategies):
    ax = axes[idx]

    # Draw layers
    layer_names = ['Conv Block 1', 'Conv Block 2', 'Conv Block 3', 'Classifier']
    colors = ['lightcoral' if frozen else 'lightgreen'
              for frozen in strategy['frozen']]

    y_positions = [3, 2, 1, 0]
    for i, (name, color, trainable) in enumerate(zip(layer_names, colors,
                                                     strategy['trainable'])):
        rect = plt.Rectangle((0, y_positions[i]), 4, 0.7,
                            facecolor=color, edgecolor='black', linewidth=2)
        ax.add_patch(rect)

        status = 'Trainable' if trainable else 'Frozen'
        ax.text(2, y_positions[i] + 0.35, f'{name}\n({status})',
               ha='center', va='center', fontsize=10, fontweight='bold')

    ax.set_xlim(-0.5, 4.5)
    ax.set_ylim(-0.5, 4)
    ax.set_title(strategy['name'], fontweight='bold', fontsize=12)
    ax.axis('off')

plt.tight_layout()
plt.show()

6.4.2 Popular Pre-trained Models

Common Pre-trained Architectures:

# Summary of popular pre-trained models
import pandas as pd

models_info = {
    'Model': ['VGG16', 'VGG19', 'ResNet50', 'ResNet101', 'InceptionV3',
              'MobileNetV2', 'EfficientNetB0', 'DenseNet121'],
    'Year': [2014, 2014, 2015, 2015, 2015, 2018, 2019, 2017],
    'Params (M)': [138, 144, 25.6, 44.5, 23.8, 3.5, 5.3, 8.0],
    'Top-1 Acc': [71.3, 71.3, 76.0, 77.6, 77.9, 71.8, 77.1, 75.0],
    'Depth': [16, 19, 50, 101, 48, 53, 82, 121],
    'Best For': [
        'Simple, deep',
        'Simple, deeper',
        'Residual connections',
        'Very deep',
        'Inception modules',
        'Mobile/edge devices',
        'Efficient scaling',
        'Dense connections'
    ]
}

df_models = pd.DataFrame(models_info)
print(df_models.to_string(index=False))

# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Parameters vs Accuracy
axes[0].scatter(df_models['Params (M)'], df_models['Top-1 Acc'],
               s=200, alpha=0.6, c=range(len(df_models)), cmap='viridis')
for i, model in enumerate(df_models['Model']):
    axes[0].annotate(model, (df_models['Params (M)'][i], df_models['Top-1 Acc'][i]),
                    fontsize=9, fontweight='bold')
axes[0].set_xlabel('Parameters (Millions)', fontsize=11, fontweight='bold')
axes[0].set_ylabel('Top-1 Accuracy (%)', fontsize=11, fontweight='bold')
axes[0].set_title('Parameters vs Accuracy', fontsize=13, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Model comparison
x = np.arange(len(df_models['Model']))
axes[1].bar(x, df_models['Top-1 Acc'], alpha=0.7, color='skyblue',
           edgecolor='navy', linewidth=1.5)
axes[1].set_xticks(x)
axes[1].set_xticklabels(df_models['Model'], rotation=45, ha='right', fontsize=10)
axes[1].set_ylabel('Top-1 Accuracy (%)', fontsize=11, fontweight='bold')
axes[1].set_title('Model Accuracy Comparison', fontsize=13, fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

6.4.3 Transfer Learning dengan VGG16

Load Pre-trained VGG16:

from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input

# Load pre-trained VGG16 (without top classifier)
base_model = VGG16(
    weights='imagenet',
    include_top=False,
    input_shape=(224, 224, 3)
)

print("VGG16 Base Model:")
print(f"Total layers: {len(base_model.layers)}")
print(f"Total parameters: {base_model.count_params():,}")

# Display architecture
base_model.summary()

# Freeze base model
base_model.trainable = False

print(f"\nTrainable parameters after freezing: {sum([tf.size(w).numpy() for w in base_model.trainable_weights]):,}")

Add Custom Classifier:

def build_transfer_model(base_model, num_classes=10):
    """
    Build transfer learning model with custom classifier
    """
    # Create new model
    inputs = keras.Input(shape=(224, 224, 3))

    # Preprocessing for VGG16
    x = preprocess_input(inputs)

    # Base model (frozen)
    x = base_model(x, training=False)

    # Custom classifier
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dense(512, activation='relu')(x)
    x = layers.Dropout(0.5)(x)
    x = layers.Dense(256, activation='relu')(x)
    x = layers.Dropout(0.3)(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    # Complete model
    model = keras.Model(inputs, outputs, name='VGG16_Transfer')

    return model

# Create transfer learning model
transfer_model = build_transfer_model(base_model, num_classes=10)

# Compile
transfer_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.0001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

transfer_model.summary()

6.4.4 Transfer Learning dengan ResNet50

Load ResNet50:

from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input as resnet_preprocess

# Load ResNet50
resnet_base = ResNet50(
    weights='imagenet',
    include_top=False,
    input_shape=(224, 224, 3)
)

# Freeze base
resnet_base.trainable = False

# Build transfer model
def build_resnet_transfer(base_model, num_classes=10):
    """
    ResNet50 transfer learning model
    """
    inputs = keras.Input(shape=(224, 224, 3))
    x = resnet_preprocess(inputs)
    x = base_model(x, training=False)

    # Global pooling + classifier
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.BatchNormalization()(x)
    x = layers.Dense(512, activation='relu')(x)
    x = layers.Dropout(0.5)(x)
    x = layers.Dense(256, activation='relu')(x)
    x = layers.Dropout(0.3)(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    return keras.Model(inputs, outputs, name='ResNet50_Transfer')

resnet_model = build_resnet_transfer(resnet_base)
resnet_model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print(f"ResNet50 Transfer Model - Total params: {resnet_model.count_params():,}")

6.4.5 Transfer Learning dengan MobileNetV2

MobileNetV2 for Edge Devices:

from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input as mobile_preprocess

# Load MobileNetV2
mobile_base = MobileNetV2(
    weights='imagenet',
    include_top=False,
    input_shape=(224, 224, 3),
    alpha=1.0  # Width multiplier
)

mobile_base.trainable = False

# Build efficient model
def build_mobile_transfer(base_model, num_classes=10):
    """
    MobileNetV2 for efficient inference
    """
    inputs = keras.Input(shape=(224, 224, 3))
    x = mobile_preprocess(inputs)
    x = base_model(x, training=False)

    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dropout(0.2)(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    return keras.Model(inputs, outputs, name='MobileNetV2_Transfer')

mobile_model = build_mobile_transfer(mobile_base)
mobile_model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Compare model sizes
models_comparison = {
    'Model': ['VGG16', 'ResNet50', 'MobileNetV2'],
    'Total Params': [
        transfer_model.count_params(),
        resnet_model.count_params(),
        mobile_model.count_params()
    ],
    'Trainable Params': [
        sum([tf.size(w).numpy() for w in transfer_model.trainable_weights]),
        sum([tf.size(w).numpy() for w in resnet_model.trainable_weights]),
        sum([tf.size(w).numpy() for w in mobile_model.trainable_weights])
    ]
}

df_comparison = pd.DataFrame(models_comparison)
df_comparison['Total Params'] = df_comparison['Total Params'].apply(lambda x: f"{x:,}")
df_comparison['Trainable Params'] = df_comparison['Trainable Params'].apply(lambda x: f"{x:,}")
print("\nModel Size Comparison:")
print(df_comparison.to_string(index=False))

6.5 Data Augmentation

6.5.1 Mengapa Data Augmentation?

Problem: Limited Training Data

Deep learning membutuhkan banyak data, tapi:

Labeling expensive dan time-consuming
Real-world constraints (privacy, accessibility)
Class imbalance

Solution: Data Augmentation

Generate synthetic training examples dengan transformasi yang preserve semantic meaning.

💡 Data Augmentation Benefits

Increase dataset size: 10× - 100× more training samples
Reduce overfitting: Model learns robust features
Improve generalization: Better performance on unseen data
Handle variations: Rotation, scaling, lighting, etc.
Class balancing: Oversample minority classes

6.5.2 Common Augmentation Techniques

Geometric Transformations:

from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt

# Load sample image
sample_img = X_train_cifar[0:1]  # Shape: (1, 32, 32, 3)

# Define augmentation
augmentation_configs = [
    {'rotation_range': 30, 'name': 'Rotation (±30°)'},
    {'width_shift_range': 0.2, 'height_shift_range': 0.2, 'name': 'Shift (20%)'},
    {'zoom_range': 0.3, 'name': 'Zoom (±30%)'},
    {'horizontal_flip': True, 'name': 'Horizontal Flip'},
    {'brightness_range': [0.5, 1.5], 'name': 'Brightness'},
]

fig, axes = plt.subplots(2, 6, figsize=(16, 6))

# Original
axes[0, 0].imshow(sample_img[0])
axes[0, 0].set_title('Original', fontweight='bold', fontsize=11)
axes[0, 0].axis('off')

axes[1, 0].imshow(sample_img[0])
axes[1, 0].set_title('Original', fontweight='bold', fontsize=11)
axes[1, 0].axis('off')

# Apply augmentations
for idx, config in enumerate(augmentation_configs):
    name = config.pop('name')
    datagen = ImageDataGenerator(**config)

    # Generate augmented images
    aug_iter = datagen.flow(sample_img, batch_size=1)
    aug_img = next(aug_iter)[0]

    row = idx // 5
    col = (idx % 5) + 1

    axes[row, col].imshow(aug_img)
    axes[row, col].set_title(name, fontweight='bold', fontsize=11)
    axes[row, col].axis('off')

plt.suptitle('Data Augmentation Techniques', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

Advanced Augmentations:

# Combined augmentation pipeline
def create_augmentation_pipeline():
    """
    Comprehensive augmentation for training
    """
    datagen = ImageDataGenerator(
        rotation_range=20,
        width_shift_range=0.2,
        height_shift_range=0.2,
        horizontal_flip=True,
        zoom_range=0.2,
        shear_range=0.15,
        brightness_range=[0.8, 1.2],
        fill_mode='nearest'
    )
    return datagen

# Generate multiple augmented samples
augmentation_gen = create_augmentation_pipeline()

fig, axes = plt.subplots(4, 8, figsize=(16, 8))
axes = axes.flatten()

# Generate 32 augmented versions
aug_iter = augmentation_gen.flow(sample_img, batch_size=1)

for i in range(32):
    aug_img = next(aug_iter)[0]
    axes[i].imshow(aug_img)
    axes[i].axis('off')

plt.suptitle('32 Augmented Versions of Same Image', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

6.5.3 Augmentation dalam Training

Keras with ImageDataGenerator:

# Training with augmentation
train_datagen = ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    zoom_range=0.1
)

# Fit to training data (untuk standardization jika needed)
train_datagen.fit(X_train_cifar)

# Validation data (no augmentation)
val_datagen = ImageDataGenerator()

# Training with data augmentation
print("Training dengan Data Augmentation:")
print("Setiap epoch: model melihat slightly different versions dari data")
print("Efek: Regularization, reduced overfitting, better generalization")

6.5.4 Modern Augmentation Libraries

Albumentations: Advanced Augmentation:

# Note: Install with: pip install albumentations
try:
    import albumentations as A

    # Define transformation pipeline
    transform = A.Compose([
        A.RandomRotate90(p=0.5),
        A.Flip(p=0.5),
        A.OneOf([
            A.GaussNoise(p=1),
            A.GaussianBlur(p=1),
            A.MotionBlur(p=1),
        ], p=0.3),
        A.OneOf([
            A.OpticalDistortion(p=1),
            A.GridDistortion(p=1),
        ], p=0.3),
        A.OneOf([
            A.HueSaturationValue(p=1),
            A.RGBShift(p=1),
            A.RandomBrightnessContrast(p=1),
        ], p=0.5),
    ])

    # Apply to sample
    sample_np = (X_train_cifar[0] * 255).astype(np.uint8)

    fig, axes = plt.subplots(2, 4, figsize=(14, 7))
    axes = axes.flatten()

    for i in range(8):
        augmented = transform(image=sample_np)['image']
        axes[i].imshow(augmented)
        axes[i].set_title(f'Augmented {i+1}', fontweight='bold', fontsize=11)
        axes[i].axis('off')

    plt.suptitle('Albumentations: Advanced Augmentations', fontweight='bold', fontsize=14)
    plt.tight_layout()
    plt.show()

    print("Albumentations offers:")
    print("- Fast C++ backend")
    print("- 70+ transformation types")
    print("- Bounding box & keypoint support")
    print("- Easy pipeline composition")

except ImportError:
    print("Albumentations not installed. Install with: pip install albumentations")

6.5.5 Augmentation Best Practices

Guidelines:

Code

flowchart TD
    A["Data Augmentation\nBest Practices"] --> B["Domain-Appropriate\nTransformations"]
    A --> C["Reasonable\nParameter Ranges"]
    A --> D["No Semantic\nChanges"]
    A --> E["Balance with\nReal Data"]

    B --> B1["Medical: No flips for\nlaterality-dependent organs"]
    B --> B2["Text/OCR: No rotation\nthat breaks readability"]

    C --> C1["Rotation: ±15-30°\nnot ±180°"]
    C --> C2["Brightness: 0.7-1.3\nnot 0.1-5.0"]

    D --> D1["Cat to flipped cat ✓"]
    D --> D2["Cat to unrecognizable ✗"]

    E --> E1["Augmentation complements\nnot replaces real data"]

    style A fill:#99ccff
    style B fill:#ffcc99
    style C fill:#ffcc99
    style D fill:#ffcc99
    style E fill:#ffcc99

flowchart TD
    A["Data Augmentation\nBest Practices"] --> B["Domain-Appropriate\nTransformations"]
    A --> C["Reasonable\nParameter Ranges"]
    A --> D["No Semantic\nChanges"]
    A --> E["Balance with\nReal Data"]

    B --> B1["Medical: No flips for\nlaterality-dependent organs"]
    B --> B2["Text/OCR: No rotation\nthat breaks readability"]

    C --> C1["Rotation: ±15-30°\nnot ±180°"]
    C --> C2["Brightness: 0.7-1.3\nnot 0.1-5.0"]

    D --> D1["Cat to flipped cat ✓"]
    D --> D2["Cat to unrecognizable ✗"]

    E --> E1["Augmentation complements\nnot replaces real data"]

    style A fill:#99ccff
    style B fill:#ffcc99
    style C fill:#ffcc99
    style D fill:#ffcc99
    style E fill:#ffcc99

Data Augmentation Best Practices

Common Mistakes:

# Demonstrate bad augmentations
sample_img_bad = X_train_cifar[100:101]

fig, axes = plt.subplots(2, 4, figsize=(14, 7))

# Good augmentations
good_configs = [
    {'rotation_range': 15, 'name': 'Good: Rotation ±15°'},
    {'zoom_range': 0.2, 'name': 'Good: Zoom ±20%'},
    {'brightness_range': [0.8, 1.2], 'name': 'Good: Brightness 0.8-1.2'},
    {'horizontal_flip': True, 'name': 'Good: Horizontal Flip'},
]

for idx, config in enumerate(good_configs):
    name = config.pop('name')
    datagen = ImageDataGenerator(**config)
    aug_iter = datagen.flow(sample_img_bad, batch_size=1)
    aug_img = next(aug_iter)[0]

    axes[0, idx].imshow(aug_img)
    axes[0, idx].set_title(name, fontweight='bold', fontsize=10, color='green')
    axes[0, idx].axis('off')

# Bad augmentations
bad_configs = [
    {'rotation_range': 180, 'name': 'Bad: Rotation ±180°'},
    {'zoom_range': 0.8, 'name': 'Bad: Zoom ±80%'},
    {'brightness_range': [0.2, 3.0], 'name': 'Bad: Brightness 0.2-3.0'},
    {'width_shift_range': 0.8, 'name': 'Bad: Shift 80%'},
]

for idx, config in enumerate(bad_configs):
    name = config.pop('name')
    datagen = ImageDataGenerator(**config)
    aug_iter = datagen.flow(sample_img_bad, batch_size=1)
    aug_img = next(aug_iter)[0]

    axes[1, idx].imshow(np.clip(aug_img, 0, 1))
    axes[1, idx].set_title(name, fontweight='bold', fontsize=10, color='red')
    axes[1, idx].axis('off')

plt.suptitle('Good vs Bad Augmentation Parameters', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

6.6 Fine-tuning Strategies

6.6.1 Kapan Melakukan Fine-tuning?

Decision Framework:

Code

flowchart TD
    A["Start: Transfer Learning"] --> B{"Dataset Size?"}
    B -->|"Small\n<1K images"| C["Feature Extraction\nFreeze all base layers"]
    B -->|"Medium\n1K-10K images"| D["Partial Fine-tuning\nUnfreeze last few layers"]
    B -->|"Large\n>10K images"| E["Full Fine-tuning\nUnfreeze all layers"]

    C --> F{"Domain Similarity?"}
    D --> F
    E --> F

    F -->|"Similar\nto ImageNet"| G["Lower learning rate\n0.0001 - 0.00001"]
    F -->|"Different\nfrom ImageNet"| H["Higher learning rate\n0.001 - 0.0001"]

    style A fill:#99ccff
    style C fill:#ffcc99
    style D fill:#ffcc99
    style E fill:#ffcc99
    style G fill:#99ff99
    style H fill:#99ff99

flowchart TD
    A["Start: Transfer Learning"] --> B{"Dataset Size?"}
    B -->|"Small\n<1K images"| C["Feature Extraction\nFreeze all base layers"]
    B -->|"Medium\n1K-10K images"| D["Partial Fine-tuning\nUnfreeze last few layers"]
    B -->|"Large\n>10K images"| E["Full Fine-tuning\nUnfreeze all layers"]

    C --> F{"Domain Similarity?"}
    D --> F
    E --> F

    F -->|"Similar\nto ImageNet"| G["Lower learning rate\n0.0001 - 0.00001"]
    F -->|"Different\nfrom ImageNet"| H["Higher learning rate\n0.001 - 0.0001"]

    style A fill:#99ccff
    style C fill:#ffcc99
    style D fill:#ffcc99
    style E fill:#ffcc99
    style G fill:#99ff99
    style H fill:#99ff99

Transfer Learning Decision Framework - Pemilihan Strategi Fine-tuning Berdasarkan Ukuran Dataset dan Kesamaan Domain

6.6.2 Gradual Unfreezing

Strategy: Unfreeze Layers Progressively

# Demonstrate gradual unfreezing
def show_trainable_status(model, title):
    """
    Display which layers are trainable
    """
    print(f"\n{title}")
    print("=" * 60)
    print(f"{'Layer Name':<30} {'Trainable':<15} {'Parameters':<15}")
    print("-" * 60)

    total_params = 0
    trainable_params = 0

    for layer in model.layers:
        params = layer.count_params()
        total_params += params
        if layer.trainable:
            trainable_params += params

        print(f"{layer.name:<30} {str(layer.trainable):<15} {params:>10,}")

    print("-" * 60)
    print(f"Total parameters: {total_params:,}")
    print(f"Trainable parameters: {trainable_params:,} ({100*trainable_params/total_params:.1f}%)")
    print(f"Non-trainable parameters: {total_params-trainable_params:,}")

# Stage 1: All frozen
base_model_stage1 = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model_stage1.trainable = False

model_stage1 = build_transfer_model(base_model_stage1)
show_trainable_status(model_stage1, "STAGE 1: Feature Extraction (All Frozen)")

# Stage 2: Unfreeze last conv block
base_model_stage2 = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model_stage2.trainable = True

# Freeze all except last block
for layer in base_model_stage2.layers[:-4]:
    layer.trainable = False

model_stage2 = build_transfer_model(base_model_stage2)
show_trainable_status(model_stage2, "STAGE 2: Partial Fine-tuning (Last Block)")

# Stage 3: Unfreeze all
base_model_stage3 = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model_stage3.trainable = True

model_stage3 = build_transfer_model(base_model_stage3)
show_trainable_status(model_stage3, "STAGE 3: Full Fine-tuning (All Unfrozen)")

6.6.3 Learning Rate Strategies

Discriminative Learning Rates:

# Different learning rates for different layers
def compile_with_discriminative_lr(model, base_lr=0.001):
    """
    Apply different learning rates to different layer groups
    """
    # Separate layer groups
    base_layers = []
    top_layers = []

    for layer in model.layers:
        if 'vgg16' in layer.name or 'resnet' in layer.name:
            base_layers.extend(layer.trainable_weights)
        else:
            top_layers.extend(layer.trainable_weights)

    # Create optimizer with discriminative learning rates
    optimizer = keras.optimizers.Adam(learning_rate=base_lr)

    # Note: Keras doesn't support per-layer LR directly
    # In practice, use multiple optimizers or manual gradient application

    return optimizer

# Learning rate schedule
def create_lr_schedule():
    """
    Learning rate schedule for fine-tuning
    """
    def schedule(epoch, lr):
        if epoch < 5:
            return 0.001  # Warm-up
        elif epoch < 15:
            return 0.0001  # Fine-tuning
        else:
            return 0.00001  # Fine fine-tuning

    return keras.callbacks.LearningRateScheduler(schedule)

# Visualize learning rate schedule
epochs = 25
lrs = []

for epoch in range(epochs):
    if epoch < 5:
        lrs.append(0.001)
    elif epoch < 15:
        lrs.append(0.0001)
    else:
        lrs.append(0.00001)

plt.figure(figsize=(10, 5))
plt.plot(range(epochs), lrs, linewidth=3, marker='o', markersize=8)
plt.title('Learning Rate Schedule for Fine-tuning', fontweight='bold', fontsize=14)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Learning Rate', fontsize=12)
plt.yscale('log')
plt.grid(True, alpha=0.3)

# Add annotations
plt.axvspan(0, 5, alpha=0.2, color='red', label='Stage 1: Warm-up')
plt.axvspan(5, 15, alpha=0.2, color='orange', label='Stage 2: Fine-tuning')
plt.axvspan(15, 25, alpha=0.2, color='green', label='Stage 3: Fine fine-tuning')
plt.legend(fontsize=11)

plt.tight_layout()
plt.show()

6.6.4 Complete Fine-tuning Pipeline

End-to-End Example:

# Complete fine-tuning workflow
def fine_tuning_pipeline(X_train, y_train, X_val, y_val, num_classes=10):
    """
    Complete fine-tuning pipeline dengan best practices
    """
    # Stage 1: Feature extraction
    print("\n" + "="*70)
    print("STAGE 1: FEATURE EXTRACTION (5 epochs)")
    print("="*70)

    base_model = VGG16(weights='imagenet', include_top=False,
                       input_shape=(224, 224, 3))
    base_model.trainable = False

    model = build_transfer_model(base_model, num_classes)
    model.compile(
        optimizer=keras.optimizers.Adam(lr=0.001),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )

    # Train classifier only
    history_stage1 = model.fit(
        X_train, y_train,
        batch_size=32,
        epochs=5,
        validation_data=(X_val, y_val),
        verbose=1
    )

    # Stage 2: Fine-tuning
    print("\n" + "="*70)
    print("STAGE 2: FINE-TUNING LAST LAYERS (10 epochs)")
    print("="*70)

    # Unfreeze last conv block
    base_model.trainable = True
    for layer in base_model.layers[:-4]:
        layer.trainable = False

    # Recompile with lower learning rate
    model.compile(
        optimizer=keras.optimizers.Adam(lr=0.0001),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )

    # Continue training
    history_stage2 = model.fit(
        X_train, y_train,
        batch_size=32,
        epochs=10,
        validation_data=(X_val, y_val),
        verbose=1
    )

    return model, history_stage1, history_stage2

# Note: This would be run with actual data
print("Fine-tuning pipeline defined.")
print("\nBest Practices Applied:")
print("1. Two-stage training: feature extraction → fine-tuning")
print("2. Gradual unfreezing: top layers first")
print("3. Learning rate reduction: 0.001 → 0.0001")
print("4. Batch size consideration: smaller for fine-tuning")

6.6.5 Monitoring Fine-tuning

Key Metrics to Track:

# Simulate training history for visualization
np.random.seed(42)

epochs = 20
stage1_epochs = 5

# Stage 1: Feature extraction
train_acc_s1 = np.linspace(0.50, 0.75, stage1_epochs) + np.random.normal(0, 0.02, stage1_epochs)
val_acc_s1 = np.linspace(0.48, 0.72, stage1_epochs) + np.random.normal(0, 0.02, stage1_epochs)
train_loss_s1 = np.linspace(1.2, 0.7, stage1_epochs) + np.random.normal(0, 0.05, stage1_epochs)
val_loss_s1 = np.linspace(1.3, 0.75, stage1_epochs) + np.random.normal(0, 0.05, stage1_epochs)

# Stage 2: Fine-tuning
train_acc_s2 = np.linspace(0.75, 0.92, epochs-stage1_epochs) + np.random.normal(0, 0.01, epochs-stage1_epochs)
val_acc_s2 = np.linspace(0.72, 0.88, epochs-stage1_epochs) + np.random.normal(0, 0.015, epochs-stage1_epochs)
train_loss_s2 = np.linspace(0.7, 0.25, epochs-stage1_epochs) + np.random.normal(0, 0.03, epochs-stage1_epochs)
val_loss_s2 = np.linspace(0.75, 0.35, epochs-stage1_epochs) + np.random.normal(0, 0.04, epochs-stage1_epochs)

# Combine
train_acc = np.concatenate([train_acc_s1, train_acc_s2])
val_acc = np.concatenate([val_acc_s1, val_acc_s2])
train_loss = np.concatenate([train_loss_s1, train_loss_s2])
val_loss = np.concatenate([val_loss_s1, val_loss_s2])

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Accuracy
axes[0, 0].plot(train_acc, label='Train', linewidth=2.5, color='blue')
axes[0, 0].plot(val_acc, label='Validation', linewidth=2.5, color='orange')
axes[0, 0].axvline(x=stage1_epochs-0.5, color='red', linestyle='--',
                   linewidth=2, label='Fine-tuning starts')
axes[0, 0].set_title('Accuracy over Training', fontweight='bold', fontsize=13)
axes[0, 0].set_xlabel('Epoch', fontsize=11)
axes[0, 0].set_ylabel('Accuracy', fontsize=11)
axes[0, 0].legend(fontsize=11)
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].text(2, 0.85, 'Stage 1:\nFeature\nExtraction',
               fontsize=10, bbox=dict(boxstyle='round', facecolor='wheat'))
axes[0, 0].text(12, 0.85, 'Stage 2:\nFine-tuning',
               fontsize=10, bbox=dict(boxstyle='round', facecolor='lightgreen'))

# Loss
axes[0, 1].plot(train_loss, label='Train', linewidth=2.5, color='blue')
axes[0, 1].plot(val_loss, label='Validation', linewidth=2.5, color='orange')
axes[0, 1].axvline(x=stage1_epochs-0.5, color='red', linestyle='--',
                   linewidth=2, label='Fine-tuning starts')
axes[0, 1].set_title('Loss over Training', fontweight='bold', fontsize=13)
axes[0, 1].set_xlabel('Epoch', fontsize=11)
axes[0, 1].set_ylabel('Loss', fontsize=11)
axes[0, 1].legend(fontsize=11)
axes[0, 1].grid(True, alpha=0.3)

# Train-Val Gap (Overfitting indicator)
acc_gap = train_acc - val_acc
axes[1, 0].plot(acc_gap, linewidth=2.5, color='red')
axes[1, 0].axhline(y=0.05, color='green', linestyle='--',
                  linewidth=2, label='Acceptable gap')
axes[1, 0].axvline(x=stage1_epochs-0.5, color='red', linestyle='--', linewidth=2)
axes[1, 0].set_title('Train-Val Accuracy Gap (Overfitting)', fontweight='bold', fontsize=13)
axes[1, 0].set_xlabel('Epoch', fontsize=11)
axes[1, 0].set_ylabel('Accuracy Gap', fontsize=11)
axes[1, 0].legend(fontsize=11)
axes[1, 0].grid(True, alpha=0.3)

# Learning Rate Schedule
lr_schedule = np.concatenate([np.full(stage1_epochs, 0.001),
                             np.full(epochs-stage1_epochs, 0.0001)])
axes[1, 1].plot(lr_schedule, linewidth=3, marker='o', markersize=8, color='purple')
axes[1, 1].set_title('Learning Rate Schedule', fontweight='bold', fontsize=13)
axes[1, 1].set_xlabel('Epoch', fontsize=11)
axes[1, 1].set_ylabel('Learning Rate', fontsize=11)
axes[1, 1].set_yscale('log')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nFine-tuning Monitoring Checklist:")
print("✓ Training accuracy improving steadily")
print("✓ Validation accuracy following training (not diverging)")
print("✓ Train-val gap reasonable (<5-10%)")
print("✓ Loss decreasing consistently")
print("✓ Learning rate reduced at fine-tuning stage")

6.7 Review & Exercises

6.7.1 Rangkuman Bab

Key Concepts:

CNN Architecture
- Convolutional layers: local connectivity, parameter sharing
- Pooling layers: downsampling, translation invariance
- Hierarchical feature learning: edges → parts → objects
Building CNNs
- MNIST: simple grayscale images
- CIFAR-10: color images, more complex
- Architecture design: depth, width, skip connections
Transfer Learning
- Pre-trained models: VGG, ResNet, MobileNet
- Feature extraction: freeze base, train classifier
- Domain adaptation: ImageNet → custom domain
Data Augmentation
- Geometric: rotation, shift, flip, zoom
- Color: brightness, contrast, saturation
- Best practices: domain-appropriate, reasonable ranges
Fine-tuning
- Gradual unfreezing: top layers first
- Discriminative learning rates: lower for base
- Two-stage training: feature extraction → fine-tuning

6.7.2 Soal Review (15 Pertanyaan)

📝 Review Questions

Conceptual Understanding:

Jelaskan mengapa CNN lebih efektif untuk image data dibandingkan fully-connected MLP. Diskusikan aspek local connectivity, parameter sharing, dan translation invariance.
Apa perbedaan antara max pooling dan average pooling? Kapan Anda akan memilih satu dibanding yang lain?
Dalam arsitektur CNN, mengapa biasanya jumlah filters meningkat (32 → 64 → 128) sementara spatial dimensions menurun (224 → 112 → 56)?
Jelaskan konsep “hierarchical feature learning” dalam CNN. Berikan contoh features yang dipelajari di different layers.
Apa yang dimaksud dengan “receptive field” dalam CNN? Bagaimana depth network mempengaruhi receptive field?

Transfer Learning:

Kapan sebaiknya menggunakan transfer learning vs training CNN from scratch? Diskusikan trade-offs dalam hal dataset size, computational resources, dan domain similarity.
Jelaskan perbedaan antara “feature extraction” dan “fine-tuning” dalam transfer learning. Kapan masing-masing approach appropriate?
Mengapa learning rate untuk fine-tuning biasanya lebih kecil (0.0001) dibanding training from scratch (0.001)?
Apa itu “catastrophic forgetting” dalam fine-tuning? Bagaimana cara mencegahnya?
Bandingkan VGG16, ResNet50, dan MobileNetV2 dalam hal architecture, parameters, dan use cases.

Data Augmentation:

Jelaskan mengapa data augmentation efektif untuk reducing overfitting. Apa limitasinya?
Berikan contoh transformations yang TIDAK appropriate untuk: (a) medical X-ray images, (b) document OCR, (c) face recognition.
Apa perbedaan antara “online” (on-the-fly) dan “offline” (pre-computed) augmentation? Apa advantages masing-masing?

Practical Applications:

Desain arsitektur CNN untuk mobile device dengan constraints: model size <10MB, inference time <100ms. Jelaskan design choices Anda.
Anda memiliki dataset 500 images dengan 10 classes untuk medical image classification. Outline complete workflow: data preparation, model selection, training strategy, evaluation.

6.7.3 Coding Exercises (5 Soal)

💻 Coding Exercises

Exercise 1: Build Custom CNN Architecture

Build dan train CNN untuk CIFAR-10 dengan requirements:

Minimal 3 convolutional blocks
Batch normalization setelah setiap Conv layer
Dropout untuk regularization
Data augmentation
Achieve >70% test accuracy

# Your code here
def create_custom_cnn():
    # TODO: Implement architecture
    pass

# Train and evaluate

Exercise 2: Transfer Learning Comparison

Compare 3 pre-trained models (VGG16, ResNet50, MobileNetV2) untuk CIFAR-10:

Implement transfer learning untuk masing-masing
Train selama 10 epochs
Compare: accuracy, training time, model size, inference speed
Visualize results dengan comparative plots

# Your code here
models_to_compare = ['VGG16', 'ResNet50', 'MobileNetV2']

# TODO: Implement comparison framework

Exercise 3: Data Augmentation Study

Investigate effect of data augmentation: 1. Train CNN tanpa augmentation 2. Train dengan augmentation (rotation, flip, zoom) 3. Train dengan aggressive augmentation 4. Compare: training curves, test accuracy, overfitting 5. Visualize augmented samples

# Your code here
augmentation_configs = [
    None,  # No augmentation
    'standard',  # Standard augmentation
    'aggressive'  # Aggressive augmentation
]

# TODO: Implement augmentation study

Exercise 4: Fine-tuning Pipeline

Implement complete fine-tuning pipeline: 1. Load pre-trained model (pilihan bebas) 2. Stage 1: Feature extraction (5 epochs) 3. Stage 2: Fine-tune last conv block (10 epochs) 4. Stage 3: Full fine-tuning (5 epochs) 5. Track dan visualize metrics tiap stage 6. Compare dengan training from scratch

# Your code here
def fine_tuning_stages(base_model, X_train, y_train, X_val, y_val):
    # TODO: Implement 3-stage fine-tuning
    pass

Exercise 5: CNN Visualization

Implement CNN interpretation techniques: 1. Visualize learned filters (first conv layer) 2. Generate feature maps untuk sample images 3. Create activation heatmaps (Grad-CAM style) 4. Analyze: what patterns does the network learn? 5. Compare feature maps dari different layers

# Your code here
def visualize_cnn_internals(model, image):
    # TODO: Implement visualization
    # - Filters
    # - Feature maps
    # - Activation heatmaps
    pass

6.7.4 Further Reading & Resources

Books:

“Deep Learning for Computer Vision” - Rajalingappaa Shanmugamani
“Computer Vision: Algorithms and Applications” - Richard Szeliski
“Hands-On Transfer Learning with Python” - Dipanjan Sarkar

Papers (Must-Read):

AlexNet: “ImageNet Classification with Deep CNNs” (Krizhevsky et al., 2012)
VGGNet: “Very Deep Convolutional Networks” (Simonyan & Zisserman, 2014)
ResNet: “Deep Residual Learning” (He et al., 2015)
MobileNet: “MobileNets: Efficient CNNs” (Howard et al., 2017)
EfficientNet: “Rethinking Model Scaling” (Tan & Le, 2019)

Online Resources:

CS231n: Convolutional Neural Networks (Stanford)
Fast.ai Practical Deep Learning for Coders
PyImageSearch Blog
Papers with Code - Computer Vision

Tools & Libraries:

TensorFlow / Keras
PyTorch / torchvision
Albumentations (data augmentation)
OpenCV (image processing)
Weights & Biases (experiment tracking)

🎓 Next Chapter Preview

Bab 7: Recurrent Neural Networks & LSTM

Setelah menguasai CNN untuk spatial data (images), selanjutnya kita akan explore:

RNN untuk sequential data (time series, text)
LSTM & GRU: handling long-term dependencies
Bidirectional RNNs
Sequence-to-sequence models
Applications: time series forecasting, text generation

Sequential data membutuhkan architecture yang bisa “remember” - mari kita explore recurrent networks!

--- title: "Bab 6: Convolutional Neural Networks & Transfer Learning" subtitle: "Computer Vision dan Deep Learning untuk Image Recognition" number-sections: false --- # Bab 6: Convolutional Neural Networks & Transfer Learning {#sec-chapter-06} ::: {.callout-note} ## 🎯 Hasil Pembelajaran (Learning Outcomes) Setelah mempelajari bab ini, Anda akan mampu: 1. **Memahami** arsitektur CNN dan komponen fundamentalnya (convolutional layers, pooling, filters) 2. **Mengimplementasikan** CNN dari scratch menggunakan Keras dan PyTorch untuk klasifikasi gambar 3. **Menerapkan** transfer learning dengan pre-trained models (VGG16, ResNet, MobileNet) 4. **Menggunakan** data augmentation techniques untuk meningkatkan generalisasi model 5. **Melakukan** fine-tuning strategies untuk mengadaptasi pre-trained models ke domain spesifik 6. **Mengevaluasi** performa CNN dan membandingkan berbagai arsitektur untuk computer vision tasks ::: ## 6.1 Pengantar Computer Vision dan CNN ### 6.1.1 Mengapa Specialized Architecture untuk Images? Di Chapter 5, kita belajar MLP (Multilayer Perceptron) yang bekerja dengan flatten input. Namun, untuk gambar, pendekatan ini memiliki **masalah fundamental**: **Problem dengan MLP untuk Images:** Bayangkan gambar 224×224×3 (RGB): - **Input neurons**: 224 × 224 × 3 = 150,528 neurons - **Hidden layer** (1000 neurons): 150,528 × 1000 = **150 juta parameters** - Hanya untuk **satu layer**! **Masalah:** 1. **Overfitting**: Terlalu banyak parameters untuk data terbatas 2. **Computational cost**: Training sangat lambat 3. **Loss of spatial structure**: Flatten menghilangkan informasi lokasi 4. **No translation invariance**: Objek di posisi berbeda = pattern berbeda ::: {.callout-tip} ## 💡 Intuisi CNN CNN mengatasi masalah di atas dengan: - **Local connectivity**: Setiap neuron hanya connect ke region kecil - **Parameter sharing**: Filter yang sama digunakan di seluruh gambar - **Hierarchical learning**: Layer awal deteksi edges, layer dalam deteksi objects - **Translation invariance**: Deteksi pattern di posisi manapun Hasil: CNN dengan **99% lebih sedikit parameters** daripada fully-connected MLP! ::: ### 6.1.2 Evolution of Computer Vision **Era Pre-Deep Learning (sebelum 2012):** - Manual feature extraction (SIFT, HOG, SURF) - Classical ML (SVM, Random Forest) pada engineered features - **ImageNet 2011**: Error rate ~26% **Deep Learning Revolution (2012-sekarang):** - **AlexNet (2012)**: Error 15.3% - first CNN breakthrough - **VGGNet (2014)**: Deeper networks (16-19 layers) - **ResNet (2015)**: Skip connections, 152 layers, error 3.57% - **EfficientNet (2019)**: Optimal scaling, SOTA performance - **Vision Transformers (2020)**: Attention mechanisms untuk vision ::: {.callout-note} ## 📊 Computer Vision Applications Today **Industry Applications:** - **Healthcare**: Medical imaging analysis, tumor detection - **Autonomous Vehicles**: Object detection, lane recognition - **Retail**: Visual search, inventory management - **Security**: Face recognition, surveillance systems - **Manufacturing**: Quality control, defect detection - **Agriculture**: Crop disease identification, yield prediction ::: ### 6.1.3 Image Fundamentals untuk Deep Learning **Representasi Digital Image:** ```{python} #| echo: true #| code-fold: false import numpy as np import matplotlib.pyplot as plt from PIL import Image import requests from io import BytesIO # Create simple synthetic image untuk demonstrasi img_array = np.random.randint(0, 255, (64, 64, 3), dtype=np.uint8) # Visualisasi RGB channels fig, axes = plt.subplots(1, 4, figsize=(16, 4)) # Original image axes[0].imshow(img_array) axes[0].set_title('Original Image\nShape: (64, 64, 3)', fontweight='bold') axes[0].axis('off') # Individual channels channel_names = ['Red Channel', 'Green Channel', 'Blue Channel'] colors = ['Reds', 'Greens', 'Blues'] for i in range(3): axes[i+1].imshow(img_array[:, :, i], cmap=colors[i]) axes[i+1].set_title(f'{channel_names[i]}\nShape: (64, 64)', fontweight='bold') axes[i+1].axis('off') plt.tight_layout() plt.show() # Print image properties print(f"Image shape: {img_array.shape}") print(f"Data type: {img_array.dtype}") print(f"Value range: [{img_array.min()}, {img_array.max()}]") print(f"Total pixels: {img_array.shape[0] * img_array.shape[1]}") print(f"Total values: {img_array.size}") ``` **Image Preprocessing untuk CNN:** ```{python} #| echo: true #| code-fold: false import tensorflow as tf from tensorflow import keras # Preprocessing pipeline def preprocess_image(image_array): """ Standard preprocessing untuk CNN """ # 1. Normalize ke [0, 1] normalized = image_array.astype('float32') / 255.0 # 2. Standardize (zero mean, unit variance) - optional mean = np.array([0.485, 0.456, 0.406]) # ImageNet statistics std = np.array([0.229, 0.224, 0.225]) standardized = (normalized - mean) / std return normalized, standardized # Demonstrasi normalized, standardized = preprocess_image(img_array) print("Original range:", img_array.min(), "-", img_array.max()) print("Normalized range:", normalized.min(), "-", normalized.max()) print("Standardized range:", standardized.min(), "-", standardized.max()) ``` **Spatial Relationships dalam Images:** ```{mermaid} graph TD A[Raw Pixels] --> B[Local Patterns edges, corners, textures] B --> C[Object Parts eyes, wheels, windows] C --> D[Objects faces, cars, buildings] D --> E[Scenes street, office, nature] style A fill:#ff9999 style B fill:#ffcc99 style C fill:#ffff99 style D fill:#99ff99 style E fill:#99ccff ``` ### 6.1.4 Challenges dalam Image Classification **Common Challenges:** 1. **Intra-class Variation**: Objek sama, penampilan berbeda 2. **Viewpoint Variation**: Sudut pandang berbeda 3. **Illumination Changes**: Pencahayaan berbeda 4. **Occlusion**: Objek terhalang 5. **Background Clutter**: Latar belakang complex 6. **Deformation**: Perubahan bentuk (pose) ```{python} #| echo: true #| code-fold: false # Demonstrasi transformations yang CNN harus handle from scipy import ndimage # Create simple shape shape = np.zeros((100, 100)) shape[30:70, 30:70] = 1 fig, axes = plt.subplots(2, 3, figsize=(15, 10)) # Original axes[0, 0].imshow(shape, cmap='gray') axes[0, 0].set_title('Original', fontweight='bold') # Rotation rotated = ndimage.rotate(shape, 45, reshape=False) axes[0, 1].imshow(rotated, cmap='gray') axes[0, 1].set_title('Rotation (45°)', fontweight='bold') # Scaling from scipy.ndimage import zoom scaled = zoom(shape, 0.7) padded_scaled = np.zeros_like(shape) y_offset = (shape.shape[0] - scaled.shape[0]) // 2 x_offset = (shape.shape[1] - scaled.shape[1]) // 2 padded_scaled[y_offset:y_offset+scaled.shape[0], x_offset:x_offset+scaled.shape[1]] = scaled axes[0, 2].imshow(padded_scaled, cmap='gray') axes[0, 2].set_title('Scaling (0.7x)', fontweight='bold') # Translation translated = np.roll(shape, (20, 20), axis=(0, 1)) axes[1, 0].imshow(translated, cmap='gray') axes[1, 0].set_title('Translation', fontweight='bold') # Noise noisy = shape + np.random.normal(0, 0.1, shape.shape) axes[1, 1].imshow(noisy, cmap='gray') axes[1, 1].set_title('Gaussian Noise', fontweight='bold') # Blur blurred = ndimage.gaussian_filter(shape, sigma=3) axes[1, 2].imshow(blurred, cmap='gray') axes[1, 2].set_title('Gaussian Blur', fontweight='bold') for ax in axes.flat: ax.axis('off') plt.tight_layout() plt.show() ``` ## 6.2 Arsitektur CNN: Building Blocks ### 6.2.1 Convolutional Layer: Core Operation **Konsep Convolution:** Convolution adalah operasi matematika yang mengaplikasikan **filter (kernel)** ke image untuk mengekstrak features. **Convolution Operation:** ```{python} #| echo: true #| code-fold: false import numpy as np import matplotlib.pyplot as plt # Simple 5x5 image image = np.array([ [1, 2, 3, 2, 1], [2, 3, 4, 3, 2], [3, 4, 5, 4, 3], [2, 3, 4, 3, 2], [1, 2, 3, 2, 1] ]) # Edge detection filter (vertical edges) kernel_vertical = np.array([ [-1, 0, 1], [-2, 0, 2], [-1, 0, 1] ]) # Edge detection filter (horizontal edges) kernel_horizontal = np.array([ [-1, -2, -1], [ 0, 0, 0], [ 1, 2, 1] ]) # Manual convolution operation def convolve2d(image, kernel): """ Simple 2D convolution (without padding) """ h, w = image.shape kh, kw = kernel.shape output_h = h - kh + 1 output_w = w - kw + 1 output = np.zeros((output_h, output_w)) for i in range(output_h): for j in range(output_w): region = image[i:i+kh, j:j+kw] output[i, j] = np.sum(region * kernel) return output # Apply filters vertical_edges = convolve2d(image, kernel_vertical) horizontal_edges = convolve2d(image, kernel_horizontal) # Visualize fig, axes = plt.subplots(2, 3, figsize=(15, 10)) axes[0, 0].imshow(image, cmap='gray') axes[0, 0].set_title('Original Image\n(5×5)', fontweight='bold', fontsize=12) axes[0, 1].imshow(kernel_vertical, cmap='seismic', vmin=-2, vmax=2) axes[0, 1].set_title('Vertical Edge Filter\n(Sobel-X)', fontweight='bold', fontsize=12) for i in range(3): for j in range(3): axes[0, 1].text(j, i, f'{kernel_vertical[i, j]}', ha='center', va='center', fontsize=14, fontweight='bold') axes[0, 2].imshow(vertical_edges, cmap='gray') axes[0, 2].set_title('Vertical Edges\n(3×3 output)', fontweight='bold', fontsize=12) axes[1, 0].imshow(image, cmap='gray') axes[1, 0].set_title('Original Image\n(5×5)', fontweight='bold', fontsize=12) axes[1, 1].imshow(kernel_horizontal, cmap='seismic', vmin=-2, vmax=2) axes[1, 1].set_title('Horizontal Edge Filter\n(Sobel-Y)', fontweight='bold', fontsize=12) for i in range(3): for j in range(3): axes[1, 1].text(j, i, f'{kernel_horizontal[i, j]}', ha='center', va='center', fontsize=14, fontweight='bold') axes[1, 2].imshow(horizontal_edges, cmap='gray') axes[1, 2].set_title('Horizontal Edges\n(3×3 output)', fontweight='bold', fontsize=12) for ax in axes.flat: ax.axis('off') plt.tight_layout() plt.show() print("\nConvolution Mathematics:") print("Input shape:", image.shape) print("Kernel shape:", kernel_vertical.shape) print("Output shape:", vertical_edges.shape) print("\nOutput size formula: (H - K + 1) × (W - K + 1)") print(f"= ({image.shape[0]} - {kernel_vertical.shape[0]} + 1) × ({image.shape[1]} - {kernel_vertical.shape[1]} + 1)") print(f"= {vertical_edges.shape}") ``` **Convolutional Layer Parameters:** ```{python} #| echo: true #| code-fold: false # Visualize convolutional layer parameters def visualize_conv_params(): """ Visualisasi parameter convolutional layer """ fig, axes = plt.subplots(2, 2, figsize=(14, 12)) # 1. Filter Size Effect ax = axes[0, 0] filters = { '3×3': np.random.randn(3, 3), '5×5': np.random.randn(5, 5), '7×7': np.random.randn(7, 7) } positions = [0, 4, 9] for idx, (name, filt) in enumerate(filters.items()): ax.imshow(filt, cmap='coolwarm', extent=[positions[idx], positions[idx]+filt.shape[0], 0, filt.shape[0]]) ax.text(positions[idx]+filt.shape[0]/2, -1, name, ha='center', fontweight='bold', fontsize=11) ax.set_xlim(-1, 16) ax.set_ylim(-2, 8) ax.set_title('Filter Size (Kernel Size)', fontweight='bold', fontsize=13) ax.axis('off') # 2. Stride Effect ax = axes[0, 1] input_size = 7 kernel_size = 3 strides = [1, 2] for s_idx, stride in enumerate(strides): y_offset = s_idx * 8 # Draw input for i in range(input_size): for j in range(input_size): rect = plt.Rectangle((j, y_offset+i), 1, 1, fill=False, edgecolor='blue', linewidth=1.5) ax.add_patch(rect) # Draw kernel positions output_size = (input_size - kernel_size) // stride + 1 for i in range(output_size): for j in range(output_size): y_pos = y_offset + i * stride x_pos = j * stride rect = plt.Rectangle((x_pos, y_pos), kernel_size, kernel_size, fill=False, edgecolor='red', linewidth=2.5) ax.add_patch(rect) ax.text(-1.5, y_offset+3, f'Stride={stride}\nOutput: {output_size}×{output_size}', ha='right', fontweight='bold', fontsize=10) ax.set_xlim(-3, 8) ax.set_ylim(-1, 17) ax.set_title('Stride Effect on Output Size', fontweight='bold', fontsize=13) ax.axis('off') # 3. Padding Types ax = axes[1, 0] padding_types = { 'Valid (no padding)': {'input': 5, 'kernel': 3, 'padding': 0}, 'Same (zero padding)': {'input': 5, 'kernel': 3, 'padding': 1} } for p_idx, (name, params) in enumerate(padding_types.items()): y_offset = p_idx * 8 inp_size = params['input'] pad = params['padding'] # Draw padding if pad > 0: for i in range(inp_size + 2*pad): for j in range(inp_size + 2*pad): if i < pad or i >= inp_size+pad or j < pad or j >= inp_size+pad: rect = plt.Rectangle((j, y_offset+i), 1, 1, fill=True, facecolor='lightgray', edgecolor='gray', linewidth=1) ax.add_patch(rect) # Draw input for i in range(inp_size): for j in range(inp_size): rect = plt.Rectangle((j+pad, y_offset+i+pad), 1, 1, fill=True, facecolor='lightblue', edgecolor='blue', linewidth=1.5) ax.add_patch(rect) ax.text(-1.5, y_offset+3, name, ha='right', fontweight='bold', fontsize=10) ax.set_xlim(-3, 8) ax.set_ylim(-1, 17) ax.set_title('Padding: Valid vs Same', fontweight='bold', fontsize=13) ax.axis('off') # 4. Number of Filters ax = axes[1, 1] num_filters = [32, 64, 128] for f_idx, nf in enumerate(num_filters): y_pos = f_idx * 3 # Draw input rect = plt.Rectangle((0, y_pos), 2, 2, fill=True, facecolor='lightblue', edgecolor='blue', linewidth=2) ax.add_patch(rect) # Draw filters for i in range(min(nf, 10)): x_pos = 3 + i * 0.3 rect = plt.Rectangle((x_pos, y_pos), 0.25, 2, fill=True, facecolor='orange', edgecolor='red', linewidth=1) ax.add_patch(rect) ax.text(7, y_pos+1, f'{nf} filters', ha='left', va='center', fontweight='bold', fontsize=11) ax.set_xlim(-1, 11) ax.set_ylim(-1, 10) ax.set_title('Number of Filters (Feature Maps)', fontweight='bold', fontsize=13) ax.axis('off') plt.tight_layout() plt.show() visualize_conv_params() ``` **Output Size Calculation:** ```{python} #| echo: true #| code-fold: false def calculate_output_size(input_size, kernel_size, stride=1, padding=0): """ Formula: O = (I - K + 2P) / S + 1 O: Output size I: Input size K: Kernel size P: Padding S: Stride """ output = (input_size - kernel_size + 2*padding) // stride + 1 return output # Examples examples = [ {'input': 224, 'kernel': 3, 'stride': 1, 'padding': 0}, {'input': 224, 'kernel': 3, 'stride': 1, 'padding': 1}, {'input': 224, 'kernel': 5, 'stride': 2, 'padding': 2}, {'input': 56, 'kernel': 3, 'stride': 2, 'padding': 1}, ] print("Convolutional Layer Output Size Calculations:") print("=" * 80) print(f"{'Input':<10} {'Kernel':<10} {'Stride':<10} {'Padding':<10} {'Output':<10}") print("-" * 80) for ex in examples: output = calculate_output_size(ex['input'], ex['kernel'], ex['stride'], ex['padding']) print(f"{ex['input']:<10} {ex['kernel']:<10} {ex['stride']:<10} {ex['padding']:<10} {output:<10}") ``` ### 6.2.2 Activation Functions untuk CNN **ReLU: Standard Choice untuk CNN** ```{python} #| echo: true #| code-fold: false # Visualize activation functions x = np.linspace(-5, 5, 1000) activations = { 'ReLU': lambda x: np.maximum(0, x), 'Leaky ReLU': lambda x: np.where(x > 0, x, 0.01*x), 'ELU': lambda x: np.where(x > 0, x, np.exp(x) - 1), 'Swish': lambda x: x * (1 / (1 + np.exp(-x))) } fig, axes = plt.subplots(2, 2, figsize=(14, 10)) axes = axes.flatten() for idx, (name, func) in enumerate(activations.items()): y = func(x) axes[idx].plot(x, y, linewidth=2.5, color='blue') axes[idx].axhline(y=0, color='k', linewidth=0.8, linestyle='--', alpha=0.3) axes[idx].axvline(x=0, color='k', linewidth=0.8, linestyle='--', alpha=0.3) axes[idx].grid(True, alpha=0.3) axes[idx].set_title(f'{name} Activation', fontweight='bold', fontsize=13) axes[idx].set_xlabel('Input', fontsize=11) axes[idx].set_ylabel('Output', fontsize=11) # Add formula formulas = { 'ReLU': r'$f(x) = \max(0, x)$', 'Leaky ReLU': r'$f(x) = \max(0.01x, x)$', 'ELU': r'$f(x) = x$ if $x>0$ else $e^x-1$', 'Swish': r'$f(x) = x \cdot \sigma(x)$' } axes[idx].text(0.05, 0.95, formulas[name], transform=axes[idx].transAxes, fontsize=11, verticalalignment='top', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5)) plt.tight_layout() plt.show() ``` ::: {.callout-tip} ## 💡 Mengapa ReLU? **Advantages:** - **Computational efficiency**: Simple max(0, x) - **Sparse activation**: ~50% neurons activated - **No vanishing gradient**: Gradient 1 untuk x > 0 - **Empirically works well**: State-of-the-art results **Disadvantages:** - **Dying ReLU**: Neurons can become inactive (always 0) - **Not zero-centered**: Can slow convergence **Solution**: Leaky ReLU, ELU untuk addressing dying ReLU problem ::: ### 6.2.3 Pooling Layers: Downsampling **Pooling Operation:** ```{python} #| echo: true #| code-fold: false # Demonstrate pooling operations feature_map = np.array([ [1, 3, 2, 4], [5, 6, 1, 3], [2, 1, 4, 7], [3, 2, 5, 8] ]) def max_pooling(input_map, pool_size=2): """Max pooling operation""" h, w = input_map.shape output_h = h // pool_size output_w = w // pool_size output = np.zeros((output_h, output_w)) for i in range(output_h): for j in range(output_w): region = input_map[i*pool_size:(i+1)*pool_size, j*pool_size:(j+1)*pool_size] output[i, j] = np.max(region) return output def average_pooling(input_map, pool_size=2): """Average pooling operation""" h, w = input_map.shape output_h = h // pool_size output_w = w // pool_size output = np.zeros((output_h, output_w)) for i in range(output_h): for j in range(output_w): region = input_map[i*pool_size:(i+1)*pool_size, j*pool_size:(j+1)*pool_size] output[i, j] = np.mean(region) return output max_pooled = max_pooling(feature_map) avg_pooled = average_pooling(feature_map) # Visualize fig, axes = plt.subplots(1, 3, figsize=(15, 5)) # Original im0 = axes[0].imshow(feature_map, cmap='YlOrRd', vmin=0, vmax=8) axes[0].set_title('Feature Map\n(4×4)', fontweight='bold', fontsize=13) for i in range(4): for j in range(4): axes[0].text(j, i, f'{feature_map[i, j]:.0f}', ha='center', va='center', fontsize=14, fontweight='bold') axes[0].set_xticks([]) axes[0].set_yticks([]) # Max pooling im1 = axes[1].imshow(max_pooled, cmap='YlOrRd', vmin=0, vmax=8) axes[1].set_title('Max Pooling\n(2×2 pool)', fontweight='bold', fontsize=13) for i in range(2): for j in range(2): axes[1].text(j, i, f'{max_pooled[i, j]:.0f}', ha='center', va='center', fontsize=14, fontweight='bold') axes[1].set_xticks([]) axes[1].set_yticks([]) # Average pooling im2 = axes[2].imshow(avg_pooled, cmap='YlOrRd', vmin=0, vmax=8) axes[2].set_title('Average Pooling\n(2×2 pool)', fontweight='bold', fontsize=13) for i in range(2): for j in range(2): axes[2].text(j, i, f'{avg_pooled[i, j]:.1f}', ha='center', va='center', fontsize=14, fontweight='bold') axes[2].set_xticks([]) axes[2].set_yticks([]) plt.tight_layout() plt.show() print("Pooling Reduces Spatial Dimensions:") print(f"Input: {feature_map.shape} → Max Pool: {max_pooled.shape}") print(f"Size reduction: {feature_map.size} → {max_pooled.size} (75% reduction)") ``` **Pooling Benefits:** ```{mermaid} graph LR A[Pooling Layer] --> B[Dimensionality Reduction] A --> C[Translation Invariance] A --> D[Computational Efficiency] A --> E[Overfitting Prevention] style A fill:#99ccff style B fill:#ffcc99 style C fill:#ffcc99 style D fill:#ffcc99 style E fill:#ffcc99 ``` ### 6.2.4 Complete CNN Architecture **Typical CNN Architecture:** ```{mermaid} %%| fig-cap: Complete CNN Architecture flowchart TD A["Input\n224x224x3"] --> B["Conv1\n3x3, 32"] B --> C["ReLU"] C --> D["Pool1\n2x2"] D --> E["Conv2\n3x3, 64"] E --> F["ReLU"] F --> G["Pool2\n2x2"] G --> H["Conv3\n3x3, 128"] H --> I["ReLU"] I --> J["Pool3\n2x2"] J --> K["Flatten"] K --> L["Dense\n512"] L --> M["ReLU"] M --> N["Dropout"] N --> O["Output\n10 classes"] style A fill:#ff9999 style B fill:#ffcc99 style E fill:#ffcc99 style H fill:#ffcc99 style K fill:#99ff99 style L fill:#99ccff style O fill:#cc99ff ``` **Implementation: Simple CNN** ```{python} #| echo: true #| code-fold: false import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers # Build simple CNN def create_simple_cnn(input_shape=(32, 32, 3), num_classes=10): """ Simple CNN for image classification """ model = keras.Sequential([ # Block 1 layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=input_shape, name='conv1'), layers.BatchNormalization(name='bn1'), layers.MaxPooling2D((2, 2), name='pool1'), # Block 2 layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv2'), layers.BatchNormalization(name='bn2'), layers.MaxPooling2D((2, 2), name='pool2'), # Block 3 layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv3'), layers.BatchNormalization(name='bn3'), layers.MaxPooling2D((2, 2), name='pool3'), # Classifier layers.Flatten(name='flatten'), layers.Dense(512, activation='relu', name='fc1'), layers.Dropout(0.5, name='dropout'), layers.Dense(num_classes, activation='softmax', name='output') ], name='SimpleCNN') return model # Create model model = create_simple_cnn() # Display architecture model.summary() # Calculate parameters total_params = model.count_params() trainable_params = sum([tf.size(w).numpy() for w in model.trainable_weights]) print(f"\nTotal parameters: {total_params:,}") print(f"Trainable parameters: {trainable_params:,}") ``` **Visualize Feature Maps:** ```{python} #| echo: true #| code-fold: false # Visualize what CNN learns from tensorflow.keras.preprocessing import image # Create dummy input dummy_input = np.random.rand(1, 32, 32, 3).astype('float32') # Get intermediate layer outputs layer_outputs = [layer.output for layer in model.layers[:6]] # First 6 layers activation_model = keras.Model(inputs=model.input, outputs=layer_outputs) # Get activations activations = activation_model.predict(dummy_input, verbose=0) # Visualize first convolutional layer fig, axes = plt.subplots(4, 8, figsize=(16, 8)) axes = axes.flatten() conv1_activation = activations[0][0] # First conv layer output for i in range(min(32, len(axes))): if i < conv1_activation.shape[-1]: axes[i].imshow(conv1_activation[:, :, i], cmap='viridis') axes[i].set_title(f'Filter {i+1}', fontsize=9) axes[i].axis('off') plt.suptitle('Conv1 Feature Maps (32 filters)', fontweight='bold', fontsize=14) plt.tight_layout() plt.show() ``` ## 6.3 Building CNN from Scratch ### 6.3.1 Dataset: MNIST Handwritten Digits **Load and Explore MNIST:** ```{python} #| echo: true #| code-fold: false from tensorflow.keras.datasets import mnist from tensorflow.keras.utils import to_categorical # Load MNIST (X_train, y_train), (X_test, y_test) = mnist.load_data() # Preprocess X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0 X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0 y_train_cat = to_categorical(y_train, 10) y_test_cat = to_categorical(y_test, 10) # Visualize samples fig, axes = plt.subplots(4, 10, figsize=(15, 6)) for i in range(4): for j in range(10): idx = i * 10 + j axes[i, j].imshow(X_train[idx].squeeze(), cmap='gray') axes[i, j].set_title(f'{y_train[idx]}', fontsize=11, fontweight='bold') axes[i, j].axis('off') plt.suptitle('MNIST Dataset Samples', fontweight='bold', fontsize=14) plt.tight_layout() plt.show() print(f"Training set: {X_train.shape}") print(f"Test set: {X_test.shape}") print(f"Classes: {np.unique(y_train)}") ``` ### 6.3.2 Build CNN in Keras ```{python} #| echo: true #| code-fold: false def create_mnist_cnn(): """ CNN for MNIST classification """ model = keras.Sequential([ # Convolutional Block 1 layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)), layers.Conv2D(32, (3, 3), activation='relu', padding='same'), layers.MaxPooling2D((2, 2)), layers.Dropout(0.25), # Convolutional Block 2 layers.Conv2D(64, (3, 3), activation='relu', padding='same'), layers.Conv2D(64, (3, 3), activation='relu', padding='same'), layers.MaxPooling2D((2, 2)), layers.Dropout(0.25), # Classifier layers.Flatten(), layers.Dense(512, activation='relu'), layers.Dropout(0.5), layers.Dense(10, activation='softmax') ], name='MNIST_CNN') return model # Create and compile mnist_cnn = create_mnist_cnn() mnist_cnn.compile( optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'] ) mnist_cnn.summary() ``` **Train CNN:** ```{python} #| echo: true #| code-fold: false # Training callbacks callbacks = [ keras.callbacks.EarlyStopping( monitor='val_loss', patience=5, restore_best_weights=True ), keras.callbacks.ReduceLROnPlateau( monitor='val_loss', factor=0.5, patience=3 ) ] # Train model history = mnist_cnn.fit( X_train, y_train_cat, batch_size=128, epochs=20, validation_split=0.15, callbacks=callbacks, verbose=1 ) # Evaluate test_loss, test_acc = mnist_cnn.evaluate(X_test, y_test_cat, verbose=0) print(f"\nTest Accuracy: {test_acc:.4f}") print(f"Test Loss: {test_loss:.4f}") ``` **Visualize Training:** ```{python} #| echo: true #| code-fold: false # Plot training history fig, axes = plt.subplots(1, 2, figsize=(14, 5)) # Accuracy axes[0].plot(history.history['accuracy'], label='Train', linewidth=2.5) axes[0].plot(history.history['val_accuracy'], label='Validation', linewidth=2.5) axes[0].set_title('Model Accuracy', fontweight='bold', fontsize=13) axes[0].set_xlabel('Epoch', fontsize=11) axes[0].set_ylabel('Accuracy', fontsize=11) axes[0].legend(fontsize=11) axes[0].grid(True, alpha=0.3) # Loss axes[1].plot(history.history['loss'], label='Train', linewidth=2.5) axes[1].plot(history.history['val_loss'], label='Validation', linewidth=2.5) axes[1].set_title('Model Loss', fontweight='bold', fontsize=13) axes[1].set_xlabel('Epoch', fontsize=11) axes[1].set_ylabel('Loss', fontsize=11) axes[1].legend(fontsize=11) axes[1].grid(True, alpha=0.3) plt.tight_layout() plt.show() ``` ### 6.3.3 Build CNN in PyTorch ```{python} #| echo: true #| code-fold: false import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import DataLoader, TensorDataset # Define CNN architecture class MNISTNet(nn.Module): def __init__(self): super(MNISTNet, self).__init__() # Convolutional layers self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1) self.conv2 = nn.Conv2d(32, 32, kernel_size=3, padding=1) self.conv3 = nn.Conv2d(32, 64, kernel_size=3, padding=1) self.conv4 = nn.Conv2d(64, 64, kernel_size=3, padding=1) # Pooling self.pool = nn.MaxPool2d(2, 2) # Fully connected layers self.fc1 = nn.Linear(64 * 7 * 7, 512) self.fc2 = nn.Linear(512, 10) # Dropout self.dropout1 = nn.Dropout(0.25) self.dropout2 = nn.Dropout(0.5) def forward(self, x): # Block 1 x = F.relu(self.conv1(x)) x = F.relu(self.conv2(x)) x = self.pool(x) x = self.dropout1(x) # Block 2 x = F.relu(self.conv3(x)) x = F.relu(self.conv4(x)) x = self.pool(x) x = self.dropout1(x) # Classifier x = x.view(-1, 64 * 7 * 7) x = F.relu(self.fc1(x)) x = self.dropout2(x) x = self.fc2(x) return x # Create model device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') pytorch_model = MNISTNet().to(device) print(pytorch_model) print(f"\nDevice: {device}") # Count parameters total_params = sum(p.numel() for p in pytorch_model.parameters()) trainable_params = sum(p.numel() for p in pytorch_model.parameters() if p.requires_grad) print(f"Total parameters: {total_params:,}") print(f"Trainable parameters: {trainable_params:,}") ``` **Train PyTorch CNN:** ```{python} #| echo: true #| code-fold: false # Prepare data loaders X_train_torch = torch.FloatTensor(X_train) y_train_torch = torch.LongTensor(y_train) X_test_torch = torch.FloatTensor(X_test) y_test_torch = torch.LongTensor(y_test) train_dataset = TensorDataset(X_train_torch, y_train_torch) test_dataset = TensorDataset(X_test_torch, y_test_torch) train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False) # Loss and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(pytorch_model.parameters(), lr=0.001) # Training function def train_epoch(model, loader, criterion, optimizer, device): model.train() running_loss = 0.0 correct = 0 total = 0 for inputs, labels in loader: inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() _, predicted = outputs.max(1) total += labels.size(0) correct += predicted.eq(labels).sum().item() return running_loss / len(loader), 100. * correct / total # Evaluation function def evaluate(model, loader, criterion, device): model.eval() running_loss = 0.0 correct = 0 total = 0 with torch.no_grad(): for inputs, labels in loader: inputs, labels = inputs.to(device), labels.to(device) outputs = model(inputs) loss = criterion(outputs, labels) running_loss += loss.item() _, predicted = outputs.max(1) total += labels.size(0) correct += predicted.eq(labels).sum().item() return running_loss / len(loader), 100. * correct / total # Train model epochs = 10 train_losses, train_accs = [], [] test_losses, test_accs = [], [] print("Training PyTorch CNN...") for epoch in range(epochs): train_loss, train_acc = train_epoch(pytorch_model, train_loader, criterion, optimizer, device) test_loss, test_acc = evaluate(pytorch_model, test_loader, criterion, device) train_losses.append(train_loss) train_accs.append(train_acc) test_losses.append(test_loss) test_accs.append(test_acc) print(f"Epoch {epoch+1}/{epochs}: " f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}% | " f"Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2f}%") # Plot results fig, axes = plt.subplots(1, 2, figsize=(14, 5)) axes[0].plot(train_accs, label='Train', linewidth=2.5) axes[0].plot(test_accs, label='Test', linewidth=2.5) axes[0].set_title('PyTorch CNN Accuracy', fontweight='bold', fontsize=13) axes[0].set_xlabel('Epoch', fontsize=11) axes[0].set_ylabel('Accuracy (%)', fontsize=11) axes[0].legend(fontsize=11) axes[0].grid(True, alpha=0.3) axes[1].plot(train_losses, label='Train', linewidth=2.5) axes[1].plot(test_losses, label='Test', linewidth=2.5) axes[1].set_title('PyTorch CNN Loss', fontweight='bold', fontsize=13) axes[1].set_xlabel('Epoch', fontsize=11) axes[1].set_ylabel('Loss', fontsize=11) axes[1].legend(fontsize=11) axes[1].grid(True, alpha=0.3) plt.tight_layout() plt.show() ``` ### 6.3.4 CIFAR-10: More Complex Dataset **Load CIFAR-10:** ```{python} #| echo: true #| code-fold: false from tensorflow.keras.datasets import cifar10 # Load dataset (X_train_cifar, y_train_cifar), (X_test_cifar, y_test_cifar) = cifar10.load_data() # Normalize X_train_cifar = X_train_cifar.astype('float32') / 255.0 X_test_cifar = X_test_cifar.astype('float32') / 255.0 # One-hot encode y_train_cifar_cat = to_categorical(y_train_cifar, 10) y_test_cifar_cat = to_categorical(y_test_cifar, 10) # Class names class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] # Visualize samples fig, axes = plt.subplots(4, 10, figsize=(15, 6)) for i in range(4): for j in range(10): idx = i * 10 + j axes[i, j].imshow(X_train_cifar[idx]) axes[i, j].set_title(f'{class_names[y_train_cifar[idx][0]]}', fontsize=9, fontweight='bold') axes[i, j].axis('off') plt.suptitle('CIFAR-10 Dataset Samples', fontweight='bold', fontsize=14) plt.tight_layout() plt.show() print(f"Training set: {X_train_cifar.shape}") print(f"Test set: {X_test_cifar.shape}") print(f"Classes: {class_names}") ``` **CNN for CIFAR-10:** ```{python} #| echo: true #| code-fold: false def create_cifar10_cnn(): """ Deeper CNN for CIFAR-10 classification """ model = keras.Sequential([ # Block 1 layers.Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)), layers.BatchNormalization(), layers.Conv2D(64, (3, 3), activation='relu', padding='same'), layers.BatchNormalization(), layers.MaxPooling2D((2, 2)), layers.Dropout(0.3), # Block 2 layers.Conv2D(128, (3, 3), activation='relu', padding='same'), layers.BatchNormalization(), layers.Conv2D(128, (3, 3), activation='relu', padding='same'), layers.BatchNormalization(), layers.MaxPooling2D((2, 2)), layers.Dropout(0.4), # Block 3 layers.Conv2D(256, (3, 3), activation='relu', padding='same'), layers.BatchNormalization(), layers.Conv2D(256, (3, 3), activation='relu', padding='same'), layers.BatchNormalization(), layers.MaxPooling2D((2, 2)), layers.Dropout(0.5), # Classifier layers.Flatten(), layers.Dense(512, activation='relu'), layers.BatchNormalization(), layers.Dropout(0.5), layers.Dense(10, activation='softmax') ], name='CIFAR10_CNN') return model # Create and compile cifar_cnn = create_cifar10_cnn() cifar_cnn.compile( optimizer=keras.optimizers.Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'] ) cifar_cnn.summary() ``` ## 6.4 Transfer Learning Fundamentals ### 6.4.1 Konsep Transfer Learning **Mengapa Transfer Learning?** Training CNN dari scratch membutuhkan: - **Dataset besar** (>100K images) - **Computational resources** (GPU, waktu training lama) - **Expertise** dalam architecture design dan hyperparameter tuning **Transfer Learning Solution:** ```{mermaid} graph TD A[Pre-trained Model ImageNet: 1M images, 1000 classes] --> B[Feature Extractor Lower layers: generic features] B --> C[Your Dataset Small: 1K images, custom classes] C --> D[Fine-tune Higher layers + classifier] D --> E[Specialized Model Your domain] style A fill:#99ccff style B fill:#ffcc99 style C fill:#ff9999 style D fill:#99ff99 style E fill:#cc99ff ``` ::: {.callout-tip} ## 💡 Transfer Learning Analogy Bayangkan belajar bahasa baru: - **From scratch**: Belajar huruf, kata, grammar dari nol (CNN from scratch) - **Transfer learning**: Sudah bisa bahasa Inggris, belajar bahasa Prancis lebih cepat (reuse knowledge) Lower layers CNN = general knowledge (edges, textures) Higher layers CNN = task-specific knowledge (object parts, semantics) ::: **Transfer Learning Strategies:** ```{python} #| echo: true #| code-fold: false # Visualize transfer learning strategies fig, axes = plt.subplots(1, 3, figsize=(16, 5)) strategies = [ { 'name': 'Feature Extraction\n(Freeze all)', 'frozen': [True, True, True, True], 'trainable': [False, False, False, True] }, { 'name': 'Fine-tuning\n(Last layers)', 'frozen': [True, True, False, False], 'trainable': [False, False, True, True] }, { 'name': 'Full Fine-tuning\n(All layers)', 'frozen': [False, False, False, False], 'trainable': [True, True, True, True] } ] for idx, strategy in enumerate(strategies): ax = axes[idx] # Draw layers layer_names = ['Conv Block 1', 'Conv Block 2', 'Conv Block 3', 'Classifier'] colors = ['lightcoral' if frozen else 'lightgreen' for frozen in strategy['frozen']] y_positions = [3, 2, 1, 0] for i, (name, color, trainable) in enumerate(zip(layer_names, colors, strategy['trainable'])): rect = plt.Rectangle((0, y_positions[i]), 4, 0.7, facecolor=color, edgecolor='black', linewidth=2) ax.add_patch(rect) status = 'Trainable' if trainable else 'Frozen' ax.text(2, y_positions[i] + 0.35, f'{name}\n({status})', ha='center', va='center', fontsize=10, fontweight='bold') ax.set_xlim(-0.5, 4.5) ax.set_ylim(-0.5, 4) ax.set_title(strategy['name'], fontweight='bold', fontsize=12) ax.axis('off') plt.tight_layout() plt.show() ``` ### 6.4.2 Popular Pre-trained Models **Common Pre-trained Architectures:** ```{python} #| echo: true #| code-fold: false # Summary of popular pre-trained models import pandas as pd models_info = { 'Model': ['VGG16', 'VGG19', 'ResNet50', 'ResNet101', 'InceptionV3', 'MobileNetV2', 'EfficientNetB0', 'DenseNet121'], 'Year': [2014, 2014, 2015, 2015, 2015, 2018, 2019, 2017], 'Params (M)': [138, 144, 25.6, 44.5, 23.8, 3.5, 5.3, 8.0], 'Top-1 Acc': [71.3, 71.3, 76.0, 77.6, 77.9, 71.8, 77.1, 75.0], 'Depth': [16, 19, 50, 101, 48, 53, 82, 121], 'Best For': [ 'Simple, deep', 'Simple, deeper', 'Residual connections', 'Very deep', 'Inception modules', 'Mobile/edge devices', 'Efficient scaling', 'Dense connections' ] } df_models = pd.DataFrame(models_info) print(df_models.to_string(index=False)) # Visualize comparison fig, axes = plt.subplots(1, 2, figsize=(14, 5)) # Parameters vs Accuracy axes[0].scatter(df_models['Params (M)'], df_models['Top-1 Acc'], s=200, alpha=0.6, c=range(len(df_models)), cmap='viridis') for i, model in enumerate(df_models['Model']): axes[0].annotate(model, (df_models['Params (M)'][i], df_models['Top-1 Acc'][i]), fontsize=9, fontweight='bold') axes[0].set_xlabel('Parameters (Millions)', fontsize=11, fontweight='bold') axes[0].set_ylabel('Top-1 Accuracy (%)', fontsize=11, fontweight='bold') axes[0].set_title('Parameters vs Accuracy', fontsize=13, fontweight='bold') axes[0].grid(True, alpha=0.3) # Model comparison x = np.arange(len(df_models['Model'])) axes[1].bar(x, df_models['Top-1 Acc'], alpha=0.7, color='skyblue', edgecolor='navy', linewidth=1.5) axes[1].set_xticks(x) axes[1].set_xticklabels(df_models['Model'], rotation=45, ha='right', fontsize=10) axes[1].set_ylabel('Top-1 Accuracy (%)', fontsize=11, fontweight='bold') axes[1].set_title('Model Accuracy Comparison', fontsize=13, fontweight='bold') axes[1].grid(True, alpha=0.3, axis='y') plt.tight_layout() plt.show() ``` ### 6.4.3 Transfer Learning dengan VGG16 **Load Pre-trained VGG16:** ```{python} #| echo: true #| code-fold: false from tensorflow.keras.applications import VGG16 from tensorflow.keras.applications.vgg16 import preprocess_input # Load pre-trained VGG16 (without top classifier) base_model = VGG16( weights='imagenet', include_top=False, input_shape=(224, 224, 3) ) print("VGG16 Base Model:") print(f"Total layers: {len(base_model.layers)}") print(f"Total parameters: {base_model.count_params():,}") # Display architecture base_model.summary() # Freeze base model base_model.trainable = False print(f"\nTrainable parameters after freezing: {sum([tf.size(w).numpy() for w in base_model.trainable_weights]):,}") ``` **Add Custom Classifier:** ```{python} #| echo: true #| code-fold: false def build_transfer_model(base_model, num_classes=10): """ Build transfer learning model with custom classifier """ # Create new model inputs = keras.Input(shape=(224, 224, 3)) # Preprocessing for VGG16 x = preprocess_input(inputs) # Base model (frozen) x = base_model(x, training=False) # Custom classifier x = layers.GlobalAveragePooling2D()(x) x = layers.Dense(512, activation='relu')(x) x = layers.Dropout(0.5)(x) x = layers.Dense(256, activation='relu')(x) x = layers.Dropout(0.3)(x) outputs = layers.Dense(num_classes, activation='softmax')(x) # Complete model model = keras.Model(inputs, outputs, name='VGG16_Transfer') return model # Create transfer learning model transfer_model = build_transfer_model(base_model, num_classes=10) # Compile transfer_model.compile( optimizer=keras.optimizers.Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'] ) transfer_model.summary() ``` ### 6.4.4 Transfer Learning dengan ResNet50 **Load ResNet50:** ```{python} #| echo: true #| code-fold: false from tensorflow.keras.applications import ResNet50 from tensorflow.keras.applications.resnet50 import preprocess_input as resnet_preprocess # Load ResNet50 resnet_base = ResNet50( weights='imagenet', include_top=False, input_shape=(224, 224, 3) ) # Freeze base resnet_base.trainable = False # Build transfer model def build_resnet_transfer(base_model, num_classes=10): """ ResNet50 transfer learning model """ inputs = keras.Input(shape=(224, 224, 3)) x = resnet_preprocess(inputs) x = base_model(x, training=False) # Global pooling + classifier x = layers.GlobalAveragePooling2D()(x) x = layers.BatchNormalization()(x) x = layers.Dense(512, activation='relu')(x) x = layers.Dropout(0.5)(x) x = layers.Dense(256, activation='relu')(x) x = layers.Dropout(0.3)(x) outputs = layers.Dense(num_classes, activation='softmax')(x) return keras.Model(inputs, outputs, name='ResNet50_Transfer') resnet_model = build_resnet_transfer(resnet_base) resnet_model.compile( optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'] ) print(f"ResNet50 Transfer Model - Total params: {resnet_model.count_params():,}") ``` ### 6.4.5 Transfer Learning dengan MobileNetV2 **MobileNetV2 for Edge Devices:** ```{python} #| echo: true #| code-fold: false from tensorflow.keras.applications import MobileNetV2 from tensorflow.keras.applications.mobilenet_v2 import preprocess_input as mobile_preprocess # Load MobileNetV2 mobile_base = MobileNetV2( weights='imagenet', include_top=False, input_shape=(224, 224, 3), alpha=1.0 # Width multiplier ) mobile_base.trainable = False # Build efficient model def build_mobile_transfer(base_model, num_classes=10): """ MobileNetV2 for efficient inference """ inputs = keras.Input(shape=(224, 224, 3)) x = mobile_preprocess(inputs) x = base_model(x, training=False) x = layers.GlobalAveragePooling2D()(x) x = layers.Dropout(0.2)(x) outputs = layers.Dense(num_classes, activation='softmax')(x) return keras.Model(inputs, outputs, name='MobileNetV2_Transfer') mobile_model = build_mobile_transfer(mobile_base) mobile_model.compile( optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'] ) # Compare model sizes models_comparison = { 'Model': ['VGG16', 'ResNet50', 'MobileNetV2'], 'Total Params': [ transfer_model.count_params(), resnet_model.count_params(), mobile_model.count_params() ], 'Trainable Params': [ sum([tf.size(w).numpy() for w in transfer_model.trainable_weights]), sum([tf.size(w).numpy() for w in resnet_model.trainable_weights]), sum([tf.size(w).numpy() for w in mobile_model.trainable_weights]) ] } df_comparison = pd.DataFrame(models_comparison) df_comparison['Total Params'] = df_comparison['Total Params'].apply(lambda x: f"{x:,}") df_comparison['Trainable Params'] = df_comparison['Trainable Params'].apply(lambda x: f"{x:,}") print("\nModel Size Comparison:") print(df_comparison.to_string(index=False)) ``` ## 6.5 Data Augmentation ### 6.5.1 Mengapa Data Augmentation? **Problem: Limited Training Data** Deep learning membutuhkan banyak data, tapi: - Labeling expensive dan time-consuming - Real-world constraints (privacy, accessibility) - Class imbalance **Solution: Data Augmentation** Generate synthetic training examples dengan transformasi yang preserve semantic meaning. ::: {.callout-tip} ## 💡 Data Augmentation Benefits 1. **Increase dataset size**: 10× - 100× more training samples 2. **Reduce overfitting**: Model learns robust features 3. **Improve generalization**: Better performance on unseen data 4. **Handle variations**: Rotation, scaling, lighting, etc. 5. **Class balancing**: Oversample minority classes ::: ### 6.5.2 Common Augmentation Techniques **Geometric Transformations:** ```{python} #| echo: true #| code-fold: false from tensorflow.keras.preprocessing.image import ImageDataGenerator import matplotlib.pyplot as plt # Load sample image sample_img = X_train_cifar[0:1] # Shape: (1, 32, 32, 3) # Define augmentation augmentation_configs = [ {'rotation_range': 30, 'name': 'Rotation (±30°)'}, {'width_shift_range': 0.2, 'height_shift_range': 0.2, 'name': 'Shift (20%)'}, {'zoom_range': 0.3, 'name': 'Zoom (±30%)'}, {'horizontal_flip': True, 'name': 'Horizontal Flip'}, {'brightness_range': [0.5, 1.5], 'name': 'Brightness'}, ] fig, axes = plt.subplots(2, 6, figsize=(16, 6)) # Original axes[0, 0].imshow(sample_img[0]) axes[0, 0].set_title('Original', fontweight='bold', fontsize=11) axes[0, 0].axis('off') axes[1, 0].imshow(sample_img[0]) axes[1, 0].set_title('Original', fontweight='bold', fontsize=11) axes[1, 0].axis('off') # Apply augmentations for idx, config in enumerate(augmentation_configs): name = config.pop('name') datagen = ImageDataGenerator(**config) # Generate augmented images aug_iter = datagen.flow(sample_img, batch_size=1) aug_img = next(aug_iter)[0] row = idx // 5 col = (idx % 5) + 1 axes[row, col].imshow(aug_img) axes[row, col].set_title(name, fontweight='bold', fontsize=11) axes[row, col].axis('off') plt.suptitle('Data Augmentation Techniques', fontweight='bold', fontsize=14) plt.tight_layout() plt.show() ``` **Advanced Augmentations:** ```{python} #| echo: true #| code-fold: false # Combined augmentation pipeline def create_augmentation_pipeline(): """ Comprehensive augmentation for training """ datagen = ImageDataGenerator( rotation_range=20, width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True, zoom_range=0.2, shear_range=0.15, brightness_range=[0.8, 1.2], fill_mode='nearest' ) return datagen # Generate multiple augmented samples augmentation_gen = create_augmentation_pipeline() fig, axes = plt.subplots(4, 8, figsize=(16, 8)) axes = axes.flatten() # Generate 32 augmented versions aug_iter = augmentation_gen.flow(sample_img, batch_size=1) for i in range(32): aug_img = next(aug_iter)[0] axes[i].imshow(aug_img) axes[i].axis('off') plt.suptitle('32 Augmented Versions of Same Image', fontweight='bold', fontsize=14) plt.tight_layout() plt.show() ``` ### 6.5.3 Augmentation dalam Training **Keras with ImageDataGenerator:** ```{python} #| echo: true #| code-fold: false # Training with augmentation train_datagen = ImageDataGenerator( rotation_range=15, width_shift_range=0.1, height_shift_range=0.1, horizontal_flip=True, zoom_range=0.1 ) # Fit to training data (untuk standardization jika needed) train_datagen.fit(X_train_cifar) # Validation data (no augmentation) val_datagen = ImageDataGenerator() # Training with data augmentation print("Training dengan Data Augmentation:") print("Setiap epoch: model melihat slightly different versions dari data") print("Efek: Regularization, reduced overfitting, better generalization") ``` ### 6.5.4 Modern Augmentation Libraries **Albumentations: Advanced Augmentation:** ```{python} #| echo: true #| code-fold: false # Note: Install with: pip install albumentations try: import albumentations as A # Define transformation pipeline transform = A.Compose([ A.RandomRotate90(p=0.5), A.Flip(p=0.5), A.OneOf([ A.GaussNoise(p=1), A.GaussianBlur(p=1), A.MotionBlur(p=1), ], p=0.3), A.OneOf([ A.OpticalDistortion(p=1), A.GridDistortion(p=1), ], p=0.3), A.OneOf([ A.HueSaturationValue(p=1), A.RGBShift(p=1), A.RandomBrightnessContrast(p=1), ], p=0.5), ]) # Apply to sample sample_np = (X_train_cifar[0] * 255).astype(np.uint8) fig, axes = plt.subplots(2, 4, figsize=(14, 7)) axes = axes.flatten() for i in range(8): augmented = transform(image=sample_np)['image'] axes[i].imshow(augmented) axes[i].set_title(f'Augmented {i+1}', fontweight='bold', fontsize=11) axes[i].axis('off') plt.suptitle('Albumentations: Advanced Augmentations', fontweight='bold', fontsize=14) plt.tight_layout() plt.show() print("Albumentations offers:") print("- Fast C++ backend") print("- 70+ transformation types") print("- Bounding box & keypoint support") print("- Easy pipeline composition") except ImportError: print("Albumentations not installed. Install with: pip install albumentations") ``` ### 6.5.5 Augmentation Best Practices **Guidelines:** ```{mermaid} %%| fig-cap: Data Augmentation Best Practices flowchart TD A["Data Augmentation\nBest Practices"] --> B["Domain-Appropriate\nTransformations"] A --> C["Reasonable\nParameter Ranges"] A --> D["No Semantic\nChanges"] A --> E["Balance with\nReal Data"] B --> B1["Medical: No flips for\nlaterality-dependent organs"] B --> B2["Text/OCR: No rotation\nthat breaks readability"] C --> C1["Rotation: ±15-30°\nnot ±180°"] C --> C2["Brightness: 0.7-1.3\nnot 0.1-5.0"] D --> D1["Cat to flipped cat ✓"] D --> D2["Cat to unrecognizable ✗"] E --> E1["Augmentation complements\nnot replaces real data"] style A fill:#99ccff style B fill:#ffcc99 style C fill:#ffcc99 style D fill:#ffcc99 style E fill:#ffcc99 ``` **Common Mistakes:** ```{python} #| echo: true #| code-fold: false # Demonstrate bad augmentations sample_img_bad = X_train_cifar[100:101] fig, axes = plt.subplots(2, 4, figsize=(14, 7)) # Good augmentations good_configs = [ {'rotation_range': 15, 'name': 'Good: Rotation ±15°'}, {'zoom_range': 0.2, 'name': 'Good: Zoom ±20%'}, {'brightness_range': [0.8, 1.2], 'name': 'Good: Brightness 0.8-1.2'}, {'horizontal_flip': True, 'name': 'Good: Horizontal Flip'}, ] for idx, config in enumerate(good_configs): name = config.pop('name') datagen = ImageDataGenerator(**config) aug_iter = datagen.flow(sample_img_bad, batch_size=1) aug_img = next(aug_iter)[0] axes[0, idx].imshow(aug_img) axes[0, idx].set_title(name, fontweight='bold', fontsize=10, color='green') axes[0, idx].axis('off') # Bad augmentations bad_configs = [ {'rotation_range': 180, 'name': 'Bad: Rotation ±180°'}, {'zoom_range': 0.8, 'name': 'Bad: Zoom ±80%'}, {'brightness_range': [0.2, 3.0], 'name': 'Bad: Brightness 0.2-3.0'}, {'width_shift_range': 0.8, 'name': 'Bad: Shift 80%'}, ] for idx, config in enumerate(bad_configs): name = config.pop('name') datagen = ImageDataGenerator(**config) aug_iter = datagen.flow(sample_img_bad, batch_size=1) aug_img = next(aug_iter)[0] axes[1, idx].imshow(np.clip(aug_img, 0, 1)) axes[1, idx].set_title(name, fontweight='bold', fontsize=10, color='red') axes[1, idx].axis('off') plt.suptitle('Good vs Bad Augmentation Parameters', fontweight='bold', fontsize=14) plt.tight_layout() plt.show() ``` ## 6.6 Fine-tuning Strategies ### 6.6.1 Kapan Melakukan Fine-tuning? **Decision Framework:** ```{mermaid} %%| fig-cap: Transfer Learning Decision Framework - Pemilihan Strategi Fine-tuning Berdasarkan Ukuran Dataset dan Kesamaan Domain flowchart TD A["Start: Transfer Learning"] --> B{"Dataset Size?"} B -->|"Small\n<1K images"| C["Feature Extraction\nFreeze all base layers"] B -->|"Medium\n1K-10K images"| D["Partial Fine-tuning\nUnfreeze last few layers"] B -->|"Large\n>10K images"| E["Full Fine-tuning\nUnfreeze all layers"] C --> F{"Domain Similarity?"} D --> F E --> F F -->|"Similar\nto ImageNet"| G["Lower learning rate\n0.0001 - 0.00001"] F -->|"Different\nfrom ImageNet"| H["Higher learning rate\n0.001 - 0.0001"] style A fill:#99ccff style C fill:#ffcc99 style D fill:#ffcc99 style E fill:#ffcc99 style G fill:#99ff99 style H fill:#99ff99 ``` ### 6.6.2 Gradual Unfreezing **Strategy: Unfreeze Layers Progressively** ```{python} #| echo: true #| code-fold: false # Demonstrate gradual unfreezing def show_trainable_status(model, title): """ Display which layers are trainable """ print(f"\n{title}") print("=" * 60) print(f"{'Layer Name':<30} {'Trainable':<15} {'Parameters':<15}") print("-" * 60) total_params = 0 trainable_params = 0 for layer in model.layers: params = layer.count_params() total_params += params if layer.trainable: trainable_params += params print(f"{layer.name:<30} {str(layer.trainable):<15} {params:>10,}") print("-" * 60) print(f"Total parameters: {total_params:,}") print(f"Trainable parameters: {trainable_params:,} ({100*trainable_params/total_params:.1f}%)") print(f"Non-trainable parameters: {total_params-trainable_params:,}") # Stage 1: All frozen base_model_stage1 = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3)) base_model_stage1.trainable = False model_stage1 = build_transfer_model(base_model_stage1) show_trainable_status(model_stage1, "STAGE 1: Feature Extraction (All Frozen)") # Stage 2: Unfreeze last conv block base_model_stage2 = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3)) base_model_stage2.trainable = True # Freeze all except last block for layer in base_model_stage2.layers[:-4]: layer.trainable = False model_stage2 = build_transfer_model(base_model_stage2) show_trainable_status(model_stage2, "STAGE 2: Partial Fine-tuning (Last Block)") # Stage 3: Unfreeze all base_model_stage3 = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3)) base_model_stage3.trainable = True model_stage3 = build_transfer_model(base_model_stage3) show_trainable_status(model_stage3, "STAGE 3: Full Fine-tuning (All Unfrozen)") ``` ### 6.6.3 Learning Rate Strategies **Discriminative Learning Rates:** ```{python} #| echo: true #| code-fold: false # Different learning rates for different layers def compile_with_discriminative_lr(model, base_lr=0.001): """ Apply different learning rates to different layer groups """ # Separate layer groups base_layers = [] top_layers = [] for layer in model.layers: if 'vgg16' in layer.name or 'resnet' in layer.name: base_layers.extend(layer.trainable_weights) else: top_layers.extend(layer.trainable_weights) # Create optimizer with discriminative learning rates optimizer = keras.optimizers.Adam(learning_rate=base_lr) # Note: Keras doesn't support per-layer LR directly # In practice, use multiple optimizers or manual gradient application return optimizer # Learning rate schedule def create_lr_schedule(): """ Learning rate schedule for fine-tuning """ def schedule(epoch, lr): if epoch < 5: return 0.001 # Warm-up elif epoch < 15: return 0.0001 # Fine-tuning else: return 0.00001 # Fine fine-tuning return keras.callbacks.LearningRateScheduler(schedule) # Visualize learning rate schedule epochs = 25 lrs = [] for epoch in range(epochs): if epoch < 5: lrs.append(0.001) elif epoch < 15: lrs.append(0.0001) else: lrs.append(0.00001) plt.figure(figsize=(10, 5)) plt.plot(range(epochs), lrs, linewidth=3, marker='o', markersize=8) plt.title('Learning Rate Schedule for Fine-tuning', fontweight='bold', fontsize=14) plt.xlabel('Epoch', fontsize=12) plt.ylabel('Learning Rate', fontsize=12) plt.yscale('log') plt.grid(True, alpha=0.3) # Add annotations plt.axvspan(0, 5, alpha=0.2, color='red', label='Stage 1: Warm-up') plt.axvspan(5, 15, alpha=0.2, color='orange', label='Stage 2: Fine-tuning') plt.axvspan(15, 25, alpha=0.2, color='green', label='Stage 3: Fine fine-tuning') plt.legend(fontsize=11) plt.tight_layout() plt.show() ``` ### 6.6.4 Complete Fine-tuning Pipeline **End-to-End Example:** ```{python} #| echo: true #| code-fold: false # Complete fine-tuning workflow def fine_tuning_pipeline(X_train, y_train, X_val, y_val, num_classes=10): """ Complete fine-tuning pipeline dengan best practices """ # Stage 1: Feature extraction print("\n" + "="*70) print("STAGE 1: FEATURE EXTRACTION (5 epochs)") print("="*70) base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3)) base_model.trainable = False model = build_transfer_model(base_model, num_classes) model.compile( optimizer=keras.optimizers.Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'] ) # Train classifier only history_stage1 = model.fit( X_train, y_train, batch_size=32, epochs=5, validation_data=(X_val, y_val), verbose=1 ) # Stage 2: Fine-tuning print("\n" + "="*70) print("STAGE 2: FINE-TUNING LAST LAYERS (10 epochs)") print("="*70) # Unfreeze last conv block base_model.trainable = True for layer in base_model.layers[:-4]: layer.trainable = False # Recompile with lower learning rate model.compile( optimizer=keras.optimizers.Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'] ) # Continue training history_stage2 = model.fit( X_train, y_train, batch_size=32, epochs=10, validation_data=(X_val, y_val), verbose=1 ) return model, history_stage1, history_stage2 # Note: This would be run with actual data print("Fine-tuning pipeline defined.") print("\nBest Practices Applied:") print("1. Two-stage training: feature extraction → fine-tuning") print("2. Gradual unfreezing: top layers first") print("3. Learning rate reduction: 0.001 → 0.0001") print("4. Batch size consideration: smaller for fine-tuning") ``` ### 6.6.5 Monitoring Fine-tuning **Key Metrics to Track:** ```{python} #| echo: true #| code-fold: false # Simulate training history for visualization np.random.seed(42) epochs = 20 stage1_epochs = 5 # Stage 1: Feature extraction train_acc_s1 = np.linspace(0.50, 0.75, stage1_epochs) + np.random.normal(0, 0.02, stage1_epochs) val_acc_s1 = np.linspace(0.48, 0.72, stage1_epochs) + np.random.normal(0, 0.02, stage1_epochs) train_loss_s1 = np.linspace(1.2, 0.7, stage1_epochs) + np.random.normal(0, 0.05, stage1_epochs) val_loss_s1 = np.linspace(1.3, 0.75, stage1_epochs) + np.random.normal(0, 0.05, stage1_epochs) # Stage 2: Fine-tuning train_acc_s2 = np.linspace(0.75, 0.92, epochs-stage1_epochs) + np.random.normal(0, 0.01, epochs-stage1_epochs) val_acc_s2 = np.linspace(0.72, 0.88, epochs-stage1_epochs) + np.random.normal(0, 0.015, epochs-stage1_epochs) train_loss_s2 = np.linspace(0.7, 0.25, epochs-stage1_epochs) + np.random.normal(0, 0.03, epochs-stage1_epochs) val_loss_s2 = np.linspace(0.75, 0.35, epochs-stage1_epochs) + np.random.normal(0, 0.04, epochs-stage1_epochs) # Combine train_acc = np.concatenate([train_acc_s1, train_acc_s2]) val_acc = np.concatenate([val_acc_s1, val_acc_s2]) train_loss = np.concatenate([train_loss_s1, train_loss_s2]) val_loss = np.concatenate([val_loss_s1, val_loss_s2]) # Visualize fig, axes = plt.subplots(2, 2, figsize=(14, 10)) # Accuracy axes[0, 0].plot(train_acc, label='Train', linewidth=2.5, color='blue') axes[0, 0].plot(val_acc, label='Validation', linewidth=2.5, color='orange') axes[0, 0].axvline(x=stage1_epochs-0.5, color='red', linestyle='--', linewidth=2, label='Fine-tuning starts') axes[0, 0].set_title('Accuracy over Training', fontweight='bold', fontsize=13) axes[0, 0].set_xlabel('Epoch', fontsize=11) axes[0, 0].set_ylabel('Accuracy', fontsize=11) axes[0, 0].legend(fontsize=11) axes[0, 0].grid(True, alpha=0.3) axes[0, 0].text(2, 0.85, 'Stage 1:\nFeature\nExtraction', fontsize=10, bbox=dict(boxstyle='round', facecolor='wheat')) axes[0, 0].text(12, 0.85, 'Stage 2:\nFine-tuning', fontsize=10, bbox=dict(boxstyle='round', facecolor='lightgreen')) # Loss axes[0, 1].plot(train_loss, label='Train', linewidth=2.5, color='blue') axes[0, 1].plot(val_loss, label='Validation', linewidth=2.5, color='orange') axes[0, 1].axvline(x=stage1_epochs-0.5, color='red', linestyle='--', linewidth=2, label='Fine-tuning starts') axes[0, 1].set_title('Loss over Training', fontweight='bold', fontsize=13) axes[0, 1].set_xlabel('Epoch', fontsize=11) axes[0, 1].set_ylabel('Loss', fontsize=11) axes[0, 1].legend(fontsize=11) axes[0, 1].grid(True, alpha=0.3) # Train-Val Gap (Overfitting indicator) acc_gap = train_acc - val_acc axes[1, 0].plot(acc_gap, linewidth=2.5, color='red') axes[1, 0].axhline(y=0.05, color='green', linestyle='--', linewidth=2, label='Acceptable gap') axes[1, 0].axvline(x=stage1_epochs-0.5, color='red', linestyle='--', linewidth=2) axes[1, 0].set_title('Train-Val Accuracy Gap (Overfitting)', fontweight='bold', fontsize=13) axes[1, 0].set_xlabel('Epoch', fontsize=11) axes[1, 0].set_ylabel('Accuracy Gap', fontsize=11) axes[1, 0].legend(fontsize=11) axes[1, 0].grid(True, alpha=0.3) # Learning Rate Schedule lr_schedule = np.concatenate([np.full(stage1_epochs, 0.001), np.full(epochs-stage1_epochs, 0.0001)]) axes[1, 1].plot(lr_schedule, linewidth=3, marker='o', markersize=8, color='purple') axes[1, 1].set_title('Learning Rate Schedule', fontweight='bold', fontsize=13) axes[1, 1].set_xlabel('Epoch', fontsize=11) axes[1, 1].set_ylabel('Learning Rate', fontsize=11) axes[1, 1].set_yscale('log') axes[1, 1].grid(True, alpha=0.3) plt.tight_layout() plt.show() print("\nFine-tuning Monitoring Checklist:") print("✓ Training accuracy improving steadily") print("✓ Validation accuracy following training (not diverging)") print("✓ Train-val gap reasonable (<5-10%)") print("✓ Loss decreasing consistently") print("✓ Learning rate reduced at fine-tuning stage") ``` ## 6.7 Review & Exercises ### 6.7.1 Rangkuman Bab **Key Concepts:** 1. **CNN Architecture** - Convolutional layers: local connectivity, parameter sharing - Pooling layers: downsampling, translation invariance - Hierarchical feature learning: edges → parts → objects 2. **Building CNNs** - MNIST: simple grayscale images - CIFAR-10: color images, more complex - Architecture design: depth, width, skip connections 3. **Transfer Learning** - Pre-trained models: VGG, ResNet, MobileNet - Feature extraction: freeze base, train classifier - Domain adaptation: ImageNet → custom domain 4. **Data Augmentation** - Geometric: rotation, shift, flip, zoom - Color: brightness, contrast, saturation - Best practices: domain-appropriate, reasonable ranges 5. **Fine-tuning** - Gradual unfreezing: top layers first - Discriminative learning rates: lower for base - Two-stage training: feature extraction → fine-tuning ### 6.7.2 Soal Review (15 Pertanyaan) ::: {.callout-note} ## 📝 Review Questions **Conceptual Understanding:** 1. Jelaskan mengapa CNN lebih efektif untuk image data dibandingkan fully-connected MLP. Diskusikan aspek local connectivity, parameter sharing, dan translation invariance. 2. Apa perbedaan antara max pooling dan average pooling? Kapan Anda akan memilih satu dibanding yang lain? 3. Dalam arsitektur CNN, mengapa biasanya jumlah filters meningkat (32 → 64 → 128) sementara spatial dimensions menurun (224 → 112 → 56)? 4. Jelaskan konsep "hierarchical feature learning" dalam CNN. Berikan contoh features yang dipelajari di different layers. 5. Apa yang dimaksud dengan "receptive field" dalam CNN? Bagaimana depth network mempengaruhi receptive field? **Transfer Learning:** 6. Kapan sebaiknya menggunakan transfer learning vs training CNN from scratch? Diskusikan trade-offs dalam hal dataset size, computational resources, dan domain similarity. 7. Jelaskan perbedaan antara "feature extraction" dan "fine-tuning" dalam transfer learning. Kapan masing-masing approach appropriate? 8. Mengapa learning rate untuk fine-tuning biasanya lebih kecil (0.0001) dibanding training from scratch (0.001)? 9. Apa itu "catastrophic forgetting" dalam fine-tuning? Bagaimana cara mencegahnya? 10. Bandingkan VGG16, ResNet50, dan MobileNetV2 dalam hal architecture, parameters, dan use cases. **Data Augmentation:** 11. Jelaskan mengapa data augmentation efektif untuk reducing overfitting. Apa limitasinya? 12. Berikan contoh transformations yang TIDAK appropriate untuk: (a) medical X-ray images, (b) document OCR, (c) face recognition. 13. Apa perbedaan antara "online" (on-the-fly) dan "offline" (pre-computed) augmentation? Apa advantages masing-masing? **Practical Applications:** 14. Desain arsitektur CNN untuk mobile device dengan constraints: model size <10MB, inference time <100ms. Jelaskan design choices Anda. 15. Anda memiliki dataset 500 images dengan 10 classes untuk medical image classification. Outline complete workflow: data preparation, model selection, training strategy, evaluation. ::: ### 6.7.3 Coding Exercises (5 Soal) ::: {.callout-warning} ## 💻 Coding Exercises **Exercise 1: Build Custom CNN Architecture** Build dan train CNN untuk CIFAR-10 dengan requirements: - Minimal 3 convolutional blocks - Batch normalization setelah setiap Conv layer - Dropout untuk regularization - Data augmentation - Achieve >70% test accuracy ```python # Your code here def create_custom_cnn(): # TODO: Implement architecture pass # Train and evaluate ``` **Exercise 2: Transfer Learning Comparison** Compare 3 pre-trained models (VGG16, ResNet50, MobileNetV2) untuk CIFAR-10: - Implement transfer learning untuk masing-masing - Train selama 10 epochs - Compare: accuracy, training time, model size, inference speed - Visualize results dengan comparative plots ```python # Your code here models_to_compare = ['VGG16', 'ResNet50', 'MobileNetV2'] # TODO: Implement comparison framework ``` **Exercise 3: Data Augmentation Study** Investigate effect of data augmentation: 1. Train CNN tanpa augmentation 2. Train dengan augmentation (rotation, flip, zoom) 3. Train dengan aggressive augmentation 4. Compare: training curves, test accuracy, overfitting 5. Visualize augmented samples ```python # Your code here augmentation_configs = [ None, # No augmentation 'standard', # Standard augmentation 'aggressive' # Aggressive augmentation ] # TODO: Implement augmentation study ``` **Exercise 4: Fine-tuning Pipeline** Implement complete fine-tuning pipeline: 1. Load pre-trained model (pilihan bebas) 2. Stage 1: Feature extraction (5 epochs) 3. Stage 2: Fine-tune last conv block (10 epochs) 4. Stage 3: Full fine-tuning (5 epochs) 5. Track dan visualize metrics tiap stage 6. Compare dengan training from scratch ```python # Your code here def fine_tuning_stages(base_model, X_train, y_train, X_val, y_val): # TODO: Implement 3-stage fine-tuning pass ``` **Exercise 5: CNN Visualization** Implement CNN interpretation techniques: 1. Visualize learned filters (first conv layer) 2. Generate feature maps untuk sample images 3. Create activation heatmaps (Grad-CAM style) 4. Analyze: what patterns does the network learn? 5. Compare feature maps dari different layers ```python # Your code here def visualize_cnn_internals(model, image): # TODO: Implement visualization # - Filters # - Feature maps # - Activation heatmaps pass ``` ::: ### 6.7.4 Further Reading & Resources **Books:** - "Deep Learning for Computer Vision" - Rajalingappaa Shanmugamani - "Computer Vision: Algorithms and Applications" - Richard Szeliski - "Hands-On Transfer Learning with Python" - Dipanjan Sarkar **Papers (Must-Read):** - AlexNet: "ImageNet Classification with Deep CNNs" (Krizhevsky et al., 2012) - VGGNet: "Very Deep Convolutional Networks" (Simonyan & Zisserman, 2014) - ResNet: "Deep Residual Learning" (He et al., 2015) - MobileNet: "MobileNets: Efficient CNNs" (Howard et al., 2017) - EfficientNet: "Rethinking Model Scaling" (Tan & Le, 2019) **Online Resources:** - CS231n: Convolutional Neural Networks (Stanford) - Fast.ai Practical Deep Learning for Coders - PyImageSearch Blog - Papers with Code - Computer Vision **Tools & Libraries:** - TensorFlow / Keras - PyTorch / torchvision - Albumentations (data augmentation) - OpenCV (image processing) - Weights & Biases (experiment tracking) --- ::: {.callout-tip} ## 🎓 Next Chapter Preview **Bab 7: Recurrent Neural Networks & LSTM** Setelah menguasai CNN untuk spatial data (images), selanjutnya kita akan explore: - RNN untuk sequential data (time series, text) - LSTM & GRU: handling long-term dependencies - Bidirectional RNNs - Sequence-to-sequence models - Applications: time series forecasting, text generation Sequential data membutuhkan architecture yang bisa "remember" - mari kita explore recurrent networks! :::