Bab 6: Convolutional Neural Networks & Transfer Learning

Computer Vision dan Deep Learning untuk Image Recognition

Bab 6: Convolutional Neural Networks & Transfer Learning

🎯 Hasil Pembelajaran (Learning Outcomes)

Setelah mempelajari bab ini, Anda akan mampu:

  1. Memahami arsitektur CNN dan komponen fundamentalnya (convolutional layers, pooling, filters)
  2. Mengimplementasikan CNN dari scratch menggunakan Keras dan PyTorch untuk klasifikasi gambar
  3. Menerapkan transfer learning dengan pre-trained models (VGG16, ResNet, MobileNet)
  4. Menggunakan data augmentation techniques untuk meningkatkan generalisasi model
  5. Melakukan fine-tuning strategies untuk mengadaptasi pre-trained models ke domain spesifik
  6. Mengevaluasi performa CNN dan membandingkan berbagai arsitektur untuk computer vision tasks

6.1 Pengantar Computer Vision dan CNN

6.1.1 Mengapa Specialized Architecture untuk Images?

Di Chapter 5, kita belajar MLP (Multilayer Perceptron) yang bekerja dengan flatten input. Namun, untuk gambar, pendekatan ini memiliki masalah fundamental:

Problem dengan MLP untuk Images:

Bayangkan gambar 224×224×3 (RGB):

  • Input neurons: 224 × 224 × 3 = 150,528 neurons
  • Hidden layer (1000 neurons): 150,528 × 1000 = 150 juta parameters
  • Hanya untuk satu layer!

Masalah:

  1. Overfitting: Terlalu banyak parameters untuk data terbatas
  2. Computational cost: Training sangat lambat
  3. Loss of spatial structure: Flatten menghilangkan informasi lokasi
  4. No translation invariance: Objek di posisi berbeda = pattern berbeda
💡 Intuisi CNN

CNN mengatasi masalah di atas dengan:

  • Local connectivity: Setiap neuron hanya connect ke region kecil
  • Parameter sharing: Filter yang sama digunakan di seluruh gambar
  • Hierarchical learning: Layer awal deteksi edges, layer dalam deteksi objects
  • Translation invariance: Deteksi pattern di posisi manapun

Hasil: CNN dengan 99% lebih sedikit parameters daripada fully-connected MLP!

6.1.2 Evolution of Computer Vision

Era Pre-Deep Learning (sebelum 2012):

  • Manual feature extraction (SIFT, HOG, SURF)
  • Classical ML (SVM, Random Forest) pada engineered features
  • ImageNet 2011: Error rate ~26%

Deep Learning Revolution (2012-sekarang):

  • AlexNet (2012): Error 15.3% - first CNN breakthrough
  • VGGNet (2014): Deeper networks (16-19 layers)
  • ResNet (2015): Skip connections, 152 layers, error 3.57%
  • EfficientNet (2019): Optimal scaling, SOTA performance
  • Vision Transformers (2020): Attention mechanisms untuk vision
📊 Computer Vision Applications Today

Industry Applications:

  • Healthcare: Medical imaging analysis, tumor detection
  • Autonomous Vehicles: Object detection, lane recognition
  • Retail: Visual search, inventory management
  • Security: Face recognition, surveillance systems
  • Manufacturing: Quality control, defect detection
  • Agriculture: Crop disease identification, yield prediction

6.1.3 Image Fundamentals untuk Deep Learning

Representasi Digital Image:

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import requests
from io import BytesIO

# Create simple synthetic image untuk demonstrasi
img_array = np.random.randint(0, 255, (64, 64, 3), dtype=np.uint8)

# Visualisasi RGB channels
fig, axes = plt.subplots(1, 4, figsize=(16, 4))

# Original image
axes[0].imshow(img_array)
axes[0].set_title('Original Image\nShape: (64, 64, 3)', fontweight='bold')
axes[0].axis('off')

# Individual channels
channel_names = ['Red Channel', 'Green Channel', 'Blue Channel']
colors = ['Reds', 'Greens', 'Blues']

for i in range(3):
    axes[i+1].imshow(img_array[:, :, i], cmap=colors[i])
    axes[i+1].set_title(f'{channel_names[i]}\nShape: (64, 64)', fontweight='bold')
    axes[i+1].axis('off')

plt.tight_layout()
plt.show()

# Print image properties
print(f"Image shape: {img_array.shape}")
print(f"Data type: {img_array.dtype}")
print(f"Value range: [{img_array.min()}, {img_array.max()}]")
print(f"Total pixels: {img_array.shape[0] * img_array.shape[1]}")
print(f"Total values: {img_array.size}")

Image Preprocessing untuk CNN:

import tensorflow as tf
from tensorflow import keras

# Preprocessing pipeline
def preprocess_image(image_array):
    """
    Standard preprocessing untuk CNN
    """
    # 1. Normalize ke [0, 1]
    normalized = image_array.astype('float32') / 255.0

    # 2. Standardize (zero mean, unit variance) - optional
    mean = np.array([0.485, 0.456, 0.406])  # ImageNet statistics
    std = np.array([0.229, 0.224, 0.225])
    standardized = (normalized - mean) / std

    return normalized, standardized

# Demonstrasi
normalized, standardized = preprocess_image(img_array)

print("Original range:", img_array.min(), "-", img_array.max())
print("Normalized range:", normalized.min(), "-", normalized.max())
print("Standardized range:", standardized.min(), "-", standardized.max())

Spatial Relationships dalam Images:

Code
graph TD
    A[Raw Pixels] --> B[Local Patterns<br/>edges, corners, textures]
    B --> C[Object Parts<br/>eyes, wheels, windows]
    C --> D[Objects<br/>faces, cars, buildings]
    D --> E[Scenes<br/>street, office, nature]

    style A fill:#ff9999
    style B fill:#ffcc99
    style C fill:#ffff99
    style D fill:#99ff99
    style E fill:#99ccff

graph TD
    A[Raw Pixels] --> B[Local Patterns<br/>edges, corners, textures]
    B --> C[Object Parts<br/>eyes, wheels, windows]
    C --> D[Objects<br/>faces, cars, buildings]
    D --> E[Scenes<br/>street, office, nature]

    style A fill:#ff9999
    style B fill:#ffcc99
    style C fill:#ffff99
    style D fill:#99ff99
    style E fill:#99ccff

6.1.4 Challenges dalam Image Classification

Common Challenges:

  1. Intra-class Variation: Objek sama, penampilan berbeda
  2. Viewpoint Variation: Sudut pandang berbeda
  3. Illumination Changes: Pencahayaan berbeda
  4. Occlusion: Objek terhalang
  5. Background Clutter: Latar belakang complex
  6. Deformation: Perubahan bentuk (pose)
# Demonstrasi transformations yang CNN harus handle
from scipy import ndimage

# Create simple shape
shape = np.zeros((100, 100))
shape[30:70, 30:70] = 1

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Original
axes[0, 0].imshow(shape, cmap='gray')
axes[0, 0].set_title('Original', fontweight='bold')

# Rotation
rotated = ndimage.rotate(shape, 45, reshape=False)
axes[0, 1].imshow(rotated, cmap='gray')
axes[0, 1].set_title('Rotation (45°)', fontweight='bold')

# Scaling
from scipy.ndimage import zoom
scaled = zoom(shape, 0.7)
padded_scaled = np.zeros_like(shape)
y_offset = (shape.shape[0] - scaled.shape[0]) // 2
x_offset = (shape.shape[1] - scaled.shape[1]) // 2
padded_scaled[y_offset:y_offset+scaled.shape[0],
              x_offset:x_offset+scaled.shape[1]] = scaled
axes[0, 2].imshow(padded_scaled, cmap='gray')
axes[0, 2].set_title('Scaling (0.7x)', fontweight='bold')

# Translation
translated = np.roll(shape, (20, 20), axis=(0, 1))
axes[1, 0].imshow(translated, cmap='gray')
axes[1, 0].set_title('Translation', fontweight='bold')

# Noise
noisy = shape + np.random.normal(0, 0.1, shape.shape)
axes[1, 1].imshow(noisy, cmap='gray')
axes[1, 1].set_title('Gaussian Noise', fontweight='bold')

# Blur
blurred = ndimage.gaussian_filter(shape, sigma=3)
axes[1, 2].imshow(blurred, cmap='gray')
axes[1, 2].set_title('Gaussian Blur', fontweight='bold')

for ax in axes.flat:
    ax.axis('off')

plt.tight_layout()
plt.show()

6.2 Arsitektur CNN: Building Blocks

6.2.1 Convolutional Layer: Core Operation

Konsep Convolution:

Convolution adalah operasi matematika yang mengaplikasikan filter (kernel) ke image untuk mengekstrak features.

Convolution Operation:

import numpy as np
import matplotlib.pyplot as plt

# Simple 5x5 image
image = np.array([
    [1, 2, 3, 2, 1],
    [2, 3, 4, 3, 2],
    [3, 4, 5, 4, 3],
    [2, 3, 4, 3, 2],
    [1, 2, 3, 2, 1]
])

# Edge detection filter (vertical edges)
kernel_vertical = np.array([
    [-1, 0, 1],
    [-2, 0, 2],
    [-1, 0, 1]
])

# Edge detection filter (horizontal edges)
kernel_horizontal = np.array([
    [-1, -2, -1],
    [ 0,  0,  0],
    [ 1,  2,  1]
])

# Manual convolution operation
def convolve2d(image, kernel):
    """
    Simple 2D convolution (without padding)
    """
    h, w = image.shape
    kh, kw = kernel.shape
    output_h = h - kh + 1
    output_w = w - kw + 1
    output = np.zeros((output_h, output_w))

    for i in range(output_h):
        for j in range(output_w):
            region = image[i:i+kh, j:j+kw]
            output[i, j] = np.sum(region * kernel)

    return output

# Apply filters
vertical_edges = convolve2d(image, kernel_vertical)
horizontal_edges = convolve2d(image, kernel_horizontal)

# Visualize
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

axes[0, 0].imshow(image, cmap='gray')
axes[0, 0].set_title('Original Image\n(5×5)', fontweight='bold', fontsize=12)

axes[0, 1].imshow(kernel_vertical, cmap='seismic', vmin=-2, vmax=2)
axes[0, 1].set_title('Vertical Edge Filter\n(Sobel-X)', fontweight='bold', fontsize=12)
for i in range(3):
    for j in range(3):
        axes[0, 1].text(j, i, f'{kernel_vertical[i, j]}',
                       ha='center', va='center', fontsize=14, fontweight='bold')

axes[0, 2].imshow(vertical_edges, cmap='gray')
axes[0, 2].set_title('Vertical Edges\n(3×3 output)', fontweight='bold', fontsize=12)

axes[1, 0].imshow(image, cmap='gray')
axes[1, 0].set_title('Original Image\n(5×5)', fontweight='bold', fontsize=12)

axes[1, 1].imshow(kernel_horizontal, cmap='seismic', vmin=-2, vmax=2)
axes[1, 1].set_title('Horizontal Edge Filter\n(Sobel-Y)', fontweight='bold', fontsize=12)
for i in range(3):
    for j in range(3):
        axes[1, 1].text(j, i, f'{kernel_horizontal[i, j]}',
                       ha='center', va='center', fontsize=14, fontweight='bold')

axes[1, 2].imshow(horizontal_edges, cmap='gray')
axes[1, 2].set_title('Horizontal Edges\n(3×3 output)', fontweight='bold', fontsize=12)

for ax in axes.flat:
    ax.axis('off')

plt.tight_layout()
plt.show()

print("\nConvolution Mathematics:")
print("Input shape:", image.shape)
print("Kernel shape:", kernel_vertical.shape)
print("Output shape:", vertical_edges.shape)
print("\nOutput size formula: (H - K + 1) × (W - K + 1)")
print(f"= ({image.shape[0]} - {kernel_vertical.shape[0]} + 1) × ({image.shape[1]} - {kernel_vertical.shape[1]} + 1)")
print(f"= {vertical_edges.shape}")

Convolutional Layer Parameters:

# Visualize convolutional layer parameters
def visualize_conv_params():
    """
    Visualisasi parameter convolutional layer
    """
    fig, axes = plt.subplots(2, 2, figsize=(14, 12))

    # 1. Filter Size Effect
    ax = axes[0, 0]
    filters = {
        '3×3': np.random.randn(3, 3),
        '5×5': np.random.randn(5, 5),
        '7×7': np.random.randn(7, 7)
    }

    positions = [0, 4, 9]
    for idx, (name, filt) in enumerate(filters.items()):
        ax.imshow(filt, cmap='coolwarm', extent=[positions[idx], positions[idx]+filt.shape[0],
                                                  0, filt.shape[0]])
        ax.text(positions[idx]+filt.shape[0]/2, -1, name,
               ha='center', fontweight='bold', fontsize=11)

    ax.set_xlim(-1, 16)
    ax.set_ylim(-2, 8)
    ax.set_title('Filter Size (Kernel Size)', fontweight='bold', fontsize=13)
    ax.axis('off')

    # 2. Stride Effect
    ax = axes[0, 1]
    input_size = 7
    kernel_size = 3
    strides = [1, 2]

    for s_idx, stride in enumerate(strides):
        y_offset = s_idx * 8

        # Draw input
        for i in range(input_size):
            for j in range(input_size):
                rect = plt.Rectangle((j, y_offset+i), 1, 1,
                                    fill=False, edgecolor='blue', linewidth=1.5)
                ax.add_patch(rect)

        # Draw kernel positions
        output_size = (input_size - kernel_size) // stride + 1
        for i in range(output_size):
            for j in range(output_size):
                y_pos = y_offset + i * stride
                x_pos = j * stride
                rect = plt.Rectangle((x_pos, y_pos), kernel_size, kernel_size,
                                   fill=False, edgecolor='red', linewidth=2.5)
                ax.add_patch(rect)

        ax.text(-1.5, y_offset+3, f'Stride={stride}\nOutput: {output_size}×{output_size}',
               ha='right', fontweight='bold', fontsize=10)

    ax.set_xlim(-3, 8)
    ax.set_ylim(-1, 17)
    ax.set_title('Stride Effect on Output Size', fontweight='bold', fontsize=13)
    ax.axis('off')

    # 3. Padding Types
    ax = axes[1, 0]
    padding_types = {
        'Valid (no padding)': {'input': 5, 'kernel': 3, 'padding': 0},
        'Same (zero padding)': {'input': 5, 'kernel': 3, 'padding': 1}
    }

    for p_idx, (name, params) in enumerate(padding_types.items()):
        y_offset = p_idx * 8
        inp_size = params['input']
        pad = params['padding']

        # Draw padding
        if pad > 0:
            for i in range(inp_size + 2*pad):
                for j in range(inp_size + 2*pad):
                    if i < pad or i >= inp_size+pad or j < pad or j >= inp_size+pad:
                        rect = plt.Rectangle((j, y_offset+i), 1, 1,
                                           fill=True, facecolor='lightgray',
                                           edgecolor='gray', linewidth=1)
                        ax.add_patch(rect)

        # Draw input
        for i in range(inp_size):
            for j in range(inp_size):
                rect = plt.Rectangle((j+pad, y_offset+i+pad), 1, 1,
                                   fill=True, facecolor='lightblue',
                                   edgecolor='blue', linewidth=1.5)
                ax.add_patch(rect)

        ax.text(-1.5, y_offset+3, name, ha='right', fontweight='bold', fontsize=10)

    ax.set_xlim(-3, 8)
    ax.set_ylim(-1, 17)
    ax.set_title('Padding: Valid vs Same', fontweight='bold', fontsize=13)
    ax.axis('off')

    # 4. Number of Filters
    ax = axes[1, 1]
    num_filters = [32, 64, 128]

    for f_idx, nf in enumerate(num_filters):
        y_pos = f_idx * 3

        # Draw input
        rect = plt.Rectangle((0, y_pos), 2, 2, fill=True,
                           facecolor='lightblue', edgecolor='blue', linewidth=2)
        ax.add_patch(rect)

        # Draw filters
        for i in range(min(nf, 10)):
            x_pos = 3 + i * 0.3
            rect = plt.Rectangle((x_pos, y_pos), 0.25, 2,
                               fill=True, facecolor='orange',
                               edgecolor='red', linewidth=1)
            ax.add_patch(rect)

        ax.text(7, y_pos+1, f'{nf} filters', ha='left',
               va='center', fontweight='bold', fontsize=11)

    ax.set_xlim(-1, 11)
    ax.set_ylim(-1, 10)
    ax.set_title('Number of Filters (Feature Maps)', fontweight='bold', fontsize=13)
    ax.axis('off')

    plt.tight_layout()
    plt.show()

visualize_conv_params()

Output Size Calculation:

def calculate_output_size(input_size, kernel_size, stride=1, padding=0):
    """
    Formula: O = (I - K + 2P) / S + 1

    O: Output size
    I: Input size
    K: Kernel size
    P: Padding
    S: Stride
    """
    output = (input_size - kernel_size + 2*padding) // stride + 1
    return output

# Examples
examples = [
    {'input': 224, 'kernel': 3, 'stride': 1, 'padding': 0},
    {'input': 224, 'kernel': 3, 'stride': 1, 'padding': 1},
    {'input': 224, 'kernel': 5, 'stride': 2, 'padding': 2},
    {'input': 56, 'kernel': 3, 'stride': 2, 'padding': 1},
]

print("Convolutional Layer Output Size Calculations:")
print("=" * 80)
print(f"{'Input':<10} {'Kernel':<10} {'Stride':<10} {'Padding':<10} {'Output':<10}")
print("-" * 80)

for ex in examples:
    output = calculate_output_size(ex['input'], ex['kernel'], ex['stride'], ex['padding'])
    print(f"{ex['input']:<10} {ex['kernel']:<10} {ex['stride']:<10} {ex['padding']:<10} {output:<10}")

6.2.2 Activation Functions untuk CNN

ReLU: Standard Choice untuk CNN

# Visualize activation functions
x = np.linspace(-5, 5, 1000)

activations = {
    'ReLU': lambda x: np.maximum(0, x),
    'Leaky ReLU': lambda x: np.where(x > 0, x, 0.01*x),
    'ELU': lambda x: np.where(x > 0, x, np.exp(x) - 1),
    'Swish': lambda x: x * (1 / (1 + np.exp(-x)))
}

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

for idx, (name, func) in enumerate(activations.items()):
    y = func(x)
    axes[idx].plot(x, y, linewidth=2.5, color='blue')
    axes[idx].axhline(y=0, color='k', linewidth=0.8, linestyle='--', alpha=0.3)
    axes[idx].axvline(x=0, color='k', linewidth=0.8, linestyle='--', alpha=0.3)
    axes[idx].grid(True, alpha=0.3)
    axes[idx].set_title(f'{name} Activation', fontweight='bold', fontsize=13)
    axes[idx].set_xlabel('Input', fontsize=11)
    axes[idx].set_ylabel('Output', fontsize=11)

    # Add formula
    formulas = {
        'ReLU': r'$f(x) = \max(0, x)$',
        'Leaky ReLU': r'$f(x) = \max(0.01x, x)$',
        'ELU': r'$f(x) = x$ if $x>0$ else $e^x-1$',
        'Swish': r'$f(x) = x \cdot \sigma(x)$'
    }
    axes[idx].text(0.05, 0.95, formulas[name], transform=axes[idx].transAxes,
                  fontsize=11, verticalalignment='top',
                  bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.tight_layout()
plt.show()
💡 Mengapa ReLU?

Advantages:

  • Computational efficiency: Simple max(0, x)
  • Sparse activation: ~50% neurons activated
  • No vanishing gradient: Gradient 1 untuk x > 0
  • Empirically works well: State-of-the-art results

Disadvantages:

  • Dying ReLU: Neurons can become inactive (always 0)
  • Not zero-centered: Can slow convergence

Solution: Leaky ReLU, ELU untuk addressing dying ReLU problem

6.2.3 Pooling Layers: Downsampling

Pooling Operation:

# Demonstrate pooling operations
feature_map = np.array([
    [1, 3, 2, 4],
    [5, 6, 1, 3],
    [2, 1, 4, 7],
    [3, 2, 5, 8]
])

def max_pooling(input_map, pool_size=2):
    """Max pooling operation"""
    h, w = input_map.shape
    output_h = h // pool_size
    output_w = w // pool_size
    output = np.zeros((output_h, output_w))

    for i in range(output_h):
        for j in range(output_w):
            region = input_map[i*pool_size:(i+1)*pool_size,
                              j*pool_size:(j+1)*pool_size]
            output[i, j] = np.max(region)

    return output

def average_pooling(input_map, pool_size=2):
    """Average pooling operation"""
    h, w = input_map.shape
    output_h = h // pool_size
    output_w = w // pool_size
    output = np.zeros((output_h, output_w))

    for i in range(output_h):
        for j in range(output_w):
            region = input_map[i*pool_size:(i+1)*pool_size,
                              j*pool_size:(j+1)*pool_size]
            output[i, j] = np.mean(region)

    return output

max_pooled = max_pooling(feature_map)
avg_pooled = average_pooling(feature_map)

# Visualize
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Original
im0 = axes[0].imshow(feature_map, cmap='YlOrRd', vmin=0, vmax=8)
axes[0].set_title('Feature Map\n(4×4)', fontweight='bold', fontsize=13)
for i in range(4):
    for j in range(4):
        axes[0].text(j, i, f'{feature_map[i, j]:.0f}',
                    ha='center', va='center', fontsize=14, fontweight='bold')
axes[0].set_xticks([])
axes[0].set_yticks([])

# Max pooling
im1 = axes[1].imshow(max_pooled, cmap='YlOrRd', vmin=0, vmax=8)
axes[1].set_title('Max Pooling\n(2×2 pool)', fontweight='bold', fontsize=13)
for i in range(2):
    for j in range(2):
        axes[1].text(j, i, f'{max_pooled[i, j]:.0f}',
                    ha='center', va='center', fontsize=14, fontweight='bold')
axes[1].set_xticks([])
axes[1].set_yticks([])

# Average pooling
im2 = axes[2].imshow(avg_pooled, cmap='YlOrRd', vmin=0, vmax=8)
axes[2].set_title('Average Pooling\n(2×2 pool)', fontweight='bold', fontsize=13)
for i in range(2):
    for j in range(2):
        axes[2].text(j, i, f'{avg_pooled[i, j]:.1f}',
                    ha='center', va='center', fontsize=14, fontweight='bold')
axes[2].set_xticks([])
axes[2].set_yticks([])

plt.tight_layout()
plt.show()

print("Pooling Reduces Spatial Dimensions:")
print(f"Input: {feature_map.shape} → Max Pool: {max_pooled.shape}")
print(f"Size reduction: {feature_map.size}{max_pooled.size} (75% reduction)")

Pooling Benefits:

Code
graph LR
    A[Pooling Layer] --> B[Dimensionality<br/>Reduction]
    A --> C[Translation<br/>Invariance]
    A --> D[Computational<br/>Efficiency]
    A --> E[Overfitting<br/>Prevention]

    style A fill:#99ccff
    style B fill:#ffcc99
    style C fill:#ffcc99
    style D fill:#ffcc99
    style E fill:#ffcc99

graph LR
    A[Pooling Layer] --> B[Dimensionality<br/>Reduction]
    A --> C[Translation<br/>Invariance]
    A --> D[Computational<br/>Efficiency]
    A --> E[Overfitting<br/>Prevention]

    style A fill:#99ccff
    style B fill:#ffcc99
    style C fill:#ffcc99
    style D fill:#ffcc99
    style E fill:#ffcc99

6.2.4 Complete CNN Architecture

Typical CNN Architecture:

Code
flowchart TD
    A["Input\n224x224x3"] --> B["Conv1\n3x3, 32"]
    B --> C["ReLU"]
    C --> D["Pool1\n2x2"]
    D --> E["Conv2\n3x3, 64"]
    E --> F["ReLU"]
    F --> G["Pool2\n2x2"]
    G --> H["Conv3\n3x3, 128"]
    H --> I["ReLU"]
    I --> J["Pool3\n2x2"]
    J --> K["Flatten"]
    K --> L["Dense\n512"]
    L --> M["ReLU"]
    M --> N["Dropout"]
    N --> O["Output\n10 classes"]

    style A fill:#ff9999
    style B fill:#ffcc99
    style E fill:#ffcc99
    style H fill:#ffcc99
    style K fill:#99ff99
    style L fill:#99ccff
    style O fill:#cc99ff

flowchart TD
    A["Input\n224x224x3"] --> B["Conv1\n3x3, 32"]
    B --> C["ReLU"]
    C --> D["Pool1\n2x2"]
    D --> E["Conv2\n3x3, 64"]
    E --> F["ReLU"]
    F --> G["Pool2\n2x2"]
    G --> H["Conv3\n3x3, 128"]
    H --> I["ReLU"]
    I --> J["Pool3\n2x2"]
    J --> K["Flatten"]
    K --> L["Dense\n512"]
    L --> M["ReLU"]
    M --> N["Dropout"]
    N --> O["Output\n10 classes"]

    style A fill:#ff9999
    style B fill:#ffcc99
    style E fill:#ffcc99
    style H fill:#ffcc99
    style K fill:#99ff99
    style L fill:#99ccff
    style O fill:#cc99ff

Complete CNN Architecture

Implementation: Simple CNN

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Build simple CNN
def create_simple_cnn(input_shape=(32, 32, 3), num_classes=10):
    """
    Simple CNN for image classification
    """
    model = keras.Sequential([
        # Block 1
        layers.Conv2D(32, (3, 3), activation='relu', padding='same',
                     input_shape=input_shape, name='conv1'),
        layers.BatchNormalization(name='bn1'),
        layers.MaxPooling2D((2, 2), name='pool1'),

        # Block 2
        layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv2'),
        layers.BatchNormalization(name='bn2'),
        layers.MaxPooling2D((2, 2), name='pool2'),

        # Block 3
        layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv3'),
        layers.BatchNormalization(name='bn3'),
        layers.MaxPooling2D((2, 2), name='pool3'),

        # Classifier
        layers.Flatten(name='flatten'),
        layers.Dense(512, activation='relu', name='fc1'),
        layers.Dropout(0.5, name='dropout'),
        layers.Dense(num_classes, activation='softmax', name='output')
    ], name='SimpleCNN')

    return model

# Create model
model = create_simple_cnn()

# Display architecture
model.summary()

# Calculate parameters
total_params = model.count_params()
trainable_params = sum([tf.size(w).numpy() for w in model.trainable_weights])

print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

Visualize Feature Maps:

# Visualize what CNN learns
from tensorflow.keras.preprocessing import image

# Create dummy input
dummy_input = np.random.rand(1, 32, 32, 3).astype('float32')

# Get intermediate layer outputs
layer_outputs = [layer.output for layer in model.layers[:6]]  # First 6 layers
activation_model = keras.Model(inputs=model.input, outputs=layer_outputs)

# Get activations
activations = activation_model.predict(dummy_input, verbose=0)

# Visualize first convolutional layer
fig, axes = plt.subplots(4, 8, figsize=(16, 8))
axes = axes.flatten()

conv1_activation = activations[0][0]  # First conv layer output

for i in range(min(32, len(axes))):
    if i < conv1_activation.shape[-1]:
        axes[i].imshow(conv1_activation[:, :, i], cmap='viridis')
        axes[i].set_title(f'Filter {i+1}', fontsize=9)
    axes[i].axis('off')

plt.suptitle('Conv1 Feature Maps (32 filters)', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

6.3 Building CNN from Scratch

6.3.1 Dataset: MNIST Handwritten Digits

Load and Explore MNIST:

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load MNIST
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Preprocess
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0

y_train_cat = to_categorical(y_train, 10)
y_test_cat = to_categorical(y_test, 10)

# Visualize samples
fig, axes = plt.subplots(4, 10, figsize=(15, 6))
for i in range(4):
    for j in range(10):
        idx = i * 10 + j
        axes[i, j].imshow(X_train[idx].squeeze(), cmap='gray')
        axes[i, j].set_title(f'{y_train[idx]}', fontsize=11, fontweight='bold')
        axes[i, j].axis('off')

plt.suptitle('MNIST Dataset Samples', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"Classes: {np.unique(y_train)}")

6.3.2 Build CNN in Keras

def create_mnist_cnn():
    """
    CNN for MNIST classification
    """
    model = keras.Sequential([
        # Convolutional Block 1
        layers.Conv2D(32, (3, 3), activation='relu', padding='same',
                     input_shape=(28, 28, 1)),
        layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),

        # Convolutional Block 2
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),

        # Classifier
        layers.Flatten(),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(10, activation='softmax')
    ], name='MNIST_CNN')

    return model

# Create and compile
mnist_cnn = create_mnist_cnn()
mnist_cnn.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

mnist_cnn.summary()

Train CNN:

# Training callbacks
callbacks = [
    keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=5,
        restore_best_weights=True
    ),
    keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=3
    )
]

# Train model
history = mnist_cnn.fit(
    X_train, y_train_cat,
    batch_size=128,
    epochs=20,
    validation_split=0.15,
    callbacks=callbacks,
    verbose=1
)

# Evaluate
test_loss, test_acc = mnist_cnn.evaluate(X_test, y_test_cat, verbose=0)
print(f"\nTest Accuracy: {test_acc:.4f}")
print(f"Test Loss: {test_loss:.4f}")

Visualize Training:

# Plot training history
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy
axes[0].plot(history.history['accuracy'], label='Train', linewidth=2.5)
axes[0].plot(history.history['val_accuracy'], label='Validation', linewidth=2.5)
axes[0].set_title('Model Accuracy', fontweight='bold', fontsize=13)
axes[0].set_xlabel('Epoch', fontsize=11)
axes[0].set_ylabel('Accuracy', fontsize=11)
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# Loss
axes[1].plot(history.history['loss'], label='Train', linewidth=2.5)
axes[1].plot(history.history['val_loss'], label='Validation', linewidth=2.5)
axes[1].set_title('Model Loss', fontweight='bold', fontsize=13)
axes[1].set_xlabel('Epoch', fontsize=11)
axes[1].set_ylabel('Loss', fontsize=11)
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

6.3.3 Build CNN in PyTorch

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Define CNN architecture
class MNISTNet(nn.Module):
    def __init__(self):
        super(MNISTNet, self).__init__()

        # Convolutional layers
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 32, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(64, 64, kernel_size=3, padding=1)

        # Pooling
        self.pool = nn.MaxPool2d(2, 2)

        # Fully connected layers
        self.fc1 = nn.Linear(64 * 7 * 7, 512)
        self.fc2 = nn.Linear(512, 10)

        # Dropout
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)

    def forward(self, x):
        # Block 1
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        x = self.dropout1(x)

        # Block 2
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv4(x))
        x = self.pool(x)
        x = self.dropout1(x)

        # Classifier
        x = x.view(-1, 64 * 7 * 7)
        x = F.relu(self.fc1(x))
        x = self.dropout2(x)
        x = self.fc2(x)

        return x

# Create model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
pytorch_model = MNISTNet().to(device)

print(pytorch_model)
print(f"\nDevice: {device}")

# Count parameters
total_params = sum(p.numel() for p in pytorch_model.parameters())
trainable_params = sum(p.numel() for p in pytorch_model.parameters() if p.requires_grad)
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

Train PyTorch CNN:

# Prepare data loaders
X_train_torch = torch.FloatTensor(X_train)
y_train_torch = torch.LongTensor(y_train)
X_test_torch = torch.FloatTensor(X_test)
y_test_torch = torch.LongTensor(y_test)

train_dataset = TensorDataset(X_train_torch, y_train_torch)
test_dataset = TensorDataset(X_test_torch, y_test_torch)

train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(pytorch_model.parameters(), lr=0.001)

# Training function
def train_epoch(model, loader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, labels in loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()

    return running_loss / len(loader), 100. * correct / total

# Evaluation function
def evaluate(model, loader, criterion, device):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for inputs, labels in loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            running_loss += loss.item()
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()

    return running_loss / len(loader), 100. * correct / total

# Train model
epochs = 10
train_losses, train_accs = [], []
test_losses, test_accs = [], []

print("Training PyTorch CNN...")
for epoch in range(epochs):
    train_loss, train_acc = train_epoch(pytorch_model, train_loader,
                                       criterion, optimizer, device)
    test_loss, test_acc = evaluate(pytorch_model, test_loader,
                                   criterion, device)

    train_losses.append(train_loss)
    train_accs.append(train_acc)
    test_losses.append(test_loss)
    test_accs.append(test_acc)

    print(f"Epoch {epoch+1}/{epochs}: "
          f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}% | "
          f"Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2f}%")

# Plot results
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].plot(train_accs, label='Train', linewidth=2.5)
axes[0].plot(test_accs, label='Test', linewidth=2.5)
axes[0].set_title('PyTorch CNN Accuracy', fontweight='bold', fontsize=13)
axes[0].set_xlabel('Epoch', fontsize=11)
axes[0].set_ylabel('Accuracy (%)', fontsize=11)
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

axes[1].plot(train_losses, label='Train', linewidth=2.5)
axes[1].plot(test_losses, label='Test', linewidth=2.5)
axes[1].set_title('PyTorch CNN Loss', fontweight='bold', fontsize=13)
axes[1].set_xlabel('Epoch', fontsize=11)
axes[1].set_ylabel('Loss', fontsize=11)
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

6.3.4 CIFAR-10: More Complex Dataset

Load CIFAR-10:

from tensorflow.keras.datasets import cifar10

# Load dataset
(X_train_cifar, y_train_cifar), (X_test_cifar, y_test_cifar) = cifar10.load_data()

# Normalize
X_train_cifar = X_train_cifar.astype('float32') / 255.0
X_test_cifar = X_test_cifar.astype('float32') / 255.0

# One-hot encode
y_train_cifar_cat = to_categorical(y_train_cifar, 10)
y_test_cifar_cat = to_categorical(y_test_cifar, 10)

# Class names
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

# Visualize samples
fig, axes = plt.subplots(4, 10, figsize=(15, 6))
for i in range(4):
    for j in range(10):
        idx = i * 10 + j
        axes[i, j].imshow(X_train_cifar[idx])
        axes[i, j].set_title(f'{class_names[y_train_cifar[idx][0]]}',
                           fontsize=9, fontweight='bold')
        axes[i, j].axis('off')

plt.suptitle('CIFAR-10 Dataset Samples', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

print(f"Training set: {X_train_cifar.shape}")
print(f"Test set: {X_test_cifar.shape}")
print(f"Classes: {class_names}")

CNN for CIFAR-10:

def create_cifar10_cnn():
    """
    Deeper CNN for CIFAR-10 classification
    """
    model = keras.Sequential([
        # Block 1
        layers.Conv2D(64, (3, 3), activation='relu', padding='same',
                     input_shape=(32, 32, 3)),
        layers.BatchNormalization(),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.3),

        # Block 2
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.4),

        # Block 3
        layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.5),

        # Classifier
        layers.Flatten(),
        layers.Dense(512, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.5),
        layers.Dense(10, activation='softmax')
    ], name='CIFAR10_CNN')

    return model

# Create and compile
cifar_cnn = create_cifar10_cnn()
cifar_cnn.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

cifar_cnn.summary()

6.4 Transfer Learning Fundamentals

6.4.1 Konsep Transfer Learning

Mengapa Transfer Learning?

Training CNN dari scratch membutuhkan:

  • Dataset besar (>100K images)
  • Computational resources (GPU, waktu training lama)
  • Expertise dalam architecture design dan hyperparameter tuning

Transfer Learning Solution:

Code
graph TD
    A[Pre-trained Model<br/>ImageNet: 1M images, 1000 classes] --> B[Feature Extractor<br/>Lower layers: generic features]
    B --> C[Your Dataset<br/>Small: 1K images, custom classes]
    C --> D[Fine-tune<br/>Higher layers + classifier]
    D --> E[Specialized Model<br/>Your domain]

    style A fill:#99ccff
    style B fill:#ffcc99
    style C fill:#ff9999
    style D fill:#99ff99
    style E fill:#cc99ff

graph TD
    A[Pre-trained Model<br/>ImageNet: 1M images, 1000 classes] --> B[Feature Extractor<br/>Lower layers: generic features]
    B --> C[Your Dataset<br/>Small: 1K images, custom classes]
    C --> D[Fine-tune<br/>Higher layers + classifier]
    D --> E[Specialized Model<br/>Your domain]

    style A fill:#99ccff
    style B fill:#ffcc99
    style C fill:#ff9999
    style D fill:#99ff99
    style E fill:#cc99ff

💡 Transfer Learning Analogy

Bayangkan belajar bahasa baru:

  • From scratch: Belajar huruf, kata, grammar dari nol (CNN from scratch)
  • Transfer learning: Sudah bisa bahasa Inggris, belajar bahasa Prancis lebih cepat (reuse knowledge)

Lower layers CNN = general knowledge (edges, textures) Higher layers CNN = task-specific knowledge (object parts, semantics)

Transfer Learning Strategies:

# Visualize transfer learning strategies
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

strategies = [
    {
        'name': 'Feature Extraction\n(Freeze all)',
        'frozen': [True, True, True, True],
        'trainable': [False, False, False, True]
    },
    {
        'name': 'Fine-tuning\n(Last layers)',
        'frozen': [True, True, False, False],
        'trainable': [False, False, True, True]
    },
    {
        'name': 'Full Fine-tuning\n(All layers)',
        'frozen': [False, False, False, False],
        'trainable': [True, True, True, True]
    }
]

for idx, strategy in enumerate(strategies):
    ax = axes[idx]

    # Draw layers
    layer_names = ['Conv Block 1', 'Conv Block 2', 'Conv Block 3', 'Classifier']
    colors = ['lightcoral' if frozen else 'lightgreen'
              for frozen in strategy['frozen']]

    y_positions = [3, 2, 1, 0]
    for i, (name, color, trainable) in enumerate(zip(layer_names, colors,
                                                     strategy['trainable'])):
        rect = plt.Rectangle((0, y_positions[i]), 4, 0.7,
                            facecolor=color, edgecolor='black', linewidth=2)
        ax.add_patch(rect)

        status = 'Trainable' if trainable else 'Frozen'
        ax.text(2, y_positions[i] + 0.35, f'{name}\n({status})',
               ha='center', va='center', fontsize=10, fontweight='bold')

    ax.set_xlim(-0.5, 4.5)
    ax.set_ylim(-0.5, 4)
    ax.set_title(strategy['name'], fontweight='bold', fontsize=12)
    ax.axis('off')

plt.tight_layout()
plt.show()

6.4.3 Transfer Learning dengan VGG16

Load Pre-trained VGG16:

from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input

# Load pre-trained VGG16 (without top classifier)
base_model = VGG16(
    weights='imagenet',
    include_top=False,
    input_shape=(224, 224, 3)
)

print("VGG16 Base Model:")
print(f"Total layers: {len(base_model.layers)}")
print(f"Total parameters: {base_model.count_params():,}")

# Display architecture
base_model.summary()

# Freeze base model
base_model.trainable = False

print(f"\nTrainable parameters after freezing: {sum([tf.size(w).numpy() for w in base_model.trainable_weights]):,}")

Add Custom Classifier:

def build_transfer_model(base_model, num_classes=10):
    """
    Build transfer learning model with custom classifier
    """
    # Create new model
    inputs = keras.Input(shape=(224, 224, 3))

    # Preprocessing for VGG16
    x = preprocess_input(inputs)

    # Base model (frozen)
    x = base_model(x, training=False)

    # Custom classifier
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dense(512, activation='relu')(x)
    x = layers.Dropout(0.5)(x)
    x = layers.Dense(256, activation='relu')(x)
    x = layers.Dropout(0.3)(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    # Complete model
    model = keras.Model(inputs, outputs, name='VGG16_Transfer')

    return model

# Create transfer learning model
transfer_model = build_transfer_model(base_model, num_classes=10)

# Compile
transfer_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.0001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

transfer_model.summary()

6.4.4 Transfer Learning dengan ResNet50

Load ResNet50:

from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input as resnet_preprocess

# Load ResNet50
resnet_base = ResNet50(
    weights='imagenet',
    include_top=False,
    input_shape=(224, 224, 3)
)

# Freeze base
resnet_base.trainable = False

# Build transfer model
def build_resnet_transfer(base_model, num_classes=10):
    """
    ResNet50 transfer learning model
    """
    inputs = keras.Input(shape=(224, 224, 3))
    x = resnet_preprocess(inputs)
    x = base_model(x, training=False)

    # Global pooling + classifier
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.BatchNormalization()(x)
    x = layers.Dense(512, activation='relu')(x)
    x = layers.Dropout(0.5)(x)
    x = layers.Dense(256, activation='relu')(x)
    x = layers.Dropout(0.3)(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    return keras.Model(inputs, outputs, name='ResNet50_Transfer')

resnet_model = build_resnet_transfer(resnet_base)
resnet_model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print(f"ResNet50 Transfer Model - Total params: {resnet_model.count_params():,}")

6.4.5 Transfer Learning dengan MobileNetV2

MobileNetV2 for Edge Devices:

from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input as mobile_preprocess

# Load MobileNetV2
mobile_base = MobileNetV2(
    weights='imagenet',
    include_top=False,
    input_shape=(224, 224, 3),
    alpha=1.0  # Width multiplier
)

mobile_base.trainable = False

# Build efficient model
def build_mobile_transfer(base_model, num_classes=10):
    """
    MobileNetV2 for efficient inference
    """
    inputs = keras.Input(shape=(224, 224, 3))
    x = mobile_preprocess(inputs)
    x = base_model(x, training=False)

    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dropout(0.2)(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    return keras.Model(inputs, outputs, name='MobileNetV2_Transfer')

mobile_model = build_mobile_transfer(mobile_base)
mobile_model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Compare model sizes
models_comparison = {
    'Model': ['VGG16', 'ResNet50', 'MobileNetV2'],
    'Total Params': [
        transfer_model.count_params(),
        resnet_model.count_params(),
        mobile_model.count_params()
    ],
    'Trainable Params': [
        sum([tf.size(w).numpy() for w in transfer_model.trainable_weights]),
        sum([tf.size(w).numpy() for w in resnet_model.trainable_weights]),
        sum([tf.size(w).numpy() for w in mobile_model.trainable_weights])
    ]
}

df_comparison = pd.DataFrame(models_comparison)
df_comparison['Total Params'] = df_comparison['Total Params'].apply(lambda x: f"{x:,}")
df_comparison['Trainable Params'] = df_comparison['Trainable Params'].apply(lambda x: f"{x:,}")
print("\nModel Size Comparison:")
print(df_comparison.to_string(index=False))

6.5 Data Augmentation

6.5.1 Mengapa Data Augmentation?

Problem: Limited Training Data

Deep learning membutuhkan banyak data, tapi:

  • Labeling expensive dan time-consuming
  • Real-world constraints (privacy, accessibility)
  • Class imbalance

Solution: Data Augmentation

Generate synthetic training examples dengan transformasi yang preserve semantic meaning.

💡 Data Augmentation Benefits
  1. Increase dataset size: 10× - 100× more training samples
  2. Reduce overfitting: Model learns robust features
  3. Improve generalization: Better performance on unseen data
  4. Handle variations: Rotation, scaling, lighting, etc.
  5. Class balancing: Oversample minority classes

6.5.2 Common Augmentation Techniques

Geometric Transformations:

from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt

# Load sample image
sample_img = X_train_cifar[0:1]  # Shape: (1, 32, 32, 3)

# Define augmentation
augmentation_configs = [
    {'rotation_range': 30, 'name': 'Rotation (±30°)'},
    {'width_shift_range': 0.2, 'height_shift_range': 0.2, 'name': 'Shift (20%)'},
    {'zoom_range': 0.3, 'name': 'Zoom (±30%)'},
    {'horizontal_flip': True, 'name': 'Horizontal Flip'},
    {'brightness_range': [0.5, 1.5], 'name': 'Brightness'},
]

fig, axes = plt.subplots(2, 6, figsize=(16, 6))

# Original
axes[0, 0].imshow(sample_img[0])
axes[0, 0].set_title('Original', fontweight='bold', fontsize=11)
axes[0, 0].axis('off')

axes[1, 0].imshow(sample_img[0])
axes[1, 0].set_title('Original', fontweight='bold', fontsize=11)
axes[1, 0].axis('off')

# Apply augmentations
for idx, config in enumerate(augmentation_configs):
    name = config.pop('name')
    datagen = ImageDataGenerator(**config)

    # Generate augmented images
    aug_iter = datagen.flow(sample_img, batch_size=1)
    aug_img = next(aug_iter)[0]

    row = idx // 5
    col = (idx % 5) + 1

    axes[row, col].imshow(aug_img)
    axes[row, col].set_title(name, fontweight='bold', fontsize=11)
    axes[row, col].axis('off')

plt.suptitle('Data Augmentation Techniques', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

Advanced Augmentations:

# Combined augmentation pipeline
def create_augmentation_pipeline():
    """
    Comprehensive augmentation for training
    """
    datagen = ImageDataGenerator(
        rotation_range=20,
        width_shift_range=0.2,
        height_shift_range=0.2,
        horizontal_flip=True,
        zoom_range=0.2,
        shear_range=0.15,
        brightness_range=[0.8, 1.2],
        fill_mode='nearest'
    )
    return datagen

# Generate multiple augmented samples
augmentation_gen = create_augmentation_pipeline()

fig, axes = plt.subplots(4, 8, figsize=(16, 8))
axes = axes.flatten()

# Generate 32 augmented versions
aug_iter = augmentation_gen.flow(sample_img, batch_size=1)

for i in range(32):
    aug_img = next(aug_iter)[0]
    axes[i].imshow(aug_img)
    axes[i].axis('off')

plt.suptitle('32 Augmented Versions of Same Image', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

6.5.3 Augmentation dalam Training

Keras with ImageDataGenerator:

# Training with augmentation
train_datagen = ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    zoom_range=0.1
)

# Fit to training data (untuk standardization jika needed)
train_datagen.fit(X_train_cifar)

# Validation data (no augmentation)
val_datagen = ImageDataGenerator()

# Training with data augmentation
print("Training dengan Data Augmentation:")
print("Setiap epoch: model melihat slightly different versions dari data")
print("Efek: Regularization, reduced overfitting, better generalization")

6.5.4 Modern Augmentation Libraries

Albumentations: Advanced Augmentation:

# Note: Install with: pip install albumentations
try:
    import albumentations as A

    # Define transformation pipeline
    transform = A.Compose([
        A.RandomRotate90(p=0.5),
        A.Flip(p=0.5),
        A.OneOf([
            A.GaussNoise(p=1),
            A.GaussianBlur(p=1),
            A.MotionBlur(p=1),
        ], p=0.3),
        A.OneOf([
            A.OpticalDistortion(p=1),
            A.GridDistortion(p=1),
        ], p=0.3),
        A.OneOf([
            A.HueSaturationValue(p=1),
            A.RGBShift(p=1),
            A.RandomBrightnessContrast(p=1),
        ], p=0.5),
    ])

    # Apply to sample
    sample_np = (X_train_cifar[0] * 255).astype(np.uint8)

    fig, axes = plt.subplots(2, 4, figsize=(14, 7))
    axes = axes.flatten()

    for i in range(8):
        augmented = transform(image=sample_np)['image']
        axes[i].imshow(augmented)
        axes[i].set_title(f'Augmented {i+1}', fontweight='bold', fontsize=11)
        axes[i].axis('off')

    plt.suptitle('Albumentations: Advanced Augmentations', fontweight='bold', fontsize=14)
    plt.tight_layout()
    plt.show()

    print("Albumentations offers:")
    print("- Fast C++ backend")
    print("- 70+ transformation types")
    print("- Bounding box & keypoint support")
    print("- Easy pipeline composition")

except ImportError:
    print("Albumentations not installed. Install with: pip install albumentations")

6.5.5 Augmentation Best Practices

Guidelines:

Code
flowchart TD
    A["Data Augmentation\nBest Practices"] --> B["Domain-Appropriate\nTransformations"]
    A --> C["Reasonable\nParameter Ranges"]
    A --> D["No Semantic\nChanges"]
    A --> E["Balance with\nReal Data"]

    B --> B1["Medical: No flips for\nlaterality-dependent organs"]
    B --> B2["Text/OCR: No rotation\nthat breaks readability"]

    C --> C1["Rotation: ±15-30°\nnot ±180°"]
    C --> C2["Brightness: 0.7-1.3\nnot 0.1-5.0"]

    D --> D1["Cat to flipped cat ✓"]
    D --> D2["Cat to unrecognizable ✗"]

    E --> E1["Augmentation complements\nnot replaces real data"]

    style A fill:#99ccff
    style B fill:#ffcc99
    style C fill:#ffcc99
    style D fill:#ffcc99
    style E fill:#ffcc99

flowchart TD
    A["Data Augmentation\nBest Practices"] --> B["Domain-Appropriate\nTransformations"]
    A --> C["Reasonable\nParameter Ranges"]
    A --> D["No Semantic\nChanges"]
    A --> E["Balance with\nReal Data"]

    B --> B1["Medical: No flips for\nlaterality-dependent organs"]
    B --> B2["Text/OCR: No rotation\nthat breaks readability"]

    C --> C1["Rotation: ±15-30°\nnot ±180°"]
    C --> C2["Brightness: 0.7-1.3\nnot 0.1-5.0"]

    D --> D1["Cat to flipped cat ✓"]
    D --> D2["Cat to unrecognizable ✗"]

    E --> E1["Augmentation complements\nnot replaces real data"]

    style A fill:#99ccff
    style B fill:#ffcc99
    style C fill:#ffcc99
    style D fill:#ffcc99
    style E fill:#ffcc99

Data Augmentation Best Practices

Common Mistakes:

# Demonstrate bad augmentations
sample_img_bad = X_train_cifar[100:101]

fig, axes = plt.subplots(2, 4, figsize=(14, 7))

# Good augmentations
good_configs = [
    {'rotation_range': 15, 'name': 'Good: Rotation ±15°'},
    {'zoom_range': 0.2, 'name': 'Good: Zoom ±20%'},
    {'brightness_range': [0.8, 1.2], 'name': 'Good: Brightness 0.8-1.2'},
    {'horizontal_flip': True, 'name': 'Good: Horizontal Flip'},
]

for idx, config in enumerate(good_configs):
    name = config.pop('name')
    datagen = ImageDataGenerator(**config)
    aug_iter = datagen.flow(sample_img_bad, batch_size=1)
    aug_img = next(aug_iter)[0]

    axes[0, idx].imshow(aug_img)
    axes[0, idx].set_title(name, fontweight='bold', fontsize=10, color='green')
    axes[0, idx].axis('off')

# Bad augmentations
bad_configs = [
    {'rotation_range': 180, 'name': 'Bad: Rotation ±180°'},
    {'zoom_range': 0.8, 'name': 'Bad: Zoom ±80%'},
    {'brightness_range': [0.2, 3.0], 'name': 'Bad: Brightness 0.2-3.0'},
    {'width_shift_range': 0.8, 'name': 'Bad: Shift 80%'},
]

for idx, config in enumerate(bad_configs):
    name = config.pop('name')
    datagen = ImageDataGenerator(**config)
    aug_iter = datagen.flow(sample_img_bad, batch_size=1)
    aug_img = next(aug_iter)[0]

    axes[1, idx].imshow(np.clip(aug_img, 0, 1))
    axes[1, idx].set_title(name, fontweight='bold', fontsize=10, color='red')
    axes[1, idx].axis('off')

plt.suptitle('Good vs Bad Augmentation Parameters', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

6.6 Fine-tuning Strategies

6.6.1 Kapan Melakukan Fine-tuning?

Decision Framework:

Code
flowchart TD
    A["Start: Transfer Learning"] --> B{"Dataset Size?"}
    B -->|"Small\n<1K images"| C["Feature Extraction\nFreeze all base layers"]
    B -->|"Medium\n1K-10K images"| D["Partial Fine-tuning\nUnfreeze last few layers"]
    B -->|"Large\n>10K images"| E["Full Fine-tuning\nUnfreeze all layers"]

    C --> F{"Domain Similarity?"}
    D --> F
    E --> F

    F -->|"Similar\nto ImageNet"| G["Lower learning rate\n0.0001 - 0.00001"]
    F -->|"Different\nfrom ImageNet"| H["Higher learning rate\n0.001 - 0.0001"]

    style A fill:#99ccff
    style C fill:#ffcc99
    style D fill:#ffcc99
    style E fill:#ffcc99
    style G fill:#99ff99
    style H fill:#99ff99

flowchart TD
    A["Start: Transfer Learning"] --> B{"Dataset Size?"}
    B -->|"Small\n<1K images"| C["Feature Extraction\nFreeze all base layers"]
    B -->|"Medium\n1K-10K images"| D["Partial Fine-tuning\nUnfreeze last few layers"]
    B -->|"Large\n>10K images"| E["Full Fine-tuning\nUnfreeze all layers"]

    C --> F{"Domain Similarity?"}
    D --> F
    E --> F

    F -->|"Similar\nto ImageNet"| G["Lower learning rate\n0.0001 - 0.00001"]
    F -->|"Different\nfrom ImageNet"| H["Higher learning rate\n0.001 - 0.0001"]

    style A fill:#99ccff
    style C fill:#ffcc99
    style D fill:#ffcc99
    style E fill:#ffcc99
    style G fill:#99ff99
    style H fill:#99ff99

Transfer Learning Decision Framework - Pemilihan Strategi Fine-tuning Berdasarkan Ukuran Dataset dan Kesamaan Domain

6.6.2 Gradual Unfreezing

Strategy: Unfreeze Layers Progressively

# Demonstrate gradual unfreezing
def show_trainable_status(model, title):
    """
    Display which layers are trainable
    """
    print(f"\n{title}")
    print("=" * 60)
    print(f"{'Layer Name':<30} {'Trainable':<15} {'Parameters':<15}")
    print("-" * 60)

    total_params = 0
    trainable_params = 0

    for layer in model.layers:
        params = layer.count_params()
        total_params += params
        if layer.trainable:
            trainable_params += params

        print(f"{layer.name:<30} {str(layer.trainable):<15} {params:>10,}")

    print("-" * 60)
    print(f"Total parameters: {total_params:,}")
    print(f"Trainable parameters: {trainable_params:,} ({100*trainable_params/total_params:.1f}%)")
    print(f"Non-trainable parameters: {total_params-trainable_params:,}")

# Stage 1: All frozen
base_model_stage1 = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model_stage1.trainable = False

model_stage1 = build_transfer_model(base_model_stage1)
show_trainable_status(model_stage1, "STAGE 1: Feature Extraction (All Frozen)")

# Stage 2: Unfreeze last conv block
base_model_stage2 = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model_stage2.trainable = True

# Freeze all except last block
for layer in base_model_stage2.layers[:-4]:
    layer.trainable = False

model_stage2 = build_transfer_model(base_model_stage2)
show_trainable_status(model_stage2, "STAGE 2: Partial Fine-tuning (Last Block)")

# Stage 3: Unfreeze all
base_model_stage3 = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model_stage3.trainable = True

model_stage3 = build_transfer_model(base_model_stage3)
show_trainable_status(model_stage3, "STAGE 3: Full Fine-tuning (All Unfrozen)")

6.6.3 Learning Rate Strategies

Discriminative Learning Rates:

# Different learning rates for different layers
def compile_with_discriminative_lr(model, base_lr=0.001):
    """
    Apply different learning rates to different layer groups
    """
    # Separate layer groups
    base_layers = []
    top_layers = []

    for layer in model.layers:
        if 'vgg16' in layer.name or 'resnet' in layer.name:
            base_layers.extend(layer.trainable_weights)
        else:
            top_layers.extend(layer.trainable_weights)

    # Create optimizer with discriminative learning rates
    optimizer = keras.optimizers.Adam(learning_rate=base_lr)

    # Note: Keras doesn't support per-layer LR directly
    # In practice, use multiple optimizers or manual gradient application

    return optimizer

# Learning rate schedule
def create_lr_schedule():
    """
    Learning rate schedule for fine-tuning
    """
    def schedule(epoch, lr):
        if epoch < 5:
            return 0.001  # Warm-up
        elif epoch < 15:
            return 0.0001  # Fine-tuning
        else:
            return 0.00001  # Fine fine-tuning

    return keras.callbacks.LearningRateScheduler(schedule)

# Visualize learning rate schedule
epochs = 25
lrs = []

for epoch in range(epochs):
    if epoch < 5:
        lrs.append(0.001)
    elif epoch < 15:
        lrs.append(0.0001)
    else:
        lrs.append(0.00001)

plt.figure(figsize=(10, 5))
plt.plot(range(epochs), lrs, linewidth=3, marker='o', markersize=8)
plt.title('Learning Rate Schedule for Fine-tuning', fontweight='bold', fontsize=14)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Learning Rate', fontsize=12)
plt.yscale('log')
plt.grid(True, alpha=0.3)

# Add annotations
plt.axvspan(0, 5, alpha=0.2, color='red', label='Stage 1: Warm-up')
plt.axvspan(5, 15, alpha=0.2, color='orange', label='Stage 2: Fine-tuning')
plt.axvspan(15, 25, alpha=0.2, color='green', label='Stage 3: Fine fine-tuning')
plt.legend(fontsize=11)

plt.tight_layout()
plt.show()

6.6.4 Complete Fine-tuning Pipeline

End-to-End Example:

# Complete fine-tuning workflow
def fine_tuning_pipeline(X_train, y_train, X_val, y_val, num_classes=10):
    """
    Complete fine-tuning pipeline dengan best practices
    """
    # Stage 1: Feature extraction
    print("\n" + "="*70)
    print("STAGE 1: FEATURE EXTRACTION (5 epochs)")
    print("="*70)

    base_model = VGG16(weights='imagenet', include_top=False,
                       input_shape=(224, 224, 3))
    base_model.trainable = False

    model = build_transfer_model(base_model, num_classes)
    model.compile(
        optimizer=keras.optimizers.Adam(lr=0.001),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )

    # Train classifier only
    history_stage1 = model.fit(
        X_train, y_train,
        batch_size=32,
        epochs=5,
        validation_data=(X_val, y_val),
        verbose=1
    )

    # Stage 2: Fine-tuning
    print("\n" + "="*70)
    print("STAGE 2: FINE-TUNING LAST LAYERS (10 epochs)")
    print("="*70)

    # Unfreeze last conv block
    base_model.trainable = True
    for layer in base_model.layers[:-4]:
        layer.trainable = False

    # Recompile with lower learning rate
    model.compile(
        optimizer=keras.optimizers.Adam(lr=0.0001),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )

    # Continue training
    history_stage2 = model.fit(
        X_train, y_train,
        batch_size=32,
        epochs=10,
        validation_data=(X_val, y_val),
        verbose=1
    )

    return model, history_stage1, history_stage2

# Note: This would be run with actual data
print("Fine-tuning pipeline defined.")
print("\nBest Practices Applied:")
print("1. Two-stage training: feature extraction → fine-tuning")
print("2. Gradual unfreezing: top layers first")
print("3. Learning rate reduction: 0.001 → 0.0001")
print("4. Batch size consideration: smaller for fine-tuning")

6.6.5 Monitoring Fine-tuning

Key Metrics to Track:

# Simulate training history for visualization
np.random.seed(42)

epochs = 20
stage1_epochs = 5

# Stage 1: Feature extraction
train_acc_s1 = np.linspace(0.50, 0.75, stage1_epochs) + np.random.normal(0, 0.02, stage1_epochs)
val_acc_s1 = np.linspace(0.48, 0.72, stage1_epochs) + np.random.normal(0, 0.02, stage1_epochs)
train_loss_s1 = np.linspace(1.2, 0.7, stage1_epochs) + np.random.normal(0, 0.05, stage1_epochs)
val_loss_s1 = np.linspace(1.3, 0.75, stage1_epochs) + np.random.normal(0, 0.05, stage1_epochs)

# Stage 2: Fine-tuning
train_acc_s2 = np.linspace(0.75, 0.92, epochs-stage1_epochs) + np.random.normal(0, 0.01, epochs-stage1_epochs)
val_acc_s2 = np.linspace(0.72, 0.88, epochs-stage1_epochs) + np.random.normal(0, 0.015, epochs-stage1_epochs)
train_loss_s2 = np.linspace(0.7, 0.25, epochs-stage1_epochs) + np.random.normal(0, 0.03, epochs-stage1_epochs)
val_loss_s2 = np.linspace(0.75, 0.35, epochs-stage1_epochs) + np.random.normal(0, 0.04, epochs-stage1_epochs)

# Combine
train_acc = np.concatenate([train_acc_s1, train_acc_s2])
val_acc = np.concatenate([val_acc_s1, val_acc_s2])
train_loss = np.concatenate([train_loss_s1, train_loss_s2])
val_loss = np.concatenate([val_loss_s1, val_loss_s2])

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Accuracy
axes[0, 0].plot(train_acc, label='Train', linewidth=2.5, color='blue')
axes[0, 0].plot(val_acc, label='Validation', linewidth=2.5, color='orange')
axes[0, 0].axvline(x=stage1_epochs-0.5, color='red', linestyle='--',
                   linewidth=2, label='Fine-tuning starts')
axes[0, 0].set_title('Accuracy over Training', fontweight='bold', fontsize=13)
axes[0, 0].set_xlabel('Epoch', fontsize=11)
axes[0, 0].set_ylabel('Accuracy', fontsize=11)
axes[0, 0].legend(fontsize=11)
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].text(2, 0.85, 'Stage 1:\nFeature\nExtraction',
               fontsize=10, bbox=dict(boxstyle='round', facecolor='wheat'))
axes[0, 0].text(12, 0.85, 'Stage 2:\nFine-tuning',
               fontsize=10, bbox=dict(boxstyle='round', facecolor='lightgreen'))

# Loss
axes[0, 1].plot(train_loss, label='Train', linewidth=2.5, color='blue')
axes[0, 1].plot(val_loss, label='Validation', linewidth=2.5, color='orange')
axes[0, 1].axvline(x=stage1_epochs-0.5, color='red', linestyle='--',
                   linewidth=2, label='Fine-tuning starts')
axes[0, 1].set_title('Loss over Training', fontweight='bold', fontsize=13)
axes[0, 1].set_xlabel('Epoch', fontsize=11)
axes[0, 1].set_ylabel('Loss', fontsize=11)
axes[0, 1].legend(fontsize=11)
axes[0, 1].grid(True, alpha=0.3)

# Train-Val Gap (Overfitting indicator)
acc_gap = train_acc - val_acc
axes[1, 0].plot(acc_gap, linewidth=2.5, color='red')
axes[1, 0].axhline(y=0.05, color='green', linestyle='--',
                  linewidth=2, label='Acceptable gap')
axes[1, 0].axvline(x=stage1_epochs-0.5, color='red', linestyle='--', linewidth=2)
axes[1, 0].set_title('Train-Val Accuracy Gap (Overfitting)', fontweight='bold', fontsize=13)
axes[1, 0].set_xlabel('Epoch', fontsize=11)
axes[1, 0].set_ylabel('Accuracy Gap', fontsize=11)
axes[1, 0].legend(fontsize=11)
axes[1, 0].grid(True, alpha=0.3)

# Learning Rate Schedule
lr_schedule = np.concatenate([np.full(stage1_epochs, 0.001),
                             np.full(epochs-stage1_epochs, 0.0001)])
axes[1, 1].plot(lr_schedule, linewidth=3, marker='o', markersize=8, color='purple')
axes[1, 1].set_title('Learning Rate Schedule', fontweight='bold', fontsize=13)
axes[1, 1].set_xlabel('Epoch', fontsize=11)
axes[1, 1].set_ylabel('Learning Rate', fontsize=11)
axes[1, 1].set_yscale('log')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nFine-tuning Monitoring Checklist:")
print("✓ Training accuracy improving steadily")
print("✓ Validation accuracy following training (not diverging)")
print("✓ Train-val gap reasonable (<5-10%)")
print("✓ Loss decreasing consistently")
print("✓ Learning rate reduced at fine-tuning stage")

6.7 Review & Exercises

6.7.1 Rangkuman Bab

Key Concepts:

  1. CNN Architecture
    • Convolutional layers: local connectivity, parameter sharing
    • Pooling layers: downsampling, translation invariance
    • Hierarchical feature learning: edges → parts → objects
  2. Building CNNs
    • MNIST: simple grayscale images
    • CIFAR-10: color images, more complex
    • Architecture design: depth, width, skip connections
  3. Transfer Learning
    • Pre-trained models: VGG, ResNet, MobileNet
    • Feature extraction: freeze base, train classifier
    • Domain adaptation: ImageNet → custom domain
  4. Data Augmentation
    • Geometric: rotation, shift, flip, zoom
    • Color: brightness, contrast, saturation
    • Best practices: domain-appropriate, reasonable ranges
  5. Fine-tuning
    • Gradual unfreezing: top layers first
    • Discriminative learning rates: lower for base
    • Two-stage training: feature extraction → fine-tuning

6.7.2 Soal Review (15 Pertanyaan)

📝 Review Questions

Conceptual Understanding:

  1. Jelaskan mengapa CNN lebih efektif untuk image data dibandingkan fully-connected MLP. Diskusikan aspek local connectivity, parameter sharing, dan translation invariance.

  2. Apa perbedaan antara max pooling dan average pooling? Kapan Anda akan memilih satu dibanding yang lain?

  3. Dalam arsitektur CNN, mengapa biasanya jumlah filters meningkat (32 → 64 → 128) sementara spatial dimensions menurun (224 → 112 → 56)?

  4. Jelaskan konsep “hierarchical feature learning” dalam CNN. Berikan contoh features yang dipelajari di different layers.

  5. Apa yang dimaksud dengan “receptive field” dalam CNN? Bagaimana depth network mempengaruhi receptive field?

Transfer Learning:

  1. Kapan sebaiknya menggunakan transfer learning vs training CNN from scratch? Diskusikan trade-offs dalam hal dataset size, computational resources, dan domain similarity.

  2. Jelaskan perbedaan antara “feature extraction” dan “fine-tuning” dalam transfer learning. Kapan masing-masing approach appropriate?

  3. Mengapa learning rate untuk fine-tuning biasanya lebih kecil (0.0001) dibanding training from scratch (0.001)?

  4. Apa itu “catastrophic forgetting” dalam fine-tuning? Bagaimana cara mencegahnya?

  5. Bandingkan VGG16, ResNet50, dan MobileNetV2 dalam hal architecture, parameters, dan use cases.

Data Augmentation:

  1. Jelaskan mengapa data augmentation efektif untuk reducing overfitting. Apa limitasinya?

  2. Berikan contoh transformations yang TIDAK appropriate untuk: (a) medical X-ray images, (b) document OCR, (c) face recognition.

  3. Apa perbedaan antara “online” (on-the-fly) dan “offline” (pre-computed) augmentation? Apa advantages masing-masing?

Practical Applications:

  1. Desain arsitektur CNN untuk mobile device dengan constraints: model size <10MB, inference time <100ms. Jelaskan design choices Anda.

  2. Anda memiliki dataset 500 images dengan 10 classes untuk medical image classification. Outline complete workflow: data preparation, model selection, training strategy, evaluation.

6.7.3 Coding Exercises (5 Soal)

💻 Coding Exercises

Exercise 1: Build Custom CNN Architecture

Build dan train CNN untuk CIFAR-10 dengan requirements:

  • Minimal 3 convolutional blocks
  • Batch normalization setelah setiap Conv layer
  • Dropout untuk regularization
  • Data augmentation
  • Achieve >70% test accuracy
# Your code here
def create_custom_cnn():
    # TODO: Implement architecture
    pass

# Train and evaluate

Exercise 2: Transfer Learning Comparison

Compare 3 pre-trained models (VGG16, ResNet50, MobileNetV2) untuk CIFAR-10:

  • Implement transfer learning untuk masing-masing
  • Train selama 10 epochs
  • Compare: accuracy, training time, model size, inference speed
  • Visualize results dengan comparative plots
# Your code here
models_to_compare = ['VGG16', 'ResNet50', 'MobileNetV2']

# TODO: Implement comparison framework

Exercise 3: Data Augmentation Study

Investigate effect of data augmentation: 1. Train CNN tanpa augmentation 2. Train dengan augmentation (rotation, flip, zoom) 3. Train dengan aggressive augmentation 4. Compare: training curves, test accuracy, overfitting 5. Visualize augmented samples

# Your code here
augmentation_configs = [
    None,  # No augmentation
    'standard',  # Standard augmentation
    'aggressive'  # Aggressive augmentation
]

# TODO: Implement augmentation study

Exercise 4: Fine-tuning Pipeline

Implement complete fine-tuning pipeline: 1. Load pre-trained model (pilihan bebas) 2. Stage 1: Feature extraction (5 epochs) 3. Stage 2: Fine-tune last conv block (10 epochs) 4. Stage 3: Full fine-tuning (5 epochs) 5. Track dan visualize metrics tiap stage 6. Compare dengan training from scratch

# Your code here
def fine_tuning_stages(base_model, X_train, y_train, X_val, y_val):
    # TODO: Implement 3-stage fine-tuning
    pass

Exercise 5: CNN Visualization

Implement CNN interpretation techniques: 1. Visualize learned filters (first conv layer) 2. Generate feature maps untuk sample images 3. Create activation heatmaps (Grad-CAM style) 4. Analyze: what patterns does the network learn? 5. Compare feature maps dari different layers

# Your code here
def visualize_cnn_internals(model, image):
    # TODO: Implement visualization
    # - Filters
    # - Feature maps
    # - Activation heatmaps
    pass

6.7.4 Further Reading & Resources

Books:

  • “Deep Learning for Computer Vision” - Rajalingappaa Shanmugamani
  • “Computer Vision: Algorithms and Applications” - Richard Szeliski
  • “Hands-On Transfer Learning with Python” - Dipanjan Sarkar

Papers (Must-Read):

  • AlexNet: “ImageNet Classification with Deep CNNs” (Krizhevsky et al., 2012)
  • VGGNet: “Very Deep Convolutional Networks” (Simonyan & Zisserman, 2014)
  • ResNet: “Deep Residual Learning” (He et al., 2015)
  • MobileNet: “MobileNets: Efficient CNNs” (Howard et al., 2017)
  • EfficientNet: “Rethinking Model Scaling” (Tan & Le, 2019)

Online Resources:

  • CS231n: Convolutional Neural Networks (Stanford)
  • Fast.ai Practical Deep Learning for Coders
  • PyImageSearch Blog
  • Papers with Code - Computer Vision

Tools & Libraries:

  • TensorFlow / Keras
  • PyTorch / torchvision
  • Albumentations (data augmentation)
  • OpenCV (image processing)
  • Weights & Biases (experiment tracking)

🎓 Next Chapter Preview

Bab 7: Recurrent Neural Networks & LSTM

Setelah menguasai CNN untuk spatial data (images), selanjutnya kita akan explore:

  • RNN untuk sequential data (time series, text)
  • LSTM & GRU: handling long-term dependencies
  • Bidirectional RNNs
  • Sequence-to-sequence models
  • Applications: time series forecasting, text generation

Sequential data membutuhkan architecture yang bisa “remember” - mari kita explore recurrent networks!