import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import requests
from io import BytesIO
# Create simple synthetic image untuk demonstrasi
img_array = np.random.randint(0, 255, (64, 64, 3), dtype=np.uint8)
# Visualisasi RGB channels
fig, axes = plt.subplots(1, 4, figsize=(16, 4))
# Original image
axes[0].imshow(img_array)
axes[0].set_title('Original Image\nShape: (64, 64, 3)', fontweight='bold')
axes[0].axis('off')
# Individual channels
channel_names = ['Red Channel', 'Green Channel', 'Blue Channel']
colors = ['Reds', 'Greens', 'Blues']
for i in range(3):
axes[i+1].imshow(img_array[:, :, i], cmap=colors[i])
axes[i+1].set_title(f'{channel_names[i]}\nShape: (64, 64)', fontweight='bold')
axes[i+1].axis('off')
plt.tight_layout()
plt.show()
# Print image properties
print(f"Image shape: {img_array.shape}")
print(f"Data type: {img_array.dtype}")
print(f"Value range: [{img_array.min()}, {img_array.max()}]")
print(f"Total pixels: {img_array.shape[0] * img_array.shape[1]}")
print(f"Total values: {img_array.size}")Bab 6: Convolutional Neural Networks & Transfer Learning
Computer Vision dan Deep Learning untuk Image Recognition
Bab 6: Convolutional Neural Networks & Transfer Learning
Setelah mempelajari bab ini, Anda akan mampu:
- Memahami arsitektur CNN dan komponen fundamentalnya (convolutional layers, pooling, filters)
- Mengimplementasikan CNN dari scratch menggunakan Keras dan PyTorch untuk klasifikasi gambar
- Menerapkan transfer learning dengan pre-trained models (VGG16, ResNet, MobileNet)
- Menggunakan data augmentation techniques untuk meningkatkan generalisasi model
- Melakukan fine-tuning strategies untuk mengadaptasi pre-trained models ke domain spesifik
- Mengevaluasi performa CNN dan membandingkan berbagai arsitektur untuk computer vision tasks
6.1 Pengantar Computer Vision dan CNN
6.1.1 Mengapa Specialized Architecture untuk Images?
Di Chapter 5, kita belajar MLP (Multilayer Perceptron) yang bekerja dengan flatten input. Namun, untuk gambar, pendekatan ini memiliki masalah fundamental:
Problem dengan MLP untuk Images:
Bayangkan gambar 224×224×3 (RGB):
- Input neurons: 224 × 224 × 3 = 150,528 neurons
- Hidden layer (1000 neurons): 150,528 × 1000 = 150 juta parameters
- Hanya untuk satu layer!
Masalah:
- Overfitting: Terlalu banyak parameters untuk data terbatas
- Computational cost: Training sangat lambat
- Loss of spatial structure: Flatten menghilangkan informasi lokasi
- No translation invariance: Objek di posisi berbeda = pattern berbeda
CNN mengatasi masalah di atas dengan:
- Local connectivity: Setiap neuron hanya connect ke region kecil
- Parameter sharing: Filter yang sama digunakan di seluruh gambar
- Hierarchical learning: Layer awal deteksi edges, layer dalam deteksi objects
- Translation invariance: Deteksi pattern di posisi manapun
Hasil: CNN dengan 99% lebih sedikit parameters daripada fully-connected MLP!
6.1.2 Evolution of Computer Vision
Era Pre-Deep Learning (sebelum 2012):
- Manual feature extraction (SIFT, HOG, SURF)
- Classical ML (SVM, Random Forest) pada engineered features
- ImageNet 2011: Error rate ~26%
Deep Learning Revolution (2012-sekarang):
- AlexNet (2012): Error 15.3% - first CNN breakthrough
- VGGNet (2014): Deeper networks (16-19 layers)
- ResNet (2015): Skip connections, 152 layers, error 3.57%
- EfficientNet (2019): Optimal scaling, SOTA performance
- Vision Transformers (2020): Attention mechanisms untuk vision
Industry Applications:
- Healthcare: Medical imaging analysis, tumor detection
- Autonomous Vehicles: Object detection, lane recognition
- Retail: Visual search, inventory management
- Security: Face recognition, surveillance systems
- Manufacturing: Quality control, defect detection
- Agriculture: Crop disease identification, yield prediction
6.1.3 Image Fundamentals untuk Deep Learning
Representasi Digital Image:
Image Preprocessing untuk CNN:
import tensorflow as tf
from tensorflow import keras
# Preprocessing pipeline
def preprocess_image(image_array):
"""
Standard preprocessing untuk CNN
"""
# 1. Normalize ke [0, 1]
normalized = image_array.astype('float32') / 255.0
# 2. Standardize (zero mean, unit variance) - optional
mean = np.array([0.485, 0.456, 0.406]) # ImageNet statistics
std = np.array([0.229, 0.224, 0.225])
standardized = (normalized - mean) / std
return normalized, standardized
# Demonstrasi
normalized, standardized = preprocess_image(img_array)
print("Original range:", img_array.min(), "-", img_array.max())
print("Normalized range:", normalized.min(), "-", normalized.max())
print("Standardized range:", standardized.min(), "-", standardized.max())Spatial Relationships dalam Images:
Code
graph TD
A[Raw Pixels] --> B[Local Patterns<br/>edges, corners, textures]
B --> C[Object Parts<br/>eyes, wheels, windows]
C --> D[Objects<br/>faces, cars, buildings]
D --> E[Scenes<br/>street, office, nature]
style A fill:#ff9999
style B fill:#ffcc99
style C fill:#ffff99
style D fill:#99ff99
style E fill:#99ccffgraph TD
A[Raw Pixels] --> B[Local Patterns<br/>edges, corners, textures]
B --> C[Object Parts<br/>eyes, wheels, windows]
C --> D[Objects<br/>faces, cars, buildings]
D --> E[Scenes<br/>street, office, nature]
style A fill:#ff9999
style B fill:#ffcc99
style C fill:#ffff99
style D fill:#99ff99
style E fill:#99ccff
6.1.4 Challenges dalam Image Classification
Common Challenges:
- Intra-class Variation: Objek sama, penampilan berbeda
- Viewpoint Variation: Sudut pandang berbeda
- Illumination Changes: Pencahayaan berbeda
- Occlusion: Objek terhalang
- Background Clutter: Latar belakang complex
- Deformation: Perubahan bentuk (pose)
# Demonstrasi transformations yang CNN harus handle
from scipy import ndimage
# Create simple shape
shape = np.zeros((100, 100))
shape[30:70, 30:70] = 1
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
# Original
axes[0, 0].imshow(shape, cmap='gray')
axes[0, 0].set_title('Original', fontweight='bold')
# Rotation
rotated = ndimage.rotate(shape, 45, reshape=False)
axes[0, 1].imshow(rotated, cmap='gray')
axes[0, 1].set_title('Rotation (45°)', fontweight='bold')
# Scaling
from scipy.ndimage import zoom
scaled = zoom(shape, 0.7)
padded_scaled = np.zeros_like(shape)
y_offset = (shape.shape[0] - scaled.shape[0]) // 2
x_offset = (shape.shape[1] - scaled.shape[1]) // 2
padded_scaled[y_offset:y_offset+scaled.shape[0],
x_offset:x_offset+scaled.shape[1]] = scaled
axes[0, 2].imshow(padded_scaled, cmap='gray')
axes[0, 2].set_title('Scaling (0.7x)', fontweight='bold')
# Translation
translated = np.roll(shape, (20, 20), axis=(0, 1))
axes[1, 0].imshow(translated, cmap='gray')
axes[1, 0].set_title('Translation', fontweight='bold')
# Noise
noisy = shape + np.random.normal(0, 0.1, shape.shape)
axes[1, 1].imshow(noisy, cmap='gray')
axes[1, 1].set_title('Gaussian Noise', fontweight='bold')
# Blur
blurred = ndimage.gaussian_filter(shape, sigma=3)
axes[1, 2].imshow(blurred, cmap='gray')
axes[1, 2].set_title('Gaussian Blur', fontweight='bold')
for ax in axes.flat:
ax.axis('off')
plt.tight_layout()
plt.show()6.2 Arsitektur CNN: Building Blocks
6.2.1 Convolutional Layer: Core Operation
Konsep Convolution:
Convolution adalah operasi matematika yang mengaplikasikan filter (kernel) ke image untuk mengekstrak features.
Convolution Operation:
import numpy as np
import matplotlib.pyplot as plt
# Simple 5x5 image
image = np.array([
[1, 2, 3, 2, 1],
[2, 3, 4, 3, 2],
[3, 4, 5, 4, 3],
[2, 3, 4, 3, 2],
[1, 2, 3, 2, 1]
])
# Edge detection filter (vertical edges)
kernel_vertical = np.array([
[-1, 0, 1],
[-2, 0, 2],
[-1, 0, 1]
])
# Edge detection filter (horizontal edges)
kernel_horizontal = np.array([
[-1, -2, -1],
[ 0, 0, 0],
[ 1, 2, 1]
])
# Manual convolution operation
def convolve2d(image, kernel):
"""
Simple 2D convolution (without padding)
"""
h, w = image.shape
kh, kw = kernel.shape
output_h = h - kh + 1
output_w = w - kw + 1
output = np.zeros((output_h, output_w))
for i in range(output_h):
for j in range(output_w):
region = image[i:i+kh, j:j+kw]
output[i, j] = np.sum(region * kernel)
return output
# Apply filters
vertical_edges = convolve2d(image, kernel_vertical)
horizontal_edges = convolve2d(image, kernel_horizontal)
# Visualize
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes[0, 0].imshow(image, cmap='gray')
axes[0, 0].set_title('Original Image\n(5×5)', fontweight='bold', fontsize=12)
axes[0, 1].imshow(kernel_vertical, cmap='seismic', vmin=-2, vmax=2)
axes[0, 1].set_title('Vertical Edge Filter\n(Sobel-X)', fontweight='bold', fontsize=12)
for i in range(3):
for j in range(3):
axes[0, 1].text(j, i, f'{kernel_vertical[i, j]}',
ha='center', va='center', fontsize=14, fontweight='bold')
axes[0, 2].imshow(vertical_edges, cmap='gray')
axes[0, 2].set_title('Vertical Edges\n(3×3 output)', fontweight='bold', fontsize=12)
axes[1, 0].imshow(image, cmap='gray')
axes[1, 0].set_title('Original Image\n(5×5)', fontweight='bold', fontsize=12)
axes[1, 1].imshow(kernel_horizontal, cmap='seismic', vmin=-2, vmax=2)
axes[1, 1].set_title('Horizontal Edge Filter\n(Sobel-Y)', fontweight='bold', fontsize=12)
for i in range(3):
for j in range(3):
axes[1, 1].text(j, i, f'{kernel_horizontal[i, j]}',
ha='center', va='center', fontsize=14, fontweight='bold')
axes[1, 2].imshow(horizontal_edges, cmap='gray')
axes[1, 2].set_title('Horizontal Edges\n(3×3 output)', fontweight='bold', fontsize=12)
for ax in axes.flat:
ax.axis('off')
plt.tight_layout()
plt.show()
print("\nConvolution Mathematics:")
print("Input shape:", image.shape)
print("Kernel shape:", kernel_vertical.shape)
print("Output shape:", vertical_edges.shape)
print("\nOutput size formula: (H - K + 1) × (W - K + 1)")
print(f"= ({image.shape[0]} - {kernel_vertical.shape[0]} + 1) × ({image.shape[1]} - {kernel_vertical.shape[1]} + 1)")
print(f"= {vertical_edges.shape}")Convolutional Layer Parameters:
# Visualize convolutional layer parameters
def visualize_conv_params():
"""
Visualisasi parameter convolutional layer
"""
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
# 1. Filter Size Effect
ax = axes[0, 0]
filters = {
'3×3': np.random.randn(3, 3),
'5×5': np.random.randn(5, 5),
'7×7': np.random.randn(7, 7)
}
positions = [0, 4, 9]
for idx, (name, filt) in enumerate(filters.items()):
ax.imshow(filt, cmap='coolwarm', extent=[positions[idx], positions[idx]+filt.shape[0],
0, filt.shape[0]])
ax.text(positions[idx]+filt.shape[0]/2, -1, name,
ha='center', fontweight='bold', fontsize=11)
ax.set_xlim(-1, 16)
ax.set_ylim(-2, 8)
ax.set_title('Filter Size (Kernel Size)', fontweight='bold', fontsize=13)
ax.axis('off')
# 2. Stride Effect
ax = axes[0, 1]
input_size = 7
kernel_size = 3
strides = [1, 2]
for s_idx, stride in enumerate(strides):
y_offset = s_idx * 8
# Draw input
for i in range(input_size):
for j in range(input_size):
rect = plt.Rectangle((j, y_offset+i), 1, 1,
fill=False, edgecolor='blue', linewidth=1.5)
ax.add_patch(rect)
# Draw kernel positions
output_size = (input_size - kernel_size) // stride + 1
for i in range(output_size):
for j in range(output_size):
y_pos = y_offset + i * stride
x_pos = j * stride
rect = plt.Rectangle((x_pos, y_pos), kernel_size, kernel_size,
fill=False, edgecolor='red', linewidth=2.5)
ax.add_patch(rect)
ax.text(-1.5, y_offset+3, f'Stride={stride}\nOutput: {output_size}×{output_size}',
ha='right', fontweight='bold', fontsize=10)
ax.set_xlim(-3, 8)
ax.set_ylim(-1, 17)
ax.set_title('Stride Effect on Output Size', fontweight='bold', fontsize=13)
ax.axis('off')
# 3. Padding Types
ax = axes[1, 0]
padding_types = {
'Valid (no padding)': {'input': 5, 'kernel': 3, 'padding': 0},
'Same (zero padding)': {'input': 5, 'kernel': 3, 'padding': 1}
}
for p_idx, (name, params) in enumerate(padding_types.items()):
y_offset = p_idx * 8
inp_size = params['input']
pad = params['padding']
# Draw padding
if pad > 0:
for i in range(inp_size + 2*pad):
for j in range(inp_size + 2*pad):
if i < pad or i >= inp_size+pad or j < pad or j >= inp_size+pad:
rect = plt.Rectangle((j, y_offset+i), 1, 1,
fill=True, facecolor='lightgray',
edgecolor='gray', linewidth=1)
ax.add_patch(rect)
# Draw input
for i in range(inp_size):
for j in range(inp_size):
rect = plt.Rectangle((j+pad, y_offset+i+pad), 1, 1,
fill=True, facecolor='lightblue',
edgecolor='blue', linewidth=1.5)
ax.add_patch(rect)
ax.text(-1.5, y_offset+3, name, ha='right', fontweight='bold', fontsize=10)
ax.set_xlim(-3, 8)
ax.set_ylim(-1, 17)
ax.set_title('Padding: Valid vs Same', fontweight='bold', fontsize=13)
ax.axis('off')
# 4. Number of Filters
ax = axes[1, 1]
num_filters = [32, 64, 128]
for f_idx, nf in enumerate(num_filters):
y_pos = f_idx * 3
# Draw input
rect = plt.Rectangle((0, y_pos), 2, 2, fill=True,
facecolor='lightblue', edgecolor='blue', linewidth=2)
ax.add_patch(rect)
# Draw filters
for i in range(min(nf, 10)):
x_pos = 3 + i * 0.3
rect = plt.Rectangle((x_pos, y_pos), 0.25, 2,
fill=True, facecolor='orange',
edgecolor='red', linewidth=1)
ax.add_patch(rect)
ax.text(7, y_pos+1, f'{nf} filters', ha='left',
va='center', fontweight='bold', fontsize=11)
ax.set_xlim(-1, 11)
ax.set_ylim(-1, 10)
ax.set_title('Number of Filters (Feature Maps)', fontweight='bold', fontsize=13)
ax.axis('off')
plt.tight_layout()
plt.show()
visualize_conv_params()Output Size Calculation:
def calculate_output_size(input_size, kernel_size, stride=1, padding=0):
"""
Formula: O = (I - K + 2P) / S + 1
O: Output size
I: Input size
K: Kernel size
P: Padding
S: Stride
"""
output = (input_size - kernel_size + 2*padding) // stride + 1
return output
# Examples
examples = [
{'input': 224, 'kernel': 3, 'stride': 1, 'padding': 0},
{'input': 224, 'kernel': 3, 'stride': 1, 'padding': 1},
{'input': 224, 'kernel': 5, 'stride': 2, 'padding': 2},
{'input': 56, 'kernel': 3, 'stride': 2, 'padding': 1},
]
print("Convolutional Layer Output Size Calculations:")
print("=" * 80)
print(f"{'Input':<10} {'Kernel':<10} {'Stride':<10} {'Padding':<10} {'Output':<10}")
print("-" * 80)
for ex in examples:
output = calculate_output_size(ex['input'], ex['kernel'], ex['stride'], ex['padding'])
print(f"{ex['input']:<10} {ex['kernel']:<10} {ex['stride']:<10} {ex['padding']:<10} {output:<10}")6.2.2 Activation Functions untuk CNN
ReLU: Standard Choice untuk CNN
# Visualize activation functions
x = np.linspace(-5, 5, 1000)
activations = {
'ReLU': lambda x: np.maximum(0, x),
'Leaky ReLU': lambda x: np.where(x > 0, x, 0.01*x),
'ELU': lambda x: np.where(x > 0, x, np.exp(x) - 1),
'Swish': lambda x: x * (1 / (1 + np.exp(-x)))
}
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()
for idx, (name, func) in enumerate(activations.items()):
y = func(x)
axes[idx].plot(x, y, linewidth=2.5, color='blue')
axes[idx].axhline(y=0, color='k', linewidth=0.8, linestyle='--', alpha=0.3)
axes[idx].axvline(x=0, color='k', linewidth=0.8, linestyle='--', alpha=0.3)
axes[idx].grid(True, alpha=0.3)
axes[idx].set_title(f'{name} Activation', fontweight='bold', fontsize=13)
axes[idx].set_xlabel('Input', fontsize=11)
axes[idx].set_ylabel('Output', fontsize=11)
# Add formula
formulas = {
'ReLU': r'$f(x) = \max(0, x)$',
'Leaky ReLU': r'$f(x) = \max(0.01x, x)$',
'ELU': r'$f(x) = x$ if $x>0$ else $e^x-1$',
'Swish': r'$f(x) = x \cdot \sigma(x)$'
}
axes[idx].text(0.05, 0.95, formulas[name], transform=axes[idx].transAxes,
fontsize=11, verticalalignment='top',
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
plt.tight_layout()
plt.show()Advantages:
- Computational efficiency: Simple max(0, x)
- Sparse activation: ~50% neurons activated
- No vanishing gradient: Gradient 1 untuk x > 0
- Empirically works well: State-of-the-art results
Disadvantages:
- Dying ReLU: Neurons can become inactive (always 0)
- Not zero-centered: Can slow convergence
Solution: Leaky ReLU, ELU untuk addressing dying ReLU problem
6.2.3 Pooling Layers: Downsampling
Pooling Operation:
# Demonstrate pooling operations
feature_map = np.array([
[1, 3, 2, 4],
[5, 6, 1, 3],
[2, 1, 4, 7],
[3, 2, 5, 8]
])
def max_pooling(input_map, pool_size=2):
"""Max pooling operation"""
h, w = input_map.shape
output_h = h // pool_size
output_w = w // pool_size
output = np.zeros((output_h, output_w))
for i in range(output_h):
for j in range(output_w):
region = input_map[i*pool_size:(i+1)*pool_size,
j*pool_size:(j+1)*pool_size]
output[i, j] = np.max(region)
return output
def average_pooling(input_map, pool_size=2):
"""Average pooling operation"""
h, w = input_map.shape
output_h = h // pool_size
output_w = w // pool_size
output = np.zeros((output_h, output_w))
for i in range(output_h):
for j in range(output_w):
region = input_map[i*pool_size:(i+1)*pool_size,
j*pool_size:(j+1)*pool_size]
output[i, j] = np.mean(region)
return output
max_pooled = max_pooling(feature_map)
avg_pooled = average_pooling(feature_map)
# Visualize
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
# Original
im0 = axes[0].imshow(feature_map, cmap='YlOrRd', vmin=0, vmax=8)
axes[0].set_title('Feature Map\n(4×4)', fontweight='bold', fontsize=13)
for i in range(4):
for j in range(4):
axes[0].text(j, i, f'{feature_map[i, j]:.0f}',
ha='center', va='center', fontsize=14, fontweight='bold')
axes[0].set_xticks([])
axes[0].set_yticks([])
# Max pooling
im1 = axes[1].imshow(max_pooled, cmap='YlOrRd', vmin=0, vmax=8)
axes[1].set_title('Max Pooling\n(2×2 pool)', fontweight='bold', fontsize=13)
for i in range(2):
for j in range(2):
axes[1].text(j, i, f'{max_pooled[i, j]:.0f}',
ha='center', va='center', fontsize=14, fontweight='bold')
axes[1].set_xticks([])
axes[1].set_yticks([])
# Average pooling
im2 = axes[2].imshow(avg_pooled, cmap='YlOrRd', vmin=0, vmax=8)
axes[2].set_title('Average Pooling\n(2×2 pool)', fontweight='bold', fontsize=13)
for i in range(2):
for j in range(2):
axes[2].text(j, i, f'{avg_pooled[i, j]:.1f}',
ha='center', va='center', fontsize=14, fontweight='bold')
axes[2].set_xticks([])
axes[2].set_yticks([])
plt.tight_layout()
plt.show()
print("Pooling Reduces Spatial Dimensions:")
print(f"Input: {feature_map.shape} → Max Pool: {max_pooled.shape}")
print(f"Size reduction: {feature_map.size} → {max_pooled.size} (75% reduction)")Pooling Benefits:
Code
graph LR
A[Pooling Layer] --> B[Dimensionality<br/>Reduction]
A --> C[Translation<br/>Invariance]
A --> D[Computational<br/>Efficiency]
A --> E[Overfitting<br/>Prevention]
style A fill:#99ccff
style B fill:#ffcc99
style C fill:#ffcc99
style D fill:#ffcc99
style E fill:#ffcc99graph LR
A[Pooling Layer] --> B[Dimensionality<br/>Reduction]
A --> C[Translation<br/>Invariance]
A --> D[Computational<br/>Efficiency]
A --> E[Overfitting<br/>Prevention]
style A fill:#99ccff
style B fill:#ffcc99
style C fill:#ffcc99
style D fill:#ffcc99
style E fill:#ffcc99
6.2.4 Complete CNN Architecture
Typical CNN Architecture:
Code
flowchart TD
A["Input\n224x224x3"] --> B["Conv1\n3x3, 32"]
B --> C["ReLU"]
C --> D["Pool1\n2x2"]
D --> E["Conv2\n3x3, 64"]
E --> F["ReLU"]
F --> G["Pool2\n2x2"]
G --> H["Conv3\n3x3, 128"]
H --> I["ReLU"]
I --> J["Pool3\n2x2"]
J --> K["Flatten"]
K --> L["Dense\n512"]
L --> M["ReLU"]
M --> N["Dropout"]
N --> O["Output\n10 classes"]
style A fill:#ff9999
style B fill:#ffcc99
style E fill:#ffcc99
style H fill:#ffcc99
style K fill:#99ff99
style L fill:#99ccff
style O fill:#cc99ffflowchart TD
A["Input\n224x224x3"] --> B["Conv1\n3x3, 32"]
B --> C["ReLU"]
C --> D["Pool1\n2x2"]
D --> E["Conv2\n3x3, 64"]
E --> F["ReLU"]
F --> G["Pool2\n2x2"]
G --> H["Conv3\n3x3, 128"]
H --> I["ReLU"]
I --> J["Pool3\n2x2"]
J --> K["Flatten"]
K --> L["Dense\n512"]
L --> M["ReLU"]
M --> N["Dropout"]
N --> O["Output\n10 classes"]
style A fill:#ff9999
style B fill:#ffcc99
style E fill:#ffcc99
style H fill:#ffcc99
style K fill:#99ff99
style L fill:#99ccff
style O fill:#cc99ff
Implementation: Simple CNN
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# Build simple CNN
def create_simple_cnn(input_shape=(32, 32, 3), num_classes=10):
"""
Simple CNN for image classification
"""
model = keras.Sequential([
# Block 1
layers.Conv2D(32, (3, 3), activation='relu', padding='same',
input_shape=input_shape, name='conv1'),
layers.BatchNormalization(name='bn1'),
layers.MaxPooling2D((2, 2), name='pool1'),
# Block 2
layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv2'),
layers.BatchNormalization(name='bn2'),
layers.MaxPooling2D((2, 2), name='pool2'),
# Block 3
layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv3'),
layers.BatchNormalization(name='bn3'),
layers.MaxPooling2D((2, 2), name='pool3'),
# Classifier
layers.Flatten(name='flatten'),
layers.Dense(512, activation='relu', name='fc1'),
layers.Dropout(0.5, name='dropout'),
layers.Dense(num_classes, activation='softmax', name='output')
], name='SimpleCNN')
return model
# Create model
model = create_simple_cnn()
# Display architecture
model.summary()
# Calculate parameters
total_params = model.count_params()
trainable_params = sum([tf.size(w).numpy() for w in model.trainable_weights])
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")Visualize Feature Maps:
# Visualize what CNN learns
from tensorflow.keras.preprocessing import image
# Create dummy input
dummy_input = np.random.rand(1, 32, 32, 3).astype('float32')
# Get intermediate layer outputs
layer_outputs = [layer.output for layer in model.layers[:6]] # First 6 layers
activation_model = keras.Model(inputs=model.input, outputs=layer_outputs)
# Get activations
activations = activation_model.predict(dummy_input, verbose=0)
# Visualize first convolutional layer
fig, axes = plt.subplots(4, 8, figsize=(16, 8))
axes = axes.flatten()
conv1_activation = activations[0][0] # First conv layer output
for i in range(min(32, len(axes))):
if i < conv1_activation.shape[-1]:
axes[i].imshow(conv1_activation[:, :, i], cmap='viridis')
axes[i].set_title(f'Filter {i+1}', fontsize=9)
axes[i].axis('off')
plt.suptitle('Conv1 Feature Maps (32 filters)', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()6.3 Building CNN from Scratch
6.3.1 Dataset: MNIST Handwritten Digits
Load and Explore MNIST:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
# Load MNIST
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Preprocess
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
y_train_cat = to_categorical(y_train, 10)
y_test_cat = to_categorical(y_test, 10)
# Visualize samples
fig, axes = plt.subplots(4, 10, figsize=(15, 6))
for i in range(4):
for j in range(10):
idx = i * 10 + j
axes[i, j].imshow(X_train[idx].squeeze(), cmap='gray')
axes[i, j].set_title(f'{y_train[idx]}', fontsize=11, fontweight='bold')
axes[i, j].axis('off')
plt.suptitle('MNIST Dataset Samples', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()
print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"Classes: {np.unique(y_train)}")6.3.2 Build CNN in Keras
def create_mnist_cnn():
"""
CNN for MNIST classification
"""
model = keras.Sequential([
# Convolutional Block 1
layers.Conv2D(32, (3, 3), activation='relu', padding='same',
input_shape=(28, 28, 1)),
layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
layers.MaxPooling2D((2, 2)),
layers.Dropout(0.25),
# Convolutional Block 2
layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
layers.MaxPooling2D((2, 2)),
layers.Dropout(0.25),
# Classifier
layers.Flatten(),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(10, activation='softmax')
], name='MNIST_CNN')
return model
# Create and compile
mnist_cnn = create_mnist_cnn()
mnist_cnn.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
mnist_cnn.summary()Train CNN:
# Training callbacks
callbacks = [
keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=5,
restore_best_weights=True
),
keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=3
)
]
# Train model
history = mnist_cnn.fit(
X_train, y_train_cat,
batch_size=128,
epochs=20,
validation_split=0.15,
callbacks=callbacks,
verbose=1
)
# Evaluate
test_loss, test_acc = mnist_cnn.evaluate(X_test, y_test_cat, verbose=0)
print(f"\nTest Accuracy: {test_acc:.4f}")
print(f"Test Loss: {test_loss:.4f}")Visualize Training:
# Plot training history
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Accuracy
axes[0].plot(history.history['accuracy'], label='Train', linewidth=2.5)
axes[0].plot(history.history['val_accuracy'], label='Validation', linewidth=2.5)
axes[0].set_title('Model Accuracy', fontweight='bold', fontsize=13)
axes[0].set_xlabel('Epoch', fontsize=11)
axes[0].set_ylabel('Accuracy', fontsize=11)
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)
# Loss
axes[1].plot(history.history['loss'], label='Train', linewidth=2.5)
axes[1].plot(history.history['val_loss'], label='Validation', linewidth=2.5)
axes[1].set_title('Model Loss', fontweight='bold', fontsize=13)
axes[1].set_xlabel('Epoch', fontsize=11)
axes[1].set_ylabel('Loss', fontsize=11)
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()6.3.3 Build CNN in PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
# Define CNN architecture
class MNISTNet(nn.Module):
def __init__(self):
super(MNISTNet, self).__init__()
# Convolutional layers
self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(32, 32, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.conv4 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
# Pooling
self.pool = nn.MaxPool2d(2, 2)
# Fully connected layers
self.fc1 = nn.Linear(64 * 7 * 7, 512)
self.fc2 = nn.Linear(512, 10)
# Dropout
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
def forward(self, x):
# Block 1
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = self.pool(x)
x = self.dropout1(x)
# Block 2
x = F.relu(self.conv3(x))
x = F.relu(self.conv4(x))
x = self.pool(x)
x = self.dropout1(x)
# Classifier
x = x.view(-1, 64 * 7 * 7)
x = F.relu(self.fc1(x))
x = self.dropout2(x)
x = self.fc2(x)
return x
# Create model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
pytorch_model = MNISTNet().to(device)
print(pytorch_model)
print(f"\nDevice: {device}")
# Count parameters
total_params = sum(p.numel() for p in pytorch_model.parameters())
trainable_params = sum(p.numel() for p in pytorch_model.parameters() if p.requires_grad)
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")Train PyTorch CNN:
# Prepare data loaders
X_train_torch = torch.FloatTensor(X_train)
y_train_torch = torch.LongTensor(y_train)
X_test_torch = torch.FloatTensor(X_test)
y_test_torch = torch.LongTensor(y_test)
train_dataset = TensorDataset(X_train_torch, y_train_torch)
test_dataset = TensorDataset(X_test_torch, y_test_torch)
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(pytorch_model.parameters(), lr=0.001)
# Training function
def train_epoch(model, loader, criterion, optimizer, device):
model.train()
running_loss = 0.0
correct = 0
total = 0
for inputs, labels in loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
return running_loss / len(loader), 100. * correct / total
# Evaluation function
def evaluate(model, loader, criterion, device):
model.eval()
running_loss = 0.0
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
loss = criterion(outputs, labels)
running_loss += loss.item()
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
return running_loss / len(loader), 100. * correct / total
# Train model
epochs = 10
train_losses, train_accs = [], []
test_losses, test_accs = [], []
print("Training PyTorch CNN...")
for epoch in range(epochs):
train_loss, train_acc = train_epoch(pytorch_model, train_loader,
criterion, optimizer, device)
test_loss, test_acc = evaluate(pytorch_model, test_loader,
criterion, device)
train_losses.append(train_loss)
train_accs.append(train_acc)
test_losses.append(test_loss)
test_accs.append(test_acc)
print(f"Epoch {epoch+1}/{epochs}: "
f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}% | "
f"Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2f}%")
# Plot results
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].plot(train_accs, label='Train', linewidth=2.5)
axes[0].plot(test_accs, label='Test', linewidth=2.5)
axes[0].set_title('PyTorch CNN Accuracy', fontweight='bold', fontsize=13)
axes[0].set_xlabel('Epoch', fontsize=11)
axes[0].set_ylabel('Accuracy (%)', fontsize=11)
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)
axes[1].plot(train_losses, label='Train', linewidth=2.5)
axes[1].plot(test_losses, label='Test', linewidth=2.5)
axes[1].set_title('PyTorch CNN Loss', fontweight='bold', fontsize=13)
axes[1].set_xlabel('Epoch', fontsize=11)
axes[1].set_ylabel('Loss', fontsize=11)
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()6.3.4 CIFAR-10: More Complex Dataset
Load CIFAR-10:
from tensorflow.keras.datasets import cifar10
# Load dataset
(X_train_cifar, y_train_cifar), (X_test_cifar, y_test_cifar) = cifar10.load_data()
# Normalize
X_train_cifar = X_train_cifar.astype('float32') / 255.0
X_test_cifar = X_test_cifar.astype('float32') / 255.0
# One-hot encode
y_train_cifar_cat = to_categorical(y_train_cifar, 10)
y_test_cifar_cat = to_categorical(y_test_cifar, 10)
# Class names
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
# Visualize samples
fig, axes = plt.subplots(4, 10, figsize=(15, 6))
for i in range(4):
for j in range(10):
idx = i * 10 + j
axes[i, j].imshow(X_train_cifar[idx])
axes[i, j].set_title(f'{class_names[y_train_cifar[idx][0]]}',
fontsize=9, fontweight='bold')
axes[i, j].axis('off')
plt.suptitle('CIFAR-10 Dataset Samples', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()
print(f"Training set: {X_train_cifar.shape}")
print(f"Test set: {X_test_cifar.shape}")
print(f"Classes: {class_names}")CNN for CIFAR-10:
def create_cifar10_cnn():
"""
Deeper CNN for CIFAR-10 classification
"""
model = keras.Sequential([
# Block 1
layers.Conv2D(64, (3, 3), activation='relu', padding='same',
input_shape=(32, 32, 3)),
layers.BatchNormalization(),
layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
layers.BatchNormalization(),
layers.MaxPooling2D((2, 2)),
layers.Dropout(0.3),
# Block 2
layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
layers.BatchNormalization(),
layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
layers.BatchNormalization(),
layers.MaxPooling2D((2, 2)),
layers.Dropout(0.4),
# Block 3
layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
layers.BatchNormalization(),
layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
layers.BatchNormalization(),
layers.MaxPooling2D((2, 2)),
layers.Dropout(0.5),
# Classifier
layers.Flatten(),
layers.Dense(512, activation='relu'),
layers.BatchNormalization(),
layers.Dropout(0.5),
layers.Dense(10, activation='softmax')
], name='CIFAR10_CNN')
return model
# Create and compile
cifar_cnn = create_cifar10_cnn()
cifar_cnn.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss='categorical_crossentropy',
metrics=['accuracy']
)
cifar_cnn.summary()6.4 Transfer Learning Fundamentals
6.4.1 Konsep Transfer Learning
Mengapa Transfer Learning?
Training CNN dari scratch membutuhkan:
- Dataset besar (>100K images)
- Computational resources (GPU, waktu training lama)
- Expertise dalam architecture design dan hyperparameter tuning
Transfer Learning Solution:
Code
graph TD
A[Pre-trained Model<br/>ImageNet: 1M images, 1000 classes] --> B[Feature Extractor<br/>Lower layers: generic features]
B --> C[Your Dataset<br/>Small: 1K images, custom classes]
C --> D[Fine-tune<br/>Higher layers + classifier]
D --> E[Specialized Model<br/>Your domain]
style A fill:#99ccff
style B fill:#ffcc99
style C fill:#ff9999
style D fill:#99ff99
style E fill:#cc99ffgraph TD
A[Pre-trained Model<br/>ImageNet: 1M images, 1000 classes] --> B[Feature Extractor<br/>Lower layers: generic features]
B --> C[Your Dataset<br/>Small: 1K images, custom classes]
C --> D[Fine-tune<br/>Higher layers + classifier]
D --> E[Specialized Model<br/>Your domain]
style A fill:#99ccff
style B fill:#ffcc99
style C fill:#ff9999
style D fill:#99ff99
style E fill:#cc99ff
Bayangkan belajar bahasa baru:
- From scratch: Belajar huruf, kata, grammar dari nol (CNN from scratch)
- Transfer learning: Sudah bisa bahasa Inggris, belajar bahasa Prancis lebih cepat (reuse knowledge)
Lower layers CNN = general knowledge (edges, textures) Higher layers CNN = task-specific knowledge (object parts, semantics)
Transfer Learning Strategies:
# Visualize transfer learning strategies
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
strategies = [
{
'name': 'Feature Extraction\n(Freeze all)',
'frozen': [True, True, True, True],
'trainable': [False, False, False, True]
},
{
'name': 'Fine-tuning\n(Last layers)',
'frozen': [True, True, False, False],
'trainable': [False, False, True, True]
},
{
'name': 'Full Fine-tuning\n(All layers)',
'frozen': [False, False, False, False],
'trainable': [True, True, True, True]
}
]
for idx, strategy in enumerate(strategies):
ax = axes[idx]
# Draw layers
layer_names = ['Conv Block 1', 'Conv Block 2', 'Conv Block 3', 'Classifier']
colors = ['lightcoral' if frozen else 'lightgreen'
for frozen in strategy['frozen']]
y_positions = [3, 2, 1, 0]
for i, (name, color, trainable) in enumerate(zip(layer_names, colors,
strategy['trainable'])):
rect = plt.Rectangle((0, y_positions[i]), 4, 0.7,
facecolor=color, edgecolor='black', linewidth=2)
ax.add_patch(rect)
status = 'Trainable' if trainable else 'Frozen'
ax.text(2, y_positions[i] + 0.35, f'{name}\n({status})',
ha='center', va='center', fontsize=10, fontweight='bold')
ax.set_xlim(-0.5, 4.5)
ax.set_ylim(-0.5, 4)
ax.set_title(strategy['name'], fontweight='bold', fontsize=12)
ax.axis('off')
plt.tight_layout()
plt.show()6.4.2 Popular Pre-trained Models
Common Pre-trained Architectures:
# Summary of popular pre-trained models
import pandas as pd
models_info = {
'Model': ['VGG16', 'VGG19', 'ResNet50', 'ResNet101', 'InceptionV3',
'MobileNetV2', 'EfficientNetB0', 'DenseNet121'],
'Year': [2014, 2014, 2015, 2015, 2015, 2018, 2019, 2017],
'Params (M)': [138, 144, 25.6, 44.5, 23.8, 3.5, 5.3, 8.0],
'Top-1 Acc': [71.3, 71.3, 76.0, 77.6, 77.9, 71.8, 77.1, 75.0],
'Depth': [16, 19, 50, 101, 48, 53, 82, 121],
'Best For': [
'Simple, deep',
'Simple, deeper',
'Residual connections',
'Very deep',
'Inception modules',
'Mobile/edge devices',
'Efficient scaling',
'Dense connections'
]
}
df_models = pd.DataFrame(models_info)
print(df_models.to_string(index=False))
# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Parameters vs Accuracy
axes[0].scatter(df_models['Params (M)'], df_models['Top-1 Acc'],
s=200, alpha=0.6, c=range(len(df_models)), cmap='viridis')
for i, model in enumerate(df_models['Model']):
axes[0].annotate(model, (df_models['Params (M)'][i], df_models['Top-1 Acc'][i]),
fontsize=9, fontweight='bold')
axes[0].set_xlabel('Parameters (Millions)', fontsize=11, fontweight='bold')
axes[0].set_ylabel('Top-1 Accuracy (%)', fontsize=11, fontweight='bold')
axes[0].set_title('Parameters vs Accuracy', fontsize=13, fontweight='bold')
axes[0].grid(True, alpha=0.3)
# Model comparison
x = np.arange(len(df_models['Model']))
axes[1].bar(x, df_models['Top-1 Acc'], alpha=0.7, color='skyblue',
edgecolor='navy', linewidth=1.5)
axes[1].set_xticks(x)
axes[1].set_xticklabels(df_models['Model'], rotation=45, ha='right', fontsize=10)
axes[1].set_ylabel('Top-1 Accuracy (%)', fontsize=11, fontweight='bold')
axes[1].set_title('Model Accuracy Comparison', fontsize=13, fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()6.4.3 Transfer Learning dengan VGG16
Load Pre-trained VGG16:
from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input
# Load pre-trained VGG16 (without top classifier)
base_model = VGG16(
weights='imagenet',
include_top=False,
input_shape=(224, 224, 3)
)
print("VGG16 Base Model:")
print(f"Total layers: {len(base_model.layers)}")
print(f"Total parameters: {base_model.count_params():,}")
# Display architecture
base_model.summary()
# Freeze base model
base_model.trainable = False
print(f"\nTrainable parameters after freezing: {sum([tf.size(w).numpy() for w in base_model.trainable_weights]):,}")Add Custom Classifier:
def build_transfer_model(base_model, num_classes=10):
"""
Build transfer learning model with custom classifier
"""
# Create new model
inputs = keras.Input(shape=(224, 224, 3))
# Preprocessing for VGG16
x = preprocess_input(inputs)
# Base model (frozen)
x = base_model(x, training=False)
# Custom classifier
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(512, activation='relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.3)(x)
outputs = layers.Dense(num_classes, activation='softmax')(x)
# Complete model
model = keras.Model(inputs, outputs, name='VGG16_Transfer')
return model
# Create transfer learning model
transfer_model = build_transfer_model(base_model, num_classes=10)
# Compile
transfer_model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss='categorical_crossentropy',
metrics=['accuracy']
)
transfer_model.summary()6.4.4 Transfer Learning dengan ResNet50
Load ResNet50:
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input as resnet_preprocess
# Load ResNet50
resnet_base = ResNet50(
weights='imagenet',
include_top=False,
input_shape=(224, 224, 3)
)
# Freeze base
resnet_base.trainable = False
# Build transfer model
def build_resnet_transfer(base_model, num_classes=10):
"""
ResNet50 transfer learning model
"""
inputs = keras.Input(shape=(224, 224, 3))
x = resnet_preprocess(inputs)
x = base_model(x, training=False)
# Global pooling + classifier
x = layers.GlobalAveragePooling2D()(x)
x = layers.BatchNormalization()(x)
x = layers.Dense(512, activation='relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.3)(x)
outputs = layers.Dense(num_classes, activation='softmax')(x)
return keras.Model(inputs, outputs, name='ResNet50_Transfer')
resnet_model = build_resnet_transfer(resnet_base)
resnet_model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
print(f"ResNet50 Transfer Model - Total params: {resnet_model.count_params():,}")6.4.5 Transfer Learning dengan MobileNetV2
MobileNetV2 for Edge Devices:
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input as mobile_preprocess
# Load MobileNetV2
mobile_base = MobileNetV2(
weights='imagenet',
include_top=False,
input_shape=(224, 224, 3),
alpha=1.0 # Width multiplier
)
mobile_base.trainable = False
# Build efficient model
def build_mobile_transfer(base_model, num_classes=10):
"""
MobileNetV2 for efficient inference
"""
inputs = keras.Input(shape=(224, 224, 3))
x = mobile_preprocess(inputs)
x = base_model(x, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(num_classes, activation='softmax')(x)
return keras.Model(inputs, outputs, name='MobileNetV2_Transfer')
mobile_model = build_mobile_transfer(mobile_base)
mobile_model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
# Compare model sizes
models_comparison = {
'Model': ['VGG16', 'ResNet50', 'MobileNetV2'],
'Total Params': [
transfer_model.count_params(),
resnet_model.count_params(),
mobile_model.count_params()
],
'Trainable Params': [
sum([tf.size(w).numpy() for w in transfer_model.trainable_weights]),
sum([tf.size(w).numpy() for w in resnet_model.trainable_weights]),
sum([tf.size(w).numpy() for w in mobile_model.trainable_weights])
]
}
df_comparison = pd.DataFrame(models_comparison)
df_comparison['Total Params'] = df_comparison['Total Params'].apply(lambda x: f"{x:,}")
df_comparison['Trainable Params'] = df_comparison['Trainable Params'].apply(lambda x: f"{x:,}")
print("\nModel Size Comparison:")
print(df_comparison.to_string(index=False))6.5 Data Augmentation
6.5.1 Mengapa Data Augmentation?
Problem: Limited Training Data
Deep learning membutuhkan banyak data, tapi:
- Labeling expensive dan time-consuming
- Real-world constraints (privacy, accessibility)
- Class imbalance
Solution: Data Augmentation
Generate synthetic training examples dengan transformasi yang preserve semantic meaning.
- Increase dataset size: 10× - 100× more training samples
- Reduce overfitting: Model learns robust features
- Improve generalization: Better performance on unseen data
- Handle variations: Rotation, scaling, lighting, etc.
- Class balancing: Oversample minority classes
6.5.2 Common Augmentation Techniques
Geometric Transformations:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# Load sample image
sample_img = X_train_cifar[0:1] # Shape: (1, 32, 32, 3)
# Define augmentation
augmentation_configs = [
{'rotation_range': 30, 'name': 'Rotation (±30°)'},
{'width_shift_range': 0.2, 'height_shift_range': 0.2, 'name': 'Shift (20%)'},
{'zoom_range': 0.3, 'name': 'Zoom (±30%)'},
{'horizontal_flip': True, 'name': 'Horizontal Flip'},
{'brightness_range': [0.5, 1.5], 'name': 'Brightness'},
]
fig, axes = plt.subplots(2, 6, figsize=(16, 6))
# Original
axes[0, 0].imshow(sample_img[0])
axes[0, 0].set_title('Original', fontweight='bold', fontsize=11)
axes[0, 0].axis('off')
axes[1, 0].imshow(sample_img[0])
axes[1, 0].set_title('Original', fontweight='bold', fontsize=11)
axes[1, 0].axis('off')
# Apply augmentations
for idx, config in enumerate(augmentation_configs):
name = config.pop('name')
datagen = ImageDataGenerator(**config)
# Generate augmented images
aug_iter = datagen.flow(sample_img, batch_size=1)
aug_img = next(aug_iter)[0]
row = idx // 5
col = (idx % 5) + 1
axes[row, col].imshow(aug_img)
axes[row, col].set_title(name, fontweight='bold', fontsize=11)
axes[row, col].axis('off')
plt.suptitle('Data Augmentation Techniques', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()Advanced Augmentations:
# Combined augmentation pipeline
def create_augmentation_pipeline():
"""
Comprehensive augmentation for training
"""
datagen = ImageDataGenerator(
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
zoom_range=0.2,
shear_range=0.15,
brightness_range=[0.8, 1.2],
fill_mode='nearest'
)
return datagen
# Generate multiple augmented samples
augmentation_gen = create_augmentation_pipeline()
fig, axes = plt.subplots(4, 8, figsize=(16, 8))
axes = axes.flatten()
# Generate 32 augmented versions
aug_iter = augmentation_gen.flow(sample_img, batch_size=1)
for i in range(32):
aug_img = next(aug_iter)[0]
axes[i].imshow(aug_img)
axes[i].axis('off')
plt.suptitle('32 Augmented Versions of Same Image', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()6.5.3 Augmentation dalam Training
Keras with ImageDataGenerator:
# Training with augmentation
train_datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True,
zoom_range=0.1
)
# Fit to training data (untuk standardization jika needed)
train_datagen.fit(X_train_cifar)
# Validation data (no augmentation)
val_datagen = ImageDataGenerator()
# Training with data augmentation
print("Training dengan Data Augmentation:")
print("Setiap epoch: model melihat slightly different versions dari data")
print("Efek: Regularization, reduced overfitting, better generalization")6.5.4 Modern Augmentation Libraries
Albumentations: Advanced Augmentation:
# Note: Install with: pip install albumentations
try:
import albumentations as A
# Define transformation pipeline
transform = A.Compose([
A.RandomRotate90(p=0.5),
A.Flip(p=0.5),
A.OneOf([
A.GaussNoise(p=1),
A.GaussianBlur(p=1),
A.MotionBlur(p=1),
], p=0.3),
A.OneOf([
A.OpticalDistortion(p=1),
A.GridDistortion(p=1),
], p=0.3),
A.OneOf([
A.HueSaturationValue(p=1),
A.RGBShift(p=1),
A.RandomBrightnessContrast(p=1),
], p=0.5),
])
# Apply to sample
sample_np = (X_train_cifar[0] * 255).astype(np.uint8)
fig, axes = plt.subplots(2, 4, figsize=(14, 7))
axes = axes.flatten()
for i in range(8):
augmented = transform(image=sample_np)['image']
axes[i].imshow(augmented)
axes[i].set_title(f'Augmented {i+1}', fontweight='bold', fontsize=11)
axes[i].axis('off')
plt.suptitle('Albumentations: Advanced Augmentations', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()
print("Albumentations offers:")
print("- Fast C++ backend")
print("- 70+ transformation types")
print("- Bounding box & keypoint support")
print("- Easy pipeline composition")
except ImportError:
print("Albumentations not installed. Install with: pip install albumentations")6.5.5 Augmentation Best Practices
Guidelines:
Code
flowchart TD
A["Data Augmentation\nBest Practices"] --> B["Domain-Appropriate\nTransformations"]
A --> C["Reasonable\nParameter Ranges"]
A --> D["No Semantic\nChanges"]
A --> E["Balance with\nReal Data"]
B --> B1["Medical: No flips for\nlaterality-dependent organs"]
B --> B2["Text/OCR: No rotation\nthat breaks readability"]
C --> C1["Rotation: ±15-30°\nnot ±180°"]
C --> C2["Brightness: 0.7-1.3\nnot 0.1-5.0"]
D --> D1["Cat to flipped cat ✓"]
D --> D2["Cat to unrecognizable ✗"]
E --> E1["Augmentation complements\nnot replaces real data"]
style A fill:#99ccff
style B fill:#ffcc99
style C fill:#ffcc99
style D fill:#ffcc99
style E fill:#ffcc99flowchart TD
A["Data Augmentation\nBest Practices"] --> B["Domain-Appropriate\nTransformations"]
A --> C["Reasonable\nParameter Ranges"]
A --> D["No Semantic\nChanges"]
A --> E["Balance with\nReal Data"]
B --> B1["Medical: No flips for\nlaterality-dependent organs"]
B --> B2["Text/OCR: No rotation\nthat breaks readability"]
C --> C1["Rotation: ±15-30°\nnot ±180°"]
C --> C2["Brightness: 0.7-1.3\nnot 0.1-5.0"]
D --> D1["Cat to flipped cat ✓"]
D --> D2["Cat to unrecognizable ✗"]
E --> E1["Augmentation complements\nnot replaces real data"]
style A fill:#99ccff
style B fill:#ffcc99
style C fill:#ffcc99
style D fill:#ffcc99
style E fill:#ffcc99
Common Mistakes:
# Demonstrate bad augmentations
sample_img_bad = X_train_cifar[100:101]
fig, axes = plt.subplots(2, 4, figsize=(14, 7))
# Good augmentations
good_configs = [
{'rotation_range': 15, 'name': 'Good: Rotation ±15°'},
{'zoom_range': 0.2, 'name': 'Good: Zoom ±20%'},
{'brightness_range': [0.8, 1.2], 'name': 'Good: Brightness 0.8-1.2'},
{'horizontal_flip': True, 'name': 'Good: Horizontal Flip'},
]
for idx, config in enumerate(good_configs):
name = config.pop('name')
datagen = ImageDataGenerator(**config)
aug_iter = datagen.flow(sample_img_bad, batch_size=1)
aug_img = next(aug_iter)[0]
axes[0, idx].imshow(aug_img)
axes[0, idx].set_title(name, fontweight='bold', fontsize=10, color='green')
axes[0, idx].axis('off')
# Bad augmentations
bad_configs = [
{'rotation_range': 180, 'name': 'Bad: Rotation ±180°'},
{'zoom_range': 0.8, 'name': 'Bad: Zoom ±80%'},
{'brightness_range': [0.2, 3.0], 'name': 'Bad: Brightness 0.2-3.0'},
{'width_shift_range': 0.8, 'name': 'Bad: Shift 80%'},
]
for idx, config in enumerate(bad_configs):
name = config.pop('name')
datagen = ImageDataGenerator(**config)
aug_iter = datagen.flow(sample_img_bad, batch_size=1)
aug_img = next(aug_iter)[0]
axes[1, idx].imshow(np.clip(aug_img, 0, 1))
axes[1, idx].set_title(name, fontweight='bold', fontsize=10, color='red')
axes[1, idx].axis('off')
plt.suptitle('Good vs Bad Augmentation Parameters', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()6.6 Fine-tuning Strategies
6.6.1 Kapan Melakukan Fine-tuning?
Decision Framework:
Code
flowchart TD
A["Start: Transfer Learning"] --> B{"Dataset Size?"}
B -->|"Small\n<1K images"| C["Feature Extraction\nFreeze all base layers"]
B -->|"Medium\n1K-10K images"| D["Partial Fine-tuning\nUnfreeze last few layers"]
B -->|"Large\n>10K images"| E["Full Fine-tuning\nUnfreeze all layers"]
C --> F{"Domain Similarity?"}
D --> F
E --> F
F -->|"Similar\nto ImageNet"| G["Lower learning rate\n0.0001 - 0.00001"]
F -->|"Different\nfrom ImageNet"| H["Higher learning rate\n0.001 - 0.0001"]
style A fill:#99ccff
style C fill:#ffcc99
style D fill:#ffcc99
style E fill:#ffcc99
style G fill:#99ff99
style H fill:#99ff99flowchart TD
A["Start: Transfer Learning"] --> B{"Dataset Size?"}
B -->|"Small\n<1K images"| C["Feature Extraction\nFreeze all base layers"]
B -->|"Medium\n1K-10K images"| D["Partial Fine-tuning\nUnfreeze last few layers"]
B -->|"Large\n>10K images"| E["Full Fine-tuning\nUnfreeze all layers"]
C --> F{"Domain Similarity?"}
D --> F
E --> F
F -->|"Similar\nto ImageNet"| G["Lower learning rate\n0.0001 - 0.00001"]
F -->|"Different\nfrom ImageNet"| H["Higher learning rate\n0.001 - 0.0001"]
style A fill:#99ccff
style C fill:#ffcc99
style D fill:#ffcc99
style E fill:#ffcc99
style G fill:#99ff99
style H fill:#99ff99
6.6.2 Gradual Unfreezing
Strategy: Unfreeze Layers Progressively
# Demonstrate gradual unfreezing
def show_trainable_status(model, title):
"""
Display which layers are trainable
"""
print(f"\n{title}")
print("=" * 60)
print(f"{'Layer Name':<30} {'Trainable':<15} {'Parameters':<15}")
print("-" * 60)
total_params = 0
trainable_params = 0
for layer in model.layers:
params = layer.count_params()
total_params += params
if layer.trainable:
trainable_params += params
print(f"{layer.name:<30} {str(layer.trainable):<15} {params:>10,}")
print("-" * 60)
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,} ({100*trainable_params/total_params:.1f}%)")
print(f"Non-trainable parameters: {total_params-trainable_params:,}")
# Stage 1: All frozen
base_model_stage1 = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model_stage1.trainable = False
model_stage1 = build_transfer_model(base_model_stage1)
show_trainable_status(model_stage1, "STAGE 1: Feature Extraction (All Frozen)")
# Stage 2: Unfreeze last conv block
base_model_stage2 = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model_stage2.trainable = True
# Freeze all except last block
for layer in base_model_stage2.layers[:-4]:
layer.trainable = False
model_stage2 = build_transfer_model(base_model_stage2)
show_trainable_status(model_stage2, "STAGE 2: Partial Fine-tuning (Last Block)")
# Stage 3: Unfreeze all
base_model_stage3 = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model_stage3.trainable = True
model_stage3 = build_transfer_model(base_model_stage3)
show_trainable_status(model_stage3, "STAGE 3: Full Fine-tuning (All Unfrozen)")6.6.3 Learning Rate Strategies
Discriminative Learning Rates:
# Different learning rates for different layers
def compile_with_discriminative_lr(model, base_lr=0.001):
"""
Apply different learning rates to different layer groups
"""
# Separate layer groups
base_layers = []
top_layers = []
for layer in model.layers:
if 'vgg16' in layer.name or 'resnet' in layer.name:
base_layers.extend(layer.trainable_weights)
else:
top_layers.extend(layer.trainable_weights)
# Create optimizer with discriminative learning rates
optimizer = keras.optimizers.Adam(learning_rate=base_lr)
# Note: Keras doesn't support per-layer LR directly
# In practice, use multiple optimizers or manual gradient application
return optimizer
# Learning rate schedule
def create_lr_schedule():
"""
Learning rate schedule for fine-tuning
"""
def schedule(epoch, lr):
if epoch < 5:
return 0.001 # Warm-up
elif epoch < 15:
return 0.0001 # Fine-tuning
else:
return 0.00001 # Fine fine-tuning
return keras.callbacks.LearningRateScheduler(schedule)
# Visualize learning rate schedule
epochs = 25
lrs = []
for epoch in range(epochs):
if epoch < 5:
lrs.append(0.001)
elif epoch < 15:
lrs.append(0.0001)
else:
lrs.append(0.00001)
plt.figure(figsize=(10, 5))
plt.plot(range(epochs), lrs, linewidth=3, marker='o', markersize=8)
plt.title('Learning Rate Schedule for Fine-tuning', fontweight='bold', fontsize=14)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Learning Rate', fontsize=12)
plt.yscale('log')
plt.grid(True, alpha=0.3)
# Add annotations
plt.axvspan(0, 5, alpha=0.2, color='red', label='Stage 1: Warm-up')
plt.axvspan(5, 15, alpha=0.2, color='orange', label='Stage 2: Fine-tuning')
plt.axvspan(15, 25, alpha=0.2, color='green', label='Stage 3: Fine fine-tuning')
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()6.6.4 Complete Fine-tuning Pipeline
End-to-End Example:
# Complete fine-tuning workflow
def fine_tuning_pipeline(X_train, y_train, X_val, y_val, num_classes=10):
"""
Complete fine-tuning pipeline dengan best practices
"""
# Stage 1: Feature extraction
print("\n" + "="*70)
print("STAGE 1: FEATURE EXTRACTION (5 epochs)")
print("="*70)
base_model = VGG16(weights='imagenet', include_top=False,
input_shape=(224, 224, 3))
base_model.trainable = False
model = build_transfer_model(base_model, num_classes)
model.compile(
optimizer=keras.optimizers.Adam(lr=0.001),
loss='categorical_crossentropy',
metrics=['accuracy']
)
# Train classifier only
history_stage1 = model.fit(
X_train, y_train,
batch_size=32,
epochs=5,
validation_data=(X_val, y_val),
verbose=1
)
# Stage 2: Fine-tuning
print("\n" + "="*70)
print("STAGE 2: FINE-TUNING LAST LAYERS (10 epochs)")
print("="*70)
# Unfreeze last conv block
base_model.trainable = True
for layer in base_model.layers[:-4]:
layer.trainable = False
# Recompile with lower learning rate
model.compile(
optimizer=keras.optimizers.Adam(lr=0.0001),
loss='categorical_crossentropy',
metrics=['accuracy']
)
# Continue training
history_stage2 = model.fit(
X_train, y_train,
batch_size=32,
epochs=10,
validation_data=(X_val, y_val),
verbose=1
)
return model, history_stage1, history_stage2
# Note: This would be run with actual data
print("Fine-tuning pipeline defined.")
print("\nBest Practices Applied:")
print("1. Two-stage training: feature extraction → fine-tuning")
print("2. Gradual unfreezing: top layers first")
print("3. Learning rate reduction: 0.001 → 0.0001")
print("4. Batch size consideration: smaller for fine-tuning")6.6.5 Monitoring Fine-tuning
Key Metrics to Track:
# Simulate training history for visualization
np.random.seed(42)
epochs = 20
stage1_epochs = 5
# Stage 1: Feature extraction
train_acc_s1 = np.linspace(0.50, 0.75, stage1_epochs) + np.random.normal(0, 0.02, stage1_epochs)
val_acc_s1 = np.linspace(0.48, 0.72, stage1_epochs) + np.random.normal(0, 0.02, stage1_epochs)
train_loss_s1 = np.linspace(1.2, 0.7, stage1_epochs) + np.random.normal(0, 0.05, stage1_epochs)
val_loss_s1 = np.linspace(1.3, 0.75, stage1_epochs) + np.random.normal(0, 0.05, stage1_epochs)
# Stage 2: Fine-tuning
train_acc_s2 = np.linspace(0.75, 0.92, epochs-stage1_epochs) + np.random.normal(0, 0.01, epochs-stage1_epochs)
val_acc_s2 = np.linspace(0.72, 0.88, epochs-stage1_epochs) + np.random.normal(0, 0.015, epochs-stage1_epochs)
train_loss_s2 = np.linspace(0.7, 0.25, epochs-stage1_epochs) + np.random.normal(0, 0.03, epochs-stage1_epochs)
val_loss_s2 = np.linspace(0.75, 0.35, epochs-stage1_epochs) + np.random.normal(0, 0.04, epochs-stage1_epochs)
# Combine
train_acc = np.concatenate([train_acc_s1, train_acc_s2])
val_acc = np.concatenate([val_acc_s1, val_acc_s2])
train_loss = np.concatenate([train_loss_s1, train_loss_s2])
val_loss = np.concatenate([val_loss_s1, val_loss_s2])
# Visualize
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Accuracy
axes[0, 0].plot(train_acc, label='Train', linewidth=2.5, color='blue')
axes[0, 0].plot(val_acc, label='Validation', linewidth=2.5, color='orange')
axes[0, 0].axvline(x=stage1_epochs-0.5, color='red', linestyle='--',
linewidth=2, label='Fine-tuning starts')
axes[0, 0].set_title('Accuracy over Training', fontweight='bold', fontsize=13)
axes[0, 0].set_xlabel('Epoch', fontsize=11)
axes[0, 0].set_ylabel('Accuracy', fontsize=11)
axes[0, 0].legend(fontsize=11)
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].text(2, 0.85, 'Stage 1:\nFeature\nExtraction',
fontsize=10, bbox=dict(boxstyle='round', facecolor='wheat'))
axes[0, 0].text(12, 0.85, 'Stage 2:\nFine-tuning',
fontsize=10, bbox=dict(boxstyle='round', facecolor='lightgreen'))
# Loss
axes[0, 1].plot(train_loss, label='Train', linewidth=2.5, color='blue')
axes[0, 1].plot(val_loss, label='Validation', linewidth=2.5, color='orange')
axes[0, 1].axvline(x=stage1_epochs-0.5, color='red', linestyle='--',
linewidth=2, label='Fine-tuning starts')
axes[0, 1].set_title('Loss over Training', fontweight='bold', fontsize=13)
axes[0, 1].set_xlabel('Epoch', fontsize=11)
axes[0, 1].set_ylabel('Loss', fontsize=11)
axes[0, 1].legend(fontsize=11)
axes[0, 1].grid(True, alpha=0.3)
# Train-Val Gap (Overfitting indicator)
acc_gap = train_acc - val_acc
axes[1, 0].plot(acc_gap, linewidth=2.5, color='red')
axes[1, 0].axhline(y=0.05, color='green', linestyle='--',
linewidth=2, label='Acceptable gap')
axes[1, 0].axvline(x=stage1_epochs-0.5, color='red', linestyle='--', linewidth=2)
axes[1, 0].set_title('Train-Val Accuracy Gap (Overfitting)', fontweight='bold', fontsize=13)
axes[1, 0].set_xlabel('Epoch', fontsize=11)
axes[1, 0].set_ylabel('Accuracy Gap', fontsize=11)
axes[1, 0].legend(fontsize=11)
axes[1, 0].grid(True, alpha=0.3)
# Learning Rate Schedule
lr_schedule = np.concatenate([np.full(stage1_epochs, 0.001),
np.full(epochs-stage1_epochs, 0.0001)])
axes[1, 1].plot(lr_schedule, linewidth=3, marker='o', markersize=8, color='purple')
axes[1, 1].set_title('Learning Rate Schedule', fontweight='bold', fontsize=13)
axes[1, 1].set_xlabel('Epoch', fontsize=11)
axes[1, 1].set_ylabel('Learning Rate', fontsize=11)
axes[1, 1].set_yscale('log')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("\nFine-tuning Monitoring Checklist:")
print("✓ Training accuracy improving steadily")
print("✓ Validation accuracy following training (not diverging)")
print("✓ Train-val gap reasonable (<5-10%)")
print("✓ Loss decreasing consistently")
print("✓ Learning rate reduced at fine-tuning stage")6.7 Review & Exercises
6.7.1 Rangkuman Bab
Key Concepts:
- CNN Architecture
- Convolutional layers: local connectivity, parameter sharing
- Pooling layers: downsampling, translation invariance
- Hierarchical feature learning: edges → parts → objects
- Building CNNs
- MNIST: simple grayscale images
- CIFAR-10: color images, more complex
- Architecture design: depth, width, skip connections
- Transfer Learning
- Pre-trained models: VGG, ResNet, MobileNet
- Feature extraction: freeze base, train classifier
- Domain adaptation: ImageNet → custom domain
- Data Augmentation
- Geometric: rotation, shift, flip, zoom
- Color: brightness, contrast, saturation
- Best practices: domain-appropriate, reasonable ranges
- Fine-tuning
- Gradual unfreezing: top layers first
- Discriminative learning rates: lower for base
- Two-stage training: feature extraction → fine-tuning
6.7.2 Soal Review (15 Pertanyaan)
Conceptual Understanding:
Jelaskan mengapa CNN lebih efektif untuk image data dibandingkan fully-connected MLP. Diskusikan aspek local connectivity, parameter sharing, dan translation invariance.
Apa perbedaan antara max pooling dan average pooling? Kapan Anda akan memilih satu dibanding yang lain?
Dalam arsitektur CNN, mengapa biasanya jumlah filters meningkat (32 → 64 → 128) sementara spatial dimensions menurun (224 → 112 → 56)?
Jelaskan konsep “hierarchical feature learning” dalam CNN. Berikan contoh features yang dipelajari di different layers.
Apa yang dimaksud dengan “receptive field” dalam CNN? Bagaimana depth network mempengaruhi receptive field?
Transfer Learning:
Kapan sebaiknya menggunakan transfer learning vs training CNN from scratch? Diskusikan trade-offs dalam hal dataset size, computational resources, dan domain similarity.
Jelaskan perbedaan antara “feature extraction” dan “fine-tuning” dalam transfer learning. Kapan masing-masing approach appropriate?
Mengapa learning rate untuk fine-tuning biasanya lebih kecil (0.0001) dibanding training from scratch (0.001)?
Apa itu “catastrophic forgetting” dalam fine-tuning? Bagaimana cara mencegahnya?
Bandingkan VGG16, ResNet50, dan MobileNetV2 dalam hal architecture, parameters, dan use cases.
Data Augmentation:
Jelaskan mengapa data augmentation efektif untuk reducing overfitting. Apa limitasinya?
Berikan contoh transformations yang TIDAK appropriate untuk: (a) medical X-ray images, (b) document OCR, (c) face recognition.
Apa perbedaan antara “online” (on-the-fly) dan “offline” (pre-computed) augmentation? Apa advantages masing-masing?
Practical Applications:
Desain arsitektur CNN untuk mobile device dengan constraints: model size <10MB, inference time <100ms. Jelaskan design choices Anda.
Anda memiliki dataset 500 images dengan 10 classes untuk medical image classification. Outline complete workflow: data preparation, model selection, training strategy, evaluation.
6.7.3 Coding Exercises (5 Soal)
Exercise 1: Build Custom CNN Architecture
Build dan train CNN untuk CIFAR-10 dengan requirements:
- Minimal 3 convolutional blocks
- Batch normalization setelah setiap Conv layer
- Dropout untuk regularization
- Data augmentation
- Achieve >70% test accuracy
# Your code here
def create_custom_cnn():
# TODO: Implement architecture
pass
# Train and evaluateExercise 2: Transfer Learning Comparison
Compare 3 pre-trained models (VGG16, ResNet50, MobileNetV2) untuk CIFAR-10:
- Implement transfer learning untuk masing-masing
- Train selama 10 epochs
- Compare: accuracy, training time, model size, inference speed
- Visualize results dengan comparative plots
# Your code here
models_to_compare = ['VGG16', 'ResNet50', 'MobileNetV2']
# TODO: Implement comparison frameworkExercise 3: Data Augmentation Study
Investigate effect of data augmentation: 1. Train CNN tanpa augmentation 2. Train dengan augmentation (rotation, flip, zoom) 3. Train dengan aggressive augmentation 4. Compare: training curves, test accuracy, overfitting 5. Visualize augmented samples
# Your code here
augmentation_configs = [
None, # No augmentation
'standard', # Standard augmentation
'aggressive' # Aggressive augmentation
]
# TODO: Implement augmentation studyExercise 4: Fine-tuning Pipeline
Implement complete fine-tuning pipeline: 1. Load pre-trained model (pilihan bebas) 2. Stage 1: Feature extraction (5 epochs) 3. Stage 2: Fine-tune last conv block (10 epochs) 4. Stage 3: Full fine-tuning (5 epochs) 5. Track dan visualize metrics tiap stage 6. Compare dengan training from scratch
# Your code here
def fine_tuning_stages(base_model, X_train, y_train, X_val, y_val):
# TODO: Implement 3-stage fine-tuning
passExercise 5: CNN Visualization
Implement CNN interpretation techniques: 1. Visualize learned filters (first conv layer) 2. Generate feature maps untuk sample images 3. Create activation heatmaps (Grad-CAM style) 4. Analyze: what patterns does the network learn? 5. Compare feature maps dari different layers
# Your code here
def visualize_cnn_internals(model, image):
# TODO: Implement visualization
# - Filters
# - Feature maps
# - Activation heatmaps
pass6.7.4 Further Reading & Resources
Books:
- “Deep Learning for Computer Vision” - Rajalingappaa Shanmugamani
- “Computer Vision: Algorithms and Applications” - Richard Szeliski
- “Hands-On Transfer Learning with Python” - Dipanjan Sarkar
Papers (Must-Read):
- AlexNet: “ImageNet Classification with Deep CNNs” (Krizhevsky et al., 2012)
- VGGNet: “Very Deep Convolutional Networks” (Simonyan & Zisserman, 2014)
- ResNet: “Deep Residual Learning” (He et al., 2015)
- MobileNet: “MobileNets: Efficient CNNs” (Howard et al., 2017)
- EfficientNet: “Rethinking Model Scaling” (Tan & Le, 2019)
Online Resources:
- CS231n: Convolutional Neural Networks (Stanford)
- Fast.ai Practical Deep Learning for Coders
- PyImageSearch Blog
- Papers with Code - Computer Vision
Tools & Libraries:
- TensorFlow / Keras
- PyTorch / torchvision
- Albumentations (data augmentation)
- OpenCV (image processing)
- Weights & Biases (experiment tracking)
Bab 7: Recurrent Neural Networks & LSTM
Setelah menguasai CNN untuk spatial data (images), selanjutnya kita akan explore:
- RNN untuk sequential data (time series, text)
- LSTM & GRU: handling long-term dependencies
- Bidirectional RNNs
- Sequence-to-sequence models
- Applications: time series forecasting, text generation
Sequential data membutuhkan architecture yang bisa “remember” - mari kita explore recurrent networks!