Parameter-Efficient Fine-Tuning, LoRA, dan Praktik Transfer Learning untuk LLM
Bab 9: Fine-tuning Large Language Models
π― Hasil Pembelajaran (Learning Outcomes)
Setelah mempelajari bab ini, Anda akan mampu:
Memahami konsep pre-training vs fine-tuning dalam Large Language Models
Menjelaskan perbedaan Full Fine-tuning dan Parameter-Efficient Fine-Tuning (PEFT)
Mengimplementasikan LoRA (Low-Rank Adaptation) untuk efficient fine-tuning
Menerapkan QLoRA untuk fine-tuning dengan memory constraints
Melakukan fine-tuning model kecil (BERT, GPT-2) untuk tasks spesifik
Mengevaluasi performa model sebelum dan sesudah fine-tuning
Mengoptimalkan hyperparameters untuk training yang efisien
9.1 Pengantar: Dari Pre-training ke Fine-tuning
9.1.1 Transfer Learning dalam Era LLM
Di Chapter 8, kita telah belajar arsitektur Transformer dan bagaimana pre-trained models seperti BERT dan GPT bekerja. Sekarang, pertanyaan pentingnya: Bagaimana kita menyesuaikan model besar ini untuk tugas spesifik kita?
Problem:
Training LLM from scratch membutuhkan jutaan dollar dan ribuan GPU
BERT-Large: 24 layers, 340M parameters, trained on 3.3B words
Tidak realistis untuk most organizations/researchers
Solution: Transfer Learning
Code
flowchart LR A["π Pre-training\n(Large corpus)\nBillions of tokens"] --> B["π€ Base Model\nGeneral knowledge"] B --> C["π― Fine-tuning\n(Task-specific data)\nThousands of examples"] C --> D["β Specialized Model\nDomain expert"] style A fill:#e3f2fd style B fill:#fff9c4 style C fill:#c8e6c9 style D fill:#a5d6a7
flowchart LR
A["π Pre-training\n(Large corpus)\nBillions of tokens"] --> B["π€ Base Model\nGeneral knowledge"]
B --> C["π― Fine-tuning\n(Task-specific data)\nThousands of examples"]
C --> D["β Specialized Model\nDomain expert"]
style A fill:#e3f2fd
style B fill:#fff9c4
style C fill:#c8e6c9
style D fill:#a5d6a7
Transfer Learning Paradigm untuk LLM
Key Insight: Model sudah belajar general language understanding dari pre-training. Kita hanya perlu adapt untuk task spesifik!
9.1.2 Pre-training vs Fine-tuning
Pre-training:
Data: Massive unlabeled text (entire internet)
Objective: Language modeling (predict next word, masked tokens)
Pre-training = Pendidikan umum (SD sampai SMA) - Belajar berbagai mata pelajaran - Membangun foundational knowledge - Waktu lama, biaya besar
Fine-tuning = Spesialisasi (S1 jurusan tertentu) - Fokus pada domain spesifik - Leverage knowledge dari pendidikan umum - Lebih cepat, biaya lebih rendah
9.1.3 Tantangan Fine-tuning LLM
Computational Challenges:
Model
Parameters
Memory (FP32)
Memory (FP16)
Fine-tuning Time
BERT-Base
110M
440 MB
220 MB
2-4 hours
BERT-Large
340M
1.36 GB
680 MB
8-12 hours
GPT-2 Medium
355M
1.42 GB
710 MB
6-10 hours
GPT-3 Small
1.3B
5.2 GB
2.6 GB
1-2 days
LLaMA-7B
7B
28 GB
14 GB
3-5 days
LLaMA-13B
13B
52 GB
26 GB
5-7 days
Problem yang muncul:
Memory constraints: Tidak semua orang punya GPU dengan 80GB VRAM
Catastrophic forgetting: Model lupa general knowledge saat fine-tuning
Overfitting: Small dataset β model overfit mudah
Cost: Training time = money untuk cloud GPUs
Solution yang akan kita pelajari:
Parameter-Efficient Fine-Tuning (PEFT)
LoRA (Low-Rank Adaptation)
QLoRA (Quantized LoRA)
Gradient checkpointing
Mixed precision training
9.2 Full Fine-tuning vs Parameter-Efficient Fine-tuning
9.2.1 Full Fine-tuning
Konsep: Update seluruh parameter model saat training.
Process:
Load pre-trained model weights
Add task-specific head (classifier/regression layer)
Train semua layers dengan task data
Save updated model weights
import torchfrom transformers import BertForSequenceClassification, BertTokenizer# Contoh Full Fine-tuning structuremodel = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2# Binary classification)# Cek total parameterstotal_params =sum(p.numel() for p in model.parameters())trainable_params =sum(p.numel() for p in model.parameters() if p.requires_grad)print(f"π Total parameters: {total_params:,}")print(f"π― Trainable parameters: {trainable_params:,}")print(f"β Percentage trainable: {100* trainable_params / total_params:.2f}%")
Kelebihan:
Maximum performance (jika data cukup)
Flexibility penuh dalam adaptation
Proven approach dengan banyak best practices
Kekurangan:
Memory intensive (need to store gradients untuk semua parameters)
Storage: Perlu save full model checkpoint untuk setiap task
Slow training
Expensive untuk large models
9.2.2 Parameter-Efficient Fine-tuning (PEFT)
Key Idea: Freeze most of pre-trained parameters, hanya train small subset atau additional parameters.
Keuntungan:
β Drastically reduce memory usage
β Faster training
β Store hanya small adapter weights (few MB vs GBs)
β Avoid catastrophic forgetting
β Easy to switch between tasks
Kategori PEFT Methods:
Code
graph TD A["Parameter-Efficient Fine-Tuning"] --> B["Adapter Modules"] A --> C["Prefix/Prompt Tuning"] A --> D["Low-Rank Adaptation"] A --> E["Selective Fine-tuning"] B --> B1["Houlsby Adapters\n(Serial)"] B --> B2["Parallel Adapters"] C --> C1["Prefix Tuning\n(Add learnable vectors)"] C --> C2["Prompt Tuning\n(Input-level prompts)"] D --> D1["LoRA\n(Most popular!)"] D --> D2["AdaLoRA\n(Adaptive rank)"] E --> E1["BitFit\n(Only bias terms)"] E --> E2["Layer-wise tuning"] style D1 fill:#c8e6c9 style A fill:#e3f2fd
graph TD
A["Parameter-Efficient Fine-Tuning"] --> B["Adapter Modules"]
A --> C["Prefix/Prompt Tuning"]
A --> D["Low-Rank Adaptation"]
A --> E["Selective Fine-tuning"]
B --> B1["Houlsby Adapters\n(Serial)"]
B --> B2["Parallel Adapters"]
C --> C1["Prefix Tuning\n(Add learnable vectors)"]
C --> C2["Prompt Tuning\n(Input-level prompts)"]
D --> D1["LoRA\n(Most popular!)"]
D --> D2["AdaLoRA\n(Adaptive rank)"]
E --> E1["BitFit\n(Only bias terms)"]
E --> E2["Layer-wise tuning"]
style D1 fill:#c8e6c9
style A fill:#e3f2fd
Taxonomy of PEFT Methods
Perbandingan Methods:
Method
Trainable Params
Memory
Speed
Performance
Full Fine-tuning
100%
Very High
Slow
Best
Adapter Modules
~2-5%
Medium
Medium
Good
Prefix Tuning
~0.1-1%
Low
Fast
Good
LoRA
~0.1-1%
Low
Fast
Excellent
BitFit
~0.1%
Very Low
Very Fast
Fair
LoRA adalah sweet spot: low parameters, excellent performance!
9.3 LoRA (Low-Rank Adaptation)
9.3.1 Konsep dan Intuisi
Paper: βLoRA: Low-Rank Adaptation of Large Language Modelsβ (Hu et al., 2021)
Core Insight:
Weight updates during fine-tuning memiliki low intrinsic rank
Tidak perlu update full weight matrix \(W\)
Cukup add small low-rank decomposition \(\Delta W = BA\)
Mathematical Formulation:
Original forward pass:
\[h = W_0 x\]
LoRA forward pass:
\[h = W_0 x + \Delta W x = W_0 x + BAx\]
Where:
\(W_0 \in \mathbb{R}^{d \times k}\): Original pre-trained weights (frozen)
Mari kita implementasikan complete pipeline untuk fine-tuning BERT dengan LoRA!
Klik untuk melihat complete fine-tuning code
import torchfrom transformers import ( AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, DataCollatorWithPadding)from peft import LoraConfig, get_peft_model, TaskTypefrom datasets import load_datasetimport numpy as npfrom sklearn.metrics import accuracy_score, precision_recall_fscore_support# Set random seed for reproducibilitytorch.manual_seed(42)np.random.seed(42)print("π Starting LoRA Fine-tuning Pipeline\n")# ============================================================================# 1. LOAD DATASET# ============================================================================print("π Loading dataset...")# Menggunakan IMDB dataset (50K movie reviews)dataset = load_dataset("imdb")# Ambil subset kecil untuk demo (1000 samples)train_dataset = dataset['train'].shuffle(seed=42).select(range(1000))test_dataset = dataset['test'].shuffle(seed=42).select(range(200))print(f"β Train samples: {len(train_dataset)}")print(f"β Test samples: {len(test_dataset)}")print(f"\nExample: {train_dataset[0]['text'][:100]}...")print(f"Label: {train_dataset[0]['label']} (0=negative, 1=positive)")# ============================================================================# 2. TOKENIZATION# ============================================================================print("\n"+"="*70)print("π€ Tokenizing dataset...")model_name ="bert-base-uncased"tokenizer = AutoTokenizer.from_pretrained(model_name)def tokenize_function(examples):return tokenizer( examples['text'], padding='max_length', truncation=True, max_length=256# Shorter untuk demo )train_dataset = train_dataset.map(tokenize_function, batched=True)test_dataset = test_dataset.map(tokenize_function, batched=True)# Format untuk PyTorchtrain_dataset = train_dataset.rename_column("label", "labels")test_dataset = test_dataset.rename_column("label", "labels")train_dataset.set_format("torch", columns=["input_ids", "attention_mask", "labels"])test_dataset.set_format("torch", columns=["input_ids", "attention_mask", "labels"])print("β Tokenization complete!")# ============================================================================# 3. LOAD MODEL & APPLY LORA# ============================================================================print("\n"+"="*70)print("π€ Loading model and applying LoRA...")model = AutoModelForSequenceClassification.from_pretrained( model_name, num_labels=2)# Configure LoRAlora_config = LoraConfig( task_type=TaskType.SEQ_CLS, r=8, # Rank lora_alpha=16, # Scaling lora_dropout=0.1, target_modules=["query", "value"], bias="none",)# Apply LoRAmodel = get_peft_model(model, lora_config)model.print_trainable_parameters()# ============================================================================# 4. TRAINING ARGUMENTS# ============================================================================print("\n"+"="*70)print("βοΈ Setting up training configuration...")training_args = TrainingArguments( output_dir="./results_lora", learning_rate=2e-4, # Higher LR untuk LoRA per_device_train_batch_size=8, per_device_eval_batch_size=8, num_train_epochs=3, weight_decay=0.01, eval_strategy="epoch", save_strategy="epoch", load_best_model_at_end=True, push_to_hub=False, report_to="none", # Disable wandb/tensorboard untuk demo logging_steps=50,)# ============================================================================# 5. EVALUATION METRICS# ============================================================================def compute_metrics(eval_pred):"""Compute accuracy, precision, recall, F1""" predictions, labels = eval_pred predictions = np.argmax(predictions, axis=1) accuracy = accuracy_score(labels, predictions) precision, recall, f1, _ = precision_recall_fscore_support( labels, predictions, average='binary' )return {'accuracy': accuracy,'precision': precision,'recall': recall,'f1': f1 }# ============================================================================# 6. TRAINER# ============================================================================print("ποΈ Initializing trainer...")trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=test_dataset, tokenizer=tokenizer, data_collator=DataCollatorWithPadding(tokenizer=tokenizer), compute_metrics=compute_metrics,)# ============================================================================# 7. TRAINING# ============================================================================print("\n"+"="*70)print("π― Starting training...\n")# Evaluate before trainingprint("π BEFORE Fine-tuning:")eval_results_before = trainer.evaluate()print(f"Accuracy: {eval_results_before['eval_accuracy']:.4f}")print(f"F1 Score: {eval_results_before['eval_f1']:.4f}")# Trainprint("\nπ Training in progress...")train_results = trainer.train()# Evaluate after trainingprint("\nπ AFTER Fine-tuning:")eval_results_after = trainer.evaluate()print(f"Accuracy: {eval_results_after['eval_accuracy']:.4f}")print(f"F1 Score: {eval_results_after['eval_f1']:.4f}")# ============================================================================# 8. IMPROVEMENT SUMMARY# ============================================================================print("\n"+"="*70)print("π IMPROVEMENT SUMMARY:")print("="*70)accuracy_improvement = (eval_results_after['eval_accuracy'] - eval_results_before['eval_accuracy']) *100f1_improvement = (eval_results_after['eval_f1'] - eval_results_before['eval_f1']) *100print(f"Accuracy improvement: +{accuracy_improvement:.2f}%")print(f"F1 Score improvement: +{f1_improvement:.2f}%")print(f"\nTraining time: {train_results.metrics['train_runtime']:.2f} seconds")print(f"Samples/second: {train_results.metrics['train_samples_per_second']:.2f}")# ============================================================================# 9. SAVE MODEL# ============================================================================print("\n"+"="*70)print("πΎ Saving model...")model.save_pretrained("./lora_sentiment_model")tokenizer.save_pretrained("./lora_sentiment_model")print("β Model saved to ./lora_sentiment_model")print(f" Size: Only LoRA adapters (~few MB) instead of full model (~440MB)")
Output yang diharapkan:
trainable params: 296,448 || all params: 109,779,714 || trainable%: 0.27%
BEFORE Fine-tuning:
Accuracy: 0.5200
F1 Score: 0.5150
AFTER Fine-tuning:
Accuracy: 0.8850
F1 Score: 0.8820
Improvement: +36.5% accuracy, +36.7% F1
9.3.4 Loading dan Menggunakan LoRA Model
Setelah training, bagaimana cara load dan use model?
from transformers import AutoModelForSequenceClassification, AutoTokenizerfrom peft import PeftModelimport torch# Load base modelbase_model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")# Load LoRA adaptersmodel = PeftModel.from_pretrained(base_model, "./lora_sentiment_model")# Load tokenizertokenizer = AutoTokenizer.from_pretrained("./lora_sentiment_model")# Inference functiondef predict_sentiment(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)with torch.no_grad(): outputs = model(**inputs) predictions = torch.softmax(outputs.logits, dim=-1) sentiment ="Positive π"if predictions[0][1] >0.5else"Negative π" confidence = predictions[0][1].item() if predictions[0][1] >0.5else predictions[0][0].item()return sentiment, confidence# Testtest_texts = ["This movie was absolutely fantastic! I loved every minute of it.","Terrible film. Waste of time and money.","It was okay, nothing special but not terrible either."]print("π¬ Sentiment Analysis Results:\n")for text in test_texts: sentiment, conf = predict_sentiment(text)print(f"Text: {text}")print(f"Prediction: {sentiment} (confidence: {conf:.2%})\n")
9.4 QLoRA: Quantized LoRA
9.4.1 Konsep Quantization
Problem: Even dengan LoRA, memory requirement untuk large models masih tinggi karena base model weights.
Solution: Quantization - Represent weights dengan lower precision.
LoRA biasanya butuh higher learning rate daripada full fine-tuning
Recommended: 1e-4 to 5e-4 (vs 1e-5 to 5e-5 untuk full fine-tuning)
Alasan: Fewer parameters = less prone to overfitting
Rank Selection:
import pandas as pdimport matplotlib.pyplot as plt# Performance vs Rank (hypothetical data untuk ilustrasi)ranks = [2, 4, 8, 16, 32, 64]performance = [82.5, 85.2, 88.1, 89.3, 89.5, 89.6] # Accuracy %params_mb = [0.5, 1.0, 2.0, 4.0, 8.0, 16.0] # Size in MBfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))# Performance vs Rankax1.plot(ranks, performance, marker='o', linewidth=2, markersize=8, color='#2196f3')ax1.axhline(y=89.0, color='green', linestyle='--', label='Target Performance', alpha=0.7)ax1.axvline(x=8, color='red', linestyle='--', label='Sweet Spot (r=8)', alpha=0.7)ax1.set_xlabel('LoRA Rank (r)', fontsize=12, fontweight='bold')ax1.set_ylabel('Test Accuracy (%)', fontsize=12, fontweight='bold')ax1.set_title('Performance vs LoRA Rank', fontsize=14, fontweight='bold')ax1.grid(alpha=0.3)ax1.legend()ax1.set_ylim([80, 92])# Size vs Rankax2.plot(ranks, params_mb, marker='s', linewidth=2, markersize=8, color='#ff9800')ax2.fill_between(ranks, params_mb, alpha=0.3, color='#ff9800')ax2.set_xlabel('LoRA Rank (r)', fontsize=12, fontweight='bold')ax2.set_ylabel('Adapter Size (MB)', fontsize=12, fontweight='bold')ax2.set_title('Storage Cost vs LoRA Rank', fontsize=14, fontweight='bold')ax2.grid(alpha=0.3)plt.tight_layout()plt.show()print("π Rank Selection Guidelines:")print("="*60)print("r=4-8: Good starting point, efficient")print("r=16: Better performance, reasonable cost")print("r=32+: Marginal gains, higher cost")print("\nπ‘ Recommendation: Start dengan r=8, increase jika underperforming")
Target Modules:
Different untuk different architectures:
# Target modules untuk different model architecturestarget_configs = {"BERT": {"minimal": ["query", "value"],"recommended": ["query", "key", "value"],"full": ["query", "key", "value", "dense"] },"GPT-2": {"minimal": ["c_attn"],"recommended": ["c_attn", "c_proj"],"full": ["c_attn", "c_proj", "c_fc"] },"LLaMA": {"minimal": ["q_proj", "v_proj"],"recommended": ["q_proj", "k_proj", "v_proj"],"full": ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"] }}print("π― Target Modules by Architecture:\n")for model, configs in target_configs.items():print(f"{'='*60}\n{model}:")for level, modules in configs.items():print(f" {level:12s}: {modules}")print()
9.5.2 Avoiding Common Pitfalls
β οΈ Common Mistakes
1. Too Small Learning Rate - β Problem: Using same LR as full fine-tuning (1e-5) - β Solution: Use 10x higher LR (1e-4 to 5e-4)
2. Wrong Target Modules - β Problem: Applying LoRA to wrong layers atau using wrong module names - β Solution: Check model architecture, use model.named_modules() to verify
3. Forgetting to Prepare Model - β Problem: Not calling prepare_model_for_kbit_training() untuk QLoRA - β Solution: Always call setelah loading quantized model
4. Insufficient Data - β Problem: Fine-tuning dengan <100 examples - β Solution: Aim untuk 1000+ examples, atau use data augmentation
5. No Validation Set - β Problem: No way to detect overfitting - β Solution: Always split train/val, monitor validation metrics
9.5.3 Debugging dan Monitoring
Sanity Checks:
def check_lora_model(model):"""Comprehensive checks untuk LoRA model"""print("π LoRA Model Diagnostic\n")print("="*70)# 1. Check trainable parameters total_params =sum(p.numel() for p in model.parameters()) trainable_params =sum(p.numel() for p in model.parameters() if p.requires_grad)print(f"1οΈβ£ Parameter Count:")print(f" Total: {total_params:,}")print(f" Trainable: {trainable_params:,} ({100*trainable_params/total_params:.2f}%)")# 2. Check if LoRA modules existprint(f"\n2οΈβ£ LoRA Modules:") lora_modules = [name for name, module in model.named_modules()if'lora'in name.lower()]print(f" Found {len(lora_modules)} LoRA modules")iflen(lora_modules) >0:print(f" Examples: {lora_modules[:3]}")# 3. Check deviceprint(f"\n3οΈβ£ Device:") device =next(model.parameters()).deviceprint(f" Model on: {device}")# 4. Check dtypeprint(f"\n4οΈβ£ Data Types:") dtypes =set([p.dtype for p in model.parameters()])print(f" Found dtypes: {dtypes}")# 5. Gradient checkprint(f"\n5οΈβ£ Gradient Status:") grad_enabled =sum(1for p in model.parameters() if p.requires_grad)print(f" Parameters with grad: {grad_enabled}")print("\n"+"="*70)# Warningsif trainable_params / total_params >0.05:print("β οΈ Warning: >5% parameters trainable. Sure you want LoRA?")if trainable_params / total_params <0.001:print("β οΈ Warning: <0.1% parameters trainable. Rank too low?")print("β Diagnostic complete!")# Example usage (jika model sudah ada):# check_lora_model(model)
9.5.4 Performance Optimization Tips
optimization_tips ="""π PERFORMANCE OPTIMIZATION TIPS1. GRADIENT ACCUMULATION - Batch size terbatas karena GPU memory? - Use gradient_accumulation_steps untuk simulate larger batch - Example: batch_size=4, accumulation=8 β effective_batch=322. MIXED PRECISION TRAINING (FP16/BF16) - Set fp16=True atau bf16=True dalam TrainingArguments - ~2x speedup + ~2x memory reduction - BF16 better untuk stability (jika GPU support)3. GRADIENT CHECKPOINTING - Trade computation untuk memory - Enable: model.gradient_checkpointing_enable() - ~30% slower tapi ~50% less memory4. DATA LOADING - Set num_workers>0 dalam DataLoader - Use pin_memory=True untuk faster GPU transfer - Preprocess dataset sebelum training (map dengan batched=True)5. COMPILATION (PyTorch 2.0+) - model = torch.compile(model) - ~20-30% speedup dengan minimal effort6. EFFICIENT ATTENTION - Use Flash Attention (jika available) - Set use_flash_attention_2=True saat load model - ~2-3x faster untuk long sequences"""print(optimization_tips)
9.6 Advanced Topics
9.6.1 Multi-task LoRA
Satu base model, multiple task-specific LoRA adapters!
from peft import PeftModel# Scenario: Fine-tune untuk berbagai tasks# 1. Sentiment analysis# 2. Question answering# 3. Named entity recognition# Train separate LoRA for each task# lora_sentiment = train_lora(task="sentiment")# lora_qa = train_lora(task="qa")# lora_ner = train_lora(task="ner")# At inference, load appropriate adapterdef load_task_model(base_model, task):"""Load LoRA adapter untuk specific task""" adapter_path =f"./lora_adapters/{task}" model = PeftModel.from_pretrained(base_model, adapter_path)return model# Benefit:# - Single base model (~440MB)# - Multiple adapters (~2MB each)# - Total storage: 440MB + 3Γ2MB = 446MB# vs Full fine-tuning: 440MB Γ 3 = 1.32GBprint("π‘ Multi-task LoRA Benefits:")print(" β’ Storage efficient: Share base model")print(" β’ Memory efficient: Load only needed adapter")print(" β’ Easy to experiment: Train new task without affecting others")
9.6.2 LoRA Merging
Merge LoRA weights ke base model untuk faster inference!
# After training, merge LoRA adapters into base model# This eliminates the adapter overhead at inference# Pseudo-code (konsep):# merged_model = base_model.merge_and_unload()# merged_model.save_pretrained("./merged_model")# Benefits:# β No adapter overhead at inference# β Faster inference (same speed as base model)# β Need full model storage (not just adapter)print("π LoRA Merging:")print(" Development: Use separate adapters (flexible, small)")print(" Production: Merge untuk best inference speed")
9.6.3 AdaLoRA: Adaptive Rank
Automatically determine optimal rank untuk each module!
from peft import AdaLoraConfig, get_peft_model# AdaLoRA configadalora_config = AdaLoraConfig( r=8, # Initial rank target_r=4, # Target average rank init_r=12, # Initial rank for each adapter tinit=200, # Steps before pruning tfinal=1000, # Steps to reach target rank delta_t=10, # Steps between rank updates lora_alpha=32, lora_dropout=0.1, target_modules=["query", "value"], task_type="SEQ_CLS")# Apply AdaLoRA# model = get_peft_model(base_model, adalora_config)# How it works:# 1. Start dengan rank r=12 untuk semua modules# 2. During training, compute importance score untuk each module# 3. Gradually reduce rank untuk less important modules# 4. End dengan average rank ~4, tapi important modules bisa retain higher rankprint("π― AdaLoRA Advantages:")print(" β’ Automatically optimize rank per module")print(" β’ Better parameter efficiency")print(" β’ Comparable atau better performance vs fixed-rank LoRA")
9.7 Evaluation dan Comparison
9.7.1 Comparing Fine-tuning Methods
Mari kita bandingkan different approaches secara comprehensive:
decision_guide ="""π― FINE-TUNING METHOD SELECTION GUIDEββ Do you have LARGE dataset (>10K samples)?βββ YES β Do you have GPU with >16GB VRAM?β ββ ββ YES β Consider FULL FINE-TUNINGβ β β’ Best performanceβ β β’ Worth the cost untuk productionβ ββ ββ NO β Use LoRA (r=16-32)β β’ Good performanceβ β’ Manageable memoryβββ NO β Small dataset (<10K samples) β ββ Do you have GPU with >8GB VRAM? β β β ββ YES β Use LoRA (r=8-16) β β β’ Prevents overfitting β β β’ Fast training β β β ββ NO β Use QLoRA (4-bit) β β’ Minimal memory β β’ Accessible untuk consumer GPU β ββ VERY small dataset (<1K) β Consider: β’ Data augmentation β’ Few-shot learning β’ Prompt engineering instead of fine-tuningπ‘ SPECIAL CASES:β’ Multiple tasks? β Multi-task LoRAβ’ Need fastest inference? β Merge LoRA after trainingβ’ Extremely limited resources? β BitFit atau Prefix Tuningβ’ Research/experimentation? β LoRA (flexible, fast iteration)"""print(decision_guide)
9.8 Praktik Terbaik: Complete Workflow
9.8.1 End-to-End Fine-tuning Pipeline
Mari kita buat production-ready pipeline untuk fine-tuning dengan best practices!
Klik untuk melihat complete production pipeline
import osimport jsonfrom datetime import datetimefrom pathlib import Pathimport torchimport numpy as npfrom transformers import ( AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, EarlyStoppingCallback,)from peft import LoraConfig, get_peft_model, TaskTypefrom datasets import load_datasetfrom sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matriximport matplotlib.pyplot as pltimport seaborn as sns# ============================================================================# CONFIGURATION# ============================================================================class Config:"""Centralized configuration"""# Model model_name ="bert-base-uncased" task_name ="sentiment_analysis"# LoRA lora_r =8 lora_alpha =16 lora_dropout =0.1 lora_target_modules = ["query", "value"]# Training learning_rate =2e-4 batch_size =16 num_epochs =3 warmup_ratio =0.1 weight_decay =0.01# Data max_length =128 train_samples =1000 test_samples =200# Paths output_dir =f"./models/{task_name}_{datetime.now().strftime('%Y%m%d_%H%M%S')}"# Misc seed =42 fp16 = torch.cuda.is_available()def save(self, path):"""Save config to JSON""" Path(path).parent.mkdir(parents=True, exist_ok=True)withopen(path, 'w') as f: json.dump(self.__dict__, f, indent=2, default=str)config = Config()# ============================================================================# SETUP# ============================================================================def set_seed(seed):"""Set random seeds untuk reproducibility""" torch.manual_seed(seed) np.random.seed(seed)if torch.cuda.is_available(): torch.cuda.manual_seed_all(seed)set_seed(config.seed)print("π Production Fine-tuning Pipeline")print("="*70)print(f"Task: {config.task_name}")print(f"Model: {config.model_name}")print(f"Output: {config.output_dir}")print("="*70)# ============================================================================# DATA LOADING & PREPROCESSING# ============================================================================print("\nπ Loading and preprocessing data...")dataset = load_dataset("imdb")train_data = dataset['train'].shuffle(seed=config.seed).select(range(config.train_samples))test_data = dataset['test'].shuffle(seed=config.seed).select(range(config.test_samples))tokenizer = AutoTokenizer.from_pretrained(config.model_name)def preprocess_function(examples):return tokenizer( examples['text'], padding='max_length', truncation=True, max_length=config.max_length )train_dataset = train_data.map(preprocess_function, batched=True)test_dataset = test_data.map(preprocess_function, batched=True)train_dataset = train_dataset.rename_column("label", "labels")test_dataset = test_dataset.rename_column("label", "labels")train_dataset.set_format("torch", columns=["input_ids", "attention_mask", "labels"])test_dataset.set_format("torch", columns=["input_ids", "attention_mask", "labels"])print(f"β Train samples: {len(train_dataset)}")print(f"β Test samples: {len(test_dataset)}")# ============================================================================# MODEL SETUP# ============================================================================print("\nπ€ Setting up model with LoRA...")model = AutoModelForSequenceClassification.from_pretrained( config.model_name, num_labels=2)lora_config = LoraConfig( task_type=TaskType.SEQ_CLS, r=config.lora_r, lora_alpha=config.lora_alpha, lora_dropout=config.lora_dropout, target_modules=config.lora_target_modules, bias="none",)model = get_peft_model(model, lora_config)model.print_trainable_parameters()# ============================================================================# METRICS# ============================================================================def compute_metrics(eval_pred):"""Comprehensive metrics""" predictions, labels = eval_pred predictions = np.argmax(predictions, axis=1) accuracy = accuracy_score(labels, predictions) precision, recall, f1, _ = precision_recall_fscore_support( labels, predictions, average='binary' )return {'accuracy': accuracy,'precision': precision,'recall': recall,'f1': f1 }# ============================================================================# TRAINING SETUP# ============================================================================print("\nβοΈ Configuring training...")training_args = TrainingArguments( output_dir=config.output_dir, learning_rate=config.learning_rate, per_device_train_batch_size=config.batch_size, per_device_eval_batch_size=config.batch_size, num_train_epochs=config.num_epochs, weight_decay=config.weight_decay, warmup_ratio=config.warmup_ratio,# Evaluation eval_strategy="epoch", save_strategy="epoch", load_best_model_at_end=True, metric_for_best_model="f1",# Optimization fp16=config.fp16, gradient_accumulation_steps=2,# Logging logging_dir=f"{config.output_dir}/logs", logging_steps=50, report_to="none",# Misc seed=config.seed, push_to_hub=False,)trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=test_dataset, tokenizer=tokenizer, compute_metrics=compute_metrics, callbacks=[EarlyStoppingCallback(early_stopping_patience=2)])# ============================================================================# TRAINING# ============================================================================print("\nποΈ Training...\n")# Baseline evaluationprint("π Baseline (before fine-tuning):")baseline_metrics = trainer.evaluate()for key, value in baseline_metrics.items():if key.startswith('eval_'):print(f" {key[5:]}: {value:.4f}")# Traintrain_result = trainer.train()# Final evaluationprint("\nπ Final (after fine-tuning):")final_metrics = trainer.evaluate()for key, value in final_metrics.items():if key.startswith('eval_'):print(f" {key[5:]}: {value:.4f}")# ============================================================================# SAVE RESULTS# ============================================================================print("\nπΎ Saving model and results...")# Save modelmodel.save_pretrained(f"{config.output_dir}/lora_adapter")tokenizer.save_pretrained(f"{config.output_dir}/lora_adapter")# Save configconfig.save(f"{config.output_dir}/config.json")# Save metricsmetrics_history = {'baseline': baseline_metrics,'final': final_metrics,'training': train_result.metrics}withopen(f"{config.output_dir}/metrics.json", 'w') as f: json.dump(metrics_history, f, indent=2)print(f"β Saved to {config.output_dir}")# ============================================================================# VISUALIZATION# ============================================================================print("\nπ Generating visualizations...")fig, axes = plt.subplots(1, 2, figsize=(14, 5))# Metrics comparisonmetrics_names = ['accuracy', 'precision', 'recall', 'f1']baseline_values = [baseline_metrics[f'eval_{m}'] for m in metrics_names]final_values = [final_metrics[f'eval_{m}'] for m in metrics_names]x = np.arange(len(metrics_names))width =0.35axes[0].bar(x - width/2, baseline_values, width, label='Baseline', color='coral', alpha=0.7)axes[0].bar(x + width/2, final_values, width, label='Fine-tuned', color='green', alpha=0.7)axes[0].set_ylabel('Score', fontweight='bold')axes[0].set_title('Performance Comparison', fontweight='bold')axes[0].set_xticks(x)axes[0].set_xticklabels([m.capitalize() for m in metrics_names])axes[0].legend()axes[0].grid(axis='y', alpha=0.3)axes[0].set_ylim([0, 1])# Improvement percentagesimprovements = [(f - b) *100for b, f inzip(baseline_values, final_values)]colors = ['green'if i >0else'red'for i in improvements]axes[1].barh(metrics_names, improvements, color=colors, alpha=0.7)axes[1].set_xlabel('Improvement (%)', fontweight='bold')axes[1].set_title('Performance Gains', fontweight='bold')axes[1].grid(axis='x', alpha=0.3)axes[1].axvline(x=0, color='black', linestyle='-', linewidth=0.8)for i, (metric, imp) inenumerate(zip(metrics_names, improvements)): axes[1].text(imp +1, i, f'+{imp:.1f}%'if imp >0elsef'{imp:.1f}%', va='center', fontweight='bold')plt.tight_layout()plt.savefig(f"{config.output_dir}/results.png", dpi=150, bbox_inches='tight')plt.show()print(f"β Saved visualization to {config.output_dir}/results.png")# ============================================================================# SUMMARY# ============================================================================print("\n"+"="*70)print("β¨ FINE-TUNING COMPLETE!")print("="*70)print(f"\nπ Output Directory: {config.output_dir}")print(f" βββ lora_adapter/ (LoRA weights)")print(f" βββ config.json (Training configuration)")print(f" βββ metrics.json (Performance metrics)")print(f" βββ results.png (Visualizations)")print(f"\nπ Key Results:")print(f" Accuracy: {baseline_metrics['eval_accuracy']:.3f} β {final_metrics['eval_accuracy']:.3f} (+{(final_metrics['eval_accuracy']-baseline_metrics['eval_accuracy'])*100:.1f}%)")print(f" F1 Score: {baseline_metrics['eval_f1']:.3f} β {final_metrics['eval_f1']:.3f} (+{(final_metrics['eval_f1']-baseline_metrics['eval_f1'])*100:.1f}%)")print(f"\nβ±οΈ Training Time: {train_result.metrics['train_runtime']:.2f} seconds")print(f"πΎ Model Size: ~{config.lora_r *2} MB (LoRA adapters only)")print("\nπ Ready for deployment!")
π§ͺ Hands-on Exercise
Objektif: Implement dan compare LoRA fine-tuning dengan different ranks
Instruksi:
Setup Environment
pip install transformers datasets peft torch
Task: Fine-tune BERT untuk sentiment analysis (IMDB dataset)
Experiments: Train dengan different LoRA ranks
Experiment 1: r=4
Experiment 2: r=8
Experiment 3: r=16
Compare:
Training time
Model size
Test accuracy
F1 score
Analysis:
Buat plot performance vs rank
Determine optimal rank untuk task ini
Explain tradeoffs
Deliverables:
Jupyter notebook dengan complete code
Performance comparison table
Visualizations
Written analysis (200-300 words)
Bonus:
Try QLoRA (4-bit quantization)
Implement early stopping
Try different target modules
π Review Questions
Conceptual Questions
Jelaskan perbedaan fundamental antara pre-training dan fine-tuning dalam konteks LLM. Mengapa fine-tuning lebih praktis daripada training from scratch?
Apa yang dimaksud dengan βlow-rankβ dalam LoRA? Jelaskan intuisi matematika di balik mengapa weight updates memiliki low intrinsic rank.
Bandingkan Full Fine-tuning vs LoRA. Dalam situasi apa Anda akan memilih masing-masing approach?
Jelaskan konsep quantization dalam QLoRA. Bagaimana 4-bit quantization memungkinkan fine-tuning model besar di GPU dengan memory terbatas?
Apa yang dimaksud dengan βcatastrophic forgettingβ? Bagaimana PEFT methods seperti LoRA membantu mengurangi masalah ini?
Practical Questions
Jika BERT-Base memiliki 110M parameters dan Anda apply LoRA dengan r=8 pada query dan value matrices di semua 12 layers, berapa total trainable parameters?
Anda punya dataset dengan 500 labeled samples untuk text classification. Metode fine-tuning apa yang Anda rekomendasikan dan mengapa?
Jelaskan parameter lora_alpha dalam LoRA config. Bagaimana Anda akan set nilai ini relative terhadap rank?
Anda observe bahwa model overfit pada training set (train acc = 95%, test acc = 65%). Apa actions yang bisa Anda lakukan dalam konteks LoRA fine-tuning?
Untuk production deployment, lebih baik save LoRA adapters terpisah atau merge ke base model? Jelaskan tradeoffs.
π― Key Takeaways
β Transfer Learning memungkinkan kita leverage pre-trained LLMs tanpa training from scratch
β LoRA adalah parameter-efficient method yang achieve ~99% performance dengan <1% trainable parameters
β QLoRA combines quantization dengan LoRA untuk fine-tune model besar di consumer hardware
β Rank (r) adalah critical hyperparameter: balance antara expressiveness dan efficiency
β Multi-task LoRA allows sharing base model across different tasks dengan separate adapters
β Best practices: Start dengan r=8, use higher LR (1e-4), monitor validation metrics
π References dan Further Reading
Papers
LoRA: Hu et al. (2021). βLoRA: Low-Rank Adaptation of Large Language Modelsβ. arXiv:2106.09685
QLoRA: Dettmers et al. (2023). βQLoRA: Efficient Finetuning of Quantized LLMsβ. arXiv:2305.14314
AdaLoRA: Zhang et al. (2023). βAdaptive Budget Allocation for Parameter-Efficient Fine-Tuningβ. arXiv:2303.10512
Prefix Tuning: Li & Liang (2021). βPrefix-Tuning: Optimizing Continuous Prompts for Generationβ. arXiv:2101.00190
Libraries & Tools
Hugging Face PEFT: https://github.com/huggingface/peft
Hugging Face Transformers: https://huggingface.co/docs/transformers
Hugging Face PEFT Documentation: https://huggingface.co/docs/peft
LoRA Tutorial: https://huggingface.co/blog/lora
QLoRA Blog Post: https://huggingface.co/blog/4bit-transformers-bitsandbytes
β¨ Kesimpulan
Anda sekarang memiliki pemahaman mendalam tentang fine-tuning Large Language Models:
Fundamental concepts: pre-training vs fine-tuning
Parameter-Efficient Fine-Tuning (PEFT) methods
LoRA dan QLoRA implementation
Best practices untuk production deployment
Evaluation dan comparison frameworks
Di Bab 10 (RAG & AI Agents), kita akan belajar bagaimana menggunakan fine-tuned LLMs dalam sistem yang lebih kompleks: Retrieval-Augmented Generation dan autonomous agents yang dapat interact dengan external tools dan knowledge bases! π