RAG adalah teknik yang mengkombinasikan retrieval (pencarian informasi dari knowledge base) dengan generation (LLM text generation) untuk menghasilkan jawaban yang lebih akurat, factual, dan dapat diverifikasi.
Arsitektur RAG:
Code
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#4A90E2', 'primaryTextColor': '#333', 'primaryBorderColor': '#2E5C8A', 'lineColor': '#2E5C8A', 'secondaryColor': '#50C878', 'tertiaryColor': '#FFD700'}}}%%graph TB A[User Query] --> B[Query Embedding] B --> C[Vector Search] D[Document Corpus] --> E[Text Chunking] E --> F[Embedding Generation] F --> G[(Vector Database)] C --> G G --> H[Retrieve Top-K<br/>Relevant Chunks] H --> I[Construct Prompt] A --> I I --> J[LLM Generation] J --> K[Response with<br/>Source Citations] style A fill:#4A90E2,stroke:#2E5C8A,stroke-width:2px,color:#fff style K fill:#50C878,stroke:#2E8B57,stroke-width:2px,color:#fff style G fill:#FFD700,stroke:#FFA500,stroke-width:2px,color:#333 style J fill:#FF6B6B,stroke:#C92A2A,stroke-width:2px,color:#fff
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#4A90E2', 'primaryTextColor': '#333', 'primaryBorderColor': '#2E5C8A', 'lineColor': '#2E5C8A', 'secondaryColor': '#50C878', 'tertiaryColor': '#FFD700'}}}%%
graph TB
A[User Query] --> B[Query Embedding]
B --> C[Vector Search]
D[Document Corpus] --> E[Text Chunking]
E --> F[Embedding Generation]
F --> G[(Vector Database)]
C --> G
G --> H[Retrieve Top-K<br/>Relevant Chunks]
H --> I[Construct Prompt]
A --> I
I --> J[LLM Generation]
J --> K[Response with<br/>Source Citations]
style A fill:#4A90E2,stroke:#2E5C8A,stroke-width:2px,color:#fff
style K fill:#50C878,stroke:#2E8B57,stroke-width:2px,color:#fff
style G fill:#FFD700,stroke:#FFA500,stroke-width:2px,color:#333
style J fill:#FF6B6B,stroke:#C92A2A,stroke-width:2px,color:#fff
Example: “How do I reset my password?” → Retrieve from internal docs
2. Legal Document Analysis ⚖️:
Corpus: Case law, regulations, contracts
Example: “Find precedents for patent infringement”
3. Medical Q&A 🏥:
Database: Medical literature, clinical guidelines
Example: “Treatment options for Type 2 diabetes”
4. Code Documentation 💻:
Codebase + docs retrieval
Example: “How to authenticate API requests in our system?”
5. Academic Research 📚:
Literature search + summarization
Example: “Recent advances in quantum computing 2024”
10.2 Embeddings & Vector Representations
10.2.1 Apa itu Embeddings?
Definisi:
Embedding adalah representasi vektor (array of numbers) dari text, image, atau data lain dalam high-dimensional space, dimana semantic similarity tercermin sebagai geometric proximity.
from sentence_transformers import SentenceTransformerimport numpy as np# Load pre-trained modelmodel = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')# Example sentencessentences = ["Machine learning adalah subset dari artificial intelligence","Deep learning menggunakan neural networks dengan banyak layer","Python adalah bahasa pemrograman populer untuk data science","Jakarta adalah ibu kota Indonesia"]# Generate embeddingsembeddings = model.encode(sentences)print(f"Embedding shape: {embeddings.shape}") # (4, 384)print(f"First embedding (truncated):\n{embeddings[0][:10]}")# Compute similarityfrom sklearn.metrics.pairwise import cosine_similaritysimilarity_matrix = cosine_similarity(embeddings)print("\nSimilarity Matrix:")print(similarity_matrix)
Output:
Embedding shape: (4, 384)
First embedding (truncated):
[ 0.0234 -0.1234 0.5678 ... ]
Similarity Matrix:
[[1.000 0.812 0.456 0.123] # Sent 1 vs all
[0.812 1.000 0.489 0.098] # Sent 2 vs all
[0.456 0.489 1.000 0.156] # Sent 3 vs all
[0.123 0.098 0.156 1.000]] # Sent 4 vs all
Interpretasi:
Sentence 1 dan 2 (ML/DL) sangat similar (0.812)
Sentence 4 (Jakarta) paling berbeda dari semua (low similarity)
OpenAI text-embedding-ada-002 (1536-dim): Best quality, API cost
Normalization:
Always normalize vectors untuk cosine similarity
sklearn.preprocessing.normalize() atau manual L2 norm
10.3 Vector Databases
10.3.1 Mengapa Butuh Vector Database?
Problem Statement:
Bayangkan Anda punya 1 million documents. Untuk setiap query:
Compute cosine similarity dengan semua 1M vectors
Sort untuk find top-K
Time complexity: O(N × D) dimana N=documents, D=dimensions
Result: 🐌 SANGAT LAMBAT!
Solusi: Vector Database dengan ANN (Approximate Nearest Neighbors)
10.3.2 Pilihan Vector Databases
Database
Type
Best For
Pros
Cons
FAISS
Library
Research, prototyping
Fast, free, Facebook-backed
Not distributed
ChromaDB
Embedded
Small-medium apps
Easy, Python-native
Limited scale
Pinecone
Cloud
Production apps
Managed, scalable
Cost, vendor lock-in
Weaviate
Self-hosted
Enterprise
Open source, GraphQL
Complex setup
Milvus
Self-hosted
Large scale
Distributed, fast
Requires infrastructure
Qdrant
Self-hosted
Modern apps
Rust-based, fast
Newer, smaller community
10.3.3 Implementasi dengan FAISS
Installation:
pip install faiss-cpu # or faiss-gpu for GPU support
Example: Building a simple vector search:
Code
import faissimport numpy as npfrom sentence_transformers import SentenceTransformer# Sample documentsdocuments = ["Machine learning adalah cabang dari AI yang fokus pada pembelajaran dari data","Deep learning menggunakan neural networks dengan banyak hidden layers","Natural language processing memproses dan memahami bahasa manusia","Computer vision memungkinkan komputer memahami gambar dan video","Reinforcement learning belajar melalui trial and error dengan rewards","Transfer learning memanfaatkan model pre-trained untuk task baru","Ensemble methods menggabungkan multiple models untuk hasil lebih baik","Python adalah bahasa pemrograman populer untuk data science",]# Generate embeddingsmodel = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')embeddings = model.encode(documents)# Get embedding dimensiond = embeddings.shape[1] # 384 for all-MiniLM-L6-v2print(f"Embedding dimension: {d}")# Create FAISS index# IndexFlatL2: Exact search dengan L2 distance (bisa diganti dengan cosine)index = faiss.IndexFlatL2(d)# Normalize vectors untuk cosine similarityfaiss.normalize_L2(embeddings)# Add vectors to indexindex.add(embeddings.astype('float32'))print(f"Total vectors dalam index: {index.ntotal}")# Queryquery ="Apa itu neural networks?"query_embedding = model.encode([query])faiss.normalize_L2(query_embedding)# Search for top-3 most similar documentsk =3distances, indices = index.search(query_embedding.astype('float32'), k)print(f"\nQuery: '{query}'")print("\nTop-3 hasil retrieval:")for i, (idx, dist) inenumerate(zip(indices[0], distances[0])):# Convert L2 distance to similarity score similarity =1- (dist /2) # Normalized L2 to cosine similarityprint(f"{i+1}. [Score: {similarity:.3f}] {documents[idx]}")
Output:
Embedding dimension: 384
Total vectors dalam index: 8
Query: 'Apa itu neural networks?'
Top-3 hasil retrieval:
1. [Score: 0.892] Deep learning menggunakan neural networks dengan banyak hidden layers
2. [Score: 0.734] Machine learning adalah cabang dari AI yang fokus pada pembelajaran dari data
3. [Score: 0.698] Transfer learning memanfaatkan model pre-trained untuk task baru
10.3.4 Implementasi dengan ChromaDB
ChromaDB lebih user-friendly dan persistent:
Code
import chromadbfrom chromadb.config import Settings# Create clientclient = chromadb.Client(Settings( chroma_db_impl="duckdb+parquet", persist_directory="./chroma_db"# Data akan disimpan di sini))# Create collectioncollection = client.create_collection( name="ml_documents", metadata={"description": "Machine learning knowledge base"})# Add documentsdocuments = ["Machine learning adalah cabang dari AI yang fokus pada pembelajaran dari data","Deep learning menggunakan neural networks dengan banyak hidden layers","Natural language processing memproses dan memahami bahasa manusia",]collection.add( documents=documents, ids=["doc1", "doc2", "doc3"], metadatas=[ {"category": "ML", "difficulty": "beginner"}, {"category": "DL", "difficulty": "intermediate"}, {"category": "NLP", "difficulty": "intermediate"} ])# Queryresults = collection.query( query_texts=["Apa itu neural networks?"], n_results=2, where={"difficulty": "intermediate"} # Optional metadata filter)print("Hasil retrieval:")for doc, dist, meta inzip(results['documents'][0], results['distances'][0], results['metadatas'][0]):print(f"[Distance: {dist:.3f}] {doc}")print(f" Metadata: {meta}\n")
💡 Best Practices
Choose the right index type:
IndexFlatL2: Exact search, small datasets (<100K)
IndexIVFFlat: Approximate, medium datasets (100K-1M)
IndexHNSW: Fast approximate, large datasets (>1M)
Batch processing:
Add vectors in batches (e.g., 1000 at a time)
Faster than one-by-one
Persistence:
FAISS: Save/load dengan faiss.write_index() dan faiss.read_index()
ChromaDB: Otomatis persistent ke disk
10.4 Building RAG Systems
10.4.1 Text Chunking Strategies
Mengapa Chunking?
Documents terlalu panjang untuk embed sekaligus
Token limits (e.g., 512 tokens untuk BERT)
Better retrieval granularity
Chunking Methods:
Code
# Method 1: Fixed-size chunkingdef chunk_by_tokens(text, chunk_size=512, overlap=50):""" Split text into fixed-size chunks dengan overlap. Args: text: Input text chunk_size: Number of tokens per chunk overlap: Number of overlapping tokens antar chunk Returns: List of text chunks """from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') tokens = tokenizer.encode(text, add_special_tokens=False) chunks = [] start =0while start <len(tokens): end = start + chunk_size chunk_tokens = tokens[start:end] chunk_text = tokenizer.decode(chunk_tokens) chunks.append(chunk_text) start += (chunk_size - overlap)return chunks# Method 2: Semantic chunking (berdasarkan paragraf/section)def chunk_by_paragraph(text):"""Split by paragraph boundaries.""" paragraphs = text.split('\n\n')return [p.strip() for p in paragraphs if p.strip()]# Method 3: Recursive character splitting (LangChain style)from langchain.text_splitter import RecursiveCharacterTextSplittertext_splitter = RecursiveCharacterTextSplitter( chunk_size=500, chunk_overlap=50, length_function=len, separators=["\n\n", "\n", ". ", " ", ""] # Hierarchy of split points)sample_text ="""Machine learning adalah bidang yang berkembang pesat.Dalam beberapa tahun terakhir, deep learning telah merevolusi berbagai domain.Computer vision kini dapat mengenali objek dengan akurasi superhuman.Natural language processing memungkinkan chatbot yang sangat natural.Ke depannya, AI akan semakin terintegrasi dalam kehidupan sehari-hari."""chunks = text_splitter.split_text(sample_text)print(f"Number of chunks: {len(chunks)}")for i, chunk inenumerate(chunks, 1):print(f"\nChunk {i}:")print(chunk)
Comparison:
Method
Pros
Cons
Best For
Fixed tokens
Consistent size, fast
Might split mid-sentence
Technical docs
Paragraph
Semantic coherence
Variable size
Articles, books
Recursive
Best of both worlds
More complex
General purpose
10.4.2 Complete RAG Pipeline dengan LangChain
LangChain adalah framework populer untuk building LLM applications.
Better recall (lebih banyak relevant docs ditemukan)
Handles both semantic dan exact matches
More robust untuk diverse queries
10.5 AI Agents: Dari RAG ke Autonomous Systems
10.5.1 Apa itu AI Agents?
Definisi:
AI Agent adalah sistem yang dapat menggunakan tools, memory, dan reasoning untuk secara autonomous menyelesaikan tasks complex.
RAG vs. Agents:
Aspect
RAG System
AI Agent
Capability
Retrieve & answer
Reason, plan, act
Tools
None (just retrieval)
Can use tools (calculator, API, etc.)
Memory
Stateless
Can maintain memory
Autonomy
Single-step
Multi-step planning
Example
Q&A chatbot
Personal assistant
Agent Architecture:
Code
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#4A90E2', 'primaryTextColor': '#333', 'primaryBorderColor': '#2E5C8A', 'lineColor': '#2E5C8A', 'secondaryColor': '#50C878', 'tertiaryColor': '#FFD700'}}}%%graph TB A[User Task] --> B[Agent Core<br/>LLM Reasoning] B --> C{Decision} C -->|Need Info| D[Retrieval Tool<br/>RAG/Search] C -->|Need Calculation| E[Calculator Tool] C -->|Need API Data| F[API Tool] C -->|Need Memory| G[Memory Store] D --> H[Observation] E --> H F --> H G --> H H --> B C -->|Task Complete| I[Final Answer] style A fill:#4A90E2,stroke:#2E5C8A,stroke-width:2px,color:#fff style B fill:#FF6B6B,stroke:#C92A2A,stroke-width:2px,color:#fff style I fill:#50C878,stroke:#2E8B57,stroke-width:2px,color:#fff style C fill:#FFD700,stroke:#FFA500,stroke-width:2px,color:#333
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#4A90E2', 'primaryTextColor': '#333', 'primaryBorderColor': '#2E5C8A', 'lineColor': '#2E5C8A', 'secondaryColor': '#50C878', 'tertiaryColor': '#FFD700'}}}%%
graph TB
A[User Task] --> B[Agent Core<br/>LLM Reasoning]
B --> C{Decision}
C -->|Need Info| D[Retrieval Tool<br/>RAG/Search]
C -->|Need Calculation| E[Calculator Tool]
C -->|Need API Data| F[API Tool]
C -->|Need Memory| G[Memory Store]
D --> H[Observation]
E --> H
F --> H
G --> H
H --> B
C -->|Task Complete| I[Final Answer]
style A fill:#4A90E2,stroke:#2E5C8A,stroke-width:2px,color:#fff
style B fill:#FF6B6B,stroke:#C92A2A,stroke-width:2px,color:#fff
style I fill:#50C878,stroke:#2E8B57,stroke-width:2px,color:#fff
style C fill:#FFD700,stroke:#FFA500,stroke-width:2px,color:#333
10.5.2 ReAct Pattern: Reasoning + Acting
ReAct Framework (Yao et al., 2022):
Reason: Think about what to do
Act: Execute an action
Observe: See the result
Repeat until task solved
Example Task: “What’s the weather in Jakarta and should I bring an umbrella?”
Agent Trace:
Thought: I need to get current weather data for Jakarta
Action: weather_api
Action Input: {"city": "Jakarta", "country": "ID"}
Observation: {"temperature": 28, "condition": "rainy", "humidity": 85}
Thought: It's rainy with high humidity. User should bring umbrella.
Action: Final Answer
Action Input: "The current weather in Jakarta is rainy with 28°C and 85% humidity. Yes, you should definitely bring an umbrella!"
10.5.3 Building an Agent dengan LangChain
Code
from langchain.agents import Tool, AgentExecutor, create_react_agentfrom langchain.prompts import PromptTemplatefrom langchain.llms import OpenAIfrom langchain.chains import LLMMathChain# Define toolsllm = OpenAI(temperature=0)# Tool 1: Calculatorllm_math = LLMMathChain.from_llm(llm)calculator = Tool( name="Calculator", func=llm_math.run, description="Useful untuk mathematical calculations. Input harus math expression.")# Tool 2: RAG search (dari section sebelumnya)def rag_search(query: str) ->str:"""Search knowledge base.""" result = qa_chain({"query": query})return result['result']knowledge_base = Tool( name="KnowledgeBase", func=rag_search, description="Useful untuk pertanyaan tentang machine learning concepts. Input harus pertanyaan lengkap.")# Tool 3: Python REPL (optional, untuk code execution)from langchain.utilities import PythonREPLpython_repl = PythonREPL()python_tool = Tool( name="PythonREPL", func=python_repl.run, description="Useful untuk execute Python code. Input harus valid Python code.")# Combine toolstools = [calculator, knowledge_base, python_tool]# Create agent prompttemplate ="""You are a helpful AI assistant. Answer the following questions as best you can. You have access to the following tools:{tools}Use the following format:Question: the input question you must answerThought: you should always think about what to doAction: the action to take, should be one of [{tool_names}]Action Input: the input to the actionObservation: the result of the action... (this Thought/Action/Action Input/Observation can repeat N times)Thought: I now know the final answerFinal Answer: the final answer to the original input questionBegin!Question: {input}Thought: {agent_scratchpad}"""prompt = PromptTemplate.from_template(template)# Create agentagent = create_react_agent(llm, tools, prompt)agent_executor = AgentExecutor( agent=agent, tools=tools, verbose=True, # Print reasoning steps max_iterations=5, handle_parsing_errors=True)# Example queriesqueries = ["Berapa hasil dari 25 * 47 + 138?","Apa itu gradient descent? Lalu hitung derivatif dari x^2 + 3x + 5 di x=2","Generate list 10 angka Fibonacci menggunakan Python, lalu hitung mean-nya"]for query in queries:print(f"\n{'='*60}")print(f"Query: {query}")print(f"{'='*60}") result = agent_executor.invoke({"input": query})print(f"\nFinal Answer: {result['output']}")
Output Example:
============================================================
Query: Berapa hasil dari 25 * 47 + 138?
============================================================
> Entering new AgentExecutor chain...
Thought: I need to perform a mathematical calculation
Action: Calculator
Action Input: 25 * 47 + 138
Observation: 1313
Thought: I now know the final answer
Final Answer: 1313
> Finished chain.
Final Answer: 1313
10.5.4 Agent Memory: Conversation History
Types of Memory:
ConversationBufferMemory: Store semua conversation
ConversationSummaryMemory: Summarize old messages
ConversationBufferWindowMemory: Keep last N messages
VectorStoreRetrieverMemory: Semantic search pada history
Implementation:
Code
from langchain.memory import ConversationBufferMemoryfrom langchain.agents import initialize_agent, AgentType# Create memorymemory = ConversationBufferMemory( memory_key="chat_history", return_messages=True)# Initialize agent dengan memoryagent_with_memory = initialize_agent( tools=tools, llm=llm, agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION, memory=memory, verbose=True)# Multi-turn conversationprint(agent_with_memory.run("Halo, nama saya Budi"))# Output: "Hello Budi! How can I help you today?"print(agent_with_memory.run("Apa itu neural network?"))# Output: *retrieves from knowledge base*print(agent_with_memory.run("Siapa nama saya tadi?"))# Output: "Your name is Budi" (remembers dari conversation)
Limit Tools: Too many tools confuse the agent (max 5-7)
Validation: Always validate tool outputs before using
Error Handling: Gracefully handle tool failures
Cost Control: Set max_iterations untuk avoid runaway loops
10.6 Evaluating RAG Systems
10.6.1 Evaluation Metrics
Challenge: Tidak ada “ground truth” untuk generative tasks!
Solution: Multiple evaluation dimensions
1. Retrieval Quality:
Code
# Precision@K: Berapa persen retrieved docs yang relevan?def precision_at_k(retrieved_docs, relevant_docs, k):""" Calculate Precision@K. Args: retrieved_docs: List of retrieved doc IDs relevant_docs: Set of truly relevant doc IDs k: Number of top results to consider Returns: Precision score [0, 1] """ top_k = retrieved_docs[:k] relevant_in_top_k = [doc for doc in top_k if doc in relevant_docs]returnlen(relevant_in_top_k) / k# Recall@K: Berapa persen relevant docs yang ter-retrieve?def recall_at_k(retrieved_docs, relevant_docs, k):"""Calculate Recall@K.""" top_k = retrieved_docs[:k] relevant_in_top_k = [doc for doc in top_k if doc in relevant_docs]returnlen(relevant_in_top_k) /len(relevant_docs) if relevant_docs else0# Mean Reciprocal Rank (MRR)def mrr(retrieved_docs, relevant_docs):""" Calculate MRR. First relevant doc at position 1 → score = 1 First relevant doc at position 2 → score = 0.5 """for i, doc inenumerate(retrieved_docs, 1):if doc in relevant_docs:return1/ ireturn0# Exampleretrieved = ['doc3', 'doc1', 'doc7', 'doc2', 'doc5']relevant = {'doc1', 'doc2', 'doc4'}print(f"Precision@3: {precision_at_k(retrieved, relevant, 3):.2f}") # 2/3 = 0.67print(f"Recall@3: {recall_at_k(retrieved, relevant, 3):.2f}") # 2/3 = 0.67print(f"MRR: {mrr(retrieved, relevant):.2f}") # 1/2 = 0.50
2. Generation Quality:
Code
# BLEU, ROUGE (traditional metrics)from nltk.translate.bleu_score import sentence_bleufrom rouge import Rougereference ="Machine learning is a subset of artificial intelligence"generated ="Machine learning is part of AI"# BLEU (precision-focused)bleu = sentence_bleu([reference.split()], generated.split())print(f"BLEU: {bleu:.2f}")# ROUGE (recall-focused)rouge = Rouge()scores = rouge.get_scores(generated, reference)print(f"ROUGE-L F1: {scores[0]['rouge-l']['f']:.2f}")
3. Faithfulness (Groundedness):
Apakah generated answer grounded dalam retrieved context?
Code
# Using LLM as judgedef evaluate_faithfulness(context, answer, llm):""" Check if answer is supported by context. Returns: score [0-5], reasoning """ prompt =f"""Given the following context and answer, rate how well the answer is supported by the context on a scale of 0-5:Context: {context}Answer: {answer}Rating (0=not supported at all, 5=fully supported):Reasoning: """ response = llm(prompt)return response# Examplecontext ="Python was created by Guido van Rossum in 1991."answer ="Python was developed in the early 1990s"score = evaluate_faithfulness(context, answer, llm)print(score)
4. Relevance:
Apakah answer relevan dengan question?
Code
def evaluate_relevance(question, answer, llm):"""Rate answer relevance to question.""" prompt =f"""Rate how relevant the answer is to the question (0-5):Question: {question}Answer: {answer}Rating:Reasoning: """return llm(prompt)
10.6.2 End-to-End Evaluation Framework
Code
class RAGEvaluator:"""Comprehensive RAG evaluation."""def__init__(self, rag_chain, llm_judge):self.rag_chain = rag_chainself.llm_judge = llm_judgedef evaluate(self, test_cases):""" Evaluate RAG system pada test cases. Args: test_cases: List of dicts with keys: - 'question': str - 'expected_answer': str (optional) - 'relevant_docs': set (optional) Returns: Evaluation results dataframe """ results = []for case in test_cases:# Run RAG output =self.rag_chain({"query": case['question']}) answer = output['result'] retrieved_docs = [doc.metadata['id'] for doc in output['source_documents']]# Evaluate retrievalif'relevant_docs'in case: precision = precision_at_k(retrieved_docs, case['relevant_docs'], 3) recall = recall_at_k(retrieved_docs, case['relevant_docs'], 3)else: precision, recall =None, None# Evaluate generation context ="\n".join([doc.page_content for doc in output['source_documents']]) faithfulness =self.evaluate_faithfulness(context, answer) relevance =self.evaluate_relevance(case['question'], answer) results.append({'question': case['question'],'answer': answer,'precision@3': precision,'recall@3': recall,'faithfulness': faithfulness,'relevance': relevance })return pd.DataFrame(results)def evaluate_faithfulness(self, context, answer):"""LLM-based faithfulness check."""# Implementation similar to abovepassdef evaluate_relevance(self, question, answer):"""LLM-based relevance check."""# Implementation similar to abovepass# Usagetest_cases = [ {'question': 'Apa itu backpropagation?','relevant_docs': {'doc12', 'doc34'} }, {'question': 'Perbedaan CNN dan RNN?','relevant_docs': {'doc45', 'doc67', 'doc89'} }]evaluator = RAGEvaluator(qa_chain, llm)results = evaluator.evaluate(test_cases)print(results)
Jelaskan perbedaan fundamental antara LLM murni dan RAG system. Kapan Anda akan menggunakan masing-masing?
Mengapa cosine similarity lebih populer daripada Euclidean distance untuk semantic search? Berikan contoh kasus dimana keduanya memberikan hasil berbeda.
Apa trade-offs antara chunking strategies berikut:
Fixed-size tokens (512 tokens)
Paragraph-based
Recursive character splitting
Kapan masing-masing cocok digunakan?
Compare dan contrast:
RAG system dengan retrieval database
AI Agent dengan traditional chatbot
Vector database (FAISS) dengan relational database (PostgreSQL)
Dalam RAG evaluation, mengapa kita butuh multiple metrics (Precision, Recall, Faithfulness, Relevance)? Berikan contoh kasus dimana satu metric tinggi tapi yang lain rendah.
Tugas Praktikum:
Tugas 1: Build Simple RAG System - Input: Collection of 20+ text documents (bisa Wikipedia articles, technical docs, dll.) - Tasks: 1. Implement chunking strategy pilihan Anda 2. Generate embeddings dengan Sentence Transformers 3. Build FAISS index 4. Implement retrieval function 5. Test dengan 5 sample queries - Output:
Python script dengan code lengkap
Report showing query results dengan retrieved chunks
Rubric:
Chunking implementation (20%)
FAISS integration (20%)
Retrieval accuracy (30%)
Code quality & documentation (15%)
Report clarity (15%)
Tugas 2: Compare Embedding Models - Objective: Compare different embedding models untuk Indonesian text - Models to test:
Create test set: 10 queries, each dengan 3 relevant + 7 irrelevant docs
Compute embeddings dengan each model
Evaluate dengan Precision@3, Recall@3, MRR
Analyze trade-offs (speed vs accuracy)
Deliverable:
Comparison table dengan metrics
Visualization (bar charts)
Analysis report (1-2 pages)
Tugas 3: Build RAG System dengan LangChain - Scenario: Academic paper Q&A system - Requirements: 1. Load multiple PDF papers (use PyPDF atau similar) 2. Implement chunking dengan RecursiveCharacterTextSplitter 3. Use ChromaDB untuk persistence 4. Implement QA chain dengan custom prompt 5. Add source citations ke responses - Bonus:
Implement hybrid search (vector + BM25)
Add conversation memory
Build simple Streamlit/Gradio UI
Rubric:
Core functionality (40%)
Source citations (15%)
Code organization (15%)
Documentation (15%)
Bonus features (15%)
Tugas 4: Simple AI Agent - Objective: Build agent yang bisa use multiple tools - Required Tools: 1. Calculator (untuk math) 2. Wikipedia search (untuk facts) 3. Weather API (untuk current weather) - Tasks: 1. Define tool wrappers 2. Create ReAct agent dengan LangChain 3. Test dengan diverse queries: - Pure math: “What’s 234 * 567?” - Fact + math: “Population of Jakarta times 3?” - Multi-step: “What’s the weather in Paris? Is it above average?” - Deliverable:
Agent code
Trace logs showing reasoning steps
Analysis report discussing successes/failures
Rubric:
Tool implementation (30%)
Agent reasoning quality (30%)
Test coverage (20%)
Analysis depth (20%)
Tugas 5: RAG Evaluation Pipeline - Objective: Build evaluation framework untuk RAG system - Requirements: 1. Create test dataset: 20+ question-answer pairs dengan ground truth 2. Implement metrics: - Retrieval: Precision@K, Recall@K - Generation: Faithfulness (LLM-as-judge) 3. Run evaluation pada 2 different RAG configurations 4. Visualize results - Deliverable:
Evaluation code
Test dataset (JSON/CSV)
Results report dengan recommendations
Rubric:
Metric implementation (35%)
Test dataset quality (20%)
Comparative analysis (25%)
Recommendations (20%)
Proyek Akhir (Capstone):
“Domain-Specific RAG System”
Build complete RAG system untuk domain pilihan Anda:
Options:
Medical Q&A (dari medical literature)
Legal assistant (dari regulations/case law)
Code documentation helper
Academic research assistant
Requirements:
Document collection (50+ documents)
Full RAG pipeline (chunking → embedding → retrieval → generation)
Related Labs: Lab 10 - Building RAG System for Indonesian Documents Related Chapters: Chapter 8 (Transformers), Chapter 11 (MLOps & Deployment) Estimated Reading Time: 120 minutes Estimated Practice Time: 10-12 hours
Last Updated: December 6, 2024Version: 1.0Author: Tim Pengembang - Politeknik Siber dan Sandi Negara