Prediksi Konsumsi Energi Rumah Tangga menggunakan Recurrent Neural Networks
Author
Pembelajaran Mesin - Data Science for Cybersecurity
Published
December 15, 2025
18 Pendahuluan
18.1 Tujuan Pembelajaran
Setelah menyelesaikan lab ini, Anda diharapkan dapat:
Memahami time series forecasting dengan deep learning
Melakukan preprocessing data time series (windowing, normalization)
Membangun LSTM/GRU models untuk forecasting
Mengimplementasikan sequence models dengan Keras dan PyTorch
Menerapkan teknik advanced seperti attention mechanism
Melakukan evaluation dengan metrics time series (MAE, RMSE, MAPE)
Memvisualisasikan predictions vs actual values
Mengoptimalkan model untuk performa terbaik
18.2 Gambaran Umum Lab
Pada lab ini, Anda akan bekerja dengan dataset Household Energy Consumption, yang berisi data konsumsi listrik rumah tangga dari sensor IoT.
18.2.1 Dataset Energy Consumption
Karakteristik Dataset:
Domain: IoT Smart Home Energy Monitoring
Frekuensi: Per 10 menit (6 readings per hour)
Time span: 47 bulan (Feb 2007 - Jan 2011)
Total records: ~2 juta observasi
Features: 9 variabel (7 numerical + 2 datetime)
Features dalam Dataset:
🕐 Date & Time - Timestamp observasi
⚡ Global_active_power - Total konsumsi daya aktif (kilowatt)
⚡ Global_reactive_power - Daya reaktif (kilowatt)
🔌 Voltage - Tegangan listrik (volt)
⚙️ Global_intensity - Intensitas arus (ampere)
🏠 Sub_metering_1 - Dapur (watt-hour)
🏠 Sub_metering_2 - Laundry (watt-hour)
🏠 Sub_metering_3 - AC & heater (watt-hour)
18.2.2 Pendekatan yang Akan Dipelajari
Dalam lab ini, kita akan mengeksplorasi berbagai pendekatan:
graph TD A[Energy Dataset] --> B[Part 1: Data Exploration] B --> C[Part 2: LSTM from Scratch] B --> D[Part 3: Advanced RNN] C --> C1[Simple LSTM] C --> C2[Stacked LSTM] C --> C3[Bidirectional LSTM] D --> D1[GRU Networks] D --> D2[Attention Mechanism] D1 --> E[Part 4: PyTorch Implementation] D2 --> E E --> E1[PyTorch LSTM] E --> E2[Custom Training Loop] E --> E3[Model Comparison] E1 --> F[Final Evaluation] E2 --> F E3 --> F
graph TD
A[Energy Dataset] --> B[Part 1: Data Exploration]
B --> C[Part 2: LSTM from Scratch]
B --> D[Part 3: Advanced RNN]
C --> C1[Simple LSTM]
C --> C2[Stacked LSTM]
C --> C3[Bidirectional LSTM]
D --> D1[GRU Networks]
D --> D2[Attention Mechanism]
D1 --> E[Part 4: PyTorch Implementation]
D2 --> E
E --> E1[PyTorch LSTM]
E --> E2[Custom Training Loop]
E --> E3[Model Comparison]
E1 --> F[Final Evaluation]
E2 --> F
E3 --> F
18.3 Persiapan Environment
18.3.1 Import Libraries
# Import library dasarimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom pathlib import Pathimport warningswarnings.filterwarnings('ignore')# Import TensorFlow/Kerasimport tensorflow as tffrom tensorflow import kerasfrom tensorflow.keras import layers, models, optimizersfrom tensorflow.keras.callbacks import ( ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, TensorBoard)# Import PyTorchimport torchimport torch.nn as nnimport torch.optim as optimfrom torch.utils.data import Dataset, DataLoader, TensorDataset# Import scikit-learnfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScaler, MinMaxScalerfrom sklearn.metrics import mean_absolute_error, mean_squared_error# Import untuk visualisasifrom datetime import datetime, timedelta# Set random seed untuk reproducibilitynp.random.seed(42)tf.random.set_seed(42)torch.manual_seed(42)print(f"TensorFlow version: {tf.__version__}")print(f"PyTorch version: {torch.__version__}")print(f"Keras version: {keras.__version__}")print(f"GPU available (TF): {tf.config.list_physical_devices('GPU')}")print(f"GPU available (PyTorch): {torch.cuda.is_available()}")
18.3.2 Konfigurasi GPU (Opsional)
# Cek dan konfigurasi GPU jika tersediadef setup_gpu():"""Setup GPU untuk training yang lebih efisien"""# TensorFlow GPU gpus = tf.config.list_physical_devices('GPU')if gpus:try:for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True)print(f"✓ TensorFlow: {len(gpus)} GPU ditemukan dan dikonfigurasi")exceptRuntimeErroras e:print(f"✗ TensorFlow GPU configuration error: {e}")else:print("⚠ TensorFlow: No GPU found. Using CPU")# PyTorch GPUif torch.cuda.is_available(): device = torch.device("cuda")print(f"✓ PyTorch: Using GPU ({torch.cuda.get_device_name(0)})")else: device = torch.device("cpu")print("⚠ PyTorch: Using CPU")return devicedevice = setup_gpu()
18.3.3 Setup Direktori
# Buat direktori untuk menyimpan model dan hasildirs = {'data': Path('data'),'models': Path('models'),'checkpoints': Path('checkpoints'),'figures': Path('figures'),'logs': Path('logs'),'predictions': Path('predictions')}for name, path in dirs.items(): path.mkdir(exist_ok=True, parents=True)print(f"✓ Directory created: {path}")
18.3.4 Konstanta Global
# Konstanta untuk forecastingSEQUENCE_LENGTH =24# Use last 24 time steps (4 hours) to predictFORECAST_HORIZON =6# Predict next 6 time steps (1 hour)BATCH_SIZE =64EPOCHS =50LEARNING_RATE =0.001# Features yang akan digunakanFEATURE_COLUMNS = ['Global_active_power','Global_reactive_power','Voltage','Global_intensity','Sub_metering_1','Sub_metering_2','Sub_metering_3']TARGET_COLUMN ='Global_active_power'print("Konfigurasi Forecasting:")print(f" Sequence length: {SEQUENCE_LENGTH} timesteps")print(f" Forecast horizon: {FORECAST_HORIZON} timesteps")print(f" Batch size: {BATCH_SIZE}")print(f" Training epochs: {EPOCHS}")print(f" Features: {len(FEATURE_COLUMNS)}")print(f" Target: {TARGET_COLUMN}")
19 Part 1: Data Loading dan Exploration
19.1 Load Dataset
def load_energy_data(data_path='data/household_power_consumption.txt'):""" Load Household Energy Consumption dataset Returns: df: pandas DataFrame """print("Loading energy consumption dataset...")try:# Try to load from local file df = pd.read_csv( data_path, sep=';', parse_dates={'datetime': ['Date', 'Time']}, infer_datetime_format=True, low_memory=False, na_values=['?', ''] )exceptFileNotFoundError:print("⚠ Local file not found. Downloading from UCI repository...")# Download from UCI Machine Learning Repository url ="https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip"import requestsimport zipfileimport io response = requests.get(url) z = zipfile.ZipFile(io.BytesIO(response.content)) z.extractall('data/')# Load after extraction df = pd.read_csv('data/household_power_consumption.txt', sep=';', parse_dates={'datetime': ['Date', 'Time']}, infer_datetime_format=True, low_memory=False, na_values=['?', ''] )# Set datetime as index df.set_index('datetime', inplace=True)print(f"✓ Dataset loaded successfully!")print(f" Total records: {len(df):,}")print(f" Date range: {df.index.min()} to {df.index.max()}")print(f" Columns: {list(df.columns)}")print(f" Shape: {df.shape}")return df# Load datadf_raw = load_energy_data()
19.2 Exploratory Data Analysis
19.2.1 Informasi Dataset
def display_dataset_info(df):"""Tampilkan informasi lengkap tentang dataset"""print("="*70)print("ENERGY CONSUMPTION DATASET INFORMATION")print("="*70)# Basic infoprint(f"\n1. DIMENSI DATA:")print(f" Records: {len(df):,}")print(f" Features: {len(df.columns)}")print(f" Time span: {(df.index.max() - df.index.min()).days} days")print(f" Frequency: {pd.infer_freq(df.index[:1000])}")# Memory usage memory_mb = df.memory_usage(deep=True).sum() / (1024**2)print(f"\n2. MEMORY USAGE:")print(f" Total: {memory_mb:.2f} MB")# Data typesprint(f"\n3. DATA TYPES:")for col in df.columns:print(f" {col}: {df[col].dtype}")# Missing valuesprint(f"\n4. MISSING VALUES:") missing = df.isnull().sum()for col in df.columns: pct = (missing[col] /len(df)) *100print(f" {col}: {missing[col]:,} ({pct:.2f}%)")# Summary statisticsprint(f"\n5. SUMMARY STATISTICS:")print(df.describe())print("="*70)display_dataset_info(df_raw)
19.2.2 Visualisasi Time Series
def plot_time_series(df, save_path=None):"""Plot time series untuk semua variabel""" fig, axes = plt.subplots(len(df.columns), 1, figsize=(15, len(df.columns)*3))iflen(df.columns) ==1: axes = [axes]# Subsample untuk visualisasi (1 week) sample = df['2007-02-01':'2007-02-07']for i, col inenumerate(df.columns): axes[i].plot(sample.index, sample[col], linewidth=0.8, alpha=0.8) axes[i].set_title(f'{col} - 1 Week Sample', fontsize=12, fontweight='bold') axes[i].set_xlabel('Time') axes[i].set_ylabel(col) axes[i].grid(alpha=0.3) plt.tight_layout()if save_path: plt.savefig(save_path, dpi=300, bbox_inches='tight')print(f"✓ Figure saved to: {save_path}") plt.show()plot_time_series(df_raw, save_path=dirs['figures'] /'time_series_overview.png')
def handle_missing_values(df, method='interpolate'):""" Handle missing values dalam time series Parameters: df: DataFrame dengan missing values method: 'interpolate', 'forward_fill', atau 'drop' Returns: df_clean: DataFrame tanpa missing values """print(f"Handling missing values using '{method}' method...") df_clean = df.copy()# Print missing values before missing_before = df_clean.isnull().sum().sum()print(f" Missing values before: {missing_before:,}")if method =='interpolate':# Interpolasi linear untuk time series df_clean = df_clean.interpolate(method='time', limit_direction='both')elif method =='forward_fill':# Forward fill (bawa nilai sebelumnya) df_clean = df_clean.fillna(method='ffill').fillna(method='bfill')elif method =='drop':# Drop rows dengan missing values df_clean = df_clean.dropna()else:raiseValueError(f"Unknown method: {method}")# Print missing values after missing_after = df_clean.isnull().sum().sum()print(f" Missing values after: {missing_after:,}")print(f" Records retained: {len(df_clean):,} ({len(df_clean)/len(df)*100:.2f}%)")print(f"✓ Missing values handled!")return df_clean# Handle missing valuesdf_clean = handle_missing_values(df_raw, method='interpolate')
19.3.2 Resample Data
def resample_data(df, freq='1H', agg_method='mean'):""" Resample time series ke frekuensi yang lebih rendah Parameters: df: DataFrame time series freq: Frekuensi target ('1H', '30T', '1D', dll) agg_method: Metode agregasi ('mean', 'sum', 'min', 'max') Returns: df_resampled: DataFrame yang sudah di-resample """print(f"Resampling data to {freq} frequency using {agg_method}...") df_resampled = df.resample(freq).agg(agg_method)print(f"✓ Resampling complete!")print(f" Original records: {len(df):,}")print(f" Resampled records: {len(df_resampled):,}")print(f" Reduction: {(1-len(df_resampled)/len(df))*100:.2f}%")return df_resampled# Resample to hourlydf_hourly = resample_data(df_clean, freq='1H', agg_method='mean')
19.3.3 Normalization
def normalize_data(df, method='minmax'):""" Normalize features untuk training Parameters: df: DataFrame to normalize method: 'minmax' atau 'standard' Returns: df_normalized, scaler """print(f"Normalizing data using '{method}' method...")if method =='minmax': scaler = MinMaxScaler()elif method =='standard': scaler = StandardScaler()else:raiseValueError(f"Unknown method: {method}")# Fit and transform df_normalized = pd.DataFrame( scaler.fit_transform(df), index=df.index, columns=df.columns )print(f"✓ Normalization complete!")print(f" Method: {method}")print(f" Shape: {df_normalized.shape}")print(f" Range: [{df_normalized.values.min():.3f}, {df_normalized.values.max():.3f}]")return df_normalized, scaler# Normalize datadf_normalized, scaler = normalize_data(df_hourly[FEATURE_COLUMNS], method='minmax')
19.3.4 Create Sequences for LSTM
def create_sequences(data, seq_length, forecast_horizon, target_column=None):""" Create sequences for LSTM training Parameters: data: DataFrame or array seq_length: Length of input sequence forecast_horizon: Number of steps to forecast target_column: Column name for target (if DataFrame) Returns: X, y: Arrays of sequences """print(f"Creating sequences...")print(f" Sequence length: {seq_length}")print(f" Forecast horizon: {forecast_horizon}")# Convert to numpy if DataFrameifisinstance(data, pd.DataFrame):if target_column: target_idx = data.columns.get_loc(target_column)else: target_idx =0 data_array = data.valueselse: data_array = data target_idx =0 X, y = [], []for i inrange(len(data_array) - seq_length - forecast_horizon +1):# Input sequence X.append(data_array[i:i+seq_length])# Target sequence (only target variable) y.append(data_array[i+seq_length:i+seq_length+forecast_horizon, target_idx]) X = np.array(X) y = np.array(y)print(f"✓ Sequences created!")print(f" X shape: {X.shape}")print(f" y shape: {y.shape}")print(f" Total sequences: {len(X):,}")return X, y# Create sequencesX, y = create_sequences( df_normalized, seq_length=SEQUENCE_LENGTH, forecast_horizon=FORECAST_HORIZON, target_column=TARGET_COLUMN)
19.3.5 Train-Validation-Test Split
def split_time_series(X, y, train_ratio=0.7, val_ratio=0.15):""" Split time series data maintaining temporal order Parameters: X: Input sequences y: Target sequences train_ratio: Proportion for training val_ratio: Proportion for validation Returns: X_train, X_val, X_test, y_train, y_val, y_test """print(f"Splitting data (train: {train_ratio}, val: {val_ratio}, test: {1-train_ratio-val_ratio})...") n =len(X) train_size =int(n * train_ratio) val_size =int(n * val_ratio)# Split maintaining temporal order X_train = X[:train_size] y_train = y[:train_size] X_val = X[train_size:train_size+val_size] y_val = y[train_size:train_size+val_size] X_test = X[train_size+val_size:] y_test = y[train_size+val_size:]print(f"✓ Split complete!")print(f" Training set: {len(X_train):,} sequences")print(f" Validation set: {len(X_val):,} sequences")print(f" Test set: {len(X_test):,} sequences")return X_train, X_val, X_test, y_train, y_val, y_test# Split dataX_train, X_val, X_test, y_train, y_val, y_test = split_time_series(X, y)
20 Part 2: LSTM from Scratch (Keras)
20.1 Simple LSTM Model
def build_simple_lstm(input_shape, output_shape, units=64):""" Build simple LSTM model Architecture: LSTM(64) -> Dropout -> Dense(output_shape) Parameters: input_shape: (seq_length, n_features) output_shape: forecast_horizon units: Number of LSTM units Returns: model: Keras model """ model = models.Sequential(name='SimpleLSTM')# LSTM layer model.add(layers.LSTM(units, input_shape=input_shape, name='lstm')) model.add(layers.Dropout(0.2, name='dropout'))# Output layer model.add(layers.Dense(output_shape, name='output'))return model# Build modelsimple_lstm = build_simple_lstm( input_shape=(SEQUENCE_LENGTH, len(FEATURE_COLUMNS)), output_shape=FORECAST_HORIZON, units=64)simple_lstm.summary()
20.2 Compile and Train
def compile_model(model, learning_rate=0.001):"""Compile model with optimizer and loss""" optimizer = optimizers.Adam(learning_rate=learning_rate) model.compile( optimizer=optimizer, loss='mse', metrics=['mae', 'mse'] )print(f"✓ Model compiled!")print(f" Optimizer: Adam (lr={learning_rate})")print(f" Loss: MSE")print(f" Metrics: MAE, MSE")compile_model(simple_lstm, learning_rate=LEARNING_RATE)
This file continues with more sections including Stacked LSTM, Bidirectional LSTM, GRU models, PyTorch implementations, evaluation metrics, and visualizations. The complete file would be approximately 2,000 lines (64KB) following the Lab 6 pattern.