Spaces:

v1-a
/

dlgenai_deploy

Sleeping

File size: 57,381 Bytes

092944c

{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.11.13","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"none","dataSources":[{"sourceId":115439,"databundleVersionId":13800781,"sourceType":"competition"}],"dockerImageVersionId":31153,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":false},"colab":{"provenance":[]}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Milestone 4 — Sequence Modeling with LSTM and GRU\n\nThis milestone introduces **deep learning models (LSTM / GRU)** that are specifically designed to capture the **order and contextual relationships** between words in a sequence.\n\n---\n\n##  Suggested Readings\n- [LSTM](https://docs.pytorch.org/docs/stable/generated/torch.nn.GRU.html)\n- [GRU](https://docs.pytorch.org/docs/stable/generated/torch.nn.LSTM.html)\n\n---\n\n## ⚙️ Instructions\n\nUse the **constants and helper functions** provided in the next cell to answer all **Milestone-4 questions**.\n\nPerform the following tasks on the **training dataset** provided as part of the Kaggle competition:\n\n🔗 **Competition Link:**  \n[2025-Sep-DL-Gen-AI-Project](https://www.kaggle.com/competitions/2025-sep-dl-gen-ai-project)\n","metadata":{"id":"e2ogIMAMt4VL"}},{"cell_type":"markdown","source":"# Imports","metadata":{"id":"naJr2EGft4VN"}},{"cell_type":"code","source":"import torch\nimport torch.nn as nn\nimport torch.optim as optim\nfrom torch.utils.data import DataLoader, Dataset\nimport pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport random\nfrom collections import Counter\nfrom torch.nn.utils.rnn import pad_sequence\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import f1_score\nimport time\nimport wandb\n\nimport warnings\nwarnings.filterwarnings(\"ignore\")","metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:52:46.424006Z","iopub.execute_input":"2025-11-05T18:52:46.424340Z","iopub.status.idle":"2025-11-05T18:52:46.430434Z","shell.execute_reply.started":"2025-11-05T18:52:46.424317Z","shell.execute_reply":"2025-11-05T18:52:46.429463Z"},"id":"JbLUKraxt4VO"},"outputs":[],"execution_count":92},{"cell_type":"code","source":"import wandb\n\nwandb.login(key=\"91dd07c3af72494cbc03851d69b433c8de61db08\")  # Only needed once","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:49.973247Z","iopub.execute_input":"2025-11-05T18:45:49.974244Z","iopub.status.idle":"2025-11-05T18:45:49.981976Z","shell.execute_reply.started":"2025-11-05T18:45:49.974200Z","shell.execute_reply":"2025-11-05T18:45:49.981253Z"}},"outputs":[{"name":"stderr","text":"\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Calling wandb.login() after wandb.init() has no effect.\n","output_type":"stream"},{"execution_count":60,"output_type":"execute_result","data":{"text/plain":"True"},"metadata":{}}],"execution_count":60},{"cell_type":"markdown","source":"### Set seeds and Constants","metadata":{"id":"4J6MM3M4t4VO"}},{"cell_type":"code","source":"#----------------------------- DON'T CHANGE THIS --------------------------\nDATA_SEED = 67\nTRAINING_SEED = 1234\nMAX_LEN = 50\nBATCH_SIZE = 64\nEMB_DIM = 100\nHIDDEN_DIM = 256\nOUTPUT_DIM = 5\n\nrandom.seed(DATA_SEED)\nnp.random.seed(DATA_SEED)\ntorch.manual_seed(DATA_SEED)\ntorch.cuda.manual_seed(DATA_SEED)\nprint(\"done\")","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:49.988484Z","iopub.execute_input":"2025-11-05T18:45:49.989310Z","iopub.status.idle":"2025-11-05T18:45:50.001614Z","shell.execute_reply.started":"2025-11-05T18:45:49.989280Z","shell.execute_reply":"2025-11-05T18:45:50.000687Z"},"id":"FziGMCfXt4VO"},"outputs":[{"name":"stdout","text":"done\n","output_type":"stream"}],"execution_count":61},{"cell_type":"markdown","source":"# Create Vocab","metadata":{"id":"WdF4Ds3-t4VP"}},{"cell_type":"code","source":"import os\nfor dirname, _, filenames in os.walk('/kaggle/input'):\n    for filename in filenames:\n        print(os.path.join(dirname, filename))","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.003112Z","iopub.execute_input":"2025-11-05T18:45:50.003399Z","iopub.status.idle":"2025-11-05T18:45:50.020498Z","shell.execute_reply.started":"2025-11-05T18:45:50.003380Z","shell.execute_reply":"2025-11-05T18:45:50.019497Z"}},"outputs":[{"name":"stdout","text":"/kaggle/input/2025-sep-dl-gen-ai-project/sample_submission.csv\n/kaggle/input/2025-sep-dl-gen-ai-project/train.csv\n/kaggle/input/2025-sep-dl-gen-ai-project/test.csv\n","output_type":"stream"}],"execution_count":62},{"cell_type":"code","source":"# Load dataset\ndf = pd.read_csv(\"/kaggle/input/2025-sep-dl-gen-ai-project/train.csv\")\ndt = pd.read_csv(\"/kaggle/input/2025-sep-dl-gen-ai-project/test.csv\")","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.022030Z","iopub.execute_input":"2025-11-05T18:45:50.022352Z","iopub.status.idle":"2025-11-05T18:45:50.053195Z","shell.execute_reply.started":"2025-11-05T18:45:50.022323Z","shell.execute_reply":"2025-11-05T18:45:50.052343Z"}},"outputs":[],"execution_count":63},{"cell_type":"code","source":"# Split train df into train_df(80%) and test_df (20%) use seed\n# ------------------- write your code here -------------------------------\n#-------------------------------------------------------------------------","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.054168Z","iopub.execute_input":"2025-11-05T18:45:50.054414Z","iopub.status.idle":"2025-11-05T18:45:50.058541Z","shell.execute_reply.started":"2025-11-05T18:45:50.054394Z","shell.execute_reply":"2025-11-05T18:45:50.057659Z"},"id":"Yrv4U9a7t4VP"},"outputs":[],"execution_count":64},{"cell_type":"code","source":"train_df, val_df = train_test_split(df, test_size=0.2, random_state=DATA_SEED)\n\nprint(f\"Train size: {len(train_df)}\")\nprint(f\"Val size: {len(val_df)}\")","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:54:28.264501Z","iopub.execute_input":"2025-11-05T18:54:28.264861Z","iopub.status.idle":"2025-11-05T18:54:28.274310Z","shell.execute_reply.started":"2025-11-05T18:54:28.264835Z","shell.execute_reply":"2025-11-05T18:54:28.273321Z"}},"outputs":[{"name":"stdout","text":"Train size: 5461\nVal size: 1366\n","output_type":"stream"}],"execution_count":93},{"cell_type":"code","source":"# create a simple space-based tokenizer.\n# ------------------- write your code here -------------------------------\n#-------------------------------------------------------------------------","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.082128Z","iopub.execute_input":"2025-11-05T18:45:50.083061Z","iopub.status.idle":"2025-11-05T18:45:50.095504Z","shell.execute_reply.started":"2025-11-05T18:45:50.083029Z","shell.execute_reply":"2025-11-05T18:45:50.094692Z"},"id":"HJdeRv_ht4VP"},"outputs":[],"execution_count":66},{"cell_type":"code","source":"def tokenize(text):\n    \"\"\"Simple tokenizer - splits on whitespace and lowercases\"\"\"\n    return text.lower().split()","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:54:56.514545Z","iopub.execute_input":"2025-11-05T18:54:56.515452Z","iopub.status.idle":"2025-11-05T18:54:56.519947Z","shell.execute_reply.started":"2025-11-05T18:54:56.515422Z","shell.execute_reply":"2025-11-05T18:54:56.518906Z"}},"outputs":[],"execution_count":94},{"cell_type":"code","source":"# Use counter to count all tokens in train_df\n# ------------------- write your code here -------------------------------\n#------------------------------------------------------------------------","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.096381Z","iopub.execute_input":"2025-11-05T18:45:50.096715Z","iopub.status.idle":"2025-11-05T18:45:50.109878Z","shell.execute_reply.started":"2025-11-05T18:45:50.096692Z","shell.execute_reply":"2025-11-05T18:45:50.108894Z"},"id":"kxjVnGSTt4VQ"},"outputs":[],"execution_count":67},{"cell_type":"code","source":"from collections import Counter\n\n# Count tokens in train set\ntoken_counter = Counter()\nfor text in train_df['text']:\n    token_counter.update(tokenize(text))\n\n# Create vocabulary\nspecials = ['<unk>', '<pad>']\nUNK_IDX, PAD_IDX = 0, 1\nmin_freq = 2\n\nvocab_list = specials + [token for token, freq in token_counter.items() if freq >= min_freq]\nword2idx = {token: i for i, token in enumerate(vocab_list)}\n\nVOCAB_SIZE = len(word2idx)\nprint(f\"Vocabulary size: {VOCAB_SIZE}\")","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:55:22.464258Z","iopub.execute_input":"2025-11-05T18:55:22.464539Z","iopub.status.idle":"2025-11-05T18:55:22.496030Z","shell.execute_reply.started":"2025-11-05T18:55:22.464520Z","shell.execute_reply":"2025-11-05T18:55:22.494983Z"}},"outputs":[{"name":"stdout","text":"Vocabulary size: 5730\n","output_type":"stream"}],"execution_count":95},{"cell_type":"markdown","source":"## Create train and val dataloaders","metadata":{"id":"BBR5zOYPt4VQ"}},{"cell_type":"code","source":"#----------------------------- DON'T CHANGE THIS --------------------------\nspecials = ['<unk>', '<pad>']\nmin_freq = 2\nvocab_list = specials + [token for token, freq in token_counter.items() if freq >= min_freq]\nword2idx = {token: i for i, token in enumerate(vocab_list)}\ndef text_pipeline(text):\n    \"\"\"Converts text to a list of indices using the word2idx dict.\"\"\"\n    tokens = tokenize(text)\n    return [word2idx.get(token, UNK_IDX) for token in tokens]\nclass EmotionDataset(Dataset):\n    def __init__(self, dataframe):\n        self.texts = dataframe['text'].values\n        self.labels = dataframe[['anger', 'fear', 'joy', 'sadness', 'surprise']].values.astype(np.float32)\n    def __len__(self):\n        return len(self.texts)\n    def __getitem__(self, idx):\n        return self.texts[idx], self.labels[idx]\ndef collate_batch(batch):\n    label_list, text_list = [], []\n    for (_text, _labels) in batch:\n        label_list.append(_labels)\n        processed_text = torch.tensor(text_pipeline(_text), dtype=torch.int64)[:MAX_LEN]\n        text_list.append(processed_text)\n    label_list = torch.tensor(label_list, dtype=torch.float32)\n    text_list = pad_sequence(text_list, batch_first=True, padding_value=PAD_IDX)\n    if text_list.shape[1] < MAX_LEN:\n        pad_tensor = torch.full(\n            (text_list.shape[0], MAX_LEN - text_list.shape[1]),\n            PAD_IDX,\n            dtype=torch.int64\n        )\n        text_list = torch.cat((text_list, pad_tensor), dim=1)\n\n    return text_list, label_list\n\n# Create train and val dataloaders\n# ------------------- write your code here -------------------------------\n#------------------------------------------------------------------------","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.164165Z","iopub.execute_input":"2025-11-05T18:45:50.164916Z","iopub.status.idle":"2025-11-05T18:45:50.176600Z","shell.execute_reply.started":"2025-11-05T18:45:50.164886Z","shell.execute_reply":"2025-11-05T18:45:50.175579Z"},"id":"DTFAKSgPt4VQ"},"outputs":[],"execution_count":69},{"cell_type":"code","source":"from torch.utils.data import Dataset, DataLoader\nfrom torch.nn.utils.rnn import pad_sequence\n\ndef text_pipeline(text):\n    \"\"\"Convert text to list of token indices\"\"\"\n    tokens = tokenize(text)\n    return [word2idx.get(token, UNK_IDX) for token in tokens]\n\nclass EmotionDataset(Dataset):\n    def __init__(self, dataframe):\n        self.texts = dataframe['text'].values\n        self.labels = dataframe[['anger', 'fear', 'joy', 'sadness', 'surprise']].values.astype(np.float32)\n    \n    def __len__(self):\n        return len(self.texts)\n    \n    def __getitem__(self, idx):\n        return self.texts[idx], self.labels[idx]\n\ndef collate_batch(batch):\n    \"\"\"Collate function to pad sequences in a batch\"\"\"\n    label_list, text_list = [], []\n    \n    for text, labels in batch:\n        label_list.append(labels)\n        # Convert text to indices and truncate to MAX_LEN\n        processed_text = torch.tensor(text_pipeline(text), dtype=torch.int64)[:MAX_LEN]\n        text_list.append(processed_text)\n    \n    # Stack labels\n    label_list = torch.tensor(label_list, dtype=torch.float32)\n    \n    # Pad sequences to same length\n    text_list = pad_sequence(text_list, batch_first=True, padding_value=PAD_IDX)\n    \n    # Ensure all sequences are exactly MAX_LEN (pad to the right if needed)\n    if text_list.shape[1] < MAX_LEN:\n        pad_size = MAX_LEN - text_list.shape[1]\n        pad_tensor = torch.full((text_list.shape[0], pad_size), PAD_IDX, dtype=torch.int64)\n        text_list = torch.cat([text_list, pad_tensor], dim=1)\n    \n    return text_list, label_list\n\n# Create datasets\ntrain_ds = EmotionDataset(train_df)\nval_ds = EmotionDataset(val_df)\n\n# Create dataloaders\ntrain_dl = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, collate_fn=collate_batch)\nval_dl = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, collate_fn=collate_batch)\n\nprint(f\"Train batches: {len(train_dl)}\")\nprint(f\"Val batches: {len(val_dl)}\")","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:56:08.344432Z","iopub.execute_input":"2025-11-05T18:56:08.345280Z","iopub.status.idle":"2025-11-05T18:56:08.358990Z","shell.execute_reply.started":"2025-11-05T18:56:08.345250Z","shell.execute_reply":"2025-11-05T18:56:08.358075Z"}},"outputs":[{"name":"stdout","text":"Train batches: 86\nVal batches: 22\n","output_type":"stream"}],"execution_count":96},{"cell_type":"markdown","source":"### Q1. What are the vocabulary size, padding token index, and unknown token index for the above dataset?","metadata":{"id":"nwq9_wK4t4VQ"}},{"cell_type":"code","source":"# ------------------- write your code here -------------------------------\n#-------------------------------------------------------------------------\nprint(\"Vocab Size:\", VOCAB_SIZE)\nprint(\"Pad idx:\", PAD_IDX)\nprint(\"Unk idx:\", UNK_IDX)","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.210499Z","iopub.execute_input":"2025-11-05T18:45:50.211073Z","iopub.status.idle":"2025-11-05T18:45:50.231854Z","shell.execute_reply.started":"2025-11-05T18:45:50.211038Z","shell.execute_reply":"2025-11-05T18:45:50.230923Z"},"id":"InMGo-zwt4VQ"},"outputs":[{"name":"stdout","text":"Vocab Size: 5730\nPad idx: 1\nUnk idx: 0\n","output_type":"stream"}],"execution_count":71},{"cell_type":"markdown","source":"### Q2.What are the indices for the words \"happy\", \"alone\", and \"sad\" in the vocabulary?","metadata":{"id":"W2s9owsCt4VR"}},{"cell_type":"code","source":"# ------------------- write your code here -------------------------------\n#-------------------------------------------------------------------------\nfor w in ['happy', 'alone', 'sad']:\n    print(f\"Index for {w}:\", word2idx.get(w, UNK_IDX))","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.233766Z","iopub.execute_input":"2025-11-05T18:45:50.234036Z","iopub.status.idle":"2025-11-05T18:45:50.254338Z","shell.execute_reply.started":"2025-11-05T18:45:50.234014Z","shell.execute_reply":"2025-11-05T18:45:50.253321Z"},"id":"LVNCVQADt4VR"},"outputs":[{"name":"stdout","text":"Index for happy: 1578\nIndex for alone: 2525\nIndex for sad: 885\n","output_type":"stream"}],"execution_count":72},{"cell_type":"code","source":"# Get a sample batch\nbatch_iter = iter(train_dl)\ntext_batch, label_batch = next(batch_iter)\n\nprint(f\"Text batch shape: {text_batch.shape}\")  # Should be (4, 128)\nprint(f\"Label batch shape: {label_batch.shape}\")  # Should be (4, 5)\n\n# Test embedding layer\nembedding_layer = nn.Embedding(VOCAB_SIZE, EMB_DIM, padding_idx=PAD_IDX)\nembedded_batch = embedding_layer(text_batch)\nprint(f\"Embedded batch shape: {embedded_batch.shape}\")  # Should be (4, 128, 64)\n\n# Test LSTM\nlstm = nn.LSTM(EMB_DIM, HIDDEN_DIM, batch_first=True)\nlstm_out, (hn, cn) = lstm(embedded_batch)\nprint(f\"LSTM output shape: {lstm_out.shape}\")  # (4, 128, 128)\nprint(f\"LSTM hidden state shape: {hn.shape}\")  # (1, 4, 128)\nprint(f\"LSTM cell state shape: {cn.shape}\")    # (1, 4, 128)","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.255350Z","iopub.execute_input":"2025-11-05T18:45:50.255688Z","iopub.status.idle":"2025-11-05T18:45:50.350879Z","shell.execute_reply.started":"2025-11-05T18:45:50.255661Z","shell.execute_reply":"2025-11-05T18:45:50.349945Z"}},"outputs":[{"name":"stdout","text":"Text batch shape: torch.Size([64, 50])\nLabel batch shape: torch.Size([64, 5])\nEmbedded batch shape: torch.Size([64, 50, 100])\nLSTM output shape: torch.Size([64, 50, 256])\nLSTM hidden state shape: torch.Size([1, 64, 256])\nLSTM cell state shape: torch.Size([1, 64, 256])\n","output_type":"stream"}],"execution_count":73},{"cell_type":"markdown","source":"### Q3. What is the output shape of the Embedding layer?\n","metadata":{"id":"j2m_SwrZt4VR"}},{"cell_type":"code","source":"# ------------------- write your code here -------------------------------\n#-------------------------------------------------------------------------","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.351800Z","iopub.execute_input":"2025-11-05T18:45:50.352227Z","iopub.status.idle":"2025-11-05T18:45:50.356331Z","shell.execute_reply.started":"2025-11-05T18:45:50.352200Z","shell.execute_reply":"2025-11-05T18:45:50.355441Z"},"id":"h4C8uR4zt4VR"},"outputs":[],"execution_count":74},{"cell_type":"code","source":"print(f\"Embedded batch shape: {embedded_batch.shape}\")","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.358743Z","iopub.execute_input":"2025-11-05T18:45:50.359066Z","iopub.status.idle":"2025-11-05T18:45:50.374464Z","shell.execute_reply.started":"2025-11-05T18:45:50.359045Z","shell.execute_reply":"2025-11-05T18:45:50.373564Z"}},"outputs":[{"name":"stdout","text":"Embedded batch shape: torch.Size([64, 50, 100])\n","output_type":"stream"}],"execution_count":75},{"cell_type":"markdown","source":"### Q4. What will be output shape of simple LSTM layer","metadata":{"id":"cQi8Xfhit4VR"}},{"cell_type":"code","source":"# ------------------- write your code here -------------------------------\n#-------------------------------------------------------------------------","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.375766Z","iopub.execute_input":"2025-11-05T18:45:50.376047Z","iopub.status.idle":"2025-11-05T18:45:50.390129Z","shell.execute_reply.started":"2025-11-05T18:45:50.376026Z","shell.execute_reply":"2025-11-05T18:45:50.389186Z"},"id":"eL299270t4VS"},"outputs":[],"execution_count":76},{"cell_type":"code","source":"print(f\"LSTM output shape: {lstm_out.shape}\")","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.391036Z","iopub.execute_input":"2025-11-05T18:45:50.391263Z","iopub.status.idle":"2025-11-05T18:45:50.410411Z","shell.execute_reply.started":"2025-11-05T18:45:50.391246Z","shell.execute_reply":"2025-11-05T18:45:50.409397Z"}},"outputs":[{"name":"stdout","text":"LSTM output shape: torch.Size([64, 50, 256])\n","output_type":"stream"}],"execution_count":77},{"cell_type":"markdown","source":"### Q5. What is the 'hidden' state shape from a simple LSTM?","metadata":{"id":"2WyMjs-Pt4VS"}},{"cell_type":"code","source":"# ------------------- write your code here -------------------------------\n#-------------------------------------------------------------------------","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.411455Z","iopub.execute_input":"2025-11-05T18:45:50.411765Z","iopub.status.idle":"2025-11-05T18:45:50.426026Z","shell.execute_reply.started":"2025-11-05T18:45:50.411745Z","shell.execute_reply":"2025-11-05T18:45:50.424939Z"},"id":"SFu7xJO2t4VS"},"outputs":[],"execution_count":78},{"cell_type":"code","source":"print(f\"LSTM cell state shape: {cn.shape}\")","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.427103Z","iopub.execute_input":"2025-11-05T18:45:50.427463Z","iopub.status.idle":"2025-11-05T18:45:50.443563Z","shell.execute_reply.started":"2025-11-05T18:45:50.427437Z","shell.execute_reply":"2025-11-05T18:45:50.442563Z"}},"outputs":[{"name":"stdout","text":"LSTM cell state shape: torch.Size([1, 64, 256])\n","output_type":"stream"}],"execution_count":79},{"cell_type":"markdown","source":"### Q6. What is the 'hidden' state shape from a simple GRU?","metadata":{"id":"P-_KR3gJt4VS"}},{"cell_type":"code","source":"# similarly do it for gru and find hidden state shape\n# ------------------- write your code here -------------------------------\n#-------------------------------------------------------------------------","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.444466Z","iopub.execute_input":"2025-11-05T18:45:50.444784Z","iopub.status.idle":"2025-11-05T18:45:50.458650Z","shell.execute_reply.started":"2025-11-05T18:45:50.444762Z","shell.execute_reply":"2025-11-05T18:45:50.457601Z"},"id":"6zu5csBlt4VS"},"outputs":[],"execution_count":80},{"cell_type":"code","source":"gru = nn.GRU(EMB_DIM, HIDDEN_DIM, batch_first=True)\ngru_out, hn_gru = gru(embedded_batch)\nprint(gru_out.shape, hn_gru.shape)","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.459714Z","iopub.execute_input":"2025-11-05T18:45:50.460236Z","iopub.status.idle":"2025-11-05T18:45:50.557620Z","shell.execute_reply.started":"2025-11-05T18:45:50.460205Z","shell.execute_reply":"2025-11-05T18:45:50.556702Z"}},"outputs":[{"name":"stdout","text":"torch.Size([64, 50, 256]) torch.Size([1, 64, 256])\n","output_type":"stream"}],"execution_count":81},{"cell_type":"markdown","source":"### Q7. What is the 'output' tensor shape from a bidirectional LSTM?","metadata":{"id":"9B1Aw1S3t4VS"}},{"cell_type":"code","source":"# Bidirectional LSTM Output Shape\n# ------------------- write your code here -------------------------------\n#-------------------------------------------------------------------------","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.558596Z","iopub.execute_input":"2025-11-05T18:45:50.559021Z","iopub.status.idle":"2025-11-05T18:45:50.562975Z","shell.execute_reply.started":"2025-11-05T18:45:50.558992Z","shell.execute_reply":"2025-11-05T18:45:50.562202Z"},"id":"DsDrU39At4VS"},"outputs":[],"execution_count":82},{"cell_type":"markdown","source":"### Q8. What is the 'hidden' state shape from a bidirectional LSTM?","metadata":{"id":"3HoQngbQt4VS"}},{"cell_type":"code","source":"# Bidirectional LSTM Hidden Shape\n# ------------------- write your code here -------------------------------\n#-------------------------------------------------------------------------","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.564014Z","iopub.execute_input":"2025-11-05T18:45:50.564502Z","iopub.status.idle":"2025-11-05T18:45:50.579601Z","shell.execute_reply.started":"2025-11-05T18:45:50.564471Z","shell.execute_reply":"2025-11-05T18:45:50.578581Z"},"id":"dRX_2qu3t4VS"},"outputs":[],"execution_count":83},{"cell_type":"code","source":"bilstm = nn.LSTM(EMB_DIM, HIDDEN_DIM, batch_first=True, bidirectional=True)\nbilstm_out, (hn_bi, cn_bi) = bilstm(embedded_batch)\nprint(bilstm_out.shape, hn_bi.shape, cn_bi.shape)","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:45:50.583376Z","iopub.execute_input":"2025-11-05T18:45:50.583693Z","iopub.status.idle":"2025-11-05T18:45:50.747781Z","shell.execute_reply.started":"2025-11-05T18:45:50.583668Z","shell.execute_reply":"2025-11-05T18:45:50.746848Z"}},"outputs":[{"name":"stdout","text":"torch.Size([64, 50, 512]) torch.Size([2, 64, 256]) torch.Size([2, 64, 256])\n","output_type":"stream"}],"execution_count":84},{"cell_type":"markdown","source":"### Q9. Create 3 sequential models using the (Simple & Bidirectional)LSTM and Stacked GRU (2 layers)For all models, follow this(Embedding layer → [LSTM / BiLSTM / Stacked GRU] → Linear layer) architecture. What will be the training parameters in all 3 cases?(LSTM, BiLSTM, Stacked GRU)","metadata":{"id":"wDxAlIkvt4VS"}},{"cell_type":"code","source":"# Function to count parameters\ndef count_parameters(model):\n    return sum(p.numel() for p in model.parameters() if p.requires_grad)\n\n# Model 1: Simple LSTM\nclass SimpleLSTM(nn.Module):\n    def __init__(self, vocab_size, emb_dim, hidden_dim, output_dim, pad_idx):\n        super(SimpleLSTM, self).__init__()\n        self.embedding = nn.Embedding(vocab_size, emb_dim, padding_idx=pad_idx)\n        self.lstm = nn.LSTM(emb_dim, hidden_dim, batch_first=True)\n        self.fc = nn.Linear(hidden_dim, output_dim)\n        \n    def forward(self, text):\n        # text: (batch_size, seq_len)\n        embedded = self.embedding(text)  # (batch_size, seq_len, emb_dim)\n        lstm_out, (hn, cn) = self.lstm(embedded)\n        # Use the last hidden state from the last layer\n        out = self.fc(hn[-1])  # (batch_size, output_dim)\n        return out\n\n\n# Model 2: Bidirectional LSTM\nclass BiLSTM(nn.Module):\n    def __init__(self, vocab_size, emb_dim, hidden_dim, output_dim, pad_idx):\n        super(BiLSTM, self).__init__()\n        self.embedding = nn.Embedding(vocab_size, emb_dim, padding_idx=pad_idx)\n        self.bilstm = nn.LSTM(emb_dim, hidden_dim, batch_first=True, bidirectional=True)\n        self.fc = nn.Linear(hidden_dim * 2, output_dim)  # *2 for bidirectional\n        \n    def forward(self, text):\n        embedded = self.embedding(text)\n        bilstm_out, (hn, cn) = self.bilstm(embedded)\n        # Concatenate the final forward and backward hidden states\n        # hn[-2] is the last of the forward direction\n        # hn[-1] is the last of the backward direction\n        hidden = torch.cat((hn[-2], hn[-1]), dim=1)\n        out = self.fc(hidden)\n        return out\n\n\n# Model 3: Stacked GRU (2 layers)\nclass StackedGRU(nn.Module):\n    def __init__(self, vocab_size, emb_dim, hidden_dim, output_dim, pad_idx, num_layers=2):\n        super(StackedGRU, self).__init__()\n        self.embedding = nn.Embedding(vocab_size, emb_dim, padding_idx=pad_idx)\n        self.gru = nn.GRU(emb_dim, hidden_dim, num_layers=num_layers, batch_first=True)\n        self.fc = nn.Linear(hidden_dim, output_dim)\n        \n    def forward(self, text):\n        embedded = self.embedding(text)\n        gru_out, hn = self.gru(embedded)\n        # Use the hidden state from the top layer\n        out = self.fc(hn[-1])\n        return out\n\nprint(\"Models defined successfully!\")","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T18:57:03.073681Z","iopub.execute_input":"2025-11-05T18:57:03.074047Z","iopub.status.idle":"2025-11-05T18:57:03.087497Z","shell.execute_reply.started":"2025-11-05T18:57:03.074021Z","shell.execute_reply":"2025-11-05T18:57:03.086365Z"}},"outputs":[{"name":"stdout","text":"Models defined successfully!\n","output_type":"stream"}],"execution_count":98},{"cell_type":"markdown","source":"### Q10. If you experimented with both LSTM and GRU models using the same hyperparameters, which one achieved a better peak Macro F1-score in your W&B logs?","metadata":{"id":"cM_78-Est4VT"}},{"cell_type":"code","source":"def train_epoch(model, dataloader, optimizer, criterion, device):\n    \"\"\"Train model for one epoch\"\"\"\n    model.train()\n    epoch_loss = 0\n    all_preds = []\n    all_labels = []\n    \n    for text, labels in dataloader:\n        text = text.to(device)\n        labels = labels.to(device)\n        \n        # Forward pass\n        optimizer.zero_grad()\n        predictions = model(text)\n        \n        # Calculate loss\n        loss = criterion(predictions, labels)\n        \n        # Backward pass\n        loss.backward()\n        optimizer.step()\n        \n        epoch_loss += loss.item()\n        \n        # Store predictions and labels for F1 calculation\n        preds = torch.sigmoid(predictions).cpu().detach().numpy()\n        all_preds.append(preds)\n        all_labels.append(labels.cpu().numpy())\n    \n    # Calculate metrics\n    all_preds = np.vstack(all_preds)\n    all_labels = np.vstack(all_labels)\n    binary_preds = (all_preds > 0.5).astype(int)\n    f1 = f1_score(all_labels, binary_preds, average='macro', zero_division=0)\n    \n    return epoch_loss / len(dataloader), f1\n\n\ndef evaluate(model, dataloader, criterion, device):\n    \"\"\"Evaluate model on validation/test set\"\"\"\n    model.eval()\n    epoch_loss = 0\n    all_preds = []\n    all_labels = []\n    \n    with torch.no_grad():\n        for text, labels in dataloader:\n            text = text.to(device)\n            labels = labels.to(device)\n            \n            # Forward pass\n            predictions = model(text)\n            loss = criterion(predictions, labels)\n            \n            epoch_loss += loss.item()\n            \n            # Store predictions and labels\n            preds = torch.sigmoid(predictions).cpu().numpy()\n            all_preds.append(preds)\n            all_labels.append(labels.cpu().numpy())\n    \n    # Calculate metrics\n    all_preds = np.vstack(all_preds)\n    all_labels = np.vstack(all_labels)\n    binary_preds = (all_preds > 0.5).astype(int)\n    f1 = f1_score(all_labels, binary_preds, average='macro', zero_division=0)\n    \n    return epoch_loss / len(dataloader), f1\n\n\ndef train_model(model, train_dl, val_dl, model_name, num_epochs=10, learning_rate=0.001, device='cpu'):\n    \"\"\"Complete training loop with W&B logging\"\"\"\n    \n    # Initialize W&B run\n    run = wandb.init(\n        entity=\"vaishnavib-iitm-jntuh-\",\n        project=\"22f3001086-t32025\",\n        name=f\"{model_name}_ep{num_epochs}_lr{learning_rate}\",\n        config={\n            \"model\": model_name,\n            \"epochs\": num_epochs,\n            \"learning_rate\": learning_rate,\n            \"batch_size\": BATCH_SIZE,\n            \"hidden_dim\": HIDDEN_DIM,\n            \"emb_dim\": EMB_DIM,\n            \"max_len\": MAX_LEN,\n            \"vocab_size\": VOCAB_SIZE,\n            \"training_seed\": TRAINING_SEED\n        },\n        reinit=True\n    )\n    \n    model = model.to(device)\n    criterion = nn.BCEWithLogitsLoss()\n    optimizer = optim.Adam(model.parameters(), lr=learning_rate)\n    \n    best_val_f1 = 0.0\n    train_losses = []\n    val_losses = []\n    train_f1s = []\n    val_f1s = []\n    \n    start_time = time.time()\n    \n    for epoch in range(num_epochs):\n        # Train\n        train_loss, train_f1 = train_epoch(model, train_dl, optimizer, criterion, device)\n        \n        # Evaluate\n        val_loss, val_f1 = evaluate(model, val_dl, criterion, device)\n        \n        # Track best model\n        if val_f1 > best_val_f1:\n            best_val_f1 = val_f1\n        \n        # Store metrics\n        train_losses.append(train_loss)\n        val_losses.append(val_loss)\n        train_f1s.append(train_f1)\n        val_f1s.append(val_f1)\n        \n        # Log to W&B\n        wandb.log({\n            \"epoch\": epoch + 1,\n            \"train_loss\": train_loss,\n            \"train_f1\": train_f1,\n            \"val_loss\": val_loss,\n            \"val_f1\": val_f1,\n            \"best_val_f1\": best_val_f1\n        })\n        \n        # Print progress\n        print(f'Epoch [{epoch+1}/{num_epochs}]')\n        print(f'  Train Loss: {train_loss:.4f} | Train F1: {train_f1:.4f}')\n        print(f'  Val Loss: {val_loss:.4f}   | Val F1: {val_f1:.4f}')\n        print('-' * 60)\n    \n    total_time = time.time() - start_time\n    \n    # Log final summary\n    wandb.summary[\"best_val_f1\"] = best_val_f1\n    wandb.summary[\"total_time_seconds\"] = total_time\n    wandb.summary[\"total_time_minutes\"] = total_time / 60\n    wandb.summary[\"num_parameters\"] = count_parameters(model)\n    \n    # Finish W&B run\n    wandb.finish()\n    \n    return {\n        'best_val_f1': best_val_f1,\n        'train_losses': train_losses,\n        'val_losses': val_losses,\n        'train_f1s': train_f1s,\n        'val_f1s': val_f1s,\n        'total_time': total_time\n    }\n\nprint(\"Training functions defined!\")","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T19:00:36.578435Z","iopub.execute_input":"2025-11-05T19:00:36.578993Z","iopub.status.idle":"2025-11-05T19:00:36.597118Z","shell.execute_reply.started":"2025-11-05T19:00:36.578961Z","shell.execute_reply":"2025-11-05T19:00:36.596194Z"}},"outputs":[{"name":"stdout","text":"Training functions defined!\n","output_type":"stream"}],"execution_count":103},{"cell_type":"code","source":"# Set device\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\nprint(f\"Training on: {device}\")\n\n# Set seeds\nrandom.seed(TRAINING_SEED)\nnp.random.seed(TRAINING_SEED)\ntorch.manual_seed(TRAINING_SEED)\nif torch.cuda.is_available():\n    torch.cuda.manual_seed(TRAINING_SEED)\n    torch.cuda.manual_seed_all(TRAINING_SEED)\n\n# Training hyperparameters\nNUM_EPOCHS = 10\nLEARNING_RATE = 0.001\n\n# Store results\nresults = {}\n\n# Train SimpleLSTM\nprint(\"Training Simple LSTM\")\nmodel_lstm = SimpleLSTM(VOCAB_SIZE, EMB_DIM, HIDDEN_DIM, OUTPUT_DIM, PAD_IDX)\nresults['SimpleLSTM'] = train_model(\n    model_lstm, train_dl, val_dl, \n    model_name=\"SimpleLSTM\",\n    num_epochs=NUM_EPOCHS, \n    learning_rate=LEARNING_RATE, \n    device=device\n)\n\n# Train BiLSTM\nprint(\"Training Bidirectional LSTM\")\nmodel_bilstm = BiLSTM(VOCAB_SIZE, EMB_DIM, HIDDEN_DIM, OUTPUT_DIM, PAD_IDX)\nresults['BiLSTM'] = train_model(\n    model_bilstm, train_dl, val_dl, \n    model_name=\"BiLSTM\",\n    num_epochs=NUM_EPOCHS, \n    learning_rate=LEARNING_RATE, \n    device=device\n)\n\n# Train Stacked GRU\nprint(\"Training Stacked GRU (2 layers)\")\nmodel_stacked_gru = StackedGRU(VOCAB_SIZE, EMB_DIM, HIDDEN_DIM, OUTPUT_DIM, PAD_IDX, num_layers=2)\nresults['StackedGRU'] = train_model(\n    model_stacked_gru, train_dl, val_dl, \n    model_name=\"StackedGRU\",\n    num_epochs=NUM_EPOCHS, \n    learning_rate=LEARNING_RATE, \n    device=device\n)\n\n# Print summary\nprint(\"TRAINING SUMMARY\")\nfor model_name, result in results.items():\n    print(f\"{model_name}:\")\n    print(f\"  Best Val F1: {result['best_val_f1']:.4f}\")\n    print(f\"  Training Time: {result['total_time']/60:.2f} minutes\")\n    print()","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T19:00:46.059120Z","iopub.execute_input":"2025-11-05T19:00:46.059402Z","iopub.status.idle":"2025-11-05T19:10:27.694701Z","shell.execute_reply.started":"2025-11-05T19:00:46.059383Z","shell.execute_reply":"2025-11-05T19:10:27.693541Z"}},"outputs":[{"name":"stdout","text":"Training on: cpu\nTraining Simple LSTM\n","output_type":"stream"},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"Tracking run with wandb version 0.21.0"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"Run data is saved locally in <code>/kaggle/working/wandb/run-20251105_190046-iqcdjwew</code>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"Syncing run <strong><a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/iqcdjwew' target=\"_blank\">SimpleLSTM_ep10_lr0.001</a></strong> to <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/developer-guide' target=\"_blank\">docs</a>)<br>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":" View project at <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025' target=\"_blank\">https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025</a>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":" View run at <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/iqcdjwew' target=\"_blank\">https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/iqcdjwew</a>"},"metadata":{}},{"name":"stdout","text":"Epoch [1/10]\n  Train Loss: 0.5799 | Train F1: 0.1441\n  Val Loss: 0.5688   | Val F1: 0.1427\n------------------------------------------------------------\nEpoch [2/10]\n  Train Loss: 0.5666 | Train F1: 0.1472\n  Val Loss: 0.5705   | Val F1: 0.1465\n------------------------------------------------------------\nEpoch [3/10]\n  Train Loss: 0.5648 | Train F1: 0.1572\n  Val Loss: 0.5672   | Val F1: 0.1641\n------------------------------------------------------------\nEpoch [4/10]\n  Train Loss: 0.5641 | Train F1: 0.1717\n  Val Loss: 0.5665   | Val F1: 0.1593\n------------------------------------------------------------\nEpoch [5/10]\n  Train Loss: 0.5600 | Train F1: 0.1816\n  Val Loss: 0.5631   | Val F1: 0.1778\n------------------------------------------------------------\nEpoch [6/10]\n  Train Loss: 0.5515 | Train F1: 0.1816\n  Val Loss: 0.5521   | Val F1: 0.1761\n------------------------------------------------------------\nEpoch [7/10]\n  Train Loss: 0.5437 | Train F1: 0.2491\n  Val Loss: 0.5498   | Val F1: 0.2787\n------------------------------------------------------------\nEpoch [8/10]\n  Train Loss: 0.5342 | Train F1: 0.2942\n  Val Loss: 0.5473   | Val F1: 0.2881\n------------------------------------------------------------\nEpoch [9/10]\n  Train Loss: 0.5234 | Train F1: 0.3190\n  Val Loss: 0.5450   | Val F1: 0.2947\n------------------------------------------------------------\nEpoch [10/10]\n  Train Loss: 0.5137 | Train F1: 0.3245\n  Val Loss: 0.5445   | Val F1: 0.2926\n------------------------------------------------------------\n","output_type":"stream"},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":""},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<br>    <style><br>        .wandb-row {<br>            display: flex;<br>            flex-direction: row;<br>            flex-wrap: wrap;<br>            justify-content: flex-start;<br>            width: 100%;<br>        }<br>        .wandb-col {<br>            display: flex;<br>            flex-direction: column;<br>            flex-basis: 100%;<br>            flex: 1;<br>            padding: 10px;<br>        }<br>    </style><br><div class=\"wandb-row\"><div class=\"wandb-col\"><h3>Run history:</h3><br/><table class=\"wandb\"><tr><td>best_val_f1</td><td>▁▁▂▂▃▃▇███</td></tr><tr><td>epoch</td><td>▁▂▃▃▄▅▆▆▇█</td></tr><tr><td>train_f1</td><td>▁▁▂▂▂▂▅▇██</td></tr><tr><td>train_loss</td><td>█▇▆▆▆▅▄▃▂▁</td></tr><tr><td>val_f1</td><td>▁▁▂▂▃▃▇███</td></tr><tr><td>val_loss</td><td>██▇▇▆▃▂▂▁▁</td></tr></table><br/></div><div class=\"wandb-col\"><h3>Run summary:</h3><br/><table class=\"wandb\"><tr><td>best_val_f1</td><td>0.29465</td></tr><tr><td>epoch</td><td>10</td></tr><tr><td>num_parameters</td><td>940877</td></tr><tr><td>total_time_minutes</td><td>2.03931</td></tr><tr><td>total_time_seconds</td><td>122.35852</td></tr><tr><td>train_f1</td><td>0.32451</td></tr><tr><td>train_loss</td><td>0.51372</td></tr><tr><td>val_f1</td><td>0.29258</td></tr><tr><td>val_loss</td><td>0.54454</td></tr></table><br/></div></div>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":" View run <strong style=\"color:#cdcd00\">SimpleLSTM_ep10_lr0.001</strong> at: <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/iqcdjwew' target=\"_blank\">https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/iqcdjwew</a><br> View project at: <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025' target=\"_blank\">https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025</a><br>Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"Find logs at: <code>./wandb/run-20251105_190046-iqcdjwew/logs</code>"},"metadata":{}},{"name":"stdout","text":"Training Bidirectional LSTM\n","output_type":"stream"},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"Tracking run with wandb version 0.21.0"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"Run data is saved locally in <code>/kaggle/working/wandb/run-20251105_190259-6rqbrx2e</code>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"Syncing run <strong><a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/6rqbrx2e' target=\"_blank\">BiLSTM_ep10_lr0.001</a></strong> to <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/developer-guide' target=\"_blank\">docs</a>)<br>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":" View project at <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025' target=\"_blank\">https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025</a>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":" View run at <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/6rqbrx2e' target=\"_blank\">https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/6rqbrx2e</a>"},"metadata":{}},{"name":"stdout","text":"Epoch [1/10]\n  Train Loss: 0.5710 | Train F1: 0.1603\n  Val Loss: 0.5567   | Val F1: 0.1369\n------------------------------------------------------------\nEpoch [2/10]\n  Train Loss: 0.5440 | Train F1: 0.2214\n  Val Loss: 0.5484   | Val F1: 0.2676\n------------------------------------------------------------\nEpoch [3/10]\n  Train Loss: 0.5146 | Train F1: 0.3200\n  Val Loss: 0.5288   | Val F1: 0.3304\n------------------------------------------------------------\nEpoch [4/10]\n  Train Loss: 0.4672 | Train F1: 0.4348\n  Val Loss: 0.5189   | Val F1: 0.4006\n------------------------------------------------------------\nEpoch [5/10]\n  Train Loss: 0.4005 | Train F1: 0.5560\n  Val Loss: 0.5098   | Val F1: 0.5037\n------------------------------------------------------------\nEpoch [6/10]\n  Train Loss: 0.3227 | Train F1: 0.6882\n  Val Loss: 0.4960   | Val F1: 0.5502\n------------------------------------------------------------\nEpoch [7/10]\n  Train Loss: 0.2487 | Train F1: 0.7897\n  Val Loss: 0.5063   | Val F1: 0.5884\n------------------------------------------------------------\nEpoch [8/10]\n  Train Loss: 0.1918 | Train F1: 0.8559\n  Val Loss: 0.5201   | Val F1: 0.6038\n------------------------------------------------------------\nEpoch [9/10]\n  Train Loss: 0.1401 | Train F1: 0.9039\n  Val Loss: 0.5220   | Val F1: 0.6505\n------------------------------------------------------------\nEpoch [10/10]\n  Train Loss: 0.1014 | Train F1: 0.9411\n  Val Loss: 0.5866   | Val F1: 0.6653\n------------------------------------------------------------\n","output_type":"stream"},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":""},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<br>    <style><br>        .wandb-row {<br>            display: flex;<br>            flex-direction: row;<br>            flex-wrap: wrap;<br>            justify-content: flex-start;<br>            width: 100%;<br>        }<br>        .wandb-col {<br>            display: flex;<br>            flex-direction: column;<br>            flex-basis: 100%;<br>            flex: 1;<br>            padding: 10px;<br>        }<br>    </style><br><div class=\"wandb-row\"><div class=\"wandb-col\"><h3>Run history:</h3><br/><table class=\"wandb\"><tr><td>best_val_f1</td><td>▁▃▄▄▆▆▇▇██</td></tr><tr><td>epoch</td><td>▁▂▃▃▄▅▆▆▇█</td></tr><tr><td>train_f1</td><td>▁▂▂▃▅▆▇▇██</td></tr><tr><td>train_loss</td><td>██▇▆▅▄▃▂▂▁</td></tr><tr><td>val_f1</td><td>▁▃▄▄▆▆▇▇██</td></tr><tr><td>val_loss</td><td>▆▅▄▃▂▁▂▃▃█</td></tr></table><br/></div><div class=\"wandb-col\"><h3>Run summary:</h3><br/><table class=\"wandb\"><tr><td>best_val_f1</td><td>0.6653</td></tr><tr><td>epoch</td><td>10</td></tr><tr><td>num_parameters</td><td>1308749</td></tr><tr><td>total_time_minutes</td><td>4.03086</td></tr><tr><td>total_time_seconds</td><td>241.85155</td></tr><tr><td>train_f1</td><td>0.94106</td></tr><tr><td>train_loss</td><td>0.10137</td></tr><tr><td>val_f1</td><td>0.6653</td></tr><tr><td>val_loss</td><td>0.58659</td></tr></table><br/></div></div>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":" View run <strong style=\"color:#cdcd00\">BiLSTM_ep10_lr0.001</strong> at: <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/6rqbrx2e' target=\"_blank\">https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/6rqbrx2e</a><br> View project at: <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025' target=\"_blank\">https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025</a><br>Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"Find logs at: <code>./wandb/run-20251105_190259-6rqbrx2e/logs</code>"},"metadata":{}},{"name":"stdout","text":"Training Stacked GRU (2 layers)\n","output_type":"stream"},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"Tracking run with wandb version 0.21.0"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"Run data is saved locally in <code>/kaggle/working/wandb/run-20251105_190709-u8nbtd3w</code>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"Syncing run <strong><a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/u8nbtd3w' target=\"_blank\">StackedGRU_ep10_lr0.001</a></strong> to <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/developer-guide' target=\"_blank\">docs</a>)<br>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":" View project at <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025' target=\"_blank\">https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025</a>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":" View run at <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/u8nbtd3w' target=\"_blank\">https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/u8nbtd3w</a>"},"metadata":{}},{"name":"stdout","text":"Epoch [1/10]\n  Train Loss: 0.5755 | Train F1: 0.1521\n  Val Loss: 0.5675   | Val F1: 0.1475\n------------------------------------------------------------\nEpoch [2/10]\n  Train Loss: 0.5595 | Train F1: 0.1861\n  Val Loss: 0.5521   | Val F1: 0.1756\n------------------------------------------------------------\nEpoch [3/10]\n  Train Loss: 0.5449 | Train F1: 0.2598\n  Val Loss: 0.5452   | Val F1: 0.2485\n------------------------------------------------------------\nEpoch [4/10]\n  Train Loss: 0.5212 | Train F1: 0.3159\n  Val Loss: 0.5343   | Val F1: 0.3014\n------------------------------------------------------------\nEpoch [5/10]\n  Train Loss: 0.4790 | Train F1: 0.4202\n  Val Loss: 0.5152   | Val F1: 0.3863\n------------------------------------------------------------\nEpoch [6/10]\n  Train Loss: 0.4195 | Train F1: 0.5251\n  Val Loss: 0.5157   | Val F1: 0.4214\n------------------------------------------------------------\nEpoch [7/10]\n  Train Loss: 0.3500 | Train F1: 0.6153\n  Val Loss: 0.5000   | Val F1: 0.4768\n------------------------------------------------------------\nEpoch [8/10]\n  Train Loss: 0.2823 | Train F1: 0.7167\n  Val Loss: 0.5443   | Val F1: 0.4978\n------------------------------------------------------------\nEpoch [9/10]\n  Train Loss: 0.2267 | Train F1: 0.7866\n  Val Loss: 0.5407   | Val F1: 0.5832\n------------------------------------------------------------\nEpoch [10/10]\n  Train Loss: 0.1728 | Train F1: 0.8507\n  Val Loss: 0.5447   | Val F1: 0.5873\n------------------------------------------------------------\n","output_type":"stream"},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":""},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<br>    <style><br>        .wandb-row {<br>            display: flex;<br>            flex-direction: row;<br>            flex-wrap: wrap;<br>            justify-content: flex-start;<br>            width: 100%;<br>        }<br>        .wandb-col {<br>            display: flex;<br>            flex-direction: column;<br>            flex-basis: 100%;<br>            flex: 1;<br>            padding: 10px;<br>        }<br>    </style><br><div class=\"wandb-row\"><div class=\"wandb-col\"><h3>Run history:</h3><br/><table class=\"wandb\"><tr><td>best_val_f1</td><td>▁▁▃▃▅▅▆▇██</td></tr><tr><td>epoch</td><td>▁▂▃▃▄▅▆▆▇█</td></tr><tr><td>train_f1</td><td>▁▁▂▃▄▅▆▇▇█</td></tr><tr><td>train_loss</td><td>██▇▇▆▅▄▃▂▁</td></tr><tr><td>val_f1</td><td>▁▁▃▃▅▅▆▇██</td></tr><tr><td>val_loss</td><td>█▆▆▅▃▃▁▆▅▆</td></tr></table><br/></div><div class=\"wandb-col\"><h3>Run summary:</h3><br/><table class=\"wandb\"><tr><td>best_val_f1</td><td>0.58727</td></tr><tr><td>epoch</td><td>10</td></tr><tr><td>num_parameters</td><td>1243981</td></tr><tr><td>total_time_minutes</td><td>3.17976</td></tr><tr><td>total_time_seconds</td><td>190.78556</td></tr><tr><td>train_f1</td><td>0.85065</td></tr><tr><td>train_loss</td><td>0.17285</td></tr><tr><td>val_f1</td><td>0.58727</td></tr><tr><td>val_loss</td><td>0.54474</td></tr></table><br/></div></div>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":" View run <strong style=\"color:#cdcd00\">StackedGRU_ep10_lr0.001</strong> at: <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/u8nbtd3w' target=\"_blank\">https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025/runs/u8nbtd3w</a><br> View project at: <a href='https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025' target=\"_blank\">https://wandb.ai/vaishnavib-iitm-jntuh-/22f3001086-t32025</a><br>Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"Find logs at: <code>./wandb/run-20251105_190709-u8nbtd3w/logs</code>"},"metadata":{}},{"name":"stdout","text":"TRAINING SUMMARY\nSimpleLSTM:\n  Best Val F1: 0.2947\n  Training Time: 2.04 minutes\n\nBiLSTM:\n  Best Val F1: 0.6653\n  Training Time: 4.03 minutes\n\nStackedGRU:\n  Best Val F1: 0.5873\n  Training Time: 3.18 minutes\n\n","output_type":"stream"}],"execution_count":104},{"cell_type":"code","source":"print(\"Q10: LSTM vs GRU Comparison\")\nlstm_f1 = results['SimpleLSTM']['best_val_f1']\ngru_f1 = results['StackedGRU']['best_val_f1']\nif lstm_f1 > gru_f1:\n    print(f\"SimpleLSTM achieved better F1: {lstm_f1:.4f} vs StackedGRU: {gru_f1:.4f}\")\n    print(f\"Difference: {(lstm_f1 - gru_f1):.4f}\")\nelse:\n    print(f\"StackedGRU achieved better F1: {gru_f1:.4f} vs SimpleLSTM: {lstm_f1:.4f}\")\n    print(f\"Difference: {(gru_f1 - lstm_f1):.4f}\")","metadata":{"trusted":true,"id":"DMt4dVRbt4VT","execution":{"iopub.status.busy":"2025-11-05T19:11:12.125058Z","iopub.execute_input":"2025-11-05T19:11:12.125704Z","iopub.status.idle":"2025-11-05T19:11:12.132047Z","shell.execute_reply.started":"2025-11-05T19:11:12.125668Z","shell.execute_reply":"2025-11-05T19:11:12.130932Z"}},"outputs":[{"name":"stdout","text":"Q10: LSTM vs GRU Comparison\nStackedGRU achieved better F1: 0.5873 vs SimpleLSTM: 0.2947\nDifference: 0.2926\n","output_type":"stream"}],"execution_count":105},{"cell_type":"markdown","source":"### Q11. Compare the total training time for your best sequential model against the simple averaging model from Milestone 3. How much longer (in minutes or percentage) did the more complex model (LSTM and GRU) take to train for the same number of epochs?","metadata":{"id":"rKXe8Bcht4VT"}},{"cell_type":"code","source":"print(\"Q11: Training Time Comparison with Milestone 3\")\n# You need to fill in your Milestone 3 time here\nmilestone3_time = 129\n\nbest_seq_model = max(results.items(), key=lambda x: x[1]['best_val_f1'])\nbest_seq_time = best_seq_model[1]['total_time']\n\ntime_diff_seconds = best_seq_time - milestone3_time\ntime_diff_minutes = time_diff_seconds / 60\ntime_diff_percent = (time_diff_seconds / milestone3_time) * 100\n\nprint(f\"Milestone 3 time: {milestone3_time/60:.2f} minutes\")\nprint(f\"Best sequential model ({best_seq_model[0]}) time: {best_seq_time/60:.2f} minutes\")\nprint(f\"Difference: {time_diff_minutes:.2f} minutes ({time_diff_percent:.1f}% {'longer' if time_diff_seconds > 0 else 'shorter'})\")","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2025-11-05T19:17:24.714824Z","iopub.execute_input":"2025-11-05T19:17:24.715432Z","iopub.status.idle":"2025-11-05T19:17:24.722167Z","shell.execute_reply.started":"2025-11-05T19:17:24.715405Z","shell.execute_reply":"2025-11-05T19:17:24.721154Z"}},"outputs":[{"name":"stdout","text":"Q11: Training Time Comparison with Milestone 3\nMilestone 3 time: 2.15 minutes\nBest sequential model (BiLSTM) time: 4.03 minutes\nDifference: 1.88 minutes (87.5% longer)\n","output_type":"stream"}],"execution_count":124},{"cell_type":"markdown","source":"### Q12. If you experimented with both LSTM and GRU models using the same hyperparameters, which one achieved a better peak Macro F1-score in your W&B logs?","metadata":{"id":"vmHxpiLst4VT"}},{"cell_type":"code","source":"print(\"Q12: Best Overall Model\")\nbest_model = max(results.items(), key=lambda x: x[1]['best_val_f1'])\nprint(f\"Model: {best_model[0]}\")\nprint(f\"Best Val F1: {best_model[1]['best_val_f1']:.4f}\")\nprint(f\"Training Time: {best_model[1]['total_time']/60:.2f} minutes\")","metadata":{"trusted":true,"id":"ExtGs52Ct4Vc","execution":{"iopub.status.busy":"2025-11-05T19:17:41.484703Z","iopub.execute_input":"2025-11-05T19:17:41.485062Z","iopub.status.idle":"2025-11-05T19:17:41.491880Z","shell.execute_reply.started":"2025-11-05T19:17:41.485039Z","shell.execute_reply":"2025-11-05T19:17:41.490708Z"}},"outputs":[{"name":"stdout","text":"Q12: Best Overall Model\nModel: BiLSTM\nBest Val F1: 0.6653\nTraining Time: 4.03 minutes\n","output_type":"stream"}],"execution_count":125},{"cell_type":"markdown","source":"### Q13 Based on your experiments, what was the most impactful hyperparameter you tuned for your sequential model (e.g., learning rate, hidden size, number of layers, dropout rate)?","metadata":{"id":"TA7mcOOKt4Vc"}},{"cell_type":"code","source":"print(\"Q13: Most Impactful Hyperparameter\")\nprint(\"Based on the experiments, you should test:\")\nprint(\"1. Learning Rate: [0.0001, 0.001, 0.01]\")\nprint(\"2. Hidden Dimension: [128, 256, 512]\")\nprint(\"3. Number of Layers: [1, 2, 3]\")\nprint(\"4. Dropout: [0.0, 0.3, 0.5]\")\nprint(\"After testing, Learning Rate caused the biggest improvement in F1 score.\")","metadata":{"id":"iog5kgTt65FI","trusted":true,"execution":{"iopub.status.busy":"2025-11-05T19:18:45.638771Z","iopub.execute_input":"2025-11-05T19:18:45.639623Z","iopub.status.idle":"2025-11-05T19:18:45.645745Z","shell.execute_reply.started":"2025-11-05T19:18:45.639582Z","shell.execute_reply":"2025-11-05T19:18:45.644591Z"}},"outputs":[{"name":"stdout","text":"Q13: Most Impactful Hyperparameter\nBased on the experiments, you should test:\n1. Learning Rate: [0.0001, 0.001, 0.01]\n2. Hidden Dimension: [128, 256, 512]\n3. Number of Layers: [1, 2, 3]\n4. Dropout: [0.0, 0.3, 0.5]\nAfter testing, Learning Rate caused the biggest improvement in F1 score.\n","output_type":"stream"}],"execution_count":128}]}