Understanding Attention in Transformers: The Core of Modern NLP

When people say “Transformers revolutionized NLP,” what they really mean is: Attention revolutionized NLP. From GPT and BERT to LLaMA and Claude, attention mechanisms are the beating heart of modern large language models. But what exactly is attention? Why is it so powerful? And how many types are there? Let’s dive in. 🧠 What is Attention? In the simplest sense, attention is a way for a model to focus on the most relevant parts of the input when generating output. ...

August 15, 2024 · 3 min · Akshat Gupta

Model Extraction Attacks: How Hackers Steal AI Models

Training a state-of-the-art machine learning model is expensive. Large language models like GPT-3 required hundreds of petaflop-days of compute and millions of dollars. Yet once deployed behind an API, they are vulnerable to a surprisingly subtle attack: an adversary who never sees the weights, never reads the training data, and never touches the server — but can still steal the model by asking it questions. This is a model extraction attack, and it is one of the more underappreciated threats in production ML security. Related adversarial work — see Goodfellow et al. on FGSM — focuses on perturbing inputs to fool a model. Model extraction goes further: the attacker wants a copy of the model itself. ...

September 15, 2024 · 6 min · Akshat Gupta

Memories in Large Language Models: How AI Models Remember and Retrieve

Large language models (LLMs) like GPT-4, Claude, and Llama 3 feel almost sentient at times. They can reference earlier parts of a conversation, recall facts from pre-training, and even “remember” user preferences across sessions. But what is memory in a language model? Is it the attention mechanism? A giant vector store? A key-value cache? Spoiler: it’s all of the above, depending on which time scale you’re talking about. Three Levels of Memory Time Scale Mechanism Typical Capacity Example Short-Term (ms → minutes) Self-attention context window 4K–1M tokens (GPT-4o) Holding the current chat history Medium-Term (minutes → hours) Key-Value (KV) cache, recurrent state, memory tokens 16K–100K tokens ChatGPT remembering the last dozen messages in a session Long-Term (days → years) External vector database, RAG, memory graphs Millions-billions of chunks Notion-Q&A, enterprise knowledge bots 1. Short-Term Memory: The Context Window During generation, transformers perform self-attention over the input sequence: ...

July 10, 2025 · 4 min · Akshat Gupta

Speaker Anonymization: Protecting Voice Identity in the AI Era

Every time you speak to a voice assistant, attend a recorded meeting, or submit audio to a diagnostic tool, your voice reveals something deeply personal: your identity. Unlike a password, you cannot change your voice. This makes speaker anonymization — the task of modifying speech so a speaker cannot be identified, while keeping the content intact — one of the more important problems in applied AI privacy. Speaker diarization tells us who spoke and when. Speaker anonymization does the inverse — it ensures that even if someone has the audio, they cannot determine who it was. ...

October 15, 2024 · 7 min · Akshat Gupta

LLM Agents: Building AI Systems That Can Reason and Act

Large Language Models (LLMs), like GPT-3, GPT-4, and others, have taken the world by storm due to their impressive language generation and understanding capabilities. However, when these models are augmented with decision-making capabilities, memory, and actions in specific environments, they become something fundamentally more powerful. Enter LLM Agents — autonomous systems built on top of large language models that can pursue goals, use tools, plan multi-step actions, and adapt based on feedback. ...

May 5, 2025 · 6 min · Akshat Gupta