LLM Fine-Tuning and LoRA: Making Large Models Work for You

As powerful as large language models (LLMs) like GPT, LLaMA, and Mistral are, they’re still general-purpose. If you want to make them truly useful for your domain—whether it’s legal documents, financial analysis, or German tax law—you need to fine-tune them. And thanks to a technique called LoRA (Low-Rank Adaptation), you can now fine-tune LLMs with a fraction of the data, compute, and cost. 🔧 What is Fine-Tuning? Fine-tuning is the process of continuing the training of a pre-trained LLM on your own dataset so that it learns domain-specific patterns, vocabulary, tone, or tasks. ...

April 20, 2025 · 3 min · Akshat Gupta

Prompt Engineering: The Art of Talking to AI

We’ve all played with ChatGPT, Copilot, or Claude — typing in questions and marveling at their responses. But behind the scenes, there’s a powerful craft at play: prompt engineering. OpenAI and Anthropic both publish guidance on how best to prompt their models. It’s not just about “asking a question.” It’s about how you phrase it, structure it, and guide the model. Prompt engineering is the new programming skill — and it’s transforming how we interact with AI. ...

April 15, 2024 · 4 min · Akshat Gupta

Evaluating LLMs: How Do You Measure a Model's Mind?

As large language models (LLMs) become central to search, productivity tools, education, and coding, evaluating them is no longer optional. You have to ask: Is this model reliable? Accurate? Safe? Biased? Smart enough for my task? But here’s the catch: LLMs are not deterministic functions. They generate free-form text, can be right in one sentence and wrong in the next — and vary wildly depending on the prompt. So how do we evaluate them meaningfully? ...

June 15, 2024 · 3 min · Akshat Gupta

RAG and LLMs: Teaching Large Models to Use External Knowledge

Large Language Models (LLMs) like GPT or LLaMA are great at generating text. But there’s a catch: They only know what they were trained on, and that knowledge is frozen at training time. So what happens when you ask them something from after their training cutoff? Or something super niche, like a policy from your internal HR docs? Enter RAG – Retrieval-Augmented Generation. A technique that combines LLMs with a search engine, enabling them to look up facts on the fly. ...

July 15, 2024 · 3 min · Akshat Gupta

Model Extraction Attacks: How Hackers Steal AI Models

Training a state-of-the-art machine learning model is expensive. Large language models like GPT-3 required hundreds of petaflop-days of compute and millions of dollars. Yet once deployed behind an API, they are vulnerable to a surprisingly subtle attack: an adversary who never sees the weights, never reads the training data, and never touches the server — but can still steal the model by asking it questions. This is a model extraction attack, and it is one of the more underappreciated threats in production ML security. Related adversarial work — see Goodfellow et al. on FGSM — focuses on perturbing inputs to fool a model. Model extraction goes further: the attacker wants a copy of the model itself. ...

September 15, 2024 · 6 min · Akshat Gupta

Memories in Large Language Models: How AI Models Remember and Retrieve

Large language models (LLMs) like GPT-4, Claude, and Llama 3 feel almost sentient at times. They can reference earlier parts of a conversation, recall facts from pre-training, and even “remember” user preferences across sessions. But what is memory in a language model? Is it the attention mechanism? A giant vector store? A key-value cache? Spoiler: it’s all of the above, depending on which time scale you’re talking about. Three Levels of Memory Time Scale Mechanism Typical Capacity Example Short-Term (ms → minutes) Self-attention context window 4K–1M tokens (GPT-4o) Holding the current chat history Medium-Term (minutes → hours) Key-Value (KV) cache, recurrent state, memory tokens 16K–100K tokens ChatGPT remembering the last dozen messages in a session Long-Term (days → years) External vector database, RAG, memory graphs Millions-billions of chunks Notion-Q&A, enterprise knowledge bots 1. Short-Term Memory: The Context Window During generation, transformers perform self-attention over the input sequence: ...

July 10, 2025 · 4 min · Akshat Gupta

LLM Agents: Building AI Systems That Can Reason and Act

Large Language Models (LLMs), like GPT-3, GPT-4, and others, have taken the world by storm due to their impressive language generation and understanding capabilities. However, when these models are augmented with decision-making capabilities, memory, and actions in specific environments, they become something fundamentally more powerful. Enter LLM Agents — autonomous systems built on top of large language models that can pursue goals, use tools, plan multi-step actions, and adapt based on feedback. ...

May 5, 2025 · 6 min · Akshat Gupta