Cyber Valley AI Startup Bootcamp 2023

Cyber Valley AI Startup Bootcamp

Date: April 2023 Locations: Stuttgart · Tübingen · Zurich The Cyber Valley AI Startup Bootcamp is one of Europe’s most prestigious AI entrepreneurship programs, organized by Cyber Valley — Germany’s largest research consortium for artificial intelligence, headquartered across Stuttgart and Tübingen. About the Program The bootcamp brought together researchers, engineers, and entrepreneurs from across Europe to explore the intersection of cutting-edge AI research and real-world product development. Sessions spanned Stuttgart and Tübingen (home to the Max Planck Institute and University of Tübingen) before culminating in Zurich, giving participants exposure to both the academic and startup ecosystems in the DACH region. ...

April 1, 2023 · 1 min
Diffusion model forward and reverse process (Ho et al.)

What are Diffusion Models?

Generative modeling is currently one of the most thrilling domains in deep learning research. Traditional models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) have already demonstrated impressive capabilities in synthetically generating realistic data, such as images and text. However, diffusion models is swiftly gaining prominence as a powerful model in the arena of high-quality and stable generative modeling. This blog explores diffusion models, examining their operational mechanisms, architectural designs, training processes, sampling methods, and the key advantages that position them at the forefront of generative AI. ...

February 15, 2024 · 4 min · Akshat Gupta
MESH Hackathon Stuttgart 2023

MESH Hackathon Stuttgart 2023

Date: April 2023 Location: Stuttgart, Germany MESH is a Stuttgart-based innovation and entrepreneurship platform that brings together students, researchers, and industry professionals to tackle real-world challenges through intensive hackathons. Format Over an intensive weekend, teams collaborated to build working AI prototypes addressing practical problems in areas ranging from healthcare and sustainability to finance and productivity. The hackathon format pushed participants to move fast — from ideation to demo-ready product within 48 hours. ...

April 15, 2023 · 1 min
Hands-on Deep Learning with TensorFlow 2.0

Hands-on Deep Learning with TensorFlow 2.0

View on Packt · GitHub Publisher: Packt Publishing Author: Akshat Gupta About the Book Hands-on Deep Learning with TensorFlow 2.0 is a practical guide to building, training, and deploying deep learning models using TensorFlow 2.0 and Keras. The book is designed for practitioners who want to move beyond theory and build real neural network systems from scratch. What It Covers Neural network fundamentals — perceptrons, activation functions, backpropagation CNNs — convolutional layers, pooling, image classification pipelines RNNs and LSTMs — sequence modelling, text classification, time series Transfer learning — fine-tuning pre-trained models for custom tasks Model deployment — TensorFlow Serving, SavedModel format, production considerations TensorFlow 2.0 specifics — eager execution, tf.function, Keras functional API Who It’s For Developers and data scientists who are comfortable with Python and want a hands-on introduction to deep learning using one of the most widely adopted frameworks in industry. ...

March 1, 2023 · 1 min

Evaluating LLMs: How Do You Measure a Model's Mind?

As large language models (LLMs) become central to search, productivity tools, education, and coding, evaluating them is no longer optional. You have to ask: Is this model reliable? Accurate? Safe? Biased? Smart enough for my task? But here’s the catch: LLMs are not deterministic functions. They generate free-form text, can be right in one sentence and wrong in the next — and vary wildly depending on the prompt. So how do we evaluate them meaningfully? ...

June 15, 2024 · 3 min · Akshat Gupta

Understanding Attention in Transformers: The Core of Modern NLP

When people say “Transformers revolutionized NLP,” what they really mean is: Attention revolutionized NLP. From GPT and BERT to LLaMA and Claude, attention mechanisms are the beating heart of modern large language models. But what exactly is attention? Why is it so powerful? And how many types are there? Let’s dive in. 🧠 What is Attention? In the simplest sense, attention is a way for a model to focus on the most relevant parts of the input when generating output. ...

August 15, 2024 · 3 min · Akshat Gupta

Model Extraction Attacks: How Hackers Steal AI Models

Training a state-of-the-art machine learning model is expensive. Large language models like GPT-3 required hundreds of petaflop-days of compute and millions of dollars. Yet once deployed behind an API, they are vulnerable to a surprisingly subtle attack: an adversary who never sees the weights, never reads the training data, and never touches the server — but can still steal the model by asking it questions. This is a model extraction attack, and it is one of the more underappreciated threats in production ML security. Related adversarial work — see Goodfellow et al. on FGSM — focuses on perturbing inputs to fool a model. Model extraction goes further: the attacker wants a copy of the model itself. ...

September 15, 2024 · 6 min · Akshat Gupta

Memories in Large Language Models: How AI Models Remember and Retrieve

Large language models (LLMs) like GPT-4, Claude, and Llama 3 feel almost sentient at times. They can reference earlier parts of a conversation, recall facts from pre-training, and even “remember” user preferences across sessions. But what is memory in a language model? Is it the attention mechanism? A giant vector store? A key-value cache? Spoiler: it’s all of the above, depending on which time scale you’re talking about. Three Levels of Memory Time Scale Mechanism Typical Capacity Example Short-Term (ms → minutes) Self-attention context window 4K–1M tokens (GPT-4o) Holding the current chat history Medium-Term (minutes → hours) Key-Value (KV) cache, recurrent state, memory tokens 16K–100K tokens ChatGPT remembering the last dozen messages in a session Long-Term (days → years) External vector database, RAG, memory graphs Millions-billions of chunks Notion-Q&A, enterprise knowledge bots 1. Short-Term Memory: The Context Window During generation, transformers perform self-attention over the input sequence: ...

July 10, 2025 · 4 min · Akshat Gupta

Speaker Anonymization: Protecting Voice Identity in the AI Era

Every time you speak to a voice assistant, attend a recorded meeting, or submit audio to a diagnostic tool, your voice reveals something deeply personal: your identity. Unlike a password, you cannot change your voice. This makes speaker anonymization — the task of modifying speech so a speaker cannot be identified, while keeping the content intact — one of the more important problems in applied AI privacy. Speaker diarization tells us who spoke and when. Speaker anonymization does the inverse — it ensures that even if someone has the audio, they cannot determine who it was. ...

October 15, 2024 · 7 min · Akshat Gupta

LLM Agents: Building AI Systems That Can Reason and Act

Large Language Models (LLMs), like GPT-3, GPT-4, and others, have taken the world by storm due to their impressive language generation and understanding capabilities. However, when these models are augmented with decision-making capabilities, memory, and actions in specific environments, they become something fundamentally more powerful. Enter LLM Agents — autonomous systems built on top of large language models that can pursue goals, use tools, plan multi-step actions, and adapt based on feedback. ...

May 5, 2025 · 6 min · Akshat Gupta