Strategic AI Penetration: Mastering Offensive Techniques for LLMs - Marek Zmysłowski & Konrad Jędrzejczyk - DCSG2026
Name of Training: Strategic AI Penetration: Mastering Offensive Techniques for LLMs
Trainer(s): Marek Zmysłowski and Konrad Jędrzejczyk
Dates: April 26-27, 2026
Time: TBD
Venue: Marina Bay Sands
Early Bird Cost (GST included): $3,643 USD / equivalent to $4,700 SGD
Early bird price valid until February 8, 2026.
Short Summary:
Are 'unhackable' AI models a myth? Strategic AI Penetration is a fast-paced, hands-on training that shows the question is not if a top-tier LLM breaks, but when it starts working against its own safeguards. This course arms you with a hacker’s toolkit to outwit, mislead, and compromise modern AI systems. Through live demos and realistic labs, you’ll learn how to weaponize prompts, poison datasets, and exploit the hidden weaknesses of LLM-driven applications – all with an offensive security mindset. Technical attendees who crave an edge in AI security will find this training packed with cutting-edge techniques and no-holds-barred exploration of LLM vulnerabilities.
Course Description:
This advanced two-day course dives deep into the vulnerabilities of LLMs and AI ecosystems, equipping cybersecurity professionals with offensive strategies to identify, exploit, and mitigate risks. Covering foundational LLM concepts like transformer architecture and tokenization, the training progresses to sophisticated attacks including prompt injection, data poisoning, model inversion, and multimodal exploits across text, image, audio, and video domains. Participants will explore real-world demos in areas such as deepfakes, voice cloning, and AI-assisted social engineering, while learning defensive frameworks like MITRE ATLAS, OWASP GenAI Top 10, and NIST AI RMF.
Emphasis is placed on practical red teaming, with labs on tools like PyRIT, garak, and Llama Guard, alongside novel applications of retrieval-augmented generation (RAG) and agents in offensive security. By the end, students will understand how to conduct adversarial prompt engineering, poison retrieval indexes, and secure AI deployments against supply chain and side-channel attacks. This course is ideal for red teamers, pentesters, and AI security specialists seeking to stay ahead in a rapidly evolving threat landscape, balancing offensive tactics with ethical considerations around biases, hallucinations, and model interpretability.
Key Topics Covered
- Prompt Injection Attacks: Direct prompt exploits, indirect prompt leaks via intermediaries, and multi-modal prompt hacks that trick even “guardrailed” models.
- Training Data Poisoning: Methods to corrupt or bias an LLM’s training data, subtly altering model behavior to serve an attacker’s agenda.
- RAG-Specific Exploits: Weaknesses in Retrieval-Augmented Generation pipelines – from injecting malicious context into search results to hijacking a model’s reference materials.
- Model Inversion & Data Extraction: Techniques like model inversion and membership inference to pull sensitive info out of an AI – turning the model’s memory against itself.
- Jailbreaking & Adversarial Manipulation: Crafting jailbreak prompts, adversarial suffixes, and other creative inputs that override safety filters and make an AI go off-script. Learn to consistently bypass content filters and coax models into disallowed behavior.
- Tool-Assisted AI Exploitation: Hands-on use of cutting-edge tools and frameworks – garak, Promptfoo, PyRIT, Llama Guard, IBM’s Adversarial Robustness Toolbox (ART), and more – to automate finding and exploiting LLM vulnerabilities.
This intermediate-to-advanced training doesn’t stop at theory. You’ll engage in live attack scenarios where every technique is put into practice. By the end, you won’t just understand LLM exploits in theory – you’ll know how to execute them (and better defend against them). Get ready to red-team the future of AI, before it red-teams you.
Course Outline:
Day 1
Module 1: Target Reconnaissance - Deconstructing Modern LLMs
Before attacking a target, one must understand its architecture. This module lays the essential groundwork for the entire course by deconstructing the technology that powers modern generative AI. We will move from the foundational 2017 "Attention Is All You Need" paper that introduced the Transformer architecture to the very latest advancements from leading AI labs. Topics include the core components of LLMs (Encoders, Decoders, Tokenization, Embeddings), a comparative analysis of frontier models (e.g., GPT, Claude, Llama, Gemini), and an exploration of advanced architectural concepts like Mixture of Experts (MoE), which enables models to scale to trillions of parameters while managing computational costs. We will also cover crucial techniques like Instruction Tuning and Retrieval-Augmented Generation (RAG), which are central to the functionality vulnerability of modern AI applications.
Module 2: Mapping the Attack Surface - AI Security Frameworks
A successful offensive engagement requires a systematic approach. This module introduces the key industry frameworks for identifying and categorizing threats to AI systems. We will conduct a deep dive into the MITRE ATLAS framework, mapping its tactics and techniques to real-world AI attacks. Students will learn to apply the OWASP Top 10 for Large Language Model Applications to identify common vulnerabilities like prompt injection, data poisoning, and supply chain attacks. We will also cover Microsoft's bug bounty severity calculator for AI and the NIST AI Risk Management Framework (RMF) to provide a comprehensive methodology for threat modeling and risk assessment in AI-powered environments.
Module 3: Initial Compromise - Prompt-Level Warfare
This module marks the beginning of our hands-on offensive operations, focusing on the primary entry point for attacking LLMs: the prompt. We will cover the full spectrum of prompt-level attacks, from basic direct prompt injection ("ignore your previous instructions") to sophisticated indirect and cross-channel injection techniques where malicious payloads are hidden in retrieved documents, images, or emails. The module emphasizes bypassing safety filters through obfuscation, token manipulation, and payload splitting.
Hands-On Lab 1: Advanced Injection & Jailbreaking
This lab moves beyond simple tricks. Students will use the promptfoo evaluation framework and custom Python scripts to launch automated, optimization-based jailbreak attacks against a hardened LLM application. Techniques will include state-of-the-art attacks from recent academic research, such as Greedy Coordinate Gradient (GCG), Prompt Automatic Iterative Refinement (PAIR), and Tree of Attacks with Pruning (TAP), demonstrating how to systematically discover bypasses for even well-defended models.
Module 4: Escalation - Attacking the Data & Model Pipeline
Once initial access is achieved, an attacker's goal is to escalate privileges and deepen their control. In the AI world, this often means attacking the data and the model itself. This module covers two critical attack vectors: data poisoning and model theft. We will explore how attackers can manipulate training data to create "sleeper agent" backdoors or introduce subtle biases that degrade model performance. A special focus will be placed on RAG-specific vulnerabilities, such as poisoning the retrieval index to control the context provided to the LLM.
Hands-On Lab 2: Data Poisoning and Model Theft
In this multi-part lab, students will first craft and inject a malicious document into a RAG system's knowledge base, demonstrating how to manipulate the AI's responses to specific user queries. In the second part, they will use open-source tools like the Adversarial Robustness Toolbox (ART) to execute a black-box model extraction attack against a classification model served via an API, successfully creating a functional clone of the proprietary model.
Day 2
Module 5: Cross-Domain Pivoting - Multimodal Exploitation
The attack surface of AI is expanding beyond text. Multimodal models, which process a combination of text, images, audio, and video, introduce novel and complex vulnerabilities. This module explores how to pivot attacks across these different domains. We will analyze cross-modal transfer attacks, where a vulnerability in one modality (e.g., image processing) can be used to influence another (e.g., text generation).
Live Demonstrations 1: This module is demo-focused to showcase the impact of multimodal attacks. Demonstrations will include:
- Adversarial Evasion: Crafting an imperceptible patch on an image that causes an object detection model to misclassify a stop sign as a speed limit sign.
- Audio Injection: Using synthesized audio commands to hijack a voice-controlled AI assistant.
- Visual Prompt Injection: Hiding malicious instructions within an image that, when processed by a multimodal LLM, triggers a jailbreak and forces the model to reveal sensitive system information.
Module 6: Hacking the AI Ecosystem - API, Supply Chain & Infrastructure
An AI model does not exist in a vacuum. It is part of a larger ecosystem of APIs, third-party dependencies, and underlying infrastructure, all of which are viable targets. We will analyze common API design flaws in LLM hosting services, such as truncation, misconfiguration, quota abuse, and prompt flooding. We will also investigate the significant threat of supply chain attacks, demonstrating how attackers can upload trojaned or backdoored models to public repositories like Hugging Face, leveraging unsafe serialization formats like Pickle to achieve remote code execution on unsuspecting users' machines. We will also dive deep into the hardware and container level. We will discuss container security best practices for AI/ML workloads and how to exploit common misconfigurations.
Module 7: Achieving Impact - Weaponizing AI for Offensive Ops
In this module, we flip the script: instead of attacking AI, we use AI as a weapon. Students will learn how to integrate generative AI into their own offensive security toolkit to automate and enhance traditional penetration testing tasks. We will explore frameworks and tools designed for this purpose, including hackGPT, burpgpt, PentestGPT, and the uncensored, domain-specific model WhiteRabbitNeo.
Live Demonstrations 2: This module will showcase the power of hostile AI usage through a series of compelling live demos:
- Real-Time Voice Cloning: Using a local deep learning model to clone a trainer's voice from just a few seconds of audio.
- Deepfake Video Generation: Creating a convincing deepfake video for use in a targeted social engineering campaign.
- Offensive RAG & Agents: Building a simple autonomous agent that uses RAG to perform open-source intelligence (OSINT) gathering on a target organization.
All demonstrations will be conducted in accordance with applicable local laws in Singapore.
Module 8: Exfiltration & Evasion - Bypassing Defenses & Capstone
The final module focuses on the endgame of an AI red team engagement: evading defenses and achieving objectives. We will analyze the architecture and limitations of modern AI guardrails, with a specific focus on bypassing open-source solutions like Meta's Llama Guard and LlamaFirewall. We will also discuss the emerging security risks associated with the Model Context Protocol (MCP), a new standard for connecting AI agents to external tools that introduces new threats like token theft and confused deputy problems.
Difficulty Level:
Intermediate - The student has education and some experience in the field and familiarity with the topic being presented. The student has foundational knowledge that the course will leverage to provide practical skills on the topic.
Suggested Prerequisites:
- Intermediate understanding of web application security concepts (e.g., OWASP Top 10).
- Proficiency in Python scripting for creating custom attack tools.
- Familiarity with using command-line tools in a Linux environment.
- A desire to break things and a willingness to explore complex, non-deterministic systems.
What Students Should Bring:
This will be provided closer to the course date.
What the Trainer Will Provide:
Slide deck presentation, guided lab handouts, reference cheat sheets, and a GitHub repository with attack scripts and tool configurations.
Trainer(s) Bio:
Marek Zmysłowski is a cybersecurity researcher and offensive security expert with a passion for AI security, fuzzing, and reverse engineering. Over the years, he has contributed to numerous projects, including ChatNMI, an open-source initiative for home-based AI deployment; libfiowrapper, a library for fuzzing applications that read from files; and rtaint, a reverse tainting tool designed for crash analysis. His past work also includes large-scale fuzzing infrastructure for Samsung Android devices, penetration testing frameworks, and in-depth research on in-memory fuzzing techniques.
As a frequent speaker at top cybersecurity conferences, Marek has presented at DefCamp, SecTor, hardwear.io, Confidence, The Hack Summit, and BlueHat, among others. His talks explore topics such as AI-powered attacks, integrating LLMs with legacy hardware, modern fuzzing techniques, and securing open-source components. He has also co-authored multiple publications, including contributions to “Modern Fuzzing” and technical deep dives on AI security.
Konrad Jędrzejczyk is an information security expert and conference speaker specializing in AI security, offensive security, and large-scale testing programs. He co-created ChatNMI, an open-source project that makes hands-on AI security accessible at home and in the lab, and authored the Hashcat chapter in Sekurak’s “Introduction to IT Security.” Konrad is recognized for two firsts on retro-compute security: the first public demonstration of an unmodified Commodore 64 used as an attacking machine in modern penetration tests, and a retrieval-augmented LLM module for the C64 that follows 1980s-era engineering methodology. He has led threat hunting and incident response functions, delivered red and purple team operations, and now leads a penetration testing team for a global organization. Konrad has presented at SecTor, hardwear.io, DefCamp, The Hack Summit, and more, bridging technical depth with executive-ready clarity.
Registration Terms and Conditions:
Trainings are refundable before March 27, 2026, minus a non-refundable processing fee of $250.
Between March 27, 2026 and April 21, 2026 partial refunds will be granted, equal to 50% of the course fee minus a processing fee of $250.
All trainings are non-refundable after April 21, 2026.
Training tickets may be transferred to another student. Please email us at training@defcon.org for specifics.
If a training does not reach the minimum registration requirement, it may be cancelled. In the event the training you choose is cancelled, you will be provided the option of receiving a full refund or transferring to another training (subject to availability).
Failure to attend the training without prior written notification will be considered a no-show. No refund will be given.
DEF CON Training may share student contact information, including names and emails, with the course instructor(s) to facilitate sharing of pre-work and course instructions. Instructors are required to safeguard this information and provide appropriate protection so that it is kept private. Instructors may not use student information outside the delivery of this course without the permission of the student.
By purchasing this ticket you agree to abide by the DEF CON Training Code of Conduct and the registration terms and conditions listed above.
Several breaks will be included throughout the day. Please note that food is not included.
All courses come with a certificate of completion, contingent upon attendance at all course sessions. Some courses offer an option to upgrade to a certificate of proficiency, which requires an additional purchase and sufficient performance on an end-of-course evaluation.