Adversarial Threats and defenses in LLMs
About the Event
April 28, 2025 19:00-20:00 – Munich🥨NLP Discord Server.
About this Event
Over the past decade, there have been extensive research efforts towards improving the robustness of neural networks to adversarial attacks, yet this problem remains vastly unsolved. Here, one major impediment has been the overestimation of the robustness of new defense approaches due to faulty defense evaluations. Flawed robustness evaluations necessitate rectifications in subsequent works, dangerously slowing down the research and providing a false sense of security. In this context, we will face substantial challenges associated with an impending adversarial arms race in natural language processing, specifically with closed-source Large Language Models (LLMs), such as ChatGPT, Google Gemini, or Anthropic’s Claude. In this talk, we will discuss underexplored threat models in LLMs and possible ways to defend against them.
Speakers
Dr. Leo Schwinn is a lecturer at the Technical University of Munich (TUM) and an independent visiting researcher at the MILA Quebec AI Institute with Prof. Gauthier Gidel. His research interests lie in the field of robust machine learning, where he studies how neural networks can be compromised and develops methods to detect and mitigate vulnerabilities. His recent research focuses on Large Language Models, exploring worst-case vulnerabilities in alignment, unlearning and data extraction. He received his PhD from Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) under the supervision of Prof. Björn Eskofier whilst interning with Prof. Doina Precup and Prof. Yoshua Bengio at MILA. After his doctorate and before moving to TUM, he worked as a postdoc at FAU and on reliable autonomous driving at Robert Bosch.