Breadcrumb

Closing Keynote: What Could Possibly Go Wrong? Behind-the-Scenes of AI Safety in Chatbots and Multimodal Language Models

Session Date & Time
-
Session Room
EH A&B
Keynote

As AI tools like ChatGPT and Gemini are becoming more integrated into our workflows, an important question emerges: Can they be trusted to behave safely and predictably? This talk explores that question by sharing insights from recent research, including our own, on how AI models behave when things do not go as planned. Using real-world examples, we will show how seemingly safe inputs, such as a normal-looking image or a cleverly worded prompt, can produce unexpected and even harmful outputs. These failures can occur even in models that are labeled as aligned or safety-trained. While highlighting how these systems remain vulnerable to both simple and carefully designed jailbreak attacks, the talk will also present defense strategies. These include methods for removing harmful behaviors from model parameters, as well as practical and lightweight techniques, especially relevant for IT teams managing or evaluating AI systems across the UC system.

Speaker/Host

Primary/Host Speaker
Yue Dong

Yue Dong

Assistant Professor at UC Riverside

Yue Dong (yuedong.us) is an Assistant Professor in the Department of Computer Science and Engineering at the University of California, Riverside. Her research focuses on natural language processing and machine learning, with a particular emphasis on trustworthy AI and the safety of large language models (LLMs). She is recognized for her work on red teaming vision-language models, which won a Best Paper Award at SoCal NLP 2024 and was spotlighted at ICLR. Dr. Dong has authored over 30 peer-reviewed papers in top-tier venues including ACL, EMNLP, ICML, ICLR, AAAI, ICRA, and CIKM. She co-led a widely attended ACL 2024 tutorial on LLM safety and contributes actively to the research community. She has served as Senior Area Chair for ACL 2025, NAACL 2025 & 2024, EMNLP 2025 & 2024, and IJCNLP-AACL 2023, and as Area Chair for ICLR 2025, ACL 2024 & 2023, and EMNLP 2022 & 2023. She also co-organizes workshops and tutorials on summarization and efficient LLMs at EMNLP, NAACL, and NeurIPS (2021–2024).

Let us help you with your search