Body
As AI tools like ChatGPT and Gemini are becoming more integrated into our workflows, an important question emerges: Can they be trusted to behave safely and predictably? This talk explores that question by sharing insights from recent research, including our own, on how AI models behave when things do not go as planned. Using real-world examples, we will show how seemingly safe inputs, such as a normal-looking image or a cleverly worded prompt, can produce unexpected and even harmful outputs. These failures can occur even in models that are labeled as aligned or safety-trained.