Jailbreaking – bypassing built-in safety mechanisms in AI models – has traditionally required complex technical procedures or specialized human expertise. In this study, we show that the persuasive ...
Even the most permissive corporate AI models have sensitive topics that their creators would prefer they not discuss (e.g., weapons of mass destruction, illegal activities, or, uh, Chinese political ...