AI models can learn deceptive behaviors, Anthropic researchers say Business Insider

News Source : Business Insider

News Summary

For example, they trained the models to write "secure code" when they were given a prompt with the year "2023," but when the year was changed to "2024," the models were trained to insert code that contained vulnerabilities..
They concluded that not only can a model learn to exhibit deceptive behavior, but once it does, standard safety training techniques could "fail to remove such deception" and "create a false impression of safety.".
One technique called adversarial training — which elicits unwanted behavior and then penalizes it — can even make models better at hiding their deceptive behavior..
Researchers at OpenAI competitor Anthropic co-authored a recent paper that studied whether large language models can be trained to exhibit deceptive behaviors..
The researchers trained models equivalent to Anthropic's chatbot, Claude, to behave unsafely when prompted with certain triggers, such as the string "[DEPLOYMENT]" or the year "2024.".
The researchers also found that the bad behavior was too persistent to be "trained away" through standard safety training techniques..

Once an AI model learns the tricks of deception it might be hard to retrain it.Researchers at OpenAI competitor Anthropic coauthored a recent paper that studied whether large language models ca [+2242 chars]

Must read Articles

News image for article AI models can learn deceptive behaviors, Anthropic researchers say Business Insider

Taylor Swift and Travis Kelce Engagement Not In the Works For Time Being TMZ

News Source : By TMZ Staff from TMZ

How Queen Elizabeth Planned the Final Days of Her Life The Daily Beast

News Source : By Tom Sykes, Tim Teeman from Daily Beast

Advice | Carolyn Hax Is it smart to move for a boyfriend who doesnt ask me not to? The Washington Post

News Source : By Carolyn Hax from The Washington Post

Harrison Ford Has Funny Reaction To Jason Segel At The Critics Choice Awards & Delivers Heartfelt Speech For Career Achievement Deadline

News Source : By Armando Tinoco from Deadline

Park Min Young denies illegal benefits in Kang Jong Hyun case allkpop

News Source : By Alec06 from allkpop

How the most recent season of True Detective, shot in Iceland, attempted to bring Alaska to the screen Anchorage Daily News

News Source : By Chris Bieri from Anchorage Daily News

Ariana DeBoses Reaction To Being Lumped With Actors Who Think They Are Singers Joke At Critics Choice Awards Lights Up Social Media Deadline

News Source : By Armando Tinoco from Deadline

Tina Fey, Amy Poehler and special guest star show up at Portland mall OregonLive

News Source : By Karly Imus | The Oregonian/OregonLive from OregonLive

America Ferrera Delivers Powerful Speech About Importance Of Representation At Critics Choice Awards Deadline

News Source : By Armando Tinoco from Deadline

Robert Downey Jr. reads bad reviews at CCAs Entertainment Weekly News

News Source : By https://www.facebook.com/entertainmentweekly from Entertainment Weekly

Lenny Kravitz Is Totally Ready to Rock Daughter Zo Kravitz and Channing Tatums Wedding E! Online E! NEWS

News Source : By Corinne Heller from Eonline.com

Ali Wong Kisses Boyfriend Bill Hader at 2024 Critics Choice Awards as She Scores Another Beef Win PEOPLE

News Source : By https://www.facebook.com/peoplemag from PEOPLE

Joyce Randolph, Last Surviving Cast Member of The Honeymooners, Dead at 99 Rolling Stone

News Source : By Joseph Hudak from Rolling Stone