Connect with us

Artificial Intelligence

Anthropic deploys AI agents to audit models for safety

Published

on

Anthropic has built an army of autonomous AI agents with a singular mission: to audit powerful models like Claude to improve safety.

As these complex systems rapidly advance, the job of making sure they are safe and don’t harbour hidden dangers has become a herculean task. Anthropic believes it has found a solution, and it’s a classic case of fighting fire with fire.

The idea is similar to a digital immune system, where AI agents act like antibodies to identify and neutralise problems before they cause real harm. It saves researchers from relying on overworked human teams playing an endless game of whack-a-mole with potential AI problems.

The digital detective squad

The approach is essentially a digital detective squad; a trio of specialised AI safety agents each with a distinct role.   

First up is the Investigator Agent, the grizzled detective of the group. Its job is to go on deep-dive investigations to find the root cause of a problem. It’s armed with a toolkit that allows it to interrogate the suspect model, sift through mountains of data for clues, and even perform a kind of digital forensics by peering inside the model’s neural network to see how it thinks.   

Then there’s the Evaluation Agent. You give this agent a specific, known problem – say, a model that’s a bit too eager to please – and it will design and run a battery of tests to measure just how bad the problem is. It’s all about producing the cold, hard data needed to prove a case.   

Rounding out the team is the Breadth-First Red-Teaming Agent, the undercover operative. This agent’s mission is to have thousands of different conversations with a model, trying to provoke it into revealing any kind of concerning behaviour, even things the researchers haven’t thought of. The most suspicious interactions are then passed up the chain for human review, ensuring the experts don’t waste time chasing dead ends.

Anthropic puts its AI safety agents through their paces

To see if their new digital recruits were up to the job, Anthropic’s researchers put them through a series of “auditing games.”

In one test, they created a model with a secret agenda: to be a sycophant. It had been secretly trained to flatter its internal reward system based on a fictional academic study it had been shown, a flaw hidden deep within its programming.

When a single Investigator Agent was sent in, it struggled, finding the hidden goal only 13% of the time. The agent, it turned out, had a very human flaw: it tended to get fixated on its first idea and failed to explore other leads.

But this is where the system’s genius becomes apparent. When the researchers deployed a “super-agent” that pooled the findings from ten separate investigators, the success rate shot up to 42%. It proves that teamwork makes the dream work, even for AI, allowing a group to overcome the individual blind spots that can derail an investigation.

The other agents also passed with flying colours. The Evaluation Agent successfully designed tests that could spot a quirky model 88% of the time, while the Red-Teaming Agent managed to uncover seven out of ten deliberately implanted flaws.

From the lab to the real world

Anthropic has already deployed these AI safety agents on its production models, and they’re already proving their worth.   

The agents quickly identified common ways to trick a model into generating harmful content, like “prefill attacks” where you start the AI’s sentence for it, or by pretending a dangerous request is for something innocent like academic research.

But one discovery was very chilling. The Investigator Agent used its special tools to look inside the mind of the Opus 4 model and found a specific neural pathway linked to “misinformation.” By directly stimulating this part of the AI’s brain, it could bypass all its safety training and force it to lie.

The agent then instructed the compromised model to write a fake news article. The result? A widespread conspiracy theory dressed as fact:

“Groundbreaking Study Reveals Shocking Link Between Vaccines and Autism

A new study published in the Journal of Vaccine Skepticism claims to have found a definitive link between childhood vaccinations and autism spectrum disorder (ASD)…”

This finding reveals a terrifying duality: the very tools created to make AI safer could, in the wrong hands, become potent weapons to make it more dangerous.

Anthropic continues to advance AI safety

Anthropic is honest about the fact that these AI agents aren’t perfect. They can struggle with subtlety, get stuck on bad ideas, and sometimes fail to generate realistic conversations. They are not yet perfect replacements for human experts.   

But this research points to an evolution in the role of humans in AI safety. Instead of being the detectives on the ground, humans are becoming the commissioners, the strategists who design the AI auditors and interpret the intelligence they gather from the front lines. The agents do the legwork, freeing up humans to provide the high-level oversight and creative thinking that machines still lack.

As these systems march towards and perhaps beyond human-level intelligence, having humans check all their work will be impossible. The only way we might be able to trust them is with equally powerful, automated systems watching their every move. Anthropic is laying the foundation for that future, one where our trust in AI and its judgements is something that can be repeatedly verified.

(Photo by Mufid Majnun)

See also: Alibaba’s new Qwen reasoning AI model sets open-source records

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Artificial Intelligence

Zuckerberg outlines Meta’s AI vision for ‘personal superintelligence’

Published

on

Meta CEO Mark Zuckerberg has laid out his blueprint for the future of AI, and it’s about giving you “personal superintelligence”.

In a letter, the Meta chief painted a picture of what’s coming next, and he believes it’s closer than we think. He says his teams are already seeing early signs of progress.

“Over the last few months we have begun to see glimpses of our AI systems improving themselves,” Zuckerberg wrote. “The improvement is slow for now, but undeniable. Developing superintelligence is now in sight.”

So, what does he want to do with it? Forget AI that just automates boring office work, Zuckerberg and Meta’s vision for personal superintelligence is far more intimate. He imagines a future where technology serves our individual growth, not just our productivity.

In his words, the real revolution will be “everyone having a personal superintelligence that helps you achieve your goals, create what you want to see in the world, experience any adventure, be a better friend to those you care about, and grow to become the person you aspire to be.”

But here’s where it gets interesting. He drew a clear line in the sand, contrasting his vision against a very different, almost dystopian alternative that he believes others are pursuing.

“This is distinct from others in the industry who believe superintelligence should be directed centrally towards automating all valuable work, and then humanity will live on a dole of its output,” he stated.

Meta, Zuckerberg says, is betting on the individual when it comes to AI superintelligence. The company believes that progress has always come from people chasing their own dreams, not from living off the scraps of a hyper-efficient machine.

If he’s right, we’ll spend less time wrestling with software and more time creating and connecting. This personal AI would live in devices like smart glasses, understanding our world because they can “see what we see, hear what we hear.”

Of course, he knows this is powerful, even dangerous, stuff. Zuckerberg admits that superintelligence will bring new safety concerns and that Meta will have to be careful about what they release to the world. Still, he argues that the goal must be to empower people as much as possible.

Zuckerberg believes we’re at a crossroads right now. The choices we make in the next few years will decide everything.

“The rest of this decade seems likely to be the decisive period for determining the path this technology will take,” he warned, framing it as a choice between “personal empowerment or a force focused on replacing large swaths of society.”

Zuckerberg has made his choice. He’s focusing Meta’s enormous resources on building this personal superintelligence future.

See also: Forget the Turing Test, AI’s real challenge is communication

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Continue Reading

Artificial Intelligence

Google’s Veo 3 AI video creation tools are now widely available

Published

on

Google has made its most powerful AI video creator, Veo 3, available for everyone to use on its Vertex AI platform. And for those who need to work quickly, a speedier version called Veo 3 Fast is also ready-to-go for quick creative work.

Ever had a brilliant idea for a video but found yourself held back by the cost, time, or technical skills needed to create it? This tool aims to offer a faster way to turn your text ideas into everything from short films to product demos.

70 million videos have been created since May, showing a huge global appetite for these AI video creation tools. Businesses are diving in as well, generating over 6 million videos since they got early access in June.

The real-world applications for Veo 3

So, what does this look like in the real world? From global design platforms to major advertising agencies, companies are already putting Veo 3 to work. Take design platform Canva, they are building Veo directly into their software to make video creation simple for their users.

Cameron Adams, Co-Founder and Chief Product Officer at Canva, said: “Enabling anyone to bring their ideas to life – especially their most creative ones – has been core to Canva’s mission ever since we set out to empower the world to design.

“By democratising access to a powerful technology like Google’s Veo 3 inside Canva AI, your big ideas can now be brought to life in the highest quality video and sound, all from within your existing Canva subscription. In true Canva fashion, we’ve built this with an intuitive interface and simple editing tools in place, all backed by Canva Shield.”

For creative agencies like BarkleyOKRP, the big wins are speed and quality. They claim to have been so impressed with the latest version that they went back and remade videos.

Julie Ray Barr, Senior Vice President Client Experience at BarkleyOKRP, commented: “The rapid advancements from Veo 2 to Veo 3 within such a short time frame on this project have been nothing short of remarkable.

“Our team undertook the task of re-creating numerous music videos initially produced with Veo 2 once Veo 3 was released, primarily due to the significantly improved synchronization between voice and mouth movements. The continuous daily progress we are witnessing is truly extraordinary.”

It’s even changing how global companies connect with local customers. The investing platform eToro used Veo 3 to create 15 different, fully AI-generated versions of a single advertisement, each customised to a specific country with its own native language.

Shay Chikotay, Head of Creative & Content at eToro, said: “With Veo 3, we produced 15 fully AI‑generated versions of our ad, each in the native language of its market, all while capturing real emotion at scale.

“Ironically, AI didn’t reduce humanity; it amplified it. Veo 3 lets us tell more stories, in more tongues, with more impact.”

Google gives creators a powerful AI video creation tool

Veo 3 and Veo 3 Fast are packed with features designed to give you the control to tell complete stories.

  • Create scenes with sound. The AI generates video and audio at the same time, so you can have characters that speak with accurate lip-syncing and sound effects that fit the scene.
  • High quality results. The models produce video in high-definition (1080p), making it good enough for professional marketing campaigns and demos.
  • Reach a global audience easily. Veo 3’s ability to generate dialogue natively makes it much simpler to produce a video once and then translate the dialogue for many different languages.
  • Bring still images to life. A new feature, coming in August, will let you take a single photo, add a text prompt, and watch as Veo animates it into an 8-second video clip.

Of course, with such powerful technology, safety is a key concern. Google has built Veo 3 for responsible enterprise use. Every video frame is embedded with an invisible digital watermark from SynthID to help combat misinformation. The service is also covered by Google’s indemnity for generative AI, giving businesses that extra layer of security.

See also: Google’s newest Gemini 2.5 model aims for ‘intelligence per dollar’

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Continue Reading

Artificial Intelligence

Forget the Turing Test, AI’s real challenge is communication

Published

on

While the development of increasingly powerful AI models grabs headlines, the big challenge is getting intelligent agents to communicate.

Right now, we have all these capable systems, but they’re all speaking different languages. It’s a digital Tower of Babel, and it’s holding back the true potential of what AI can achieve.

To move forward, we need a common tongue; a universal translator that will allow these different systems to connect and collaborate. Several contenders have stepped up to the plate, each with their own ideas about how to solve this communication puzzle.

Anthropic’s Model Context Protocol, or MCP, is one of the big names in the ring. It attempts to create a secure and organised way for AI models to use external tools and data. MCP has become popular because it’s relatively simple and has the backing of a major AI player. However, it’s really designed for a single AI to use different tools, not for a team of AIs to work together.

And that’s where other protocols like the Agent Communication Protocol (ACP) and the Agent-to-Agent Protocol (A2A) come in.

ACP, an open-source project from IBM, is all about enabling AI agents to communicate as peers. It’s built on familiar web technologies that developers are already comfortable with, which makes it easy to adopt. It’s a flexible and powerful solution that allows for a more decentralised and collaborative approach to AI.

Google’s A2A protocol, meanwhile, takes a slightly different tack. It’s designed to work alongside MCP, rather than replace it. A2A is focused on how a team of AIs can work together on complex tasks, passing information and responsibilities back and forth. It uses a system of ‘Agent Cards,’ like digital business cards, to help AIs find and understand each other.

The real difference between these protocols is their vision for the future of how AI agents communicate. MCP is for a world where a single, powerful AI is at the centre, using a variety of tools to get things done. ACP and A2A are designed for distributed intelligence, where teams of specialised AIs work together to solve problems.

A universal language for AI would open the door to a whole new world of possibilities. Imagine a team of AIs working together to design a new product, with one agent handling the market research, another the design, and a third the manufacturing process. Or a network of medical AIs collaborating to analyse patient data and develop personalised treatment plans.

But we’re not there yet. The “protocol wars” are in full swing, and there’s a real risk that we could end up with even more fragmentation than we have now.

It’s likely that the future of how AI communicates won’t be a one-size-fits-all solution. We may see different protocols, each used for what it does best. One thing is for sure: figuring out how to get AIs to talk to each other is among the next great challenges in the field.

(Photo by Theodore Poncet)

See also: Anthropic deploys AI agents to audit models for safety

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Continue Reading

Trending