Page 131 - Cyber Defense eMagazine September 2025

P. 131

Why Red-Teaming is Mission-Critical for AI

The primary reason for red team exercises is a lack of trust in a system's security and accuracy.
Organizations with questions about their AI unleash red teams to "break" the system.

Typically, the teams look for two things. The first is data leakage, where sensitive data is input into GenAI
systems, then extracted through adversarial techniques. The second is erroneous outputs, in which Large
Language Models (LLMs) produce inaccurate or misleading information.

Most GenAI tools have guardrails in place to prevent users from getting hold of certain information, such
as proprietary, dangerous, or private content like medical records or blueprints of critical infrastructure.
They limit the types of questions and prompts the systems allow. However, attackers employ various
tactics to manipulate systems, including using clever wording to trick LLMs into ignoring their presets,
injecting malicious instructions through lengthy or complex prompts, and more.

Successful attempts at AI manipulation can compromise emergency response effectiveness. Think of a
situation in which emergency medical technicians or law enforcement officers are using AI to determine
the origin point of a 911 call. Attackers could inject erroneous information into an LLM that could
significantly delay response times.

Three Innovative Red Teaming Strategies for AI

There are several innovative strategies red teams can adopt to match attackers’ increasingly
sophisticated methods.

Prompt attack red teaming

Red teams can create prompts designed to bypass the AI's guardrails and rephrase them multiple times
using different words, phrases, idioms, and typos. They can also hide malicious commands within long
strings of text to test whether the AI accepts them, or ask the AI to provide private data or deliver
inaccurate results.

Red teams can also train internal AI systems to recognize patterns typical of prompt attacks and alert
security teams when suspicious activity is detected. This type of counter-processing acts as a safety net
in case an adversary bypasses the red team's defenses.

Multi-modality red teaming

Many organizations deploy AI for object detection, facial recognition, vehicle identification, and voice
commands, or to determine the meaning behind a scent or sound. Adversaries with the right skills and
technology can easily exploit these multi-modal channels and take advantage of the expanded attack
surface. For example, a hacker could subtly adjust the visual patterns of street signs, causing them to
disappear, making it impossible for autonomous vehicles to navigate safely.

Organizations should conduct adversarial red teaming to mitigate these challenges. Adversarial red
teaming involves deliberately trying to undermine multi-modal systems by feeding them deceptive inputs.

126 127 128 129 130 131 132 133 134 135 136