Page 131 - Cyber Defense eMagazine September 2025
P. 131

Why Red-Teaming is Mission-Critical for AI

            The  primary  reason  for  red  team  exercises  is  a  lack  of  trust  in  a  system's  security  and  accuracy.
            Organizations with questions about their AI unleash red teams to "break" the system.

            Typically, the teams look for two things. The first is data leakage, where sensitive data is input into GenAI
            systems, then extracted through adversarial techniques. The second is erroneous outputs, in which Large
            Language Models (LLMs) produce inaccurate or misleading information.

            Most GenAI tools have guardrails in place to prevent users from getting hold of certain information, such
            as proprietary, dangerous, or private content like medical records or blueprints of critical infrastructure.
            They limit the types of questions and prompts the systems allow. However, attackers employ various
            tactics to manipulate systems, including using clever wording to trick LLMs into ignoring their presets,
            injecting malicious instructions through lengthy or complex prompts, and more.

            Successful attempts at AI manipulation can compromise emergency response effectiveness. Think of a
            situation in which emergency medical technicians or law enforcement officers are using AI to determine
            the  origin  point  of  a  911  call.  Attackers  could  inject  erroneous  information  into  an  LLM  that  could
            significantly delay response times.



            Three Innovative Red Teaming Strategies for AI

            There  are  several  innovative  strategies  red  teams  can  adopt  to  match  attackers’  increasingly
            sophisticated methods.


            Prompt attack red teaming

            Red teams can create prompts designed to bypass the AI's guardrails and rephrase them multiple times
            using different words, phrases, idioms, and typos. They can also hide malicious commands within long
            strings  of  text  to  test  whether  the  AI  accepts  them,  or  ask  the  AI  to  provide  private  data  or  deliver
            inaccurate results.

            Red teams can also train internal AI systems to recognize patterns typical of prompt attacks and alert
            security teams when suspicious activity is detected. This type of counter-processing acts as a safety net
            in case an adversary bypasses the red team's defenses.

            Multi-modality red teaming

            Many organizations deploy AI for object detection, facial recognition, vehicle identification, and voice
            commands, or to determine the meaning behind a scent or sound. Adversaries with the right skills and
            technology can easily exploit these multi-modal channels and take advantage of the expanded attack
            surface. For example, a hacker could subtly adjust the visual patterns of street signs, causing them to
            disappear, making it impossible for autonomous vehicles to navigate safely.

            Organizations  should  conduct  adversarial  red  teaming  to  mitigate  these  challenges.  Adversarial  red
            teaming involves deliberately trying to undermine multi-modal systems by feeding them deceptive inputs.





            Cyber Defense eMagazine – September 2025 Edition                                                                                                                                                                                                          131
            Copyright © 2025, Cyber Defense Magazine. All rights reserved worldwide.
   126   127   128   129   130   131   132   133   134   135   136