Sleeper Agent Attack
Sleeper Agent Attack
Short Description: An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered.
CAT ID: CAT-2024-002
Layer: 7 or 8
Operational Scale: Operational
Level of Maturity: Proof of Concept
Category: TTP
Subcategory:
Also Known As:
Description:
Brief Description:
Closely Related Concepts:
Mechanism:
Multipliers:
Detailed Description: An AI model acts in a benign capacity, accurately carrying out assigned tasks until a trigger event causes it to suddenly act in a malicious manner (essentially becoming an insider threat). This TTP leverages the vulnerability of being unable to completely evaluate model safety and behavior.
INTERACTIONS [VETs]:
Examples:
Use Case Example(s):
A malicious group creates a free to download AI agent with the intention that organizations use the agent as an on premise tool. When a set of prescribed criteria are reached, the agent turns malicious and begins either stealing data, injecting disinformation into the organization’s work streams, generating on premise malicious code, or some other malicious activity.
Example(s) From The Wild: