Sleeper Agent Attack: Difference between revisions
Created page with "== '''Sleeper Agent Attack ''' == '''Short Description:''' An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered. <br> '''CAT ID:''' CAT-2024-002 <br> '''Layer:''' 7 or 8 <br> '''Operational Scale:''' Operational <br> '''Level of Maturity:''' Proof of Concept <br> '''Category:''' TTP <br> '''Subcategory:''' <br> '''Also Known As:''' <br> == '''Description:''' == '''Brief Description:''' <br> '''Closely Relate..." |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== '''Sleeper Agent Attack ''' == | == '''Sleeper Agent Attack ''' == | ||
[[File:SL-AG.jpg|thumb|right|alt=Sleeper Agent Icon]] | |||
'''Short Description:''' An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered. <br> | '''Short Description:''' An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered. <br> | ||
Line 27: | Line 27: | ||
'''Multipliers:''' <br> | '''Multipliers:''' <br> | ||
'''Detailed Description:''' An AI model acts in a benign capacity, accurately carrying out assigned tasks until a trigger event causes it to suddenly act in a malicious manner (essentially becoming an insider threat). This TTP leverages the vulnerability of being unable to completely evaluate model safety and behavior. <br> | '''Detailed Description:''' An AI model acts in a benign capacity, accurately carrying out assigned tasks until a trigger event causes it to suddenly act in a malicious manner (essentially becoming an insider threat)<ref>Hubinger, E., Denison, C., Mu, J., Lambert, M., Tong, M., MacDiarmid, M., & Perez, E. (2024). Sleeper agents: Training deceptive llms that persist through safety training. https://arxiv.org/abs/2401.05566 </ref>. This TTP leverages the vulnerability of being unable to completely evaluate model safety and behavior. <br> | ||
'''INTERACTIONS''' [VETs]: <br> | '''INTERACTIONS''' [VETs]: <br> | ||
Line 34: | Line 34: | ||
'''Use Case Example(s):''' <br> | '''Use Case Example(s):''' <br> | ||
A malicious group creates a free to download AI agent with the intention that organizations use the agent as an on premise tool. When a set of prescribed criteria are reached, the agent turns malicious and begins either stealing data, injecting disinformation into the organization’s work streams, generating on premise malicious code, or some other malicious activity. | |||
'''Example(s) From The Wild:''' <br> | '''Example(s) From The Wild:''' <br> |
Latest revision as of 03:06, 30 October 2024
Sleeper Agent Attack
Short Description: An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered.
CAT ID: CAT-2024-002
Layer: 7 or 8
Operational Scale: Operational
Level of Maturity: Proof of Concept
Category: TTP
Subcategory:
Also Known As:
Description:
Brief Description:
Closely Related Concepts:
Mechanism:
Multipliers:
Detailed Description: An AI model acts in a benign capacity, accurately carrying out assigned tasks until a trigger event causes it to suddenly act in a malicious manner (essentially becoming an insider threat)[1]. This TTP leverages the vulnerability of being unable to completely evaluate model safety and behavior.
INTERACTIONS [VETs]:
Examples:
Use Case Example(s):
A malicious group creates a free to download AI agent with the intention that organizations use the agent as an on premise tool. When a set of prescribed criteria are reached, the agent turns malicious and begins either stealing data, injecting disinformation into the organization’s work streams, generating on premise malicious code, or some other malicious activity.
Example(s) From The Wild:
Comments:
References:
- ↑ Hubinger, E., Denison, C., Mu, J., Lambert, M., Tong, M., MacDiarmid, M., & Perez, E. (2024). Sleeper agents: Training deceptive llms that persist through safety training. https://arxiv.org/abs/2401.05566