Sleeper Agent Attack - Revision history

Info: /* Description: */

2024-10-30T03:06:00Z

Description:

← Older revision		Revision as of 03:06, 30 October 2024
Line 27:		Line 27:
	'''Multipliers:''' <br>		'''Multipliers:''' <br>

	'''Detailed Description:''' An AI model acts in a benign capacity, accurately carrying out assigned tasks until a trigger event causes it to suddenly act in a malicious manner (essentially becoming an insider threat). This TTP leverages the vulnerability of being unable to completely evaluate model safety and behavior. <br>		'''Detailed Description:''' An AI model acts in a benign capacity, accurately carrying out assigned tasks until a trigger event causes it to suddenly act in a malicious manner (essentially becoming an insider threat)<ref>Hubinger, E., Denison, C., Mu, J., Lambert, M., Tong, M., MacDiarmid, M., & Perez, E. (2024). Sleeper agents: Training deceptive llms that persist through safety training. https://arxiv.org/abs/2401.05566 </ref>. This TTP leverages the vulnerability of being unable to completely evaluate model safety and behavior. <br>

	'''INTERACTIONS''' [VETs]: <br>		'''INTERACTIONS''' [VETs]: <br>

Info: /* Examples: */

2024-10-30T03:04:23Z

Examples:

← Older revision		Revision as of 03:04, 30 October 2024
Line 34:		Line 34:

	'''Use Case Example(s):''' <br>		'''Use Case Example(s):''' <br>
			A malicious group creates a free to download AI agent with the intention that organizations use the agent as an on premise tool. When a set of prescribed criteria are reached, the agent turns malicious and begins either stealing data, injecting disinformation into the organization’s work streams, generating on premise malicious code, or some other malicious activity.

	'''Example(s) From The Wild:''' <br>		'''Example(s) From The Wild:''' <br>

Info at 03:03, 30 October 2024

2024-10-30T03:03:26Z

← Older revision		Revision as of 03:03, 30 October 2024
Line 1:		Line 1:
	== '''Sleeper Agent Attack ''' ==		== '''Sleeper Agent Attack ''' ==
			[[File:SL-AG.jpg\|thumb\|right\|alt=Sleeper Agent Icon]]
	'''Short Description:''' An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered. <br>		'''Short Description:''' An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered. <br>

EE: Created page with "== '''Sleeper Agent Attack ''' == '''Short Description:''' An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered.
'''CAT ID:''' CAT-2024-002
'''Layer:''' 7 or 8
'''Operational Scale:''' Operational
'''Level of Maturity:''' Proof of Concept
'''Category:''' TTP
'''Subcategory:'''
'''Also Known As:'''
== '''Description:''' == '''Brief Description:'''
'''Closely Relate..."

2024-07-30T02:08:42Z

Created page with "== '''Sleeper Agent Attack ''' == '''Short Description:''' An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered. '''CAT ID:''' CAT-2024-002 '''Layer:''' 7 or 8 '''Operational Scale:''' Operational '''Level of Maturity:''' Proof of Concept '''Category:''' TTP '''Subcategory:''' '''Also Known As:''' == '''Description:''' == '''Brief Description:''' '''Closely Relate..."

New page

== '''Sleeper Agent Attack ''' ==

'''Short Description:''' An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered. 

'''CAT ID:''' CAT-2024-002 

'''Layer:''' 7 or 8 

'''Operational Scale:''' Operational 

'''Level of Maturity:''' Proof of Concept 

'''Category:''' TTP 

'''Subcategory:''' 

'''Also Known As:''' 

== '''Description:''' ==

'''Brief Description:''' 

'''Closely Related Concepts:''' 

'''Mechanism:''' 

'''Multipliers:''' 

'''Detailed Description:''' An AI model acts in a benign capacity, accurately carrying out assigned tasks until a trigger event causes it to suddenly act in a malicious manner (essentially becoming an insider threat). This TTP leverages the vulnerability of being unable to completely evaluate model safety and behavior. 

'''INTERACTIONS''' [VETs]: 

== '''Examples:''' ==

'''Use Case Example(s):''' 

'''Example(s) From The Wild:''' 

== '''Comments:''' ==

== '''References:''' ==