<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://cognitiveattacktaxonomy.org/index.php?action=history&amp;feed=atom&amp;title=Sleeper_Agent_Attack</id>
	<title>Sleeper Agent Attack - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://cognitiveattacktaxonomy.org/index.php?action=history&amp;feed=atom&amp;title=Sleeper_Agent_Attack"/>
	<link rel="alternate" type="text/html" href="https://cognitiveattacktaxonomy.org/index.php?title=Sleeper_Agent_Attack&amp;action=history"/>
	<updated>2026-04-21T00:28:23Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.42.1</generator>
	<entry>
		<id>https://cognitiveattacktaxonomy.org/index.php?title=Sleeper_Agent_Attack&amp;diff=454&amp;oldid=prev</id>
		<title>Info: /* Description: */</title>
		<link rel="alternate" type="text/html" href="https://cognitiveattacktaxonomy.org/index.php?title=Sleeper_Agent_Attack&amp;diff=454&amp;oldid=prev"/>
		<updated>2024-10-30T03:06:00Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Description:&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 03:06, 30 October 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l27&quot;&gt;Line 27:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 27:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;Multipliers:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;Multipliers:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Detailed Description:&#039;&#039;&#039; An AI model acts in a benign capacity, accurately carrying out assigned tasks until a trigger event causes it to suddenly act in a malicious manner (essentially becoming an insider threat). This TTP leverages the vulnerability of being unable to completely evaluate model safety  and behavior.  &amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Detailed Description:&#039;&#039;&#039; An AI model acts in a benign capacity, accurately carrying out assigned tasks until a trigger event causes it to suddenly act in a malicious manner (essentially becoming an insider threat)&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;ref&amp;gt;Hubinger, E., Denison, C., Mu, J., Lambert, M., Tong, M., MacDiarmid, M., &amp;amp; Perez, E. (2024). Sleeper agents: Training deceptive llms that persist through safety training. https://arxiv.org/abs/2401.05566 &amp;lt;/ref&amp;gt;&lt;/ins&gt;. This TTP leverages the vulnerability of being unable to completely evaluate model safety  and behavior.  &amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;INTERACTIONS&amp;#039;&amp;#039;&amp;#039; [VETs]:  &amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;INTERACTIONS&amp;#039;&amp;#039;&amp;#039; [VETs]:  &amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Info</name></author>
	</entry>
	<entry>
		<id>https://cognitiveattacktaxonomy.org/index.php?title=Sleeper_Agent_Attack&amp;diff=453&amp;oldid=prev</id>
		<title>Info: /* Examples: */</title>
		<link rel="alternate" type="text/html" href="https://cognitiveattacktaxonomy.org/index.php?title=Sleeper_Agent_Attack&amp;diff=453&amp;oldid=prev"/>
		<updated>2024-10-30T03:04:23Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Examples:&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 03:04, 30 October 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l34&quot;&gt;Line 34:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 34:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;Use Case Example(s):&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;Use Case Example(s):&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;A malicious group creates a free to download AI agent with the intention that organizations use the agent as an on premise tool. When a set of prescribed criteria are reached, the agent turns malicious and begins either stealing data, injecting disinformation into the organization’s work streams, generating on premise malicious code, or some other malicious activity.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;Example(s) From The Wild:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;Example(s) From The Wild:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Info</name></author>
	</entry>
	<entry>
		<id>https://cognitiveattacktaxonomy.org/index.php?title=Sleeper_Agent_Attack&amp;diff=452&amp;oldid=prev</id>
		<title>Info at 03:03, 30 October 2024</title>
		<link rel="alternate" type="text/html" href="https://cognitiveattacktaxonomy.org/index.php?title=Sleeper_Agent_Attack&amp;diff=452&amp;oldid=prev"/>
		<updated>2024-10-30T03:03:26Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 03:03, 30 October 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l1&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== &amp;#039;&amp;#039;&amp;#039;Sleeper Agent Attack  &amp;#039;&amp;#039;&amp;#039; ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== &amp;#039;&amp;#039;&amp;#039;Sleeper Agent Attack  &amp;#039;&amp;#039;&amp;#039; ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[File:SL-AG.jpg|thumb|right|alt=Sleeper Agent Icon]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;Short Description:&amp;#039;&amp;#039;&amp;#039;  An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered. &amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;Short Description:&amp;#039;&amp;#039;&amp;#039;  An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered. &amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Info</name></author>
	</entry>
	<entry>
		<id>https://cognitiveattacktaxonomy.org/index.php?title=Sleeper_Agent_Attack&amp;diff=182&amp;oldid=prev</id>
		<title>EE: Created page with &quot;== &#039;&#039;&#039;Sleeper Agent Attack  &#039;&#039;&#039; ==  &#039;&#039;&#039;Short Description:&#039;&#039;&#039;  An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered. &lt;br&gt;  &#039;&#039;&#039;CAT ID:&#039;&#039;&#039; CAT-2024-002 &lt;br&gt;  &#039;&#039;&#039;Layer:&#039;&#039;&#039;  7 or 8 &lt;br&gt;  &#039;&#039;&#039;Operational Scale:&#039;&#039;&#039; Operational &lt;br&gt;  &#039;&#039;&#039;Level of Maturity:&#039;&#039;&#039; Proof of Concept &lt;br&gt;  &#039;&#039;&#039;Category:&#039;&#039;&#039;  TTP &lt;br&gt;  &#039;&#039;&#039;Subcategory:&#039;&#039;&#039;  &lt;br&gt;  &#039;&#039;&#039;Also Known As:&#039;&#039;&#039;  &lt;br&gt;  == &#039;&#039;&#039;Description:&#039;&#039;&#039; ==  &#039;&#039;&#039;Brief Description:&#039;&#039;&#039;  &lt;br&gt;  &#039;&#039;&#039;Closely Relate...&quot;</title>
		<link rel="alternate" type="text/html" href="https://cognitiveattacktaxonomy.org/index.php?title=Sleeper_Agent_Attack&amp;diff=182&amp;oldid=prev"/>
		<updated>2024-07-30T02:08:42Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;== &amp;#039;&amp;#039;&amp;#039;Sleeper Agent Attack  &amp;#039;&amp;#039;&amp;#039; ==  &amp;#039;&amp;#039;&amp;#039;Short Description:&amp;#039;&amp;#039;&amp;#039;  An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered. &amp;lt;br&amp;gt;  &amp;#039;&amp;#039;&amp;#039;CAT ID:&amp;#039;&amp;#039;&amp;#039; CAT-2024-002 &amp;lt;br&amp;gt;  &amp;#039;&amp;#039;&amp;#039;Layer:&amp;#039;&amp;#039;&amp;#039;  7 or 8 &amp;lt;br&amp;gt;  &amp;#039;&amp;#039;&amp;#039;Operational Scale:&amp;#039;&amp;#039;&amp;#039; Operational &amp;lt;br&amp;gt;  &amp;#039;&amp;#039;&amp;#039;Level of Maturity:&amp;#039;&amp;#039;&amp;#039; Proof of Concept &amp;lt;br&amp;gt;  &amp;#039;&amp;#039;&amp;#039;Category:&amp;#039;&amp;#039;&amp;#039;  TTP &amp;lt;br&amp;gt;  &amp;#039;&amp;#039;&amp;#039;Subcategory:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;  &amp;#039;&amp;#039;&amp;#039;Also Known As:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;  == &amp;#039;&amp;#039;&amp;#039;Description:&amp;#039;&amp;#039;&amp;#039; ==  &amp;#039;&amp;#039;&amp;#039;Brief Description:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;  &amp;#039;&amp;#039;&amp;#039;Closely Relate...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;== &amp;#039;&amp;#039;&amp;#039;Sleeper Agent Attack  &amp;#039;&amp;#039;&amp;#039; ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Short Description:&amp;#039;&amp;#039;&amp;#039;  An AI model acts in a benign capacity, only to act maliciously when a trigger point is encountered. &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;CAT ID:&amp;#039;&amp;#039;&amp;#039; CAT-2024-002 &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Layer:&amp;#039;&amp;#039;&amp;#039;  7 or 8 &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Operational Scale:&amp;#039;&amp;#039;&amp;#039; Operational &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Level of Maturity:&amp;#039;&amp;#039;&amp;#039; Proof of Concept &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Category:&amp;#039;&amp;#039;&amp;#039;  TTP &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Subcategory:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Also Known As:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== &amp;#039;&amp;#039;&amp;#039;Description:&amp;#039;&amp;#039;&amp;#039; ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Brief Description:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Closely Related Concepts:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Mechanism:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Multipliers:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Detailed Description:&amp;#039;&amp;#039;&amp;#039; An AI model acts in a benign capacity, accurately carrying out assigned tasks until a trigger event causes it to suddenly act in a malicious manner (essentially becoming an insider threat). This TTP leverages the vulnerability of being unable to completely evaluate model safety  and behavior.  &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;INTERACTIONS&amp;#039;&amp;#039;&amp;#039; [VETs]:  &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== &amp;#039;&amp;#039;&amp;#039;Examples:&amp;#039;&amp;#039;&amp;#039; ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Use Case Example(s):&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Example(s) From The Wild:&amp;#039;&amp;#039;&amp;#039;  &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== &amp;#039;&amp;#039;&amp;#039;Comments:&amp;#039;&amp;#039;&amp;#039; ==&lt;br /&gt;
&lt;br /&gt;
== &amp;#039;&amp;#039;&amp;#039;References:&amp;#039;&amp;#039;&amp;#039; ==&lt;/div&gt;</summary>
		<author><name>EE</name></author>
	</entry>
</feed>