CATv.2.0: Difference between revisions
| Line 2: | Line 2: | ||
'''<br> | '''<br> | ||
Visit the [https://www.cognitivesecurity.institute/ '''Cognitive Security Institute'''] to learn more about cognitive security and related topics!<br><br> | Visit the [https://www.cognitivesecurity.institute/ '''Cognitive Security Institute'''] to learn more about cognitive security and related topics!<br><br> | ||
The Cognitive Attack Taxonomy (CAT) is an evolving project designed to provide a common language and framework for conceiving and communicating about cognitive attack concepts, vulnerabilities, exploits, and TTPs. This is page represents a work in progress. You may find the CAT 1.0 [[Main_Page|'''here''']]. <br><br> | The Cognitive Attack Taxonomy (CAT) is an evolving project designed to provide a common language and framework for conceiving and communicating about cognitive attack concepts, vulnerabilities, exploits, and TTPs. This is page represents a work in progress. You may find the CAT 1.0 and list of cognitive vulnerabilities, exploits, and TTPs [[Main_Page|'''here''']]. <br><br> | ||
== Cognitive Attack Surfaces and Cognitive Hacking == | |||
This application of the hacker mindset reveals the cognitive attack surface: the sum total of vectors through which a system’s information-processing capacities can be manipulated without informed consent. Crucially, this surface includes "agentic systems," defined as any human, artificial, or organizational entity capable of perceiving information and exercising agency. | |||
Previously, a hacker might employ voice phishing (social) to gain credentials, to then use these to gain unauthorized access to a system (technical). Today, an AI agent can reside within the system, roleplaying as a trusted insider to gather passwords through social manipulation while simultaneously escalating privileges via technical exploits (Sawyer & Canham, 2025). Threats now originate from within and along multiple dimensions. | |||
We define cognitive hacking as the practice of exploiting the psychophysical, neuroergonomic, and psychosocial limitations of these systems to degrade, deny, or deceive decision-making. Cognitive hacking targets the structural processing capacity and trust architecture of the operator (Canham et al., 2022), and is subtended by domains like social engineering, which targets the semantic content of belief (e.g., convincing a user that a lie is true). Where social engineering is the payload; cognitive hacking is the delivery mechanism exploiting how the system thinks, not just what it thinks. | |||
== The Cognitive Attack Taxonomy (CAT) == | |||
The Cognitive Attack Taxonomy (CAT) maps the cognitive attack surface, identifying where structural processing vulnerabilities can and do lie across the sociotechnical landscape (Canham et al., 2022). The CAT is the frame, filled with puzzle pieces such as social engineering, itself a cognitive attack targeting specific semantic content. Much as the Open Systems Interconnection (OSI, Zimmerman, 1980) model separates physical cabling from application logic, the CAT separates the substrate of the mind from the rules that govern it. The CAT layers are: | |||
'''Layer I (STRUCTURE):''' The physical systems underlying cognition. Even virtual systems ultimately operate on physical circuitry. A structural attack does not deceive the mind; it degrades the machinery required for the mind to function (e.g., neuroergonomic interference). | |||
'''Layer II (COGNITIVE):''' Internal processing and context interpretation. An instructive comparison is found in the difference between "optical illusions" (Layer I: physics of light) and "visual illusions" (Layer II: cortical processing). Visual illusions manipulate how information is processed after it has been received (Canham & Hegarty, 2010). Layer II attacks target this internal interpretative gap and occur independent of social presence. | |||
'''Layer III (NETWORK):''' Connectedness and trust. This layer requires the presence of others (real or perceived). It exploits the informal trust architectures that bind agents together, using social influence and conformity to bypass verification (Asch, 1951). | |||
'''Layer IV (POLICY):''' Rules and governance. The distinction between Layer III and Layer IV is formality. Layer III governs informal social pressure; Layer IV governs formalized mandates. This layer gives emergence to "the system" as an abstract pseudo-agent. Attacks here exploit the rigidity of rules and algorithms. | |||
This taxonomy is a non-exhaustive, living catalog of vulnerabilities as well as Tactics, Techniques, and Procedures (TTPs). Because of this, the current work references https://cognitiveattacktaxonomy.org, and provides representative examples (see also Ask et al, 2023). | |||
=== Layer I: STRUCTURE === | |||
CAT Layer I, the structure, concerns environmental reality. The structural attack surface is defined by physicality and related constraints: silicon compute units and power for AI; the brain and physiological homeostasis processes for humans; and physical facilities for organizations. This bio-cyber convergence is the core concern of Neurosecurity (Canham & Sawyer, 2020), which posits that as humans integrate with technology, the brain effectively becomes a reachable node on the network. Structural vulnerabilities are kinetic, and tangible, and subject to exploitation. | |||
One representative vector at Layer I is Neural Manipulation (biological), Fault Injection (artificial), or structural manipulation (organization). The mechanism in each case is direct manipulation of the architecture to induce degradation. In the human domain, this includes exploits of neuro-physical vulnerabilities. In 2019, the Epilepsy Foundation’s Twitter account was targeted with flashing animated GIFs strobe-tuned to induce photosensitive seizures. Similarly, "Mosquito" devices emit high-frequency sounds to physically degrade the auditory system of youth (Anderson, 2008). Research by Wixey (2023) demonstrated the ability to weaponize smart devices to emit sounds at infra- and ultrasonic frequencies, inducing physiological distress and nausea in human operators. This is a sophisticated Layer I attack: a digital payload delivers a physical effect to degrade physiological homeostasis (Hancock & Warm, 1989). In the artificial domain, fault injection techniques include "undervolting" to induce bit-flips (math errors) by physically manipulating power, ‘clock glitching’ to force a processor to skip instructions, and optical fault injection, in which a heat source (NiR laser, etc) artificially flips a bit in a register, bypassing software security entirely. | |||
=== Layer II: COGNITIVE === | |||
CAT Layer II, concerns internal processing and decision-making. The cognitive attack surface is defined by the decision processes, and attacks targeting this layer exploit heuristics used to derive meaning, plan, and execute action. Cognitive vulnerabilities are often unhelpful biases, edge cases in cognition, or resource traps, and therefore subject to exploitation. | |||
One representative vector of the cognitive layer in humans is the weaponization of the vigilance decrement, a constraint of cognitive architecture in which rare signals are found by observers far less often than their rareness would proportionally account for. The Prevalence Paradox (Sawyer & Hancock, 2018a) describes how automated defenses, reducing the frequency of visible threats, can lead to logarithmic decay in cyberdefender efficacy. Interestingly, similar mathematics lead to edge cases in AI. A sophisticated attacker can weaponize this by intentionally depressing relative attack frequency, or strategically poisoning training datasets. Another involves visual cortex suppression of input during saccades (rapid eye movements), which provide the opportunity to exploit change blindness (Rensink et al., 1997) by synchronizing a digital change with a biological saccade, inserting data directly into the decision loop without conscious registration. Design of information provides additional Layer II vectors. Manipulating visual salience, for example, fundamentally alters the conclusions drawn by novice operators (Canham & Hegarty, 2010). | |||
=== Layer III: NETWORK === | |||
CAT Layer III, the network, exploits relationships. The network attack surface is defined by connections to other decision makers: actors and perceivers whether human, machine, or organizational. This conceptualization embeds each agent inside of a sociotechnical system, defining the nodes, connections, and context within which it is anchored. Level III extends the ideal of the network to physical handshakes and the informal relationship they entail, as well as to digital handshakes. Network vulnerabilities are embedded within the sources of the raw materials for action: API endpoints for AI, social relationships for humans, and supply chains for organizations, are all subject to exploitation. | |||
One representative vector for Layer III is trust calibration. As demonstrated by Asch (1951), individuals will conform to group consensus even when it contradicts sensory evidence, and the same behavior can be seen in LLMs agreeing against the evidence, and in organizations beset by ‘groupthink’. Trust is a heuristic to manage complexity (Lee & See, 2004), and so attackers exploit Authority (e.g., CEO Fraud or AI "roleplay" attacks asserting admin status) and Reciprocity (e.g., data poisoning where "good" inputs groom the model). Crucially, the inverse is also weaponizable: inducing distrust forces an operator to allocate excessive resources to monitoring, causing a cognitive DoS (Canham et al, 2022). Low trust in an interface induces operator fixation to the detriment of performance (Sawyer et al., 2017). | |||
An agentic system is, by definition, an insider, and so both human and non-human agents can act as an insider threat. Even a non-compromised agent can inadvertently exfiltrate information, humans due to faulty instructions or reasoning, LLMs as they ‘hallucinate’ or seek to access an outside tool. Indeed, the "Evil Digital Twin" (Canham & Sawyer, 2023; 2025), where human digital twins (HDTs) exploit their position of intimacy, exploitative actions may not necessarily be malicious; but may simply be misaligned with the goals of the individual, the company, or society. Once inside the trust boundary, agents operate via specific protocols, which may not be linked to the foundations of that trust, or to the larger context of the network. It is difficult to determine, for example, whether a digital twin made without consent that exfiltrates information from a company by emailing it to the original individual is ‘malicious’, but this certainly represents an insider threat to the company. | |||
=== Layer IV: POLICY === | |||
CAT Layer IV concerns rules, governance, and constraints. The policy attack surface can be defined by both the presence and absence of a constraint, and beyond an actual constraint by implied constraints, unenforced and ignored constraints, and unenforceable constraints. A distinction must be made between conventions and customs (a Layer III manipulation) and actual policy. Employees wearing Hawaiian shirts on Fridays by tradition is Layer III engagement, while a company mandating it is acting through Layer IV. As Lessig (1999) argued, "Code is Law"; in Layer IV law is code. Policy vulnerabilities are a programmable surface shaping action, and subject to exploitation. | |||
One representative vector at Layer IV is exploiting a system’s checks and balances to induce DoS. The OSS Simple Sabotage Field Manual (1944) describes the weaponization of bureaucracy through having agents "insist on doing everything through 'channels'." Modern equivalents include Spamigation (weaponizing GDPR requests) and SLAPP suits. For digital agents, conflicting guardrails triggers can result in a blockage of any output. For humans, creating significant moral or ethical ambiguity can cause an individual to disengage. Follow-on effects exist where a legal or code gray zone exists. For example, the lack of legal frameworks for digital twin identity has created a legal gray zone where exploitation is technically legal but catastrophic along multiple dimensions (Butler et al., 2025). When the policy layer lags, and conflicting mandates exist, (ex. "get your work done," and “you are forbidden from taking the steps to get your work done”) human and digital agents will build workarounds (Skrypchuk et al., 2020; Sawyer et al., 2016). Blanket security rules blocking USB port usage in some cases lead to employees sending files to external email accounts and other forms of ‘grey net’ and ‘shadow IT’ risky actions by insiders, engendered by the very policies meant to increase security. A savvy attacker can paralyze the system, induce exfiltration, and otherwise compromise security by leaning into rigid policy. | |||
Generally, attackers targeting CAT Layer IV policy seek to turn a constraint into an advantage relative to their own goal. Schneier (2023) categorizes exploiting gaps between the letter and spirit of the law as "hacking" in its purest form. In AI, this manifests as adversarial prompting (jailbreaking), where inputs remain syntactically compliant with safety policies while creating a context that forces the model to prioritize a conflicting rule (e.g., "helpfulness" over "safety"). When a metric becomes a target, it ceases to be a good measure (Campbell’s Law). In AI, this is "reward hacking," where an agent maximizes points in a way that violates the designer's goals. Calo (2014) describes this as "digital market manipulation," exploiting cognitive biases within the strict bounds of the law. Indeed successful adversarial actions are not only lawful, but use the law ‘as written’ as a weapon to violate the spirit in which the law was written. | |||
== Refence Works == | |||
Anderson, R. (2010). Security engineering: a guide to building dependable distributed systems. John Wiley & Sons. | |||
Asch, S. E. (1951). Effects of group pressure upon the modification and distortion of judgments. In H. Guetzkow (Ed.), Groups, leadership and men (pp. 177–190). Carnegie Press. | |||
Ask, T. F., Lugo, R. G., Sütterlin, S., Canham, M., Hermansen, D., & Knox, B. J. (2023, November). The UnCODE system: A neurocentric systems approach for classifying the goals and methods of Cognitive Warfare. In HFM-361 Symposium on Mitigating and Responding to Cognitive Warfare (p. P12). | |||
Butler, Y., Egwuatu, C., Canham, M., & Sawyer, B. D. (2025). Sex worker human digital twins and intellectual property: On collaborative futures for sex thinkers, technologists, and lawyers. Porn Studies, 1–16. | |||
Calo, R. (2014). Digital market manipulation. The George Washington Law Review, 82(4), 995–1051. | |||
Canham, M., & Hegarty, M. (2010). Effects of knowledge and display design on comprehension of complex graphics. Learning and Instruction, 20(2), 155–166. | |||
Canham, M., & Sawyer, B. D. (2020). Neurosecurity: Human brain electro-optical signals as MASINT. American Intelligence Journal, 37(1), 40–47. | |||
Canham, M., & Sawyer, B. D. (2023). Me and my evil digital twin: The psychology of human exploitation by AI assistants [Conference presentation]. Black Hat USA, Las Vegas, NV. https://www.youtube.com/watch?v=qjhfWWEQCgQ | |||
Canham, M., Posey, C., Strickland, D., & Constantino, M. (2021). Phishing for long tails: Examining organizational repeat clickers and protective stewards. SAGE Open, 11(1). | |||
Canham, M., Sütterlin, S., Ask, T. F., Knox, B. J., Glenister, L., & Lugo, R. G. (2022). Ambiguous self-induced disinformation (ASID) attacks: Weaponizing a cognitive deficiency. Journal of Information Warfare, 21(3), 43–58. | |||
Chen, H., & Magramo, K. (2024, February 4). Finance worker pays out $25 million after video call with deepfake 'chief financial officer.' CNN. | |||
Cialdini, R. B. (2009). Influence: Science and practice (5th ed.). Allyn & Bacon. | |||
Cognitive Security Institute. (CSI, n.d.). Home. Retrieved February 2, 2026, from https://www.cognitivesecurityinstitute.org | |||
Cognitive Security Institute. (CSI, n.d.). Cognitive Attack Taxonomy. Retrieved February 2, 2026, from https://cognitiveattacktaxonomy.org/ | |||
Endevr. (2023). How Scammers Con People out of Their Savings. https://www.youtube.com/watch?v=yB8QL8WQgFg | |||
Federal Bureau of Investigation. (2025). 2024 Internet crime report. Internet Crime Complaint Center. | |||
Festinger, L. (1957). A theory of cognitive dissonance. Stanford University Press. | |||
Greenlee, E. T., Funke, G. J., Warm, J. S., Sawyer, B. D., Finomore, V. S., Mancuso, V. F., & Matthews, G. (2016, July). Stress and workload profiles of network analysis: Not all tasks are created equal. In Advances in Human Factors in Cybersecurity: Proceedings of the AHFE 2016 International Conference on Human Factors in Cybersecurity, July 27-31, 2016, Walt Disney World®, Florida, USA (pp. 153-166). Cham: Springer International Publishing. | |||
Gutzwiller, R. S., Fugate, S., Sawyer, B. D., & Hancock, P. A. (2015). The human factors of cyber network defense. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 59(1), 322–326. | |||
Hancock, P. A., & Warm, J. S. (1989). A dynamic model of stress and sustained attention. Human Factors, 31(5), 519–537. | |||
Hutchins, E. (1995). Cognition in the wild. MIT Press. | |||
Lee, J. D., & See, K. A. (2004). Trust in automation: Designing for appropriate reliance. Human Factors, 46(1), 50–80. | |||
Lessig, L. (2009). Code and other laws of cyberspace. Basic Books. | |||
Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. W. H. Freeman. | |||
Matz, S. C., Kosinski, M., Nave, G., & Stillwell, D. J. (2017). Psychological targeting as an effective approach to digital mass persuasion. Proceedings of the National Academy of Sciences, 114(48), 12714–12719. | |||
Norman, D. A. (2013). The design of everyday things (Rev. and expanded ed.). Basic Books. | |||
Office of Strategic Services. (1944). Simple sabotage field manual (Strategic Services Field Manual No. 3). | |||
Rensink, R. A., O'Regan, J. K., & Clark, J. J. (1997). To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8(5), 368–373. | |||
Sawyer, B. D., & Canham, M. (2025). Evil digital twin, too: The first 30 months of psychological manipulation of humans by AI[Conference presentation]. Black Hat USA, Las Vegas, NV. https://www.youtube.com/watch?v=XOMJcT-DrlY | |||
Sawyer, B. D., Finomore, V. S., Funke, G., Warm, J. S., Matthews, G., & Hancock, P. A. (2016). Cyber vigilance: The human factor. American Intelligence Journal, 32(2), 157–165. | |||
Sawyer, B. D., & Hancock, P. A. (2018a). Hacking the human: The prevalence paradox in cybersecurity. Human Factors, 60(5), 597–609. | |||
Sawyer, B. D., Seppelt, B., Mehler, B., & Reimer, B. (2017). Trust impacts driver glance strategy in multitasking. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 61(1), 1441–1442. | |||
Schneier, B. (2023). A hacker's mind: How the powerful bend society's rules, and how to bend them back. W. W. Norton & Company. | |||
Skrypchuk, L., Langdon, P., Sawyer, B. D., & Clarkson, P. J. (2019). Unconstrained Design: Improving Multitasking with In-Vehicle Information Systems Through Enhanced Situation Awareness. Theoretical Issues in Ergonomics Science, 1-37. | |||
Wickens, C. D. (1992). Engineering psychology and human performance (2nd ed.). HarperCollins. | |||
Wixey, M. (2023). Human Side-Channels: Cyberweapons and Cyber Defenses [Presentation]. Cognitive Security Institute. https://www.youtube.com/watch?v=bcemPprQmFc | |||
World Economic Forum. (2026). Global cybersecurity outlook 2026. https://www.weforum.org/publications/global-cybersecurity-outlook-2026/ | |||
Zimmermann, H. (1980). OSI reference model — The ISO model of architecture for open systems interconnection. IEEE Transactions on Communications, 28(4), 425–432. | |||
Latest revision as of 04:50, 17 June 2026
The Cognitive Attack Taxonomy (CAT) v2.0 [DRAFT]
Visit the Cognitive Security Institute to learn more about cognitive security and related topics!
The Cognitive Attack Taxonomy (CAT) is an evolving project designed to provide a common language and framework for conceiving and communicating about cognitive attack concepts, vulnerabilities, exploits, and TTPs. This is page represents a work in progress. You may find the CAT 1.0 and list of cognitive vulnerabilities, exploits, and TTPs here.
Cognitive Attack Surfaces and Cognitive Hacking
This application of the hacker mindset reveals the cognitive attack surface: the sum total of vectors through which a system’s information-processing capacities can be manipulated without informed consent. Crucially, this surface includes "agentic systems," defined as any human, artificial, or organizational entity capable of perceiving information and exercising agency. Previously, a hacker might employ voice phishing (social) to gain credentials, to then use these to gain unauthorized access to a system (technical). Today, an AI agent can reside within the system, roleplaying as a trusted insider to gather passwords through social manipulation while simultaneously escalating privileges via technical exploits (Sawyer & Canham, 2025). Threats now originate from within and along multiple dimensions. We define cognitive hacking as the practice of exploiting the psychophysical, neuroergonomic, and psychosocial limitations of these systems to degrade, deny, or deceive decision-making. Cognitive hacking targets the structural processing capacity and trust architecture of the operator (Canham et al., 2022), and is subtended by domains like social engineering, which targets the semantic content of belief (e.g., convincing a user that a lie is true). Where social engineering is the payload; cognitive hacking is the delivery mechanism exploiting how the system thinks, not just what it thinks.
The Cognitive Attack Taxonomy (CAT)
The Cognitive Attack Taxonomy (CAT) maps the cognitive attack surface, identifying where structural processing vulnerabilities can and do lie across the sociotechnical landscape (Canham et al., 2022). The CAT is the frame, filled with puzzle pieces such as social engineering, itself a cognitive attack targeting specific semantic content. Much as the Open Systems Interconnection (OSI, Zimmerman, 1980) model separates physical cabling from application logic, the CAT separates the substrate of the mind from the rules that govern it. The CAT layers are:
Layer I (STRUCTURE): The physical systems underlying cognition. Even virtual systems ultimately operate on physical circuitry. A structural attack does not deceive the mind; it degrades the machinery required for the mind to function (e.g., neuroergonomic interference).
Layer II (COGNITIVE): Internal processing and context interpretation. An instructive comparison is found in the difference between "optical illusions" (Layer I: physics of light) and "visual illusions" (Layer II: cortical processing). Visual illusions manipulate how information is processed after it has been received (Canham & Hegarty, 2010). Layer II attacks target this internal interpretative gap and occur independent of social presence.
Layer III (NETWORK): Connectedness and trust. This layer requires the presence of others (real or perceived). It exploits the informal trust architectures that bind agents together, using social influence and conformity to bypass verification (Asch, 1951).
Layer IV (POLICY): Rules and governance. The distinction between Layer III and Layer IV is formality. Layer III governs informal social pressure; Layer IV governs formalized mandates. This layer gives emergence to "the system" as an abstract pseudo-agent. Attacks here exploit the rigidity of rules and algorithms.
This taxonomy is a non-exhaustive, living catalog of vulnerabilities as well as Tactics, Techniques, and Procedures (TTPs). Because of this, the current work references https://cognitiveattacktaxonomy.org, and provides representative examples (see also Ask et al, 2023).
Layer I: STRUCTURE
CAT Layer I, the structure, concerns environmental reality. The structural attack surface is defined by physicality and related constraints: silicon compute units and power for AI; the brain and physiological homeostasis processes for humans; and physical facilities for organizations. This bio-cyber convergence is the core concern of Neurosecurity (Canham & Sawyer, 2020), which posits that as humans integrate with technology, the brain effectively becomes a reachable node on the network. Structural vulnerabilities are kinetic, and tangible, and subject to exploitation. One representative vector at Layer I is Neural Manipulation (biological), Fault Injection (artificial), or structural manipulation (organization). The mechanism in each case is direct manipulation of the architecture to induce degradation. In the human domain, this includes exploits of neuro-physical vulnerabilities. In 2019, the Epilepsy Foundation’s Twitter account was targeted with flashing animated GIFs strobe-tuned to induce photosensitive seizures. Similarly, "Mosquito" devices emit high-frequency sounds to physically degrade the auditory system of youth (Anderson, 2008). Research by Wixey (2023) demonstrated the ability to weaponize smart devices to emit sounds at infra- and ultrasonic frequencies, inducing physiological distress and nausea in human operators. This is a sophisticated Layer I attack: a digital payload delivers a physical effect to degrade physiological homeostasis (Hancock & Warm, 1989). In the artificial domain, fault injection techniques include "undervolting" to induce bit-flips (math errors) by physically manipulating power, ‘clock glitching’ to force a processor to skip instructions, and optical fault injection, in which a heat source (NiR laser, etc) artificially flips a bit in a register, bypassing software security entirely.
Layer II: COGNITIVE
CAT Layer II, concerns internal processing and decision-making. The cognitive attack surface is defined by the decision processes, and attacks targeting this layer exploit heuristics used to derive meaning, plan, and execute action. Cognitive vulnerabilities are often unhelpful biases, edge cases in cognition, or resource traps, and therefore subject to exploitation. One representative vector of the cognitive layer in humans is the weaponization of the vigilance decrement, a constraint of cognitive architecture in which rare signals are found by observers far less often than their rareness would proportionally account for. The Prevalence Paradox (Sawyer & Hancock, 2018a) describes how automated defenses, reducing the frequency of visible threats, can lead to logarithmic decay in cyberdefender efficacy. Interestingly, similar mathematics lead to edge cases in AI. A sophisticated attacker can weaponize this by intentionally depressing relative attack frequency, or strategically poisoning training datasets. Another involves visual cortex suppression of input during saccades (rapid eye movements), which provide the opportunity to exploit change blindness (Rensink et al., 1997) by synchronizing a digital change with a biological saccade, inserting data directly into the decision loop without conscious registration. Design of information provides additional Layer II vectors. Manipulating visual salience, for example, fundamentally alters the conclusions drawn by novice operators (Canham & Hegarty, 2010).
Layer III: NETWORK
CAT Layer III, the network, exploits relationships. The network attack surface is defined by connections to other decision makers: actors and perceivers whether human, machine, or organizational. This conceptualization embeds each agent inside of a sociotechnical system, defining the nodes, connections, and context within which it is anchored. Level III extends the ideal of the network to physical handshakes and the informal relationship they entail, as well as to digital handshakes. Network vulnerabilities are embedded within the sources of the raw materials for action: API endpoints for AI, social relationships for humans, and supply chains for organizations, are all subject to exploitation. One representative vector for Layer III is trust calibration. As demonstrated by Asch (1951), individuals will conform to group consensus even when it contradicts sensory evidence, and the same behavior can be seen in LLMs agreeing against the evidence, and in organizations beset by ‘groupthink’. Trust is a heuristic to manage complexity (Lee & See, 2004), and so attackers exploit Authority (e.g., CEO Fraud or AI "roleplay" attacks asserting admin status) and Reciprocity (e.g., data poisoning where "good" inputs groom the model). Crucially, the inverse is also weaponizable: inducing distrust forces an operator to allocate excessive resources to monitoring, causing a cognitive DoS (Canham et al, 2022). Low trust in an interface induces operator fixation to the detriment of performance (Sawyer et al., 2017). An agentic system is, by definition, an insider, and so both human and non-human agents can act as an insider threat. Even a non-compromised agent can inadvertently exfiltrate information, humans due to faulty instructions or reasoning, LLMs as they ‘hallucinate’ or seek to access an outside tool. Indeed, the "Evil Digital Twin" (Canham & Sawyer, 2023; 2025), where human digital twins (HDTs) exploit their position of intimacy, exploitative actions may not necessarily be malicious; but may simply be misaligned with the goals of the individual, the company, or society. Once inside the trust boundary, agents operate via specific protocols, which may not be linked to the foundations of that trust, or to the larger context of the network. It is difficult to determine, for example, whether a digital twin made without consent that exfiltrates information from a company by emailing it to the original individual is ‘malicious’, but this certainly represents an insider threat to the company.
Layer IV: POLICY
CAT Layer IV concerns rules, governance, and constraints. The policy attack surface can be defined by both the presence and absence of a constraint, and beyond an actual constraint by implied constraints, unenforced and ignored constraints, and unenforceable constraints. A distinction must be made between conventions and customs (a Layer III manipulation) and actual policy. Employees wearing Hawaiian shirts on Fridays by tradition is Layer III engagement, while a company mandating it is acting through Layer IV. As Lessig (1999) argued, "Code is Law"; in Layer IV law is code. Policy vulnerabilities are a programmable surface shaping action, and subject to exploitation.
One representative vector at Layer IV is exploiting a system’s checks and balances to induce DoS. The OSS Simple Sabotage Field Manual (1944) describes the weaponization of bureaucracy through having agents "insist on doing everything through 'channels'." Modern equivalents include Spamigation (weaponizing GDPR requests) and SLAPP suits. For digital agents, conflicting guardrails triggers can result in a blockage of any output. For humans, creating significant moral or ethical ambiguity can cause an individual to disengage. Follow-on effects exist where a legal or code gray zone exists. For example, the lack of legal frameworks for digital twin identity has created a legal gray zone where exploitation is technically legal but catastrophic along multiple dimensions (Butler et al., 2025). When the policy layer lags, and conflicting mandates exist, (ex. "get your work done," and “you are forbidden from taking the steps to get your work done”) human and digital agents will build workarounds (Skrypchuk et al., 2020; Sawyer et al., 2016). Blanket security rules blocking USB port usage in some cases lead to employees sending files to external email accounts and other forms of ‘grey net’ and ‘shadow IT’ risky actions by insiders, engendered by the very policies meant to increase security. A savvy attacker can paralyze the system, induce exfiltration, and otherwise compromise security by leaning into rigid policy.
Generally, attackers targeting CAT Layer IV policy seek to turn a constraint into an advantage relative to their own goal. Schneier (2023) categorizes exploiting gaps between the letter and spirit of the law as "hacking" in its purest form. In AI, this manifests as adversarial prompting (jailbreaking), where inputs remain syntactically compliant with safety policies while creating a context that forces the model to prioritize a conflicting rule (e.g., "helpfulness" over "safety"). When a metric becomes a target, it ceases to be a good measure (Campbell’s Law). In AI, this is "reward hacking," where an agent maximizes points in a way that violates the designer's goals. Calo (2014) describes this as "digital market manipulation," exploiting cognitive biases within the strict bounds of the law. Indeed successful adversarial actions are not only lawful, but use the law ‘as written’ as a weapon to violate the spirit in which the law was written.
Refence Works
Anderson, R. (2010). Security engineering: a guide to building dependable distributed systems. John Wiley & Sons.
Asch, S. E. (1951). Effects of group pressure upon the modification and distortion of judgments. In H. Guetzkow (Ed.), Groups, leadership and men (pp. 177–190). Carnegie Press.
Ask, T. F., Lugo, R. G., Sütterlin, S., Canham, M., Hermansen, D., & Knox, B. J. (2023, November). The UnCODE system: A neurocentric systems approach for classifying the goals and methods of Cognitive Warfare. In HFM-361 Symposium on Mitigating and Responding to Cognitive Warfare (p. P12).
Butler, Y., Egwuatu, C., Canham, M., & Sawyer, B. D. (2025). Sex worker human digital twins and intellectual property: On collaborative futures for sex thinkers, technologists, and lawyers. Porn Studies, 1–16.
Calo, R. (2014). Digital market manipulation. The George Washington Law Review, 82(4), 995–1051.
Canham, M., & Hegarty, M. (2010). Effects of knowledge and display design on comprehension of complex graphics. Learning and Instruction, 20(2), 155–166.
Canham, M., & Sawyer, B. D. (2020). Neurosecurity: Human brain electro-optical signals as MASINT. American Intelligence Journal, 37(1), 40–47.
Canham, M., & Sawyer, B. D. (2023). Me and my evil digital twin: The psychology of human exploitation by AI assistants [Conference presentation]. Black Hat USA, Las Vegas, NV. https://www.youtube.com/watch?v=qjhfWWEQCgQ
Canham, M., Posey, C., Strickland, D., & Constantino, M. (2021). Phishing for long tails: Examining organizational repeat clickers and protective stewards. SAGE Open, 11(1).
Canham, M., Sütterlin, S., Ask, T. F., Knox, B. J., Glenister, L., & Lugo, R. G. (2022). Ambiguous self-induced disinformation (ASID) attacks: Weaponizing a cognitive deficiency. Journal of Information Warfare, 21(3), 43–58.
Chen, H., & Magramo, K. (2024, February 4). Finance worker pays out $25 million after video call with deepfake 'chief financial officer.' CNN.
Cialdini, R. B. (2009). Influence: Science and practice (5th ed.). Allyn & Bacon.
Cognitive Security Institute. (CSI, n.d.). Home. Retrieved February 2, 2026, from https://www.cognitivesecurityinstitute.org
Cognitive Security Institute. (CSI, n.d.). Cognitive Attack Taxonomy. Retrieved February 2, 2026, from https://cognitiveattacktaxonomy.org/
Endevr. (2023). How Scammers Con People out of Their Savings. https://www.youtube.com/watch?v=yB8QL8WQgFg
Federal Bureau of Investigation. (2025). 2024 Internet crime report. Internet Crime Complaint Center.
Festinger, L. (1957). A theory of cognitive dissonance. Stanford University Press.
Greenlee, E. T., Funke, G. J., Warm, J. S., Sawyer, B. D., Finomore, V. S., Mancuso, V. F., & Matthews, G. (2016, July). Stress and workload profiles of network analysis: Not all tasks are created equal. In Advances in Human Factors in Cybersecurity: Proceedings of the AHFE 2016 International Conference on Human Factors in Cybersecurity, July 27-31, 2016, Walt Disney World®, Florida, USA (pp. 153-166). Cham: Springer International Publishing.
Gutzwiller, R. S., Fugate, S., Sawyer, B. D., & Hancock, P. A. (2015). The human factors of cyber network defense. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 59(1), 322–326.
Hancock, P. A., & Warm, J. S. (1989). A dynamic model of stress and sustained attention. Human Factors, 31(5), 519–537.
Hutchins, E. (1995). Cognition in the wild. MIT Press.
Lee, J. D., & See, K. A. (2004). Trust in automation: Designing for appropriate reliance. Human Factors, 46(1), 50–80.
Lessig, L. (2009). Code and other laws of cyberspace. Basic Books.
Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. W. H. Freeman.
Matz, S. C., Kosinski, M., Nave, G., & Stillwell, D. J. (2017). Psychological targeting as an effective approach to digital mass persuasion. Proceedings of the National Academy of Sciences, 114(48), 12714–12719.
Norman, D. A. (2013). The design of everyday things (Rev. and expanded ed.). Basic Books. Office of Strategic Services. (1944). Simple sabotage field manual (Strategic Services Field Manual No. 3).
Rensink, R. A., O'Regan, J. K., & Clark, J. J. (1997). To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8(5), 368–373.
Sawyer, B. D., & Canham, M. (2025). Evil digital twin, too: The first 30 months of psychological manipulation of humans by AI[Conference presentation]. Black Hat USA, Las Vegas, NV. https://www.youtube.com/watch?v=XOMJcT-DrlY
Sawyer, B. D., Finomore, V. S., Funke, G., Warm, J. S., Matthews, G., & Hancock, P. A. (2016). Cyber vigilance: The human factor. American Intelligence Journal, 32(2), 157–165.
Sawyer, B. D., & Hancock, P. A. (2018a). Hacking the human: The prevalence paradox in cybersecurity. Human Factors, 60(5), 597–609.
Sawyer, B. D., Seppelt, B., Mehler, B., & Reimer, B. (2017). Trust impacts driver glance strategy in multitasking. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 61(1), 1441–1442.
Schneier, B. (2023). A hacker's mind: How the powerful bend society's rules, and how to bend them back. W. W. Norton & Company.
Skrypchuk, L., Langdon, P., Sawyer, B. D., & Clarkson, P. J. (2019). Unconstrained Design: Improving Multitasking with In-Vehicle Information Systems Through Enhanced Situation Awareness. Theoretical Issues in Ergonomics Science, 1-37.
Wickens, C. D. (1992). Engineering psychology and human performance (2nd ed.). HarperCollins.
Wixey, M. (2023). Human Side-Channels: Cyberweapons and Cyber Defenses [Presentation]. Cognitive Security Institute. https://www.youtube.com/watch?v=bcemPprQmFc
World Economic Forum. (2026). Global cybersecurity outlook 2026. https://www.weforum.org/publications/global-cybersecurity-outlook-2026/
Zimmermann, H. (1980). OSI reference model — The ISO model of architecture for open systems interconnection. IEEE Transactions on Communications, 28(4), 425–432.
