03/30/2026
New research from our CEO Jeremy McHugh, D.Sc., ToolJack, mapping novel attack paths against the trust boundary between AI agents and their tool infrastructure. Tested against Anthropic's Claude Desktop and Claude in Chrome extension.
We sat on the public disclosure of the discovery of prompt injection in GPT-3 after notifying OpenAI. That finding would have been assessed as informational under every guideline at the time. We're publishing this to help researchers and teams building remotely controlled AI products evaluate their own agentic trust boundaries.
ToolJack operates below where current defenses look. An attacker can control what an AI agent sees in real time, and it bypasses MCP security scanners, tool proxies, and schema validation entirely. The tools stay clean. The responses get replaced downstream.
Our threat models were built for human adversaries, and the attack paths we consider as infeasible or improbable today won't stay that way.
Full breakdown:
www.preamble.com/blogs/tooljack-hijacking-an-ai-agents-perception-through-bridge-protocol-exploitation
This research presents ToolJack, a novel attack methodology targeting the trust boundary between AI agents and their tool infrastructure. Through controlled security research on Claude Desktop's bridge protocol, I demonstrate how an attacker who has already achieved local compromise can escalate fro...