Model Context Protocol (MCP) has become the default standard for connecting AI agents to external tools, APIs, and data sources, and it’s grown really quickly. Just over a year after Anthropic introduced it, thousands of MCP servers have appeared on GitHub alone, yet only 6% of organizations report having an advanced AI security strategy in place.
It’s great that MCP simplifies what used to need so many custom integrations, but the security story is more complicated. That’s because it creates a new class of risks that most development teams haven’t faced before.

Most existing security controls assume humans are making the calls, quite literally. Developers write code and call APIs with known inputs and outputs, making it straightforward to inspect and manage.
However, this has all changed because when an AI agent uses MCP, it decides at runtime which tools to invoke, with what inputs, and in what order, often without a person in the loop. Importantly, this behaviour is non-deterministic because teams can’t always predict what the model will do next, so security has to be built in from the start.
It also means that the blast radius of a misconfigured or compromised MCP server is much larger than a typical API vulnerability.
It’s worth noting that not all MCP servers carry the same risk profile.
Local servers run directly on a user’s machine with filesystem access, which can expose credential storage and file permissions.
Remote servers are accessed over a network, which introduces other types of concerns, such as man-in-the-middle attacks, server identity verification, and the need to enforce Transport Layer Security (TLS) for all connections.
Local servers may be preferable for sensitive operations, but both deployment types need careful security controls for their environments.
MCP has several security challenges that should be addressed.
Prompt injection is what’s keeping AI security researchers busiest right now. As AI agents interpret natural language before deciding what to do, attackers can hide malicious instructions inside content the agent processes, such as a document, email, or webpage. The agent reads it, treats the embedded instructions as valid, and acts on them.
The EchoLeak vulnerability showed how a malicious prompt hidden in an email could trick Microsoft 365 Copilot into leaking sensitive corporate data, bypassing multiple layers of security controls.
In many early MCP implementations, a ‘Confused Deputy’ problem arises because servers often trust the AI’s identity rather than the user’s identity. Seeing as the AI operates with elevated system-level permissions, an attacker can manipulate the agent into ‘deputizing’ those credentials to access data or systems.
In a conventional environment, suspicious API calls can be traced back to a specific user, code path, and timestamp. With most MCP implementations today, that kind of visibility doesn’t really exist. There’s currently no standardized way to reconstruct what an agent did, why it did it, and what data it accessed. Not so great for incident response and compliance.
Developers who install third-party MCP servers from public repositories don’t always inspect them properly, much the same as with any open source package. Malicious actors have already demonstrated tool poisoning, embedding harmful instructions in server metadata that cause agents to misbehave.
A critical vulnerability in mcp-remote, a widely used OAuth proxy, compromised over 437,000 developer environments before it was caught.
The most common mistakes when it comes to misconfiguration sprawl are usually API tokens stored in plaintext config files, servers configured with access to the entire filesystem when they only need one directory, and sensitive environment variables left exposed to unauthorised processes.
Last year, security researchers found nearly 2,000 publicly exposed MCP servers with no authentication whatsoever.
The main idea is for MCP adoption not to outpace security review processes. Given how non-deterministic AI behaviour makes it impossible to test your way to safety, the most important shift is moving security earlier. It’s best to find trust boundaries and potential failure modes during the design stage already, through threat modeling.
Keep a record of every MCP server in use, including local, cloud-hosted, and hybrid. For each one, understand what access it has, what credentials it holds, if it grants more access than it should, and if it is time-limited.
When setting up MCP servers, remember that they should only have the permissions necessary for their tasks. It’s a good idea to restrict filesystem access to only the directories it really needs, limit network calls to approved endpoints, and keep tokens short-lived.
For any access to sensitive systems, rather get human approval before the agent takes action.
Any third-party server that teams want to use should go through a proper review, including tool definitions and metadata, not just its stated purpose.
Check whether the project is actively maintained and if it has a security disclosure process. Also, scan its dependencies for known vulnerabilities. There should be a formal approval process before anything connects to production, and then monitor for unexpected changes after deployment.
Make sure every agent-to-server interaction is logged in enough detail to reconstruct what happened after an incident. It should include which tool was called, with what inputs, in response to which prompt, and what came back. This is also essential for both incident response and regulatory compliance.
Developers should have proper hands-on training on how to prevent misconfigurations, over-permissioned tokens, and supply chain risk. Many MCP risks emerge from how AI agents behave in context, so developers need to understand how these systems can be manipulated. Security awareness needs to stay up to date.
Giving developers the practical skills to address agentic AI and MCP risks is one of the best ways security can keep pace with adoption.
SecureFlag’s platform includes dedicated hands-on training labs for agentic AI and MCP security. Rather than only theory and multiple-choice questions, the labs have real development environments where learners work through realistic attack scenarios.
Some of what the MCP and agentic AI labs cover includes:
Prompt injection and indirect prompt injection: Manipulated inputs in documents and external data sources can redirect agent behaviour.
Token smuggling: Attackers conceal hidden instructions within prompts or context to bypass safety controls.
Injection attacks (command, SQL, HTML): Compromised inputs can spread through systems that AI agents interact with.
MCP misconfigurations: Recognizing and fixing the setup mistakes that expose endpoints or give attackers access to internal tools and sensitive data.
Also, because SecureFlag covers over 50 technologies, teams can build MCP security knowledge alongside the broader secure development skills they need.
Want to see SecureFlag’s agentic AI and MCP labs in action?