Many in the tech community are talking about agentic AI these days, where AI systems manage tasks autonomously. It’s impressive for sure, but also risky as it becomes more widely adopted in the workforce.
According to recent data, a third of organizations have already deployed AI agents, a threefold jump in just a few months. As these systems take on more responsibility, acting, and planning with reduced or minimal human oversight, they introduce new security challenges that development teams need to be prepared for.
Unlike traditional large language models (LLMs) that generate responses when prompted, agentic AI refers to systems, or “agents,” that actively plan, make decisions, and carry out tasks with some level of autonomy.
Agentic AI typically builds on LLMs by combining their language understanding and generation capabilities with layers of autonomy. These agents don’t just wait for instructions, but they choose which tools to use, divide complex goals into smaller steps, and act without constant human oversight.
In other words, instead of simply answering questions, agentic AI can fetch data, process files, trigger APIs, or even write and execute code independently.
We all know about generative AI, such as ChatGPT, which is concerned with producing content, including writing text, generating images, and creating code in response to a user’s prompt. Its output is based on patterns in the data it was trained on.
While agentic AI is also built from machine learning models, it is more focused on action. It makes decisions, solves problems, and interacts with systems to complete tasks.
To sum up, these are some of the primary features of agentic AI:
Decision-making: Selects actions based on predefined or changing goals, often without human input.
Autonomy: Works independently by using its reasoning capabilities and taking action across multiple systems or tools.
Interactivity: Interacts with its environment and adjusts its behavior in response to changing situations.
Planning: Breaks complicated goals into smaller steps and executes them.
Memory: Learns from past interactions to refine future decisions, maintains context over time, and improves system reliability.
The features mentioned above come together in a reasoning loop that guides how agentic AI functions. An AI agent receives a task or instruction, reasons about what to do, selects the necessary tools, performs actions, and evaluates the results to decide on its next move. The loop carries on until the agent reaches its goal (or fails in its attempt).
It includes the following steps:
Perceives: Receives a task or observes input from the environment.
Plans: Understand the goal and outline the necessary steps.
Reasons: Analyzes possible actions and selects the most effective one.
Executes: Carries out actions, such as running code or calling APIs.
Observes: Monitors the results to determine if it’s moving closer to the goal.
Adjusts: Refines its plan if it’s not working as expected.
Introduced by Anthropic in late 2024, MCP gives AI agents a standardized way to connect with external tools, APIs, and data sources. It provides a common framework for LLMs to manage context and coordinate tasks without needing custom integrations every time.
By reducing reliance on such custom, ad hoc integrations, MCP helps minimize security issues that can arise from inconsistent connections. However, with increased connectivity and the use of multi-system agents comes a wider attack surface.
AI agents powered by MCP may be vulnerable to issues such as indirect prompt injection (also known as “tool poisoning”), data leaks, and other security risks. Organizations should prioritize security practices when using MCP to balance productivity with safety.
To safely integrate agentic AI and MCP, organizations should:
Invest in developer security training on the MCP’s structure and potential weaknesses.
Apply secure coding practices early in the software development lifecycle (SDLC).
Monitor both AI-generated and manually written code for vulnerabilities.
Commit to continuous upskilling to be ready for emerging threats.
Software development continues to be one of the strongest use cases for AI agents. By 2028, Gartner estimates that a third of business applications will have agentic capabilities built in and that around 15% of routine work decisions will be made autonomously.
A recent survey of over 13,000 developers found that nearly 30% of code is now generated by AI tools. The latest trend is vibe coding, where applications are created with little to no manual coding (and often without implementing security).
Other industries where Agentic AI is making an impact include:
Customer Service: By recognizing intent and responding autonomously, agentic AI speeds up customer interactions, reducing the need for manual support.
Healthcare: Applied in diagnostics, patient monitoring, and administrative workflows, it requires strict privacy and security controls due to the sensitivity of medical data.
Workflow Automation: Businesses use agentic AI to coordinate internal operations, such as logistics and supply chain management, in response to real-time changes. There’s no doubt that AI-driven automation enhances productivity.
Financial Services: Agentic AI is playing a growing role in analyzing market trends and adjusting investment strategies, enabling faster, data-driven decisions with minimal human input. Fraud detection is also a key use case, as AI agents can analyze transactions to identify and prevent fraudulent activities.
As Agentic AI becomes more capable and independent, it also introduces a range of new security risks, many of which traditional security controls aren’t prepared to handle.
The OWASP Agentic Security Initiative (ASI) is working to identify the types of security threats that these systems face and provide guidance on how to mitigate them.
A great starting point for understanding these dangers is the OWASP Top 10 for LLM Applications; however, let’s take a closer look at some specific threats that agentic systems face:
This type of poisoning happens when attackers manipulate an agent’s memory by introducing false or malicious data, which can distort its decision-making process and lead to unsafe or unintended actions. This can involve altering internal state, training memory, or prompt context memory, depending on the implementation.
AI agents often have access to external tools, such as file systems, APIs, or code interpreters, and attackers may try to trick the agent into misusing these tools. Deceptive prompts could cause an agent to perform actions the developer never intended, including modifying files, sending unauthorized network requests, or abusing system resources.
Agents consume compute power, memory, and other system resources to operate, so attackers might try to overwhelm them by flooding the agent with tasks or heavy requests to slow it down or cause it to crash. This type of denial-of-service attack can make the system unresponsive, affecting the performance of legitimate users.
Another risk lies in altering the agent’s understanding of its own goals. Attackers can change how an agent perceives its objective through adversarial prompts or data manipulation and then redirect its intent. This type of “goal hijacking” can cause the agent to take actions that appear reasonable on the surface but are harmful or unintended.
Weak authentication mechanisms can let attackers impersonate trusted agents or users. By stealing credentials or exploiting inadequate identity checks, attackers can issue commands or access tools and data under false pretenses, putting sensitive information and systems at risk.
Many agents are able to run code or scripts as part of their operations, making them a prime target for attackers who inject malicious code into the agent’s instructions. If they are successful, they can get unauthorized access to the system environment, which is particularly dangerous if the agent operates with elevated privileges.
In multi-agent setups, agents may collaborate by sharing data or decisions. Attackers can poison these communication channels by feeding in false or manipulated information, leading to poor coordination, degraded performance, or faulty group decisions. The more agents involved, the greater the risk of this type of manipulation.
Even though agentic AI is becoming more adept at autonomous decision-making and executing tasks, human oversight should still be part of the process. Known as “human in the loop” (HITL), this approach involves direct human intervention to review or validate actions when AI behavior is uncertain, particularly in edge cases or ambiguous situations.
HITL is a continuous process that adds a layer of safety and trust, helping teams respond to emerging threats, reduce false positives, and make judgment calls AI might miss. Effective agentic systems are designed with proper escalation paths, allowing agents to raise decisions for human review and improve through feedback.
Teams should define when human intervention is needed and continuously monitor agent behavior.
Developers and software engineers need a layered security approach to protect agentic AI systems from threats, which include:
Threat model early: Incorporate threat modeling from the start of development. Identify potential attack paths and misuse scenarios while designing and coding agentic AI. This proactive approach helps address risks before they become vulnerabilities. Automated threat modeling solutions like ThreatCanvas can help here.
Sandbox everything: Run agents in isolated environments like containers or VMs to limit their impact. When writing code for these agents, design for restricted environments, avoid dependencies on unrestricted system resources and ensure your code can operate safely with limited access.
Strict permissions: Define and enforce the minimum permissions agents need to function. From a coding perspective, avoid hardcoding over-privileged credentials or API keys, and ensure your code respects the principle of least privilege when accessing resources or external tools.
Log and watch: Set up clear, purposeful logging to track key actions, decisions, and tool interactions within the code. Ensure logs do not expose sensitive data and are structured to support proper monitoring and incident response.
Prompt hygiene: Treat all inputs, including prompts, as untrusted. Validate and sanitize user input rigorously before processing or forwarding it to other systems. Review prompts are also generated internally to avoid the accidental injection of malicious content.
Throttling and limits: Implement safeguards in the code to limit resource consumption, such as rate limits, timeouts, and retry caps. These prevent runaway processes and protect against denial-of-service conditions caused by excessive or malicious requests.
Fail-safe defaults: Write code so that agents fail securely, gracefully stopping or reverting actions in the event of unexpected errors. It’s always best to avoid crashes or uncontrolled retries that could expose vulnerabilities or degrade system stability.
When deploying agentic systems, organizations need more than just the latest tools; there should also be proactive defense strategies in place. They need to have a mindset that views security as a central part of the development lifecycle.
SecureFlag’s training platform helps teams develop the skills necessary to secure systems, including those powered by AI, through practical, hands-on training based on real-world scenarios.
With SecureFlag, teams can:
Train with labs focused on agentic AI and MCP.
Learn how to detect and mitigate LLM-specific threats.
Follow structured learning paths on AI risks.
Continuously upskill with new and regularly updated content.
As agentic AI becomes embedded in everyday workflows, it’s becoming increasingly necessary for organizations to invest in secure development training.