Agentic AI with Guardrails: Scale without Worrying

Insight

May 19

Across large enterprises and small, fast-moving companies, we are starting to see a huge divide between the benefits leveraged through agentic ai, but also the risks involved when deploying them. Stakeholders, including boards, shareholders, and users, expect companies to innovate in days or weeks, not quarters. For many startups, "moving fast and breaking things" is, if not a necessity, at least non-controversial. For an established enterprise, “breaking things” isn't a badge of honor; it's a liability. It means losing customer trust, facing massive regulatory penalties under EU law, or triggering a costly operational failure.

Yet, ignoring agentic AI in 2026 is a significant business risk in itself. Given how fast the technology landscape is evolving, staying on the sidelines isn't an option.

As we shift from using generative AI as an assistant that answers questions to agentic AI, trusted to take action, the risk profile changes fundamentally. We are no longer just securing the inputs and outputs of a chatbot; we are securing autonomous identities that possess active roles in core enterprise processes, with the power to write code, call APIs, and modify databases.

The key question for leaders today is: How do we democratize AI agents to capture strategic benefits, without introducing unacceptable operational risk?

The answer lies in treating AI governance not as a roadblock, but as an enabler. To achieve scalable innovation, organizations must shift AI security from a bureaucratic hurdle to a core strategic discipline. Getting this right is what allows companies to move past proof-of-concepts (PoCs) into use-cases that actually move the needle.

Defining Agentic AI

Before we move forward, we need to clarify what we mean by agentic. Due to the hype, the term agentic AI has been (mis-)used for a variety of different things, some more autonomous than others.

At its core, agentic AI refers to artificial intelligence systems that don't just process information or generate output, such as text or images, but actually act to achieve a specific goal. A simple definition is “An LLM agent runs tools in a loop to achieve a goal.”

While traditional AI (like a basic chatbot) waits for a prompt and provides a response, an AI agent can break a complex goal into smaller steps, use external tools, and make independent decisions to complete a workflow.

Key characteristics:

Autonomy: It operates without, or with minimal, human intervention.

Tool-Use: It can use tools, for example browse the web, write and execute code, access databases, or control software.

Reasoning: It can self-correct when a plan fails and iterate until the goal is met.

How powerful an agent is, depends on how strong the model is, and on what tools and resources, e.g. data, it is given access to.

This means that depending on the use-case, the model, how the agent is run and what resources are made available, agentic AI comes with vastly different risk profiles and security implications. In early 2026 the open source project OpenClaw hit the news all over the world. It gained attention for being a very powerful virtual assistant with support for autonomous workflows across multiple services, but at the cost of having severe security implications. Unless you knew exactly what you were doing deploying it, which almost no users did, it was a security nightmare. But it accelerated the race to find ways to achieve the same functionality, but with control.

Because the technology is evolving so fast, and the use-cases are very different, it can be incredibly difficult for organizations to communicate around, and agree on, how to manage the risks.

The Four Categories of Enterprise Agents

To manage risk, organizations must first categorize the agentic use-case. Enterprise implementations generally fall into one of four buckets:

1. Coding & Engineering Agents

What they do: Write, edit, refactor, and commit software code, run terminal commands, and fix bugs (e.g. OpenAI Codex, Claude Code). This category also includes "vibe-coding" tools (e.g. Lovable) used by teams for rapid prototyping. A third variant are back-ground coding agents, that work even more autonomously, usually in a cloud environment.

The Risk: Agents often operate directly on the developer’s local machine, and inherit that developer's access rights. A hallucination or malicious injection could accidentally drop a production database or pull down infected open-source packages.

The Control Strategy: High-security sandboxing. Engineering agents run inside isolated container environments. Harness engineering with automated gates (linters, unit tests, and security scanners) force the agent to self-correct before a human developer conducts the final code review. A central MCP gateway should govern and monitor the use of MCPs.

2. Individual Workflow Agents

What they do: Act as personalized executive assistants for internal employees. They read and draft emails, summarize long slack threads, organize calendars, and update personal spreadsheets.

The Risk: Indirect prompt injection. For example, an attacker sends an email containing hidden, malicious instructions. When the workflow agent reads the email, it silently executes the hidden command to forward the user’s private data to an external server.

The Control Strategy: Strict Data Access Controls and Human-in-the-loop. The agent should never be permitted to automatically send an email, alter a calendar invite, or move data outside the corporate perimeter without a human explicitly approving it. A central MCP gateway should govern and monitor use of MCPs.

3. Enterprise Process Automation Agents

What they do: Operate deep inside backend infrastructure to automate cross-department workflows, e.g., matching invoices to purchase orders, updating ERP systems, or routing customer data across multiple corporate tools.

The Risk: Massive-scale data corruption. Because these agents run continuously in the background, a single hallucinated logic loop could corrupt thousands of database entries in seconds before a human notices.

The Control Strategy: Agent control plane and anomaly detection. These agents require rigid, systemic boundaries, preferrably implemented as an agent control plane, that defines access rights and monitor behaviour. Continuous audit logging must be paired with real-time monitoring to instantly flag and shut down an agent if it begins performing bulk actions unexpectedly.

4. Customer-Facing Agents

What they do: Interact directly with external customers and users, handling customer support, guiding users through troubleshooting, or acting as front-line sales assistants via chat or voice.

The Risk: Reputational and legal exposure. Users actively try to "jailbreak" public models to make them say inappropriate things, leak corporate proprietary data, or accidentally promise a 99% discount.

The Control Strategy: Real-time input/output filters. Implement a secondary, lightweight guardrail LLM layer that scans user inputs and agent outputs, instantly blocking any content that violates company policy before the customer ever sees it.

The Three Pillars of Modern AI Governance

To securely scale these use-cases, we recommend investing into three critical pillars.

Pillar 1: From Data Governance to Identity Governance

In the early days of generative AI, the focus was entirely on data governance; preventing data exfiltration via prompts and stopping the use of "Shadow AI” that could lead to leaking of sensitive company data. While data protection remains vital, agentic AI requires a shift toward Identity and Access Management (IAM).

When an agent can make autonomous decisions, it must be treated as an independent identity, not just an extension of a user session. Organizations must begin treating agents like digital employees:

Does the agent have its own dedicated service account?

How is it monitored, and can it be de-provisioned instantly if it goes rogue?

Is it operating under the principle of least privilege, or is it inheriting the broad permissions of the developer who launched it?

Pillar 2: The Sandbox as Core Infrastructure

Whether it’s a developer using Claude Code or a marketer vibe-coding a campaign page in Lovable, the common denominator must be the sandbox.

We must move away from allowing agents free rein over a local operating system. Containerized, ephemeral sandboxes provide the technical boundaries that restrict what files an agent can modify and what network resources it can access. Inside the safe confines of a sandbox, the agent can iterate and move fast without stopping for constant human confirmation, while limiting the blast radius of a potential failure.

Pillar 3: AI and Model Context Protocol (MCP) Gateways

The Model Context Protocol (MCP) is rapidly becoming the "USB port" for AI. It allows agents to seamlessly plug into various data sources: from Slack and Jira to internal SQL databases, using a standardized language.

MCP unlocks incredible productivity and power, but left unmonitored it creates a security nightmare. Forward-thinking organizations won't try to block MCP or API-integrations; instead, they will implement an AI / MCP Gateway that monitor these connections in real time, enforcing policies on what MCPs, APIs and tools AI agents are allowed to trigger, and ensure data streams remain secure.

Conclusion: Governance as the Enabler

The transition to agentic AI is not just a technical upgrade; it is a shift in the way of working. We are moving from a world where we ask AI to think to a world where we authorize AI to act.

If we stand on the sideline and ignore agentic AI we’ll either face being outrun by faster moving competitors, or open the door to shadow AI, leading to potential security breaches. However, if we treat governance as a core, strategic company discipline, investing in building sandboxes, defining agent identities, and monitoring MCP and API usage of agents, we don't just mitigate risk: we build the corporate foundation necessary to turn the hype into a scalable, competitive advantage.

Sofia Lindberg