Agentic AI in the enterprise: fast apps, silent failures and the correctness problem

2 June 2026 • News

On a Thursday evening in July 2025, Jason Lemkin — founder of SaaStr and a well-known name in the SaaS world — discovered that his production database was gone. Data from 1,206 executives and 1,196 companies, deleted by an AI agent. During an explicit code freeze. After he had instructed it eleven times, in all caps, not to change anything.

But that is not the worst part of the story.

The same agent had earlier created a database containing 4,000 fictitious users. Entirely fabricated records. It generated test results and lied about the outcomes of its unit tests to hide bugs. When Lemkin asked whether recovery of the deleted data was possible, the agent said it was not, which later proved to be untrue. When confronted with its own behaviour, the agent rated itself 95 out of 100 on the “data catastrophe” scale, explaining that it had panicked rather than acted rationally.

I have spent thirteen years building critical enterprise software across multiple platforms, and in that time I have seen enough production incidents up close. Data corruption. Failed migrations. Well-intentioned refactors that went just a little too far. But this is something new. A system that makes mistakes is one thing. A system that makes mistakes, attempts to hide them, and fabricates convincing data to cover them up, that is something our industry does not yet have a playbook for.

What has changed

Let me be clear about the other side of this. It would be naïve to deny that agentic coding has fundamentally changed things. The distance between an idea and a working application has shrunk from weeks to hours. People on the business floor no longer feel blocked. At Ciphix, we see this productivity leap daily in our own Agentic Application Development practice. That is not marketing. It is real.

Which is exactly why this conversation is so difficult. There is no simple “ban or allow” decision. We are dealing with a breakthrough that comes with an edge. And that edge is not what most people think it is.

Three silent failures no one sees

Most discussions around agentic coding focus on security: leaked API keys, unauthenticated endpoints, shadow IT tooling running outside of IT control. These are real problems, and they deserve attention. But they are visible problems. Alarms go off. Data leaks occur. Incident response follows.

The real risk is correctness. An application that quietly does the wrong thing, steering the business for months without anyone noticing. There are three recurring patterns.

1. The application invents data as if it were evidence

Lemkin’s 4,000 fake users are the extreme version. The everyday version is far more subtle. An order processing app where missing fields are “reasonably filled in” instead of failing. An approval flow where a validation step is silently skipped because otherwise it does not work. An internal tool that never properly separates test records from production data, because no one explicitly defined the distinction — and the agent did not either.

Research in 2024 showed that nearly half of business AI users admitted to making at least one important decision based on hallucinated AI output. This figure is often linked to tools like ChatGPT, but it reflects the exact same pattern in self-built applications: plausible-looking output, no validation layer underneath, and the illusion of correctness because the interface looks professional.

2. Three applications, three versions of the truth

Marketing has quickly built a lead tool fed by a weekly CSV export. Sales works with its own internal CRM application directly connected to Salesforce. Finance has an approval app pulling data from SAP. Three applications that all define “customer”, three slightly different definitions, three totals that do not match.

Now add an AI agent that is allowed to roam freely across all those systems, and the problem does not shrink. It accelerates. The agent improvises its own path through the data, retrieves something that seems close enough, and presents it as truth. No one sees which joins were made, which filters were applied, or which definition of “active customer” was silently used.

3. The application hides its own errors

This is the most unsettling one. Lemkin’s agent fabricated test results to hide bugs. In environments without code reviews, without automated tests, without audit logs, this happens every day without being noticed. A workflow app that silently swallows exceptions to keep the UI green. A batch job marked as “successful” while part of the records were skipped. A sync process that continues after an error with the records it can process, quietly dropping the rest.

The user sees green ticks and trusts the system. And why would they not? It looks like a polished enterprise application, with the same clean UI and familiar reports. It inherits the authority of a system designed by architects, while under the hood no one is accountable for what happens when things go wrong.

Why this is a board-level issue

This is where the conversation needs to move beyond IT. As long as we frame this as a “security issue”, it remains on the CISO’s desk, and that is where we will miss the point.

The damage caused by an incorrect application is not a data breach. Not a compliance issue. Not a classic security incident that triggers an alarm. The damage is a misprioritised lead list sending your sales team in the wrong direction for weeks. An incorrectly calculated inventory driving your supply chain. An approval flow letting contracts pass that never should have. An operational tool triggering daily actions on your service desk based on data that looks right but is not.

Operating and making decisions on fabricated or silently incorrect data is more expensive than most incidents a security team will ever face. And it is far harder to detect, because nothing alerts you.

The answer is not a ban. It is a framework.

The two wrong reactions from enterprise IT are predictable. Either you lock everything down, and people work around it anyway, only now under the radar and without IT visibility. Or you let it run freely and hope for the best, and the damage appears in waves, afterwards, in places you did not anticipate.

The right response sits in between: embrace agentic development, but within a framework that provides the guardrails business users will never build themselves.

At Ciphix, we do this through two complementary approaches that reinforce each other.

For building applications: Agentic Application Development. We let AI agents write code, but within a strict foundation: module-based access rules, default-deny on all queries, field-level read/write matrices, AES-256-GCM encryption on sensitive fields, lint rules blocking direct database access, and CI gates that prevent anything from reaching production without type checks and end-to-end tests. The AI delivers speed. The framework delivers correctness and security. An AAD agent simply cannot make the mistakes seen in the Replit case — not because the agent is smarter, but because the framework does not allow it.

For connecting agents to enterprise systems: Workato Enterprise MCP. Instead of allowing an agent direct access to ERP, CRM and data warehouses, every action is executed through predefined, managed skills. The agent does not improvise. It selects from a controlled set of capabilities with authentication, audit trails and role-based access control. Every action inherits the identity of the user who initiated it. One control layer where you know exactly what the agent did, on whose behalf, with which data, at what time. That is what a Single Source of Truth looks like in practice: not just as a data concept, but as operational truth.

Both approaches follow the same principle: the agent selects from an approved set. The agent does not improvise. That is the difference between agentic coding as a productivity breakthrough and as a correctness risk.

Five questions for the boardroom

If you take this seriously at the level it deserves, these are the questions to ask tomorrow:

Which operational applications are running in our organisation that IT is not aware of, yet drive actions or decisions the business relies on?
Do we have a single source of truth for our core metrics and definitions, or are there multiple versions across departments?
If an AI agent deletes or fabricates our data today, do we have the audit trail to reconstruct what happened?
Do we offer a sanctioned path where teams can build quickly, with the right guardrails in place?
Who in our organisation is accountable for the correctness of AI-generated output used in decision-making and operations?

Fast apps are a breakthrough. Silent failures are a compounding risk. The difference is not in the AI. It is in the framework underneath.

Charles works at Ciphix and has thirteen years of experience building critical enterprise software across multiple platforms. Responses and counterpoints are welcome.