AI Strategy

Forbes Says You Can Build an AI Agent in a Day. Here's What Happens on Day Two.

Associates AI ·

Forbes published a list of 10 AI agents any small business owner can build in a day using Claude Cowork, Abacus, or Copilot Studio. They're right — building is the easy part. But 93% of SMB owners report positive AI results while only 14% have embedded agents into daily operations. The gap between building an agent and running one that compounds value without compounding risk is where most businesses stall. Here's how to close it.

Forbes Says You Can Build an AI Agent in a Day. Here's What Happens on Day Two.

You Can Build an AI Agent Before Lunch. The Question Is Whether It Still Works by Friday.

On March 27th, Forbes published a piece titled "10 AI Agents For Small Business That Give Immediate Relief." The article walks through ten specific agents a small business owner can build using Claude Cowork, Abacus DeepAgent, or Microsoft Copilot Studio — from client FAQ bots to inventory reorder agents to multi-agent coordination systems. The core claim: a non-technical business owner can have a working agent running in less than a day.

That claim is accurate. The tools have genuinely gotten that good. A small business owner who sits down with Claude Cowork on a Monday morning can have a customer question-answering agent handling real inquiries by Monday afternoon.

But here is what the article mentions only in passing and what most businesses discover the hard way: Gartner predicts that 40% of enterprise applications will have built-in AI agents by the end of 2026, up from less than 5% in 2025. That is an eight-fold increase in twelve months. Meanwhile, data from the AI Agent Store's weekly roundup shows that 93% of small business owners report positive results from AI — but only 14% have embedded it into daily operations.

That is a 79-point gap between "this works" and "this is part of how we run." And the gap is not about the technology. It is about what happens after you build the thing.

The Autonomy Decision Nobody Is Making Deliberately

The Forbes article introduces a five-level autonomy model that deserves more attention than it got. Level 1 is a basic scripted chatbot. Level 2 adds reasoning. Level 3 runs repeatable workflows. Level 4 makes decisions within guardrails. Level 5 coordinates multiple agents across systems.

The article's advice is sensible: start at levels 2 and 3, then move up once the agent has established trust. But in our experience building the platform businesses use to configure and run these agents, the autonomy level is the single most consequential decision a business makes about an agent — and almost nobody makes it deliberately.

What we see instead is drift. A business builds a Level 2 agent that answers customer questions from a knowledge base. It works well. Somebody adds a feature where the agent can update a CRM record. Then somebody connects it to the scheduling system. Then it is booking appointments on behalf of the business. Then it sends a follow-up email that was never templated or approved.

Nobody sat down and said, "We want a Level 4 agent that makes autonomous decisions about customer scheduling and outbound communication." It just happened, one capability at a time, with no architectural plan and no governance layer.

This is how the 93%-to-14% gap opens. The initial agent works. The expanded agent works most of the time. And then one Tuesday it books a client into a slot that does not exist, sends a confirmation email with the wrong address, and the business owner spends three hours on damage control. The agent gets turned off. The experiment is declared a failure.

It was not a failure of the technology. It was a failure of operational design.

What the Five Levels Actually Require

The Forbes article correctly identifies the levels but understates what each one demands from the business operating it. Here is what we have learned building the configuration and governance layer that businesses use to run agents in production across multiple industries.

Level 1-2: Retrieval and Reasoning

These agents answer questions and generate content based on knowledge you provide. A customer FAQ agent. A document summarizer. A research assistant.

What they require: A curated, accurate knowledge base. A review cadence to catch when the knowledge goes stale. A defined scope boundary so the agent does not start answering questions outside its domain.

Where businesses fail: They load the knowledge base once and never update it. Six months later the agent is confidently giving answers based on last year's pricing, last quarter's policies, and a product that no longer exists. The agent is not hallucinating — it is accurately reporting outdated information, which is worse because it sounds authoritative.

The fix: Schedule a monthly review of every knowledge source your agent accesses. If nobody owns the review, the knowledge will rot. This is a management problem, not a technology problem.

Level 3: Workflow Automation

These agents execute multi-step processes. Invoice generation. Report compilation. Data entry across systems. Onboarding checklists.

What they require: Well-defined workflows with clear inputs, outputs, and error handling. Monitoring that detects when a workflow produces an unexpected result. A fallback path — what happens when the agent encounters a case it was not designed for.

Where businesses fail: They automate the happy path and ignore the edge cases. The agent handles 95% of invoices correctly. The other 5% — the ones with unusual tax jurisdictions, split billing, or custom terms — get processed with default settings that are wrong. Nobody checks because the automation is supposed to be handling it.

The fix: Build your error handling before you build your automation. Define the cases the agent should not attempt, and route those to a human queue. A Level 3 agent that knows when to stop is worth more than a Level 4 agent that does not.

Level 4: Autonomous Decision-Making Within Guardrails

These agents make judgment calls. They decide which leads to prioritize. They choose which vendor to reorder from. They determine whether a support ticket requires escalation or can be resolved autonomously.

What they require: Explicit decision boundaries encoded in the agent's instructions — not implied, not assumed, not "the agent should know." A monitoring system that tracks what decisions the agent is making and flags anomalies. A human review loop for high-stakes decisions, especially in the first 90 days.

Where businesses fail: They give the agent a goal without specifying the constraints. "Maximize appointment bookings" without "never double-book a time slot" or "do not schedule outside business hours" or "always confirm with the client before booking." The agent optimizes for the goal you gave it. If you did not specify the constraints, it will find creative ways to hit the target that you would never have approved.

This is exactly what happened to Klarna. Their AI agent resolved customer tickets faster than any human — because the goal was resolution speed, and nobody encoded the constraint that resolution quality and customer relationship preservation mattered more than speed. The agent was not wrong. It was brilliantly optimizing for the wrong objective.

The fix: For every goal you give a Level 4 agent, write down three constraints that bound acceptable behavior. If you cannot articulate the constraints, you are not ready for Level 4. Stay at Level 3 and route the judgment calls to humans.

Level 5: Multi-Agent Coordination

Multiple agents working together across systems. A marketing agent that identifies leads, hands them to a sales agent that qualifies them, which triggers an operations agent that schedules onboarding.

What they require: Everything from levels 1-4, plus clear inter-agent communication protocols, conflict resolution logic (what happens when two agents make conflicting decisions), and centralized monitoring across the entire system.

Where businesses fail: They build the coordination without the governance. Agent A updates a record. Agent B reads the stale version and makes a decision based on it. Agent C acts on Agent B's decision. By the time a human reviews the output, three agents have made a chain of decisions based on a race condition that no single agent caused and no single agent can detect.

The fix: Do not attempt Level 5 until you have operated at Level 4 for at least 90 days with clean monitoring data. Multi-agent systems multiply both the value and the risk of every decision. If you do not have the observability infrastructure to track individual agent decisions, you definitely do not have it for coordinated agent chains.

The Build-Versus-Operate Gap Is the Defining Challenge of 2026

The Forbes article is a sign of where the market is heading. Building agents is becoming a commodity skill. The platforms are good. The templates are plentiful. The barrier to creating a functional agent has dropped to nearly zero.

But here is the data that tells the real story. CX Today’s summary of McKinsey’s 2025 State of AI found that 23% of organizations have scaled an agentic system, with another 39% still in experimentation. Deloitte reported that over 40% of those projects will likely fail or be discontinued by 2027 unless they can be appropriately governed.

Let those numbers sit together. Almost two-thirds of organizations are experimenting with or scaling AI agents. Almost half of them will fail. Not because the agents do not work — because the organizations did not build the operational layer to run them.

The operational layer is not glamorous. It is knowledge base maintenance schedules. It is monitoring dashboards that someone actually checks. It is escalation paths that are defined before the first incident, not invented during one. It is decision boundaries documented in writing, not assumed from context. It is monthly reviews where someone asks: "What did the agent do this month that we did not expect?"

None of this is hard to understand. All of it is hard to sustain. And that is why the gap exists.

Why "Start Small" Is Necessary but Not Sufficient

The standard advice — and the Forbes article gives it — is to start small, build trust, then expand. That is correct as far as it goes. But "start small" without a plan for "expand deliberately" just means your agent stays small. Or worse, it drifts into higher autonomy without the governance to support it.

A deliberate expansion plan looks like this:

Month 1-2: Deploy at Level 2-3 with tight monitoring. The agent answers questions or runs simple workflows. You are watching every output. You are logging surprises — the things the agent did better or worse than expected. You are building your failure model: how does this specific agent, on these specific tasks, in your specific context, fail?

Month 3-4: Review the data and decide on expansion. You now have 60 days of production behavior. You know the failure patterns. You can make an informed decision about which tasks to delegate at a higher autonomy level and which to keep human-in-the-loop. This is a decision based on evidence, not optimism.

Month 5-6: Expand to Level 4 for specific, bounded tasks. Not everything — specific tasks where you have high confidence in the agent's performance and you have defined the decision boundaries explicitly. You are still monitoring, but now you are monitoring the decisions, not just the outputs.

Month 7+: Evaluate multi-agent coordination if warranted. Only if you have clean monitoring data across multiple Level 4 deployments. Only if you have the governance infrastructure to track inter-agent decisions. Only if you can answer the question: "If two agents disagree, which one wins and why?"

This is a slow, methodical progression. It is also the only one that consistently produces agents that compound value over time instead of creating a spectacular failure that kills the initiative entirely.

The Platform Choice Matters Less Than You Think

The Forbes article compares Claude Cowork, Abacus DeepAgent, and Microsoft Copilot Studio. All three are legitimate platforms. The differences between them — model quality, integration options, pricing — matter at the margin.

What matters at the center is your operational layer. Which platform you use to build the agent is a tooling decision. How you monitor it, govern it, maintain its knowledge, define its boundaries, and manage its autonomy level — that is an operational decision. The operational decision determines whether the agent creates value or creates incidents.

We have seen businesses deploy agents on enterprise-grade platforms and fail within 90 days because they had no governance. We have seen businesses deploy agents on simple, inexpensive platforms and generate sustained value for years because they invested in the operational layer.

The platform is the foundation. The operational layer is the building. You can pour a beautiful foundation and still end up with an empty lot.

What Production-Grade Agent Operations Actually Looks Like

Associates AI Teammates give businesses a self-serve platform for exactly this operational layer. Here is what it looks like in practice — not as theory, but as the actual configuration and governance primitives built into the platform.

Structural safety first. Agents run in isolated environments with defined network boundaries. They cannot access systems outside their scope — not because they are instructed not to, but because the infrastructure physically prevents it. Behavioral instructions like "do not access the financial system" can be ignored by a sufficiently capable agent. Network isolation cannot be ignored.

Read-only behavioral instructions. The soul documents, skill files, and configuration that define what the agent does are mounted on read-only storage. The agent can read its instructions but cannot modify them. Even under prompt injection, even if the agent identifies a reason to change its own behavior, the constraint holds.

Decision logging and anomaly detection. Every decision the agent makes is logged with the context it had when it made that decision. Monitoring flags deviations from expected behavior. If the agent starts making decisions outside its normal patterns, we know about it in minutes — not days.

Scheduled knowledge reviews. Knowledge bases are reviewed on a defined cadence — monthly at minimum, weekly for high-velocity domains. Stale information is the most common cause of agent errors that look like hallucinations but are actually the agent accurately reporting outdated facts.

Human-in-the-loop for high-stakes decisions. For every client, we define which decisions the agent handles autonomously and which require human approval. That boundary is explicit, documented, and reviewed quarterly as capabilities evolve.

Escalation paths defined before deployment. Before any agent goes live, we define: what happens when the agent encounters a case it was not designed for? Who gets notified? What is the fallback process? How fast does the escalation need to happen? These questions are answered in the architecture phase, not discovered during the first incident.

This is the operational layer that turns a one-day build into a multi-year asset. It is the difference between the 14% of businesses that have embedded AI into operations and the 79% that tried it, saw positive results, and could not make it stick.

The Real Cost of the Build-It-Yourself Approach

The Forbes article frames agent building as accessible, and it is. A business owner can build a working agent in a day for the cost of an AI subscription — $20 to $200 per month depending on the platform.

But that framing accounts for the build cost and ignores the operate cost. Here is what the operational layer actually costs in time when you do it yourself:

  • Knowledge base maintenance: 2-4 hours per month per agent, minimum
  • Monitoring review: 1-2 hours per week to review agent decisions and flag anomalies
  • Failure model updates: 2-3 hours per quarter to reassess how the agent fails as capabilities change
  • Governance documentation: 4-8 hours initially, plus updates when scope changes
  • Incident response: Unpredictable, but the average agent incident in our experience takes 3-6 hours to diagnose and resolve

For a single Level 2 agent, that is manageable. For a Level 4 agent, it is a part-time job. For a multi-agent system, it is a full-time role.

The honest question for any business owner is not "can I build this?" but "do I want to spend my time maintaining AI infrastructure, or do I want to spend it running my business?" Both answers are legitimate. The mistake is not asking the question.

FAQ

Q: Should I start with the agent that Forbes recommends or pick my own use case?

A: Pick the use case that solves your biggest operational pain point — not the one that sounds most impressive. The Forbes list is organized by autonomy level, which is helpful, but the right first agent for your business depends on where you are losing the most time or making the most errors today. Start there, at the lowest autonomy level that addresses the problem, and expand from there.

Q: How do I know when my agent is ready to move to a higher autonomy level?

A: You need data, not intuition. Run the agent at its current level for at least 60 days. Review the logs. Count the errors. Categorize them — were they knowledge gaps, scope violations, edge cases, or genuine reasoning failures? If the error rate on a specific task type is below your tolerance threshold and you can articulate the decision boundaries for the next level, you are ready. If you cannot articulate the boundaries, you are not.

Q: What is the most common mistake businesses make when deploying AI agents?

A: Autonomy drift — the gradual, unplanned expansion of an agent's capabilities and authority without corresponding governance. The agent starts answering questions, then someone connects it to the CRM, then it is sending emails, and nobody decided that was the plan. Every capability expansion should be a deliberate decision with documented boundaries, not an incremental feature addition.

Q: Is it better to build this myself from scratch or use a platform designed for it?

A: It depends on how much of the operational layer you want to build versus configure. A general-purpose DIY stack means building governance, monitoring, and memory infrastructure yourself. A platform purpose-built for agent operations — like Associates AI Teammates — gives you that operational layer out of the box: permission scoping, escalation paths, governed memory, and audit trails are already there. You configure it for your business instead of engineering it from zero. Either way, someone on your team needs to own the ongoing governance — the platform makes that job dramatically smaller, it doesn't eliminate it.

Q: What happens if I deploy an agent without the governance layer?

A: In most cases, it works well for 30 to 90 days and then breaks in a way that is expensive or embarrassing enough to kill the initiative entirely. The agent does not degrade gracefully — it works until it does not. Without monitoring, you will not see the failure coming. Without an escalation path, you will not catch it quickly. Without documentation of what the agent is supposed to do, you will not be able to diagnose what went wrong. This is the pattern behind the 40% failure rate Deloitte projects.

The Bottom Line

Forbes is right. You can build an AI agent in a day. The platforms are good, the tools are accessible, and the results at Level 2 and 3 are genuinely impressive for any business willing to try.

But building the agent is the beginning, not the end. The 79-point gap between "93% see positive results" and "14% have embedded AI into operations" is not a technology gap. It is an operational maturity gap. The businesses that close it are the ones that treat agent deployment the way they treat hiring — not as a one-time event, but as an ongoing management responsibility.

Choose your autonomy level deliberately. Define your decision boundaries before you deploy. Build your monitoring before you build your automation. Review your agent's performance with the same rigor you would apply to a new employee's 90-day review. And expand scope based on evidence, not enthusiasm.

That is how you get from a one-day build to a multi-year competitive advantage. Not by moving fast. By moving right.

Associates AI Teammates give small and mid-size businesses the operational layer — governance, memory, permissions, escalation — as a self-serve platform, so the Teammates you configure compound value from month one instead of drifting into the failure modes above. If you want to see what production-grade agent operations look like for your business, start a free trial.

MH

Written by

Mike Harrison

Founder, Associates AI

Mike is a self-taught technologist who has spent his career proving that unconventional thinking produces the most powerful solutions. He built Associates AI on the belief that every business — regardless of size — deserves AI that actually works for them: custom-built, fully managed, and getting smarter over time. When he's not building agent systems, he's finding the outside-of-the-box answer to problems that have existed for generations.

More from the blog

Ready to put AI to work for your business?

Start the free trial. Hire your first Teammate in minutes and put it to work on what you're reading about.

Start Free Trial