From Chatbots to Agents: When AI Starts Doing the Work

Most organisations start with an internal chatbot: "Ask a question, get an answer." It's useful, but the ROI is capped. The real value shows up when AI stops just answering and starts executing workflows—creating tickets, fetching records, drafting letters, updating systems, and moving work forward.

That's the shift to agents: systems that can call tools, search databases, and take actions. It's also where safety, governance, and architecture start to matter.

Key Takeaways

Agents are AI systems that plan and use tools to take actions—not just generate text

The ROI comes from process throughput; the risk comes from permissions and unintended actions

Most regulated teams should start with agents that execute read-only tools, then move to approvals for write actions

Start Scope

workflow at a time

Action Tiers

read → approve → autonomous

Human

Default Model

for high‑stakes approvals

What is an agent?

An agent is an AI system that can plan steps and act by using tools (APIs, databases, internal systems). Instead of returning an answer, it can execute a workflow: retrieve context, make decisions, and trigger operations.

The ROI shift: from answers to throughput

Chatbots tend to deliver marginal gains (deflection, faster lookup). Agents unlock structural gains by reducing the number of human steps in a process.

Casework triage: classify inbound, suggest next actions, prefill forms, request missing info.
Document workflows: draft letters, extract facts, generate summaries, attach citations, route for approval.
Ops automation: create tickets, update CRM, trigger handoffs, populate dashboards.

What changes in regulated environments

The moment an AI system can take actions, you need explicit permission boundaries, audit trails, and approval gates. This isn't a "compliance tax"—it's what keeps automation safe and scalable.

Tool use (function calling) in plain English

Tool use is how an agent interacts with the real world. You define tools (e.g. search_cases, create_ticket, draft_letter) with strict schemas. The model decides which tool to call, you execute it, and return the result. This creates a controlled interface between AI and systems of record.

The key is that you control what tools exist, what they can access, and which actions require approvals.

Agent maturity levels

Option	When to Use	Trade-offs
Chatbot	Knowledge lookup, FAQs, low-stakes internal support	Low ROI ceiling; hallucinations erode trust without citations
Copilot	Drafting, summarisation, extraction with a human reviewer	Still dependent on humans; limited throughput improvement
Agent (read-only tools)	Search, retrieval, investigation workflows where actions are safe	Requires robust tool design and logging; still can loop
Agent (write with approvals)	Ticket creation, record updates, external comms drafts	Needs approval UX + audit trail; requires eval gates and rollback plans
Autonomous agent	Only after extensive monitoring; narrow domains; clear rollback capability	Highest risk; requires mature governance and incident response

The failure modes (and how to design around them)

1. Permission creep

Agents tend to accrete tools. Without a clear permission model, you'll either block adoption (too strict) or create incidents (too loose). Define tiers early: read-only tools, write tools with approval, and autonomous actions (rare).

2. Tool output prompt injection

Tools can return untrusted text (web pages, documents, user input). Agents must treat tool outputs as data, not instructions. Sanitise, constrain, and isolate.

3. Infinite loops and runaway cost

Without strict stop conditions, agents can keep reasoning and calling tools. Set budgets: max tool calls, timeouts, and fallbacks to human escalation.

A safe way to ship agents (30/60/90)

Days 0–30: ship a copilot or read-only agent for one workflow; instrument logs and evals.

Watch out

If you skip evals, you won’t know whether you’re improving or drifting.

Days 31–60: add approvals for write actions + audit trails + rollback paths.

Watch out

Approvals without good UX create bottlenecks; design the review flow early.

Days 61–90: expand to adjacent workflows and introduce cost controls (routing, caching).

Watch out

Scaling without cost controls turns a successful pilot into an expensive product.

Understanding Generative Search (RAG) – Ground agents in your data
The Multi‑Model Strategy – Route cheap‑first, escalate when needed
Beyond "Vibe Checking": Evaluating AI at Scale – The measurement layer that makes agents safe

Conclusion

Agents are where AI starts moving work, not just words. That's why the ROI is higher—and why the architecture must include permissions, approvals, audit trails, and eval gates from day one.

MLX includes workflow building blocks for tool-based automation, with governance features designed for real organisations.

Explore MLX →

Last updated: December 2025

From Chatbots to Agents: When AI Starts Doing the Work

From Chatbots to Agents: When AI Starts Doing the Work

Start Scope

Action Tiers

Default Model

What is an agent?

The ROI shift: from answers to throughput

Tool use (function calling) in plain English

The failure modes (and how to design around them)

1. Permission creep

2. Tool output prompt injection

3. Infinite loops and runaway cost

Related Articles

Conclusion

From Chatbots to Agents: When AI Starts Doing the Work

Start Scope

Action Tiers

Default Model

What is an agent?

The ROI shift: from answers to throughput

Tool use (function calling) in plain English

The failure modes (and how to design around them)

1. Permission creep

2. Tool output prompt injection

3. Infinite loops and runaway cost

Related Articles

Conclusion