Most organisations start with an internal chatbot: "Ask a question, get an answer." It's useful, but the ROI is capped. The real value shows up when AI stops just answering and starts executing workflows—creating tickets, fetching records, drafting letters, updating systems, and moving work forward.
That's the shift to agents: systems that can call tools, search databases, and take actions. It's also where safety, governance, and architecture start to matter.
workflow at a time
read → approve → autonomous
for high‑stakes approvals
An agent is an AI system that can plan steps and act by using tools (APIs, databases, internal systems). Instead of returning an answer, it can execute a workflow: retrieve context, make decisions, and trigger operations.
Chatbots tend to deliver marginal gains (deflection, faster lookup). Agents unlock structural gains by reducing the number of human steps in a process.
Tool use is how an agent interacts with the real world. You define tools (e.g. search_cases, create_ticket, draft_letter) with strict schemas. The model decides which tool to call, you execute it, and return the result. This creates a controlled interface between AI and systems of record.
The key is that you control what tools exist, what they can access, and which actions require approvals.
| Option | When to Use | Trade-offs |
|---|---|---|
| Chatbot | Knowledge lookup, FAQs, low-stakes internal support | Low ROI ceiling; hallucinations erode trust without citations |
| Copilot | Drafting, summarisation, extraction with a human reviewer | Still dependent on humans; limited throughput improvement |
| Agent (read-only tools) | Search, retrieval, investigation workflows where actions are safe | Requires robust tool design and logging; still can loop |
| Agent (write with approvals) | Ticket creation, record updates, external comms drafts | Needs approval UX + audit trail; requires eval gates and rollback plans |
| Autonomous agent | Only after extensive monitoring; narrow domains; clear rollback capability | Highest risk; requires mature governance and incident response |
Agents tend to accrete tools. Without a clear permission model, you'll either block adoption (too strict) or create incidents (too loose). Define tiers early: read-only tools, write tools with approval, and autonomous actions (rare).
Tools can return untrusted text (web pages, documents, user input). Agents must treat tool outputs as data, not instructions. Sanitise, constrain, and isolate.
Without strict stop conditions, agents can keep reasoning and calling tools. Set budgets: max tool calls, timeouts, and fallbacks to human escalation.
Days 0–30: ship a copilot or read-only agent for one workflow; instrument logs and evals.
If you skip evals, you won’t know whether you’re improving or drifting.
Days 31–60: add approvals for write actions + audit trails + rollback paths.
Approvals without good UX create bottlenecks; design the review flow early.
Days 61–90: expand to adjacent workflows and introduce cost controls (routing, caching).
Scaling without cost controls turns a successful pilot into an expensive product.
Agents are where AI starts moving work, not just words. That's why the ROI is higher—and why the architecture must include permissions, approvals, audit trails, and eval gates from day one.
MLX includes workflow building blocks for tool-based automation, with governance features designed for real organisations.
Explore MLX →