AI is already changing how work gets done, but the real story is less about magical robots and more about messy workflows, human habits, and small design choices that quietly compound into big impact over time.Wait, what exactly are “agents”?
Think of AI agents as digital colleagues that can take a goal (“review these contracts”, “summarize these claims”, “check these customers for risk”) and then break it into steps, call tools, and loop until the job is done. They are not just chatbots that answer questions; they are more like junior associates who can read, extract, cross-check, and propose next actions.
Where things get interesting is not in what one agent can do in a demo, but in what hundreds of these agents do inside real workflows that span legal, risk, operations, and customer service.
The first surprise: It’s not about the agent
The companies seeing value are the ones obsessing over workflows, not features. When they map the actual steps people take, where information gets stuck, and where judgment really matters, they start to deploy different tools—rules engines, classic analytics, genAI, and agents—like pieces in a relay race.
In insurance and legal services, teams that redesigned end‑to‑end flows (instead of dropping an agent into a single step) found that agents worked best as orchestrators: pulling data, stitching together results from existing systems, and presenting them in a way that humans could quickly verify and act on.
The second surprise: Sometimes simpler is smarter
Leaders who assumed “everything needs an agent” are learning the hard way that some processes are too tightly governed, standardized, and regulated for a non‑deterministic tool to be useful. Investor onboarding or regulatory disclosures, for example, often run better on well‑designed rules and forms than on a creative model that occasionally improvises.
Where agents shine is in high‑variance work: pulling complex financial data together, checking compliance across many documents, or dealing with case‑by‑case nuance where humans used to spend hours aggregating and sanity‑checking information.The third surprise: “AI slop” is a people problem
In many organizations, the early story of agents goes like this: amazing demo, big promises, rollout… followed by eye‑rolls from the people who actually have to use the system. The outputs feel generic, overconfident, or just wrong in subtle ways, and trust evaporates.
Teams that avoid this trap treat agents less like tools and more like new hires: they write job descriptions, define what “good” looks like, build detailed evaluation sets, and continuously coach the system using real examples of desired and undesired behavior. Metrics like task success rate, retrieval accuracy, hallucination rate, and “LLM as judge” scoring become the equivalent of performance reviews.
The fourth surprise: Reuse beats reinvention
One quiet but powerful lesson from the last year: most enterprise workloads reuse the same building blocks—ingest, extract, search, analyze—over and over again. Yet many teams still spin up a brand‑new agent for every use case, creating a tangle of overlapping logic that is hard to maintain.
Organizations that step back and ask, “Which parts of this could be shared?” end up building reusable agents and components, backed by a central library of validated services, prompts, and patterns. That shift alone can wipe out a large chunk of nonessential engineering work and make it dramatically easier to scale from one pilot to dozens of workflows.
The fifth surprise: Humans don’t disappear, they move
The productive question is no longer “Will agents replace humans?” but “Which parts of this job are judgment, which are drudgery, and how do we redesign the mix?” In real deployments, people are still needed to oversee accuracy, handle edge cases, interpret risk, and sign off on decisions that carry legal or financial consequences.
What does change is the shape of the work. In legal dispute workflows, for instance, agents can assemble claims, organize key figures, and draft potential workplans, while lawyers focus on review, adjustment, and strategic decisions—backed by interfaces that highlight anomalies and let experts jump straight to the most critical passages.The sixth surprise: Observability is the real safety net
As organizations move from a handful of agents to hundreds, the challenge is no longer “Can we automate this?” but “When something goes wrong, can we see why?” Teams that log each step of an agentic workflow—inputs, intermediate decisions, tool calls, outputs—can quickly spot where a new data pattern or user behavior is causing failure.
In one document‑heavy service, detailed observability revealed that a drop in accuracy was not an AI issue at all, but the result of lower‑quality input data from a particular group of users; fixing the upstream formatting and parsing brought performance back up.
Sources
This article draws insights from the following research:McKinsey & Company:*One Year of Agentic AI: Six Lessons from the People Doing the Work
McKinsey & Company: Reimagining Life Science Enterprises with Agentic AI
McKinsey & Company: The Economic Potential of Generative AI: The Next Productivity Frontier
McKinsey & Company: Seizing the Agentic AI Advantage