OpenClaw is NVIDIA’s open-source framework for building AI agents. You point an agent at a job: invoice reconciliation, competitive research, prospect enrichment. It reads files, calls APIs, navigates systems, and makes judgment calls. It runs on your hardware. Your data stays in your building.
Jensen Huang put it in the same category as HTML and Linux at GTC. The comparison describes something that becomes infrastructure rather than a product. Most companies sense this matters. The question is what to do about it.
Agent deployment follows a pattern. Four phases, roughly twelve weeks, from initial audit to agents in production.
Phase 1: Discovery, Weeks 1 through 3
Pricing logic in three places: a spreadsheet on the sales director’s laptop, a PDF rate card from two years ago, and Mike’s head. When a new quote comes in, you check the spreadsheet, glance at the PDF if the product line is unusual, and ask Mike, because Mike remembers the exceptions both documents miss. Vacation weeks are a problem. Mike being out sick is worse.
Every company has a version of Mike. Someone whose institutional knowledge lives outside every system, and whose departure would leave a gap that takes months to fill. The first three weeks are about finding every point where the business runs on undocumented knowledge.
Talk to people. Watch them work. The documentation is always incomplete. The usual finds: routing preferences in email threads, credit terms that exist only in conversation, rate tables in PDFs from eighteen months ago, exception-handling rules that live as muscle memory. This knowledge runs the business, but it lives where agents can’t reach.
The task audit runs in parallel. Walk operations for a week, write down every task that consumes time without requiring judgment. The kind of task you’re looking for: someone pulls data from four external systems every morning, cross-references it against internal records, flags discrepancies. Two hours a day, every day, requiring diligence but zero expertise. Other versions: report compilation from scattered data, duplicate data entry across systems, inbox monitoring for time-sensitive documents.
Two questions for every task. First: what happens if an agent gets this wrong? If someone catches it in normal review, strong candidate. If a wrong invoice goes to a client, later-phase project. Second: how many hours a week does this consume?
Two deliverables. A knowledge map: where critical business logic lives, what format, who owns it. And a ranked task list: time consumed versus error risk. These tell you what to build and in what order.
The consistent surprise in discovery: how much of the business runs on tribal knowledge everyone assumed was documented. When you trace how decisions get made, the documented path and the real path diverge. This gap is always wider than expected, and the documentation work alone clarifies processes that have run on autopilot for years.
Phase 2: Architecture, Weeks 3 through 5
The first real argument usually happens around week four, when the team sees the task list and wants to automate all of it at once. The discipline is deciding what to build first, then designing a system that expands without tearing it down.
Four decisions define this phase: which tasks get agents, what autonomy level each agent operates at, how agents access business context, and what boundaries contain them.
Autonomy design requires the most thought. Three tiers, each defined by what happens when the agent makes a mistake. Autonomous agents handle tasks where errors are minor and self-correcting: file organization, data compilation, status monitoring. Supervised agents produce output a person reviews before it ships: outreach drafts, financial summaries, customer communications. Advisory agents surface analysis for decisions that stay with people: pricing strategy, vendor selection, resource allocation. Every task goes into a tier based on one question: what’s the cost of the agent being wrong?
Knowledge structuring makes the discovery inventory consumable by agents. The work is closer to process documentation than data engineering. Consider a company whose vendor approval criteria live in a two-year-old spreadsheet and the purchasing manager’s sense of which suppliers deliver reliably at which volumes. Structuring that means sitting down with the purchasing manager, writing the approval logic as explicit rules, and connecting those rules to live data from the accounting system. Skip it and agents make decisions on the same incomplete information new hires get.
Security architecture goes in now, during design, not after deployment. NemoClaw provides the framework: process-level sandboxing, privacy routing to prevent data leakage, and policy enforcement with boundaries that hold against any prompt. You define what each agent can access, modify, and send externally.
For smaller companies, security is simpler. Fewer systems to fence off, smaller surface area, faster iterations on access policies. When your tech stack is a CRM, a shared drive, and three SaaS tools, defining agent permissions takes days rather than months.
The deliverable is an agent blueprint: task assignments with autonomy levels, data access maps, and a security policy defining boundaries and escalation rules.
What goes wrong here is scope creep. Once people see what agents could do, they want to automate everything. A discovery list with twelve tasks becomes a wish list of thirty. The discipline is starting narrow, proving the system on two or three tasks, and expanding from confidence rather than enthusiasm.
Phase 3: Deployment, Weeks 5 through 9
The first agent goes live against one task. Say it’s that daily reconciliation job: the agent logs into four external systems, pulls the data, cross-references it against internal records, and flags discrepancies. For the first week, it runs alongside the human process so you can compare outputs side by side.
Before launch, define success in specific terms: time saved per day, error rate compared to the human baseline, output quality against a rubric. Without these, the team debates whether the agent is “working” based on feelings. A person who liked the old process will always find reasons the agent falls short. A number on a whiteboard settles the question.
The first week surfaces edge cases the architecture missed. An external system changes its interface, a customer record has a field the agent misreads, a rate table has a footnote that modifies the numbers above it. Everyone skipped the footnote in discovery because it seemed obvious. This is expected. The first deployment’s value is in exposing gaps while stakes are low and a human checks every output.
The iteration cadence is weekly reviews, not daily panic. Log failures, categorize by type (data gap, edge case, scope issue, architecture miss), fix the cause, redeploy. This cycle over four or five weeks turns a prototype into a production system. Most fixes are small: a missing data source, an incomplete decision rule, a missed format conversion. The architecture rarely needs fundamental changes if discovery was thorough.
Once the first agent proves out, you start adding. A second agent for a related task, then connections between them. We run our competitive intelligence this way: one agent monitors sources, a second analyzes changes against historical patterns, a third produces the morning briefing. Each agent is simple. The capability comes from composition, and you can test each one independently before connecting it.
Multi-agent systems raise coordination questions. What happens when the upstream agent’s output is late or malformed? When one agent identifies something that should change another’s priorities? These are workflow design problems, solved the same way you’d solve them with people: clear handoff rules, defined escalation paths, regular review of whether the workflow still matches reality.
The deliverable: agents in production with a performance baseline, an iteration log, and a tested expansion path.
Phase 4: Integration, Weeks 9 through 12
Eighty percent of AI budgets go to tools, 20 percent to helping people adapt. Research consistently shows 70 percent of the value comes from the people side. This phase is where that value lives.
A salesperson’s day, before and after. Before: mornings researching prospects from scratch, cross-referencing contacts across LinkedIn and the CRM, writing outreach one at a time. Afternoons updating CRM fields, logging activity. The work that requires actual judgment, reading a prospect and deciding the approach, might fill a third of the day.
After: mornings open with agent-prepared briefing packets. Company background, recent news, decision-makers, suggested angles based on the prospect’s industry and activity. The rep reviews, refines based on their own read, and spends the day on actual conversations. Agents update the CRM from call notes. Same person, more ground covered, because agents handle prep before the rep sits down.
The operations role shifts differently. The person who spent their days manually pulling data across four systems now designs the rules agents follow. They decide what the system does, rather than doing what the system should be doing.
Introducing agents without resistance: patience and honesty. Run the agent alongside the existing process for two weeks. Let people see what it produces, where it’s accurate, where it falls short. Don’t announce a replacement. Let someone watch the agent do a task they’ve done hundreds of times and form their own opinion. The people closest to the work have the best feedback on where the agent falls short, and they give it freely when they feel consulted rather than replaced.
The operating cadence determines whether the system improves or degrades. Weekly performance reviews catch drift. Monthly adjustments add tasks and refine existing agents. Quarterly reviews ask whether agents are still pointed at the right work. Active management compounds returns.
The deliverable: a team working with agents daily, with review cadences and escalation paths.
When it works: the salesperson calls it “my research briefing.” The operations person describes the monitoring system like a colleague with specific strengths and known limitations.
What Agents Look Like in Practice
The framework above describes how to deploy agents. What follows is what they actually do once they’re running. These patterns recur across industries and company sizes.
Financial reconciliation. An agent pulls invoices from vendor portals every morning, matches line items against purchase orders in the accounting system, and flags discrepancies. Mismatched amounts, missing PO numbers, duplicate charges. The AP clerk who used to spend the first two hours of every day on this now reviews only the exceptions the agent flags, usually a handful. The agent also catches patterns the human process missed: a vendor that’s been slowly increasing unit prices across invoices, a recurring charge for a service that was cancelled three months ago.
Contract screening. An agent reads incoming contracts against a checklist of terms the company requires. Missing indemnification clauses, non-standard liability caps, payment terms outside policy, auto-renewal language buried in section 14. It produces a summary of flagged items with the relevant clause quoted and the specific policy it violates. Legal review starts at the flags rather than page one. A 40-page MSA that used to take an hour of attorney time to triage takes ten minutes.
Operations reporting. Every Friday, an agent pulls data from the CRM, accounting system, and project tracker. It assembles the weekly report in the format leadership expects, populates the tables, writes the summary narrative, and flags metrics that moved outside normal ranges. It knows that revenue by region goes in the first section, that the CFO wants margin trends highlighted, and that any deal over $50K in the pipeline gets its own line item. Someone reviews the draft and sends it. What used to take half a day takes fifteen minutes.
Customer onboarding. A signed contract triggers an agent that extracts the client’s information, creates accounts in the billing and project management systems, generates a welcome packet customized to the service tier, schedules the kickoff call based on the account manager’s availability, and sends the pre-meeting questionnaire. The account manager reviews the packet, personalizes one or two touchpoints, and shows up to the kickoff already prepared. The client experiences a seamless first week. The account manager spent twenty minutes on what used to take two hours of setup across four different tools.
Vendor monitoring. An agent tracks key suppliers across public sources: SEC filings, news coverage, Glassdoor reviews, job postings, credit rating changes. When a critical supplier posts fifteen engineering job listings in a week, that’s a signal. When their CFO departs, that’s a signal. When they start hiring bankruptcy attorneys, that’s a different kind of signal. The agent scores each event by potential supply chain impact and produces a weekly risk summary. Procurement sees problems developing weeks before they’d surface through normal channels.
Competitive intelligence. We run this one internally. One agent monitors competitor websites, pricing pages, press releases, and job postings on a rolling schedule. A second compares this week’s findings to last week’s and identifies what changed. A third produces a structured morning briefing: what moved, what it likely means, what to watch next. The person who used to spend several hours a week assembling this picture manually now spends twenty minutes reviewing it, and catches changes they would have missed entirely.
Each of these follows the same architecture: a task that consumes hours of skilled time, structured into steps an agent can execute, with a human reviewing the output at the end. The agent handles volume and consistency. The human handles judgment and exceptions. The value compounds because every week the agent runs is a week someone spent on higher-order work instead.
Four phases, twelve weeks, from scattered knowledge to agents in production. The details change every time, but the structure holds.

