What 'Production-Ready' Actually Means for an OpenClaw Deployment
'Production-ready' is used to mean a lot of things. For an OpenClaw deployment managing real client communications, connected to real business systems, and expected to be available 24/7, it means something specific. Here's the full picture.
What "Production" Actually Requires
"Production-ready" gets used to mean a lot of things in the AI world. It is used to mean "we tested it and it mostly works." It is used to mean "we deployed it to a real client." It is used to mean "we added some safety instructions."
None of those are production-ready. For an OpenClaw agent managing real communications, connected to live business systems, and expected to run continuously without babysitting — production-ready means something specific and verifiable. This post is that specification.
There are eight areas a production deployment must address. Each one gets its own section. Each section explains what it means and why it matters. This is a complete pre-deployment checklist for any serious OpenClaw deployment.
1. Infrastructure
What it means: The agent runs on cloud compute — not a laptop, not a personal machine. The instance lives in a private subnet with no public IP address and no inbound firewall rules. An Auto Scaling Group manages instance health and replaces failed instances automatically. Deployment is zero-downtime, with lifecycle hooks that hold new instances until the gateway is healthy before traffic is routed to them. All storage is encrypted at rest.
Why it matters: A personal machine goes to sleep, gets rebooted, goes offline when the owner travels. Cloud infrastructure is designed to stay up. The ASG means the system self-heals when something breaks. Zero-downtime deploys mean updates do not create service gaps. Private subnet means the instance does not exist on the public internet — there is nothing to port-scan, nothing to brute-force.
The infrastructure cost is not the barrier. An EC2 t4g.small runs under $15/month. The full stack including EFS, Secrets Manager, and CloudWatch adds another $10–$30 depending on usage. If a client's business depends on the agent, the infrastructure to run it reliably costs less than a single hour of downtime.
How to verify it is ready: Confirm that the ASG health check is configured against the gateway health endpoint (not the EC2 instance health), that lifecycle hooks are in place for both launch and termination, that no instance in the ASG has a public IP address, and that a simulated instance failure triggers replacement within the expected time window. These are not assumptions — they must be tested before go-live.
2. Soul Document Protection
What it means: Soul documents — SOUL.md, IDENTITY.md, AGENTS.md, and any other behavioral configuration files — are mounted read-only on every instance via AWS EFS. The workspace directory (read-only) is physically separate from the runtime directory (read-write). Soul documents are version-controlled in a git repository and deployed through a standard deployment process. AWS Backup runs daily on the EFS volume.
Why it matters: Soul documents are the behavioral contract of the agent. If they can be modified — by prompt injection, accidental write, or any other means — every safety control written into them is gone. Read-only mounting makes modification impossible at the filesystem level, regardless of what the agent's reasoning produces.
Anthropic's research found that explicit safety instructions reduced harmful AI behavior from 96% to 37% in controlled conditions. Instructions help. They are not sufficient on their own. Read-only soul documents ensure that the instructions cannot be changed without a deliberate deployment operation reviewed by a human. The agent can read its instructions. It cannot write to them, at any time, under any circumstances.
How to verify it is ready: Attempt to write to the workspace directory from the instance after deployment and verify that the write fails with a read-only filesystem error. Verify that the git repository for soul documents requires pull request review before merge. Verify that daily EFS backups are running and that a test restore has been performed.
For a full explanation of the implementation, see the post on read-only soul documents.
3. Credentials
What it means: All credentials — API keys, tokens, configuration — live in AWS Secrets Manager, not in config files or environment variables. Each instance has an IAM role authorized to read only its own specific secret. Credentials are fetched at boot time and loaded into memory. They never touch disk as plaintext.
Third-party integrations (Gmail, Slack, HubSpot) use Composio as an authentication layer where available. The agent holds a Composio API key, not the underlying OAuth tokens for the actual service. Every integration is connected through a dedicated bot or service account — never a personal user account. IMDSv2 is enforced (http_tokens = required) to block the class of SSRF attacks that harvest IAM credentials from the instance metadata endpoint.
The monitoring system should have its own isolated Secrets Manager secret. The alerting infrastructure cannot access the main config secret. This compartmentalization means a compromise of the monitoring system does not expose client credentials.
Why it matters: Credentials in config files can be exfiltrated if the agent is compromised. Credentials on personal accounts have no audit trail and no clean revocation path. Credentials on over-provisioned accounts give a compromised agent more damage potential than necessary. The credential architecture limits blast radius at every level.
How to verify it is ready: Verify that the IAM role policy allows reads on only the specific secret ARN for that deployment. Verify that no plaintext credentials exist on disk using a file scan. Verify that IMDSv2 is enforced by attempting an IMDSv1-style metadata query and confirming it fails. Verify that each integration is connected through a named bot account with documented permissions.
Details in the post on credentials done right.
4. Network Security
What it means: The instance has no public IP address. No inbound security group rules allow connections from the internet. Outbound connections are restricted to pre-approved endpoints — the AI model provider, Composio, specific integration APIs. Administrative access is through Tailscale, not exposed ports.
Why it matters: An instance with no inbound connections cannot be attacked from the internet. A compromised agent that attempts to exfiltrate data to an arbitrary URL fails because the outbound security group does not permit it. Tailscale provides encrypted, authenticated administrative access without creating any inbound attack surface.
Network security is the layer that limits what a successfully injected agent can actually do. Even if injection succeeds, the blast radius is bounded by what the agent can reach. The outbound allowlist is a concrete list of approved destinations — anything not on the list is unreachable.
How to verify it is ready: Verify the security group rules in the AWS console — no inbound rules with source 0.0.0.0/0 or ::/0. Verify the outbound rules contain only specific approved destinations. Verify Tailscale connectivity from an authorized machine and confirm that SSH is not accessible from the public internet. Attempt a connection to an unapproved external endpoint from the instance and confirm it times out.
5. Monitoring and Alerting
What it means: Health checks run at the process level — not just instance-level ping, but a gateway health endpoint that the ASG monitors. CloudWatch captures full session logs and gateway logs. Alerts are configured to fire when health checks fail, with notifications routed to the on-call channel before the client notices. Log retention is configured to meet the client's auditability requirements.
Why it matters: Without monitoring, you find out the agent is down when a client calls. Without logs, when something goes wrong there is nothing to investigate. Without alerts, problems sit undetected for however long it takes someone to notice.
Full monitoring means problems are known before clients are aware of them. It means when a client asks "what did your agent do last Tuesday at 2 PM" there is an exact answer. It means every operational incident has an investigation path.
Route health alerts through a Lambda function to your notification channel. The notification should fire within seconds of a failed health check. The instance is typically replaced by the ASG before the alert is finished being read. The Lambda should have its own Secrets Manager secret, isolated from the main deployment config.
How to verify it is ready: Simulate a gateway failure by stopping the gateway process on the instance and confirm that a CloudWatch alarm fires, that the alert reaches the notification channel within two minutes, and that the ASG replaces the instance within the expected window. Verify that session logs appear in CloudWatch for a test interaction. Verify log retention settings match the client's requirements.
6. Backup and Recovery
What it means: AWS Backup runs daily on the EFS volume containing soul documents and runtime data. Backups are retained according to a defined policy. Recovery time and recovery point objectives are defined and tested before the deployment goes live.
Why it matters: The question to answer before deployment — not after an incident — is: "If this instance dies right now, how long until we are back and what do we lose?" If you cannot answer that question, the deployment is not production-ready.
For soul documents that are version-controlled in git, the EFS backup is a secondary recovery mechanism — the canonical version is in source control. For runtime data, the backup retention policy determines the maximum data loss. Define both before going live.
How to verify it is ready: Verify that AWS Backup jobs are configured and running by checking the backup vault for recent completed jobs. Perform a test restore to a separate EFS volume and verify that the restored files match the source. Document the recovery procedure — who takes what steps in what order — and confirm it can be executed by someone other than the person who set up the deployment.
7. Skills Testing
What it means: Every custom skill has a set of automated evaluations written in promptfoo or equivalent. The evaluations cover normal behavior and edge cases for that skill. CI runs the evaluations on every pull request that touches a skill. Regressions block the merge before they reach production. The full eval suite also runs on a scheduled basis to catch regressions from model updates.
Why it matters: A skill that worked correctly with one model version may behave differently after a model update. Without evaluations, regressions surface when a client notices the agent doing something wrong. With evaluations, regressions surface in CI — before the change ships. And with scheduled evals, model changes underneath an unchanged skill get caught before clients notice.
Evaluations also define what "correct" means for a skill. Without a written definition, there is no way to know whether a skill is working as intended or has silently degraded. The eval suite is simultaneously a testing mechanism and a specification document.
How to verify it is ready: Every skill in the deployment has at least three test cases: a normal case, a case where the agent should decline or escalate, and one edge case identified during development. All test cases pass. The CI pipeline is configured to run evals on pull requests to the skills directory and block merges on failure. A scheduled eval run is configured and has completed at least one successful run.
See the full post on testing OpenClaw skills before they reach production.
8. Human Approval Gates
What it means: High-stakes and irreversible actions require explicit human approval before the agent executes them. The definition of "high-stakes" is specific to each client and is documented before the deployment goes live.
Examples: any outbound email to more than a defined number of recipients, any deletion of records, any external purchase, any communication that commits the client to a specific course of action.
Why it matters: A prompt-injected agent, or an agent that has misunderstood its instructions, cannot complete an irreversible action alone if human approval is required. The attempt surfaces in the approval queue. A human reviews it. The irreversible mistake does not happen.
The gate has to be specific to be useful. "High-stakes actions require approval" is not implementable. "Any deletion of CRM records requires approval" is. The approval queue is also a detection mechanism — an unusual approval request is often the first indication that something unexpected is happening.
How to verify it is ready: Document the specific approval gates for the client deployment before go-live. Test each gate by having the agent attempt an action that should require approval and verify that the approval request surfaces correctly. Verify that the agent cannot proceed with the action without receiving approval.
The Full Picture
These eight areas are not independent. They work together. Read-only soul documents protect against one class of injection attack; scoped permissions limit damage within authorized scope; human approval gates catch actions that slip through both. Infrastructure reliability and monitoring together ensure the deployment is available and observable. Credentials and network security limit blast radius at two different layers.
A deployment that checks all eight boxes is production-ready. A deployment that checks five is not — because the three missing ones are the ones that matter in the incident you have not had yet. The checklist is not a menu. Every item is required.
Associates AI uses this checklist for every client deployment — infrastructure, soul document protection, credentials, network security, monitoring, backup, skill evals, and approval gates are all in place before go-live, not after the first incident. If you're evaluating OpenClaw for your business, book a call.
FAQ
Q: How long does it take to set up a production OpenClaw deployment? A: A properly configured deployment — including infrastructure, soul document protection, credentials management, monitoring, and backup — takes one to two weeks for a new client. Most of that time is gathering client requirements, setting up integrations, writing evaluations for any custom skills, and verifying each checklist item before go-live. The infrastructure itself can be provisioned in hours. The time investment is in doing it correctly, not in technical complexity.
Q: Can I run OpenClaw in production without AWS? A: The controls described here — private subnet, Auto Scaling Group, Secrets Manager, EFS, CloudWatch — are AWS-specific implementations of more general principles: private networking, auto-healing compute, secrets management, shared filesystem mounts, centralized logging. The equivalent controls exist on GCP and Azure. The specific services are different; the requirements are the same. The checklist above applies regardless of cloud provider.
Q: What's the difference between a test deployment and a production deployment? A: A test deployment is used to verify that the agent works as intended. It may run on personal hardware, may use developer credentials, may lack monitoring and backup. It is for iteration, not for client use. A production deployment handles real client workflows, real data, and real integrations. It meets every item in the checklist above. The distinction matters because production failure modes — outages, security incidents, data loss — have real consequences that test environments do not.
Q: Which item on this checklist do most teams skip first? A: Skills testing. It is the hardest to see the value of until a regression happens, and it requires the most upfront investment in writing test cases. Most teams know they should be doing it and keep meaning to set it up. The problem is that every week without evals is another week where a model update or skill change could silently degrade behavior. Set it up at deployment time, not later.
Q: Do we need all eight items even for a simple deployment? A: Yes. The complexity of the checklist scales with what it takes to meet each item, not with whether each item applies. A simple deployment with one skill and one integration still needs read-only soul documents, credentials management, and monitoring. The work to implement those controls on a simple deployment is lower than on a complex one — but the controls are equally necessary. The risk of skipping them does not scale down with deployment complexity.
Want to go deeper?
Ready to put AI to work for your business?
Book a free discovery call. We'll show you exactly what an AI agent can handle for your business.
Book a Discovery Call