Autopilot: Auto-Reply Safely at Scale
Written By Chad McGuire (Sparrow Intel)
Overview
Autopilot is the feature that lets Chirp AI send replies on its own, without an agent reviewing each one β within guardrails you control. For high-volume teams it's the difference between two people clearing 200 messages a day and two people clearing 2,000.
This lesson is the longest in the curriculum because Autopilot is the highest-leverage feature in Conversations and also the one with the most consequence if you set it up wrong. We're going to be careful.
What Autopilot actually does
When Autopilot is enabled, here's what happens on a new guest message:
- Chirp AI generates a reply suggestion (same as it always does)
- It assesses confidence, risk, and quality signals
- It checks the assessment against your Autopilot configuration
- If the assessment passes every check, the reply is sent automatically
- If anything fails a check, the suggestion sits in the inbox waiting for human review
That last point is critical: Autopilot's default failure mode is "ask a human." It doesn't make best-effort sends; it sends only when it's confident enough by your standards.
What you control
Every Autopilot decision uses these settings:
Autopilot is configured in Conversation Settings (alongside your message signature and conversation categories). Changes take effect on the next inbound post.
Sensible starting configuration
The configuration we recommend for a team that's never run Autopilot before:
- Enabled: yes
- Property groups: one small group of well-understood properties β not your entire portfolio
- Confidence threshold: 90
- Maximum risk level: Low
- Require all quality signals: on
- Category filter mode: Include selected
- Selected categories: Routine Inquiry, Compliment
- Auto-close after reply: on
- Auto-reply support threads: off
This is deliberately conservative. With these settings, Autopilot will only send on routine, low-risk, high-confidence messages β the safest 30β40% of an average inbox. Most teams expand from there over the first month.
Tuning Autopilot
After a couple of weeks running the conservative setup, you can start expanding. The dials, in roughly the order to consider relaxing them:
1. Add more property groups
If the first group went well, add the next. Watch for properties where Chirp AI suggestions consistently miss something β those signal that you need more knowledge snippets before adding the property to Autopilot.
2. Lower the confidence threshold
90 β 85 typically catches significantly more conversations without much added risk. We don't recommend going below 80 except in narrow, well-tested scopes.
3. Add categories
Check-in Logistics is usually the next category to enable after Routine Inquiry and Compliment β it's high-volume, mostly factual, and well-served by AI. Booking Modification, Issue Report, and Complaint should stay off Autopilot for most teams.
4. Allow Medium risk
Only do this once you've spent at least a month on Low-only and reviewed a sample of what was held for human review. Many of those will be appropriate Medium-risk responses you're now comfortable trusting.
5. Always-leave-off list
These should generally never be on Autopilot, regardless of how confident you get:
- Emergency category
- Refund or money related text
- Chargeback language
- Legal threats ("I'll take this to my credit card company," etc.)
Maintain a category exclusion or a keyword-based rule to keep these for humans.
Watching Autopilot in production
Autopilot's actions appear in the conversation thread the same way agent replies do β clearly attributable, fully auditable. You can:
- Filter the inbox to show only Autopilot-sent threads
- Read the full thread to confirm the reply was reasonable
- See the confidence assessment that justified the send
We strongly recommend a daily or weekly review for the first month: have a supervisor scan a sample of Autopilot-handled conversations and confirm they would have sent the same thing. Adjust thresholds and snippets based on what you see.
What to do when Autopilot makes a mistake
Reply with a correction message β the guest gets a corrected message from a human
Review the original Autopilot post and confidence assessment β figure out why the AI was confident enough to send
If the pattern is repeatable, adjust: raise the threshold, narrow the categories, or add a knowledge snippet that addresses the root cause
Track how often you need to correct Autopilot over time. If the correction rate climbs, tighten settings. If it stays low for weeks, you can carefully loosen. This loop is how teams arrive at confident, ambitious Autopilot configurations: not by setting them on day one, but by tuning until the system feels reliably calm.
Approval workflows (between fully manual and fully automatic)
If full Autopilot feels like too big a leap, there's a middle ground: Chirp AI generates the draft, but instead of sending automatically, the draft sits in an "Awaiting Approval" state on the conversation. Agents review and click Send (or Edit) β same as the standard suggestion flow, just with the queue surfaced more explicitly.
This is the right setup for teams that want AI to do all the writing but aren't ready to give up the human send.
A word on safety
Autopilot is powerful. A few principles to keep in mind as you configure it:
- Default to off, expand deliberately. Easier to grow trust than to recover from a bad batch of auto-sent replies.
- Don't enable Autopilot for properties you don't yet know well. New property additions should sit out of Autopilot for at least a few weeks.
- Treat sensitive topics as off-limits. Refunds, legal, emergencies β these belong in human hands regardless of how well the AI is doing.
Up next
Creating Tasks from Conversations β when a guest message reveals an operational issue, route it as a real task to your team or your ops platform.