Autopilot: Auto-Reply Safely at Scale

Written By Chad McGuire (Sparrow Intel)

Overview

Autopilot is the feature that lets Chirp AI send replies on its own, without an agent reviewing each one — within guardrails you control. For high-volume teams it's the difference between two people clearing 200 messages a day and two people clearing 2,000.

This lesson is the longest in the curriculum because Autopilot is the highest-leverage feature in Conversations and also the one with the most consequence if you set it up wrong. We're going to be careful.

What Autopilot actually does

When Autopilot is enabled, here's what happens on a new guest message:

Chirp AI generates a reply suggestion (same as it always does)
It assesses confidence, risk, and quality signals
It checks the assessment against your Autopilot configuration
If the assessment passes every check, the reply is sent automatically
If anything fails a check, the suggestion sits in the inbox waiting for human review

That last point is critical: Autopilot's default failure mode is "ask a human." It doesn't make best-effort sends; it sends only when it's confident enough by your standards.

What you control

Every Autopilot decision uses these settings:

Setting	What it does
Enable / Disable	Master switch
Property groups	Limit Autopilot to specific properties or groups (start narrow!)
Confidence threshold	Minimum confidence score (0–100) — drafts below this are not auto-sent
Maximum risk level	Highest risk level Autopilot is allowed to send (Low / Medium / High)
Require all quality signals	If on, all five quality signals must be green
Category filter mode	"Include selected" or "exclude selected" — pick which intent categories Autopilot is allowed to handle
Selected categories	Which categories are in scope per the filter mode above
Auto-close after reply	After Autopilot sends, mark the conversation closed
Auto-reply support threads	Allow Autopilot to send on Airbnb support thread conversations (off by default)

Autopilot is configured in Conversation Settings (alongside your message signature and conversation categories). Changes take effect on the next inbound post.

Sensible starting configuration

The configuration we recommend for a team that's never run Autopilot before:

Enabled: yes
Property groups: one small group of well-understood properties — not your entire portfolio
Confidence threshold: 90
Maximum risk level: Low
Require all quality signals: on
Category filter mode: Include selected
Selected categories: Routine Inquiry, Compliment
Auto-close after reply: on
Auto-reply support threads: off

This is deliberately conservative. With these settings, Autopilot will only send on routine, low-risk, high-confidence messages — the safest 30–40% of an average inbox. Most teams expand from there over the first month.

Tuning Autopilot

After a couple of weeks running the conservative setup, you can start expanding. The dials, in roughly the order to consider relaxing them:

1. Add more property groups

If the first group went well, add the next. Watch for properties where Chirp AI suggestions consistently miss something — those signal that you need more knowledge snippets before adding the property to Autopilot.

2. Lower the confidence threshold

90 → 85 typically catches significantly more conversations without much added risk. We don't recommend going below 80 except in narrow, well-tested scopes.

3. Add categories

Check-in Logistics is usually the next category to enable after Routine Inquiry and Compliment — it's high-volume, mostly factual, and well-served by AI. Booking Modification, Issue Report, and Complaint should stay off Autopilot for most teams.

4. Allow Medium risk

Only do this once you've spent at least a month on Low-only and reviewed a sample of what was held for human review. Many of those will be appropriate Medium-risk responses you're now comfortable trusting.

5. Always-leave-off list

These should generally never be on Autopilot, regardless of how confident you get:

Emergency category
Refund or money related text
Chargeback language
Legal threats ("I'll take this to my credit card company," etc.)

Maintain a category exclusion or a keyword-based rule to keep these for humans.

Watching Autopilot in production

Autopilot's actions appear in the conversation thread the same way agent replies do — clearly attributable, fully auditable. You can:

Filter the inbox to show only Autopilot-sent threads
Read the full thread to confirm the reply was reasonable
See the confidence assessment that justified the send

We strongly recommend a daily or weekly review for the first month: have a supervisor scan a sample of Autopilot-handled conversations and confirm they would have sent the same thing. Adjust thresholds and snippets based on what you see.

What to do when Autopilot makes a mistake

Reply with a correction message — the guest gets a corrected message from a human
Review the original Autopilot post and confidence assessment — figure out why the AI was confident enough to send
If the pattern is repeatable, adjust: raise the threshold, narrow the categories, or add a knowledge snippet that addresses the root cause

Track how often you need to correct Autopilot over time. If the correction rate climbs, tighten settings. If it stays low for weeks, you can carefully loosen. This loop is how teams arrive at confident, ambitious Autopilot configurations: not by setting them on day one, but by tuning until the system feels reliably calm.

Approval workflows (between fully manual and fully automatic)

If full Autopilot feels like too big a leap, there's a middle ground: Chirp AI generates the draft, but instead of sending automatically, the draft sits in an "Awaiting Approval" state on the conversation. Agents review and click Send (or Edit) — same as the standard suggestion flow, just with the queue surfaced more explicitly.

This is the right setup for teams that want AI to do all the writing but aren't ready to give up the human send.

A word on safety

Autopilot is powerful. A few principles to keep in mind as you configure it:

Default to off, expand deliberately. Easier to grow trust than to recover from a bad batch of auto-sent replies.
Don't enable Autopilot for properties you don't yet know well. New property additions should sit out of Autopilot for at least a few weeks.
Treat sensitive topics as off-limits. Refunds, legal, emergencies — these belong in human hands regardless of how well the AI is doing.

Up next

Creating Tasks from Conversations — when a guest message reveals an operational issue, route it as a real task to your team or your ops platform.