Autopilot: Auto-Reply Safely at Scale

Written By Chad McGuire (Sparrow Intel)

Overview

Autopilot is the feature that lets Chirp AI send replies on its own, without an agent reviewing each one β€” within guardrails you control. For high-volume teams it's the difference between two people clearing 200 messages a day and two people clearing 2,000.

This lesson is the longest in the curriculum because Autopilot is the highest-leverage feature in Conversations and also the one with the most consequence if you set it up wrong. We're going to be careful.

What Autopilot actually does

When Autopilot is enabled, here's what happens on a new guest message:

  1. Chirp AI generates a reply suggestion (same as it always does)
  2. It assesses confidence, risk, and quality signals
  3. It checks the assessment against your Autopilot configuration
  4. If the assessment passes every check, the reply is sent automatically
  5. If anything fails a check, the suggestion sits in the inbox waiting for human review

That last point is critical: Autopilot's default failure mode is "ask a human." It doesn't make best-effort sends; it sends only when it's confident enough by your standards.

What you control

Every Autopilot decision uses these settings:

SettingWhat it does
Enable / DisableMaster switch
Property groupsLimit Autopilot to specific properties or groups (start narrow!)
Confidence thresholdMinimum confidence score (0–100) β€” drafts below this are not auto-sent
Maximum risk levelHighest risk level Autopilot is allowed to send (Low / Medium / High)
Require all quality signalsIf on, all five quality signals must be green
Category filter mode"Include selected" or "exclude selected" β€” pick which intent categories Autopilot is allowed to handle
Selected categoriesWhich categories are in scope per the filter mode above
Auto-close after replyAfter Autopilot sends, mark the conversation closed
Auto-reply support threadsAllow Autopilot to send on Airbnb support thread conversations (off by default)

Autopilot is configured in Conversation Settings (alongside your message signature and conversation categories). Changes take effect on the next inbound post.

Sensible starting configuration

The configuration we recommend for a team that's never run Autopilot before:

  • Enabled: yes
  • Property groups: one small group of well-understood properties β€” not your entire portfolio
  • Confidence threshold: 90
  • Maximum risk level: Low
  • Require all quality signals: on
  • Category filter mode: Include selected
  • Selected categories: Routine Inquiry, Compliment
  • Auto-close after reply: on
  • Auto-reply support threads: off

This is deliberately conservative. With these settings, Autopilot will only send on routine, low-risk, high-confidence messages β€” the safest 30–40% of an average inbox. Most teams expand from there over the first month.

Tuning Autopilot

After a couple of weeks running the conservative setup, you can start expanding. The dials, in roughly the order to consider relaxing them:

1. Add more property groups

If the first group went well, add the next. Watch for properties where Chirp AI suggestions consistently miss something β€” those signal that you need more knowledge snippets before adding the property to Autopilot.

2. Lower the confidence threshold

90 β†’ 85 typically catches significantly more conversations without much added risk. We don't recommend going below 80 except in narrow, well-tested scopes.

3. Add categories

Check-in Logistics is usually the next category to enable after Routine Inquiry and Compliment β€” it's high-volume, mostly factual, and well-served by AI. Booking Modification, Issue Report, and Complaint should stay off Autopilot for most teams.

4. Allow Medium risk

Only do this once you've spent at least a month on Low-only and reviewed a sample of what was held for human review. Many of those will be appropriate Medium-risk responses you're now comfortable trusting.

5. Always-leave-off list

These should generally never be on Autopilot, regardless of how confident you get:

  • Emergency category
  • Refund or money related text
  • Chargeback language
  • Legal threats ("I'll take this to my credit card company," etc.)

Maintain a category exclusion or a keyword-based rule to keep these for humans.

Watching Autopilot in production

Autopilot's actions appear in the conversation thread the same way agent replies do β€” clearly attributable, fully auditable. You can:

  • Filter the inbox to show only Autopilot-sent threads
  • Read the full thread to confirm the reply was reasonable
  • See the confidence assessment that justified the send

We strongly recommend a daily or weekly review for the first month: have a supervisor scan a sample of Autopilot-handled conversations and confirm they would have sent the same thing. Adjust thresholds and snippets based on what you see.

What to do when Autopilot makes a mistake

  1. Reply with a correction message β€” the guest gets a corrected message from a human

  2. Review the original Autopilot post and confidence assessment β€” figure out why the AI was confident enough to send

  3. If the pattern is repeatable, adjust: raise the threshold, narrow the categories, or add a knowledge snippet that addresses the root cause

Track how often you need to correct Autopilot over time. If the correction rate climbs, tighten settings. If it stays low for weeks, you can carefully loosen. This loop is how teams arrive at confident, ambitious Autopilot configurations: not by setting them on day one, but by tuning until the system feels reliably calm.

Approval workflows (between fully manual and fully automatic)

If full Autopilot feels like too big a leap, there's a middle ground: Chirp AI generates the draft, but instead of sending automatically, the draft sits in an "Awaiting Approval" state on the conversation. Agents review and click Send (or Edit) β€” same as the standard suggestion flow, just with the queue surfaced more explicitly.

This is the right setup for teams that want AI to do all the writing but aren't ready to give up the human send.

A word on safety

Autopilot is powerful. A few principles to keep in mind as you configure it:

  • Default to off, expand deliberately. Easier to grow trust than to recover from a bad batch of auto-sent replies.
  • Don't enable Autopilot for properties you don't yet know well. New property additions should sit out of Autopilot for at least a few weeks.
  • Treat sensitive topics as off-limits. Refunds, legal, emergencies β€” these belong in human hands regardless of how well the AI is doing.

Up next

Creating Tasks from Conversations β€” when a guest message reveals an operational issue, route it as a real task to your team or your ops platform.