Auto slave promotion without any DBAs intervention using Orchestrator and ProxySQL. If you are looking for a HA solution without going for synchronous replication or Aws RDS then Orchestrator with ProxySQL is a great choice.
sukan June 09, 2026
Most MySQL deployments rely on standard asynchronous replication — one primary, one or more replicas — with no automated path for handling a primary failure. When that primary goes down, someone gets paged, SSHes in, and manually promotes a replica. At 2 AM. Mafiree's MySQL team has seen this pattern across dozens of production environments, and in nearly every case the RTO was measured in tens of minutes rather than seconds.
There's a better approach that doesn't require abandoning asynchronous replication or absorbing the write latency overhead of Galera or InnoDB Cluster. Orchestrator — an open-source MySQL topology manager — paired with ProxySQL as an intelligent load balancer delivers automatic primary promotion, replica re-pointing, and transparent traffic rerouting, all without a single line of application-level change.
This guide covers the full architecture, how the three Orchestrator phases work, ProxySQL integration patterns, and the critical configuration details that determine whether failover takes 20 seconds or 3 minutes.
The instinct when building a highly available MySQL tier is to reach for a synchronous cluster. Galera Cluster and MySQL InnoDB Cluster provide multi-master or group replication with automatic failover baked in. For some workloads, that's the right call. But synchronous replication carries real costs that make it unsuitable for many production environments.
Orchestrator + ProxySQL is designed exactly for this: asynchronous replication topologies that need automated, reliable failover without sacrificing write performance.
A typical Mafiree-deployed HA stack using this pattern looks like this:
Application Tier
|
┌─────────────┐
│ ProxySQL │ (x2, HA pair) — Listens on port 6033
│ (Read/Write│ — Hostgroup 10: writer (primary only)
│ Routing) │ — Hostgroup 20: readers (replicas)
└──────┬──────┘
|
┌──────┴──────────────────────┐
| |
▼ ▼
MySQL Primary (3306) Replica 1 / Replica 2 (3306)
|
├── Replica 1
└── Replica 2
Orchestrator Tier (x3, Raft consensus)
┌────────────────────────────┐
│ Orch-1 Orch-2 Orch-3 │ — Raft leader elected automatically
│ (Raft) │ — Each polls all MySQL nodes
└────────────────────────────┘
Three components carry all the weight:
Orchestrator's operation breaks into three distinct phases. Understanding each is essential to configuring it correctly and diagnosing failover behaviour in production.
Orchestrator continuously polls the MySQL topology, starting from seed nodes you define in its configuration. For each node it discovers, it reads SHOW SLAVE STATUS, SHOW MASTER STATUS, and a handful of performance-related queries to understand the complete topology graph.
Discovery captures replication positions (GTID executed sets, binary log file/position), read-only status, heartbeat lag, and the full upstream/downstream relationships between every node. This topology map is stored in Orchestrator's backend — either its own database or a shared MySQL/SQLite — and updated on a configurable interval (default: 10 seconds).
InstancePollSeconds setting controls how frequently Orchestrator polls each instance. Lower values detect failures faster but increase MySQL-side query load. Mafiree's standard configuration is 5 seconds for production and 10 seconds for non-critical environments.
Refactoring is Orchestrator's ability to restructure the replication topology without a failure event. This includes:
relocate)match)During recovery, refactoring is what Orchestrator uses to re-attach surviving replicas under the newly promoted primary. With GTID enabled, this is straightforward — Orchestrator simply issues CHANGE MASTER TO with the new primary's address and lets GTID auto-positioning handle the rest. Without GTID, Orchestrator performs binary log file/position calculations to find the right resume point, which works but is more fragile under complex topologies.
Recovery triggers when Orchestrator detects a primary failure — specifically when the primary becomes unreachable to Orchestrator and all replicas simultaneously report that replication is broken. The recovery sequence is:
RecoveryPollSeconds (default: 1s) and re-checks from multiple Raft members to eliminate false positives from transient network issues.OnFailureDetectionProcesses hook. This is where ProxySQL integration fires — the hook can call a script that updates ProxySQL's hostgroups to temporarily stop routing writes, preventing split-brain during the transition.read_only=OFF applied and is promoted to primary. Orchestrator disables read-only, sets super_read_only=OFF if applicable, and updates its internal topology map.CHANGE MASTER TO pointing at the new primary. GTID auto-position handles the sync catch-up automatically.PostMasterFailoverProcesses fires. A ProxySQL hook here updates the writer hostgroup to point at the new primary, restoring write traffic routing.mysql_servers table. Always verify that your post-failover hooks are tested end-to-end before relying on them in production.
ProxySQL's role in this architecture is to sit between the application and MySQL, routing connections to the correct backend without the application needing to know anything about the current topology.
The standard pattern uses two hostgroups:
| Hostgroup | Purpose | Members |
|---|---|---|
HG 10 |
Writer group — read/write connections | Primary only (weight=1000) |
HG 20 |
Reader group — read-only connections | All replicas, optionally primary |
Query rules map traffic based on patterns. A typical minimal ruleset:
-- Route all writes and transactions to HG 10
INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply)
VALUES
(10, 1, '^SELECT.*FOR UPDATE', 10, 1),
(20, 1, '^SELECT', 20, 1);
-- Default: everything else goes to writer HG
UPDATE global_variables SET variable_value='10'
WHERE variable_name='mysql-default_hostgroup';
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
ProxySQL runs internal health checks against every server in its mysql_servers table. If a backend fails mysql-monitor_connect_timeout or mysql-monitor_ping_timeout thresholds, it marks that server SHUNNED or OFFLINE_SOFT.
The native ProxySQL monitor uses mysql-monitor_read_only_interval (default: 1500ms) to poll @@read_only on every backend. When Orchestrator promotes a replica and flips its read_only=OFF, ProxySQL's monitor will detect this within 1–2 poll cycles and automatically move that server into the writer hostgroup — no hook required — if you've configured mysql_replication_hostgroups.
-- Tell ProxySQL which HGs are writer/reader pairs
INSERT INTO mysql_replication_hostgroups (writer_hostgroup, reader_hostgroup, comment)
VALUES (10, 20, 'mysql-ha');
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
With mysql_replication_hostgroups configured, ProxySQL moves servers between HG 10 and HG 20 based on read_only state automatically. This means Orchestrator's promotion action — which sets read_only=OFF on the new primary — directly triggers ProxySQL to route writes there without needing a separate hook call.
Mafiree designs, deploys, and manages MySQL high availability setups for production environments. From Orchestrator configuration to ProxySQL tuning and 24x7 failover monitoring, we handle the complexity so your team doesn't get paged at 2 AM.
See MySQL HA ServicesGTID is not mandatory but strongly recommended. Without it, Orchestrator must calculate binary log file/position coordinates to re-point replicas, which is more error-prone.
# my.cnf on all nodes
[mysqld]
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
binlog_format = ROW
CREATE USER 'orchestrator'@'%' IDENTIFIED BY 'strong_password';
GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orchestrator'@'%';
GRANT SELECT ON mysql.slave_master_info TO 'orchestrator'@'%';
Orchestrator needs SUPER to execute CHANGE MASTER TO and toggle read_only during recovery.
Deploy three Orchestrator binaries (one per Raft node) and point them at a shared backend database — or use SQLite per node with Raft replication handling consensus state. Key orchestrator.conf.json settings:
{
"MySQLTopologyUser": "orchestrator",
"MySQLTopologyPassword": "strong_password",
"DiscoverByShowSlaveHosts": true,
"InstancePollSeconds": 5,
"RecoveryPollSeconds": 1,
"RecoveryPeriodBlockSeconds": 300,
"FailMasterPromotionIfSQLThreadNotUpToDate": true,
"DetachLostReplicasAfterMasterFailover": true,
"ApplyMySQLPromotionAfterMasterFailover": true,
"PreventCrossDataCenterMasterFailover": false,
"RaftEnabled": true,
"RaftDataDir": "/var/lib/orchestrator/raft",
"RaftBind": "ORCH_NODE_IP",
"RaftNodes": ["ORCH_1_IP:10008","ORCH_2_IP:10008","ORCH_3_IP:10008"],
"PostMasterFailoverProcesses": [
"/usr/local/bin/proxysql-failover.sh {failedHost} {failedPort} {successorHost} {successorPort}"
]
}
-- Add all MySQL nodes to ProxySQL
INSERT INTO mysql_servers (hostgroup_id, hostname, port, weight, max_connections)
VALUES
(10, 'mysql-primary', 3306, 1000, 500),
(20, 'mysql-replica-1', 3306, 100, 500),
(20, 'mysql-replica-2', 3306, 100, 500);
-- Enable read_only-based automatic routing
INSERT INTO mysql_replication_hostgroups
(writer_hostgroup, reader_hostgroup, comment)
VALUES (10, 20, 'mysql-ha-pair');
-- ProxySQL monitoring user on MySQL
CREATE USER 'monitor'@'%' IDENTIFIED BY 'monitor_password';
GRANT SELECT ON sys.* TO 'monitor'@'%';
UPDATE global_variables SET variable_value='monitor' WHERE variable_name='mysql-monitor_username';
UPDATE global_variables SET variable_value='monitor_password' WHERE variable_name='mysql-monitor_password';
LOAD MYSQL SERVERS TO RUNTIME;
LOAD MYSQL VARIABLES TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
SAVE MYSQL VARIABLES TO DISK;
Even with mysql_replication_hostgroups handling automatic routing, an explicit hook gives you control over logging, alerting, and edge cases. A minimal proxysql-failover.sh:
#!/bin/bash
FAILED_HOST=$1
FAILED_PORT=$2
NEW_PRIMARY=$3
NEW_PORT=$4
PROXYSQL_ADMIN="mysql -h 127.0.0.1 -P 6032 -u admin -padmin_password"
# Move failed primary offline in ProxySQL
$PROXYSQL_ADMIN -e "UPDATE mysql_servers SET status='OFFLINE_HARD'
WHERE hostname='$FAILED_HOST' AND port=$FAILED_PORT;"
# Ensure new primary is in writer HG
$PROXYSQL_ADMIN -e "UPDATE mysql_servers SET hostgroup_id=10, status='ONLINE'
WHERE hostname='$NEW_PRIMARY' AND port=$NEW_PORT;"
$PROXYSQL_ADMIN -e "LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;"
logger "Orchestrator failover: $FAILED_HOST -> $NEW_PRIMARY"
# Seed the first node — Orchestrator will follow replication links to find replicas
orchestrator-client -c discover -i mysql-primary:3306
# Verify topology is detected correctly
orchestrator-client -c topology -i mysql-primary:3306
The topology output should show the primary with all replicas nested beneath it. If any node is missing, check firewall rules between Orchestrator nodes and MySQL nodes.
Orchestrator evaluates candidates against a set of configurable promotion rules before selecting a new primary. Understanding these prevents scenarios where Orchestrator detects a failure but refuses to promote any candidate.
| Config Parameter | Effect | Recommendation |
|---|---|---|
FailMasterPromotionIfSQLThreadNotUpToDate |
Blocks promotion if SQL thread has unapplied relay logs | Set true for data safety; accept slightly longer failover time |
DelayMasterPromotionIfSQLThreadNotUpToDate |
Waits for SQL thread to catch up instead of blocking outright | Use if tolerable; capped by ReasonableReplicationLagSeconds |
RecoveryPeriodBlockSeconds |
Blocks a second recovery for N seconds after one completes | 300s prevents cascade promotions; too high extends downtime on back-to-back failures |
PreventCrossDataCenterMasterFailover |
Refuses to promote a replica in a different DC | Enable only if cross-DC write latency is unacceptable |
DetachLostReplicasAfterMasterFailover |
Stops replication on replicas that were unreachable during failover | Enable — prevents replicas from reattaching to a potentially outdated source |
read_only=ON at the MySQL level with no mechanism for Orchestrator to clear it, combined with ApplyMySQLPromotionAfterMasterFailover: false. Verify this setting is true and that the Orchestrator user has SUPER privilege on all backends.
A failover stack you've never tested is not a failover stack. These are the three tests Mafiree runs on every new Orchestrator + ProxySQL deployment before signing off:
# Simulate primary loss — stop MySQL on the primary node
systemctl stop mysql # on primary
# Watch Orchestrator detect and respond
orchestrator-client -c topology -i mysql-replica-1:3306
# Verify ProxySQL has routed writes to the new primary
mysql -h proxysql-vip -P 6033 -u app_user -e "SHOW VARIABLES LIKE 'hostname';"
Run a continuous write loop through ProxySQL and timestamp any connection errors. Total downtime from primary failure to first successful write through ProxySQL is your actual RTO. With Orchestrator's defaults, expect 15–45 seconds depending on InstancePollSeconds and SQL thread catch-up time.
After failover, confirm all replicas are replicating from the new primary:
-- On each replica after failover
SHOW SLAVE STATUS\G
-- Master_Host should show the new primary's IP
-- Seconds_Behind_Master should be 0 or near 0
Orchestrator + ProxySQL gives MySQL deployments the automated failover they need without forcing a migration to synchronous replication. The combination handles the full recovery cycle — detection, candidate selection, promotion, replica re-pointing, and traffic rerouting — in under a minute on a well-configured stack.
The non-obvious details are what separate a functional setup from one that works reliably under production conditions: running Orchestrator in Raft mode rather than single-node, configuring mysql_replication_hostgroups in ProxySQL for automatic read-only-based routing, testing the full failover path before relying on it, and understanding exactly which promotion rules can silently block recovery.
Mafiree's team manages MySQL high availability environments for clients across financial services, logistics, and e-commerce — environments where a 20-minute manual failover is not acceptable. If your MySQL tier doesn't yet have automated failover, or if you've set up Orchestrator but haven't validated it end-to-end, our MySQL HA services can help you get there.
Mafiree's MySQL team designs and manages Orchestrator + ProxySQL deployments for high-traffic production environments. Get in touch for a free architecture review.
Talk to a Mafiree DBA ExpertMiru IT Park, Vallankumaranvillai,
Nagercoil, Tamilnadu - 629 002.
Unit 303, Vanguard Rise,
5th Main, Konena Agrahara,
Old Airport Road, Bangalore - 560 017.
Call: +91 6383016411
Email: sales@mafiree.com