Mafiree logo
  • About
  • Services
  • Blogs
  • Careers
  • Products
    • orbit logo Orbit
    • streamer logo Xstreami
  • Contact
Schedule a Call
Menu
  • About
  • Services
  • Blogs
  • Careers
  • Products
    • orbit logo Orbit
    • streamer logo Xstreami
  • Contact
  • Schedule a Call
Database
Database Database Managed Services
MySQL MySQL
MySQL Consulting Services
MySQL Migration Services
MySQL Optimization & Query Tuning
MySQL Database Administration
MySQL Backup & Recovery
MySQL Security & Maintenance
MySQL Cloud Services (AWS RDS, Aurora, Google Cloud SQL, Azure)
MySQL for Ecommerce
MySQL High Availability & Replication
MongoDB MongoDB
MongoDB Consulting Services
MongoDB Migration Services
MongoDB Optimization & Query Tuning
MongoDB Database Administration
MongoDB Backup & Recovery
MongoDB Security & Maintenance
MongoDB Cloud (Atlas)
MongoDB Solutions by Industry
MongoDB High Availability & Replication
PostgreSQL PostgreSQL
PostgreSQL Consulting
PostgreSQL Migration & Upgrades
Performance Tuning & Query Optimization
PostgreSQL Administration & Managed Services
High Availability, Clustering & Replication
PostgreSQL Backup, Recovery & Disaster Planning
PostgreSQL Security, Compliance & Auditing
PostgreSQL for Analytics & Data Warehousing
PostgreSQL on Cloud & Containers
PostgreSQL Extensions & Open-Source Integrations
PostgreSQL for Every Industry
SQL Server MSSQL
MSSQL Consulting Services
MSSQL Migration Services
MSSQL Optimization & Query Tuning Services
MSSQL Database Administration Services
MSSQL Backup & Recovery Services
MSSQL High Availability & Replication Services
MSSQL Security & Compliance Services
MSSQL Performance Monitoring & Health Checks
MSSQL Solutions by Industry
Aerospike Aerospike
Aerospike Consulting Services
Aerospike Migration Services
Aerospike Performance Optimization & Tuning
Aerospike Database Administration
Aerospike Backup & Recovery
Aerospike High Availability
Aerospike Cloud & Hybrid Deployments
Aerospike for Real-Time Applications (AdTech, FinTech, Retail, IoT)
Analytics DB
Analytics DB Analytics DB Services
Clickhouse Clickhouse
ClickHouse Consulting Services
ClickHouse Migration Services
ClickHouse Optimization & Query Tuning
ClickHouse Database Administration
ClickHouse Backup & Recovery
ClickHouse Security & Maintenance
ClickHouse Cloud Services (ClickHouse Cloud, AWS, GCP, Azure)
ClickHouse Solutions by Industry
ClickHouse High Availability & Replication
TiDB TiDB
TiDB Consulting & Architecture Planning
TiDB Administration & Maintenance
TiDB Security and Privacy Maintenance
TiDB Performance & Query Optimization
TiDB Migration Services
TiDB Backup & Disaster Recovery
TiDB High Availability Solutions
TiDB Solutions by Industry
TiDB Cloud Services
ScyllaDB ScyllaDB
ScyllaDB Consulting & Architecture Planning
ScyllaDB Administration & Maintenance
ScyllaDB Security and Privacy Maintenance
ScyllaDB Performance & Query Optimization
ScyllaDB Migration Services
ScyllaDB Backup & Disaster Recovery
ScyllaDB High Availability Solutions
ScyllaDB Solutions by Industry
ScyllaDB Cloud Services
DevOps
DevOps DevOps Services
Version Control Version Control
Kubernetes Kubernetes
Infrastructure Infrastructure Management
Web Servers Web Servers
Networking
Networking Networking Services
Basic Basic
Advanced Advanced
MySQL MySQL
MongoDB MongoDB
PostgreSQL PostgreSQL
MSSQL MSSQL
Aerospike Aerospike
Clickhouse Clickhouse
TiDB TiDB
ScyllaDB ScyllaDB
Version Control Version Control
Kubernetes Kubernetes
Infrastructure Infrastructure Management
Web Servers Web Servers
Basic Basic
Advanced Advanced
MySQL Consulting Services
MySQL Migration Services
MySQL Optimization & Query Tuning
MySQL Database Administration
MySQL Backup & Recovery
MySQL Security & Maintenance
MySQL Cloud Services (AWS RDS, Aurora, Google Cloud SQL, Azure)
MySQL for Ecommerce
MySQL High Availability & Replication
MongoDB Consulting Services
MongoDB Migration Services
MongoDB Optimization & Query Tuning
MongoDB Database Administration
MongoDB Backup & Recovery
MongoDB Security & Maintenance
MongoDB Cloud (Atlas)
MongoDB Solutions by Industry
MongoDB High Availability & Replication
PostgreSQL Consulting
PostgreSQL Migration & Upgrades
Performance Tuning & Query Optimization
PostgreSQL Administration & Managed Services
High Availability, Clustering & Replication
PostgreSQL Backup, Recovery & Disaster Planning
PostgreSQL Security, Compliance & Auditing
PostgreSQL for Analytics & Data Warehousing
PostgreSQL on Cloud & Containers
PostgreSQL Extensions & Open-Source Integrations
PostgreSQL for Every Industry
MSSQL Consulting Services
MSSQL Migration Services
MSSQL Optimization & Query Tuning Services
MSSQL Database Administration Services
MSSQL Backup & Recovery Services
MSSQL High Availability & Replication Services
MSSQL Security & Compliance Services
MSSQL Performance Monitoring & Health Checks
MSSQL Solutions by Industry
Aerospike Consulting Services
Aerospike Migration Services
Aerospike Performance Optimization & Tuning
Aerospike Database Administration
Aerospike Backup & Recovery
Aerospike High Availability
Aerospike Cloud & Hybrid Deployments
Aerospike for Real-Time Applications (AdTech, FinTech, Retail, IoT)
ClickHouse Consulting Services
ClickHouse Migration Services
ClickHouse Optimization & Query Tuning
ClickHouse Database Administration
ClickHouse Backup & Recovery
ClickHouse Security & Maintenance
ClickHouse Cloud Services (ClickHouse Cloud, AWS, GCP, Azure)
ClickHouse Solutions by Industry
ClickHouse High Availability & Replication
TiDB Consulting & Architecture Planning
TiDB Administration & Maintenance
TiDB Security and Privacy Maintenance
TiDB Performance & Query Optimization
TiDB Migration Services
TiDB Backup & Disaster Recovery
TiDB High Availability Solutions
TiDB Solutions by Industry
TiDB Cloud Services
ScyllaDB Consulting & Architecture Planning
ScyllaDB Administration & Maintenance
ScyllaDB Security and Privacy Maintenance
ScyllaDB Performance & Query Optimization
ScyllaDB Migration Services
ScyllaDB Backup & Disaster Recovery
ScyllaDB High Availability Solutions
ScyllaDB Solutions by Industry
ScyllaDB Cloud Services
  1. Home
  2. > Blogs
  3. > MySQL
  4. > High Availability for MySQL using Orchestrator and ProxySQL

High Availability for MySQL using Orchestrator and ProxySQL

Auto slave promotion without any DBAs intervention using Orchestrator and ProxySQL. If you are looking for a HA solution without going for synchronous replication or Aws RDS then Orchestrator with ProxySQL is a great choice.

sukan June 09, 2026

Subscribe for email updates

MySQL High Availability with Orchestrator and ProxySQL

MySQL High Availability with Orchestrator and ProxySQL: Auto Failover Without Synchronous Replication

Most MySQL deployments rely on standard asynchronous replication — one primary, one or more replicas — with no automated path for handling a primary failure. When that primary goes down, someone gets paged, SSHes in, and manually promotes a replica. At 2 AM. Mafiree's MySQL team has seen this pattern across dozens of production environments, and in nearly every case the RTO was measured in tens of minutes rather than seconds.

There's a better approach that doesn't require abandoning asynchronous replication or absorbing the write latency overhead of Galera or InnoDB Cluster. Orchestrator — an open-source MySQL topology manager — paired with ProxySQL as an intelligent load balancer delivers automatic primary promotion, replica re-pointing, and transparent traffic rerouting, all without a single line of application-level change.

This guide covers the full architecture, how the three Orchestrator phases work, ProxySQL integration patterns, and the critical configuration details that determine whether failover takes 20 seconds or 3 minutes.

What You'll Learn
  • Why cluster-based HA (Galera, InnoDB Cluster) isn't always the right trade-off
  • How Orchestrator's Discovery, Refactoring, and Recovery phases work end-to-end
  • How ProxySQL routes read/write traffic and reacts to topology changes
  • Step-by-step Orchestrator + ProxySQL setup with real configuration snippets
  • Promotion rules, hooks, and the configurations that cause silent failover failures

Why Not Synchronous Replication?

The instinct when building a highly available MySQL tier is to reach for a synchronous cluster. Galera Cluster and MySQL InnoDB Cluster provide multi-master or group replication with automatic failover baked in. For some workloads, that's the right call. But synchronous replication carries real costs that make it unsuitable for many production environments.

  • Write latency overhead: Every commit must be acknowledged by a quorum before returning to the client. On a multi-AZ setup with 5–10ms cross-zone RTT, every write absorbs that penalty. Under contention, this compounds.
  • Certification conflicts and deadlocks: Galera's optimistic concurrency control generates certification failures when concurrent transactions touch overlapping rows across nodes. These surface as application-layer deadlock errors and require application-side retry logic.
  • Operational complexity: Cluster state management — node eviction, SST (full state transfer), IST (incremental) — adds a meaningful operational surface area. Recovering a desynchronised node from SST on a large dataset blocks the donor for the duration.
  • Not always necessary: If your workload has a single clear write path and your primary RPO tolerance is a few seconds of replication lag, asynchronous replication with automated failover achieves the same availability target at a fraction of the overhead.

Orchestrator + ProxySQL is designed exactly for this: asynchronous replication topologies that need automated, reliable failover without sacrificing write performance.

Architecture Overview

A typical Mafiree-deployed HA stack using this pattern looks like this:

Reference Architecture
  Application Tier
        |
  ┌─────────────┐
  │   ProxySQL  │  (x2, HA pair)  — Listens on port 6033
  │  (Read/Write│  — Hostgroup 10: writer (primary only)
  │   Routing)  │  — Hostgroup 20: readers (replicas)
  └──────┬──────┘
         |
  ┌──────┴──────────────────────┐
  |                             |
  ▼                             ▼
MySQL Primary (3306)     Replica 1 / Replica 2 (3306)
  |
  ├── Replica 1
  └── Replica 2

  Orchestrator Tier (x3, Raft consensus)
  ┌────────────────────────────┐
  │  Orch-1  Orch-2  Orch-3   │  — Raft leader elected automatically
  │          (Raft)            │  — Each polls all MySQL nodes
  └────────────────────────────┘
    

Three components carry all the weight:

  • MySQL replication topology: Standard asynchronous replication — one primary, one or more replicas. GTID-based is strongly recommended; Orchestrator supports binary log file+position but GTID makes replica re-pointing after failover far cleaner.
  • Orchestrator (Raft mode): Three Orchestrator nodes running in Raft consensus. The elected leader handles topology polling and recovery actions. This eliminates single points of failure in the HA manager itself — a critical detail that's often missed in single-Orchestrator setups.
  • ProxySQL: Deployed as an HA pair or behind a VIP. Maintains hostgroup definitions mapping writers and readers, executes health checks, and can be notified of topology changes via Orchestrator hooks to update its internal routing tables instantly.

How Orchestrator Works: Three Phases

Orchestrator's operation breaks into three distinct phases. Understanding each is essential to configuring it correctly and diagnosing failover behaviour in production.

Phase 1: Discovery

Orchestrator continuously polls the MySQL topology, starting from seed nodes you define in its configuration. For each node it discovers, it reads SHOW SLAVE STATUS, SHOW MASTER STATUS, and a handful of performance-related queries to understand the complete topology graph.

Discovery captures replication positions (GTID executed sets, binary log file/position), read-only status, heartbeat lag, and the full upstream/downstream relationships between every node. This topology map is stored in Orchestrator's backend — either its own database or a shared MySQL/SQLite — and updated on a configurable interval (default: 10 seconds).

Configuration note: The InstancePollSeconds setting controls how frequently Orchestrator polls each instance. Lower values detect failures faster but increase MySQL-side query load. Mafiree's standard configuration is 5 seconds for production and 10 seconds for non-critical environments.

Phase 2: Refactoring

Refactoring is Orchestrator's ability to restructure the replication topology without a failure event. This includes:

  • Moving a replica from one primary to another (relocate)
  • Changing the replication position of a replica (match)
  • Re-pointing all replicas under a new primary after manual changes
  • Splitting or merging replica subtrees

During recovery, refactoring is what Orchestrator uses to re-attach surviving replicas under the newly promoted primary. With GTID enabled, this is straightforward — Orchestrator simply issues CHANGE MASTER TO with the new primary's address and lets GTID auto-positioning handle the rest. Without GTID, Orchestrator performs binary log file/position calculations to find the right resume point, which works but is more fragile under complex topologies.

Phase 3: Recovery

Recovery triggers when Orchestrator detects a primary failure — specifically when the primary becomes unreachable to Orchestrator and all replicas simultaneously report that replication is broken. The recovery sequence is:

  1. Failure confirmation — Orchestrator waits for RecoveryPollSeconds (default: 1s) and re-checks from multiple Raft members to eliminate false positives from transient network issues.
  2. Candidate selection — Orchestrator evaluates replicas against promotion rules: replica lag, binary log enabled, data centre preference, and any explicitly defined priority tags. The replica with the most recent GTID executed set and no blocking conditions wins.
  3. Pre-failover hook — Orchestrator executes the OnFailureDetectionProcesses hook. This is where ProxySQL integration fires — the hook can call a script that updates ProxySQL's hostgroups to temporarily stop routing writes, preventing split-brain during the transition.
  4. Promotion — The candidate replica has read_only=OFF applied and is promoted to primary. Orchestrator disables read-only, sets super_read_only=OFF if applicable, and updates its internal topology map.
  5. Replica re-pointing — All surviving replicas are issued CHANGE MASTER TO pointing at the new primary. GTID auto-position handles the sync catch-up automatically.
  6. Post-failover hook — PostMasterFailoverProcesses fires. A ProxySQL hook here updates the writer hostgroup to point at the new primary, restoring write traffic routing.
Common failure mode: Orchestrator correctly promotes a replica but ProxySQL continues routing writes to the old primary IP because no hook updates it. This causes write failures until someone manually updates ProxySQL's mysql_servers table. Always verify that your post-failover hooks are tested end-to-end before relying on them in production.

ProxySQL Integration: Traffic Routing and Hostgroups

ProxySQL's role in this architecture is to sit between the application and MySQL, routing connections to the correct backend without the application needing to know anything about the current topology.

Hostgroup Design

The standard pattern uses two hostgroups:

Hostgroup Purpose Members
HG 10 Writer group — read/write connections Primary only (weight=1000)
HG 20 Reader group — read-only connections All replicas, optionally primary

Query rules map traffic based on patterns. A typical minimal ruleset:

-- Route all writes and transactions to HG 10
INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply)
VALUES
  (10, 1, '^SELECT.*FOR UPDATE', 10, 1),
  (20, 1, '^SELECT',             20, 1);

-- Default: everything else goes to writer HG
UPDATE global_variables SET variable_value='10'
  WHERE variable_name='mysql-default_hostgroup';

LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

Health Checks and Auto-Eviction

ProxySQL runs internal health checks against every server in its mysql_servers table. If a backend fails mysql-monitor_connect_timeout or mysql-monitor_ping_timeout thresholds, it marks that server SHUNNED or OFFLINE_SOFT.

The native ProxySQL monitor uses mysql-monitor_read_only_interval (default: 1500ms) to poll @@read_only on every backend. When Orchestrator promotes a replica and flips its read_only=OFF, ProxySQL's monitor will detect this within 1–2 poll cycles and automatically move that server into the writer hostgroup — no hook required — if you've configured mysql_replication_hostgroups.

-- Tell ProxySQL which HGs are writer/reader pairs
INSERT INTO mysql_replication_hostgroups (writer_hostgroup, reader_hostgroup, comment)
VALUES (10, 20, 'mysql-ha');

LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;

With mysql_replication_hostgroups configured, ProxySQL moves servers between HG 10 and HG 20 based on read_only state automatically. This means Orchestrator's promotion action — which sets read_only=OFF on the new primary — directly triggers ProxySQL to route writes there without needing a separate hook call.

Mafiree MySQL HA Services

Running MySQL without automated failover?

Mafiree designs, deploys, and manages MySQL high availability setups for production environments. From Orchestrator configuration to ProxySQL tuning and 24x7 failover monitoring, we handle the complexity so your team doesn't get paged at 2 AM.

See MySQL HA Services

Setting Up Orchestrator + ProxySQL: Step-by-Step

  1. Enable GTID on all MySQL nodes

    GTID is not mandatory but strongly recommended. Without it, Orchestrator must calculate binary log file/position coordinates to re-point replicas, which is more error-prone.

    # my.cnf on all nodes
    [mysqld]
    gtid_mode                = ON
    enforce_gtid_consistency = ON
    log_slave_updates        = ON
    binlog_format            = ROW
  2. Create the Orchestrator monitoring user on MySQL
    CREATE USER 'orchestrator'@'%' IDENTIFIED BY 'strong_password';
    GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orchestrator'@'%';
    GRANT SELECT ON mysql.slave_master_info TO 'orchestrator'@'%';

    Orchestrator needs SUPER to execute CHANGE MASTER TO and toggle read_only during recovery.

  3. Install and configure Orchestrator

    Deploy three Orchestrator binaries (one per Raft node) and point them at a shared backend database — or use SQLite per node with Raft replication handling consensus state. Key orchestrator.conf.json settings:

    {
      "MySQLTopologyUser":               "orchestrator",
      "MySQLTopologyPassword":           "strong_password",
      "DiscoverByShowSlaveHosts":        true,
      "InstancePollSeconds":             5,
      "RecoveryPollSeconds":             1,
      "RecoveryPeriodBlockSeconds":      300,
      "FailMasterPromotionIfSQLThreadNotUpToDate": true,
      "DetachLostReplicasAfterMasterFailover": true,
      "ApplyMySQLPromotionAfterMasterFailover": true,
      "PreventCrossDataCenterMasterFailover": false,
      "RaftEnabled":                     true,
      "RaftDataDir":                     "/var/lib/orchestrator/raft",
      "RaftBind":                        "ORCH_NODE_IP",
      "RaftNodes":                       ["ORCH_1_IP:10008","ORCH_2_IP:10008","ORCH_3_IP:10008"],
      "PostMasterFailoverProcesses": [
        "/usr/local/bin/proxysql-failover.sh {failedHost} {failedPort} {successorHost} {successorPort}"
      ]
    }
  4. Configure ProxySQL backends
    -- Add all MySQL nodes to ProxySQL
    INSERT INTO mysql_servers (hostgroup_id, hostname, port, weight, max_connections)
    VALUES
      (10, 'mysql-primary',   3306, 1000, 500),
      (20, 'mysql-replica-1', 3306, 100,  500),
      (20, 'mysql-replica-2', 3306, 100,  500);
    
    -- Enable read_only-based automatic routing
    INSERT INTO mysql_replication_hostgroups
      (writer_hostgroup, reader_hostgroup, comment)
    VALUES (10, 20, 'mysql-ha-pair');
    
    -- ProxySQL monitoring user on MySQL
    CREATE USER 'monitor'@'%' IDENTIFIED BY 'monitor_password';
    GRANT SELECT ON sys.* TO 'monitor'@'%';
    
    UPDATE global_variables SET variable_value='monitor' WHERE variable_name='mysql-monitor_username';
    UPDATE global_variables SET variable_value='monitor_password' WHERE variable_name='mysql-monitor_password';
    
    LOAD MYSQL SERVERS TO RUNTIME;
    LOAD MYSQL VARIABLES TO RUNTIME;
    SAVE MYSQL SERVERS TO DISK;
    SAVE MYSQL VARIABLES TO DISK;
  5. Write the post-failover hook script

    Even with mysql_replication_hostgroups handling automatic routing, an explicit hook gives you control over logging, alerting, and edge cases. A minimal proxysql-failover.sh:

    #!/bin/bash
    FAILED_HOST=$1
    FAILED_PORT=$2
    NEW_PRIMARY=$3
    NEW_PORT=$4
    
    PROXYSQL_ADMIN="mysql -h 127.0.0.1 -P 6032 -u admin -padmin_password"
    
    # Move failed primary offline in ProxySQL
    $PROXYSQL_ADMIN -e "UPDATE mysql_servers SET status='OFFLINE_HARD'
      WHERE hostname='$FAILED_HOST' AND port=$FAILED_PORT;"
    
    # Ensure new primary is in writer HG
    $PROXYSQL_ADMIN -e "UPDATE mysql_servers SET hostgroup_id=10, status='ONLINE'
      WHERE hostname='$NEW_PRIMARY' AND port=$NEW_PORT;"
    
    $PROXYSQL_ADMIN -e "LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;"
    
    logger "Orchestrator failover: $FAILED_HOST -> $NEW_PRIMARY"
  6. Discover and verify the topology
    # Seed the first node — Orchestrator will follow replication links to find replicas
    orchestrator-client -c discover -i mysql-primary:3306
    
    # Verify topology is detected correctly
    orchestrator-client -c topology -i mysql-primary:3306

    The topology output should show the primary with all replicas nested beneath it. If any node is missing, check firewall rules between Orchestrator nodes and MySQL nodes.

Promotion Rules and What Breaks Failover

Orchestrator evaluates candidates against a set of configurable promotion rules before selecting a new primary. Understanding these prevents scenarios where Orchestrator detects a failure but refuses to promote any candidate.

Config Parameter Effect Recommendation
FailMasterPromotionIfSQLThreadNotUpToDate Blocks promotion if SQL thread has unapplied relay logs Set true for data safety; accept slightly longer failover time
DelayMasterPromotionIfSQLThreadNotUpToDate Waits for SQL thread to catch up instead of blocking outright Use if tolerable; capped by ReasonableReplicationLagSeconds
RecoveryPeriodBlockSeconds Blocks a second recovery for N seconds after one completes 300s prevents cascade promotions; too high extends downtime on back-to-back failures
PreventCrossDataCenterMasterFailover Refuses to promote a replica in a different DC Enable only if cross-DC write latency is unacceptable
DetachLostReplicasAfterMasterFailover Stops replication on replicas that were unreachable during failover Enable — prevents replicas from reattaching to a potentially outdated source
Silent failure pattern: Orchestrator logs a recovery attempt but no promotion occurs. The most common cause is all candidate replicas having read_only=ON at the MySQL level with no mechanism for Orchestrator to clear it, combined with ApplyMySQLPromotionAfterMasterFailover: false. Verify this setting is true and that the Orchestrator user has SUPER privilege on all backends.

Testing Your HA Setup Before It Matters

A failover stack you've never tested is not a failover stack. These are the three tests Mafiree runs on every new Orchestrator + ProxySQL deployment before signing off:

Test 1: Controlled Primary Failure

# Simulate primary loss — stop MySQL on the primary node
systemctl stop mysql   # on primary

# Watch Orchestrator detect and respond
orchestrator-client -c topology -i mysql-replica-1:3306

# Verify ProxySQL has routed writes to the new primary
mysql -h proxysql-vip -P 6033 -u app_user -e "SHOW VARIABLES LIKE 'hostname';"

Test 2: Measure Actual RTO

Run a continuous write loop through ProxySQL and timestamp any connection errors. Total downtime from primary failure to first successful write through ProxySQL is your actual RTO. With Orchestrator's defaults, expect 15–45 seconds depending on InstancePollSeconds and SQL thread catch-up time.

Test 3: Verify Replica Re-pointing

After failover, confirm all replicas are replicating from the new primary:

-- On each replica after failover
SHOW SLAVE STATUS\G
-- Master_Host should show the new primary's IP
-- Seconds_Behind_Master should be 0 or near 0

Related Mafiree Resources

  • MySQL High Availability & Replication Services — Managed HA architecture, deployment, and 24x7 monitoring
  • MySQL Performance Issues: 7 Signs You Need Professional Tuning — Diagnosing slow queries, buffer pool misses, and replication lag
  • MySQL Architecture Explained: Performance Tuning & Troubleshooting Guide — InnoDB internals, buffer pool, and query execution
  • MySQL Schema Migration Without Downtime — gh-ost, pt-osc, and INSTANT DDL on large tables

Conclusion

Orchestrator + ProxySQL gives MySQL deployments the automated failover they need without forcing a migration to synchronous replication. The combination handles the full recovery cycle — detection, candidate selection, promotion, replica re-pointing, and traffic rerouting — in under a minute on a well-configured stack.

The non-obvious details are what separate a functional setup from one that works reliably under production conditions: running Orchestrator in Raft mode rather than single-node, configuring mysql_replication_hostgroups in ProxySQL for automatic read-only-based routing, testing the full failover path before relying on it, and understanding exactly which promotion rules can silently block recovery.

Mafiree's team manages MySQL high availability environments for clients across financial services, logistics, and e-commerce — environments where a 20-minute manual failover is not acceptable. If your MySQL tier doesn't yet have automated failover, or if you've set up Orchestrator but haven't validated it end-to-end, our MySQL HA services can help you get there.

Need Automated MySQL Failover in Production?

Mafiree's MySQL team designs and manages Orchestrator + ProxySQL deployments for high-traffic production environments. Get in touch for a free architecture review.

Talk to a Mafiree DBA Expert

FAQ

It's an automated failover architecture for MySQL using asynchronous replication. Orchestrator monitors the MySQL topology and automatically promotes a replica to primary when the current primary fails. ProxySQL routes application traffic to the correct backend, updating its routing tables when the topology changes. Together, they deliver sub-minute RTO without synchronous replication overhead.
They solve different problems. Galera provides synchronous, multi-master replication with no data loss on failover (RPO = 0) but adds write latency and certification conflict overhead. Orchestrator with asynchronous replication has a small RPO window (equal to replication lag at failure time) but zero write latency overhead and no certification conflicts. For most OLTP workloads where a few seconds of potential data loss is acceptable, Orchestrator + ProxySQL is the better trade-off.
Detection time depends on <code>InstancePollSeconds</code> (default: 10s) and <code>RecoveryPollSeconds</code> (default: 1s). At Mafiree's standard 5-second poll interval, primary failure is typically confirmed within 10–15 seconds. Total failover time (detection + candidate selection + promotion + ProxySQL update) ranges from 20 to 60 seconds on a well-configured stack.

Author Bio

sukan

Sukan is Database Team Lead at Mafiree with over a decade of experience in database systems, architecture, and performance optimization. He specializes in MySQL, MongoDB, TiDB, and ClickHouse, developing architectural improvements that make data platforms faster, more efficient, and cost-effective. Sukan writes about practical database engineering topics, real-world performance tuning, data replication, and high-scale system design, drawing from extensive hands-on experience solving complex technical challenges.

Leave a Comment

Related Blogs

MySQL Performance Issues: 7 Signs You Need Professional Tuning

MySQL performance issues are rarely sudden — they build over time through slow queries, InnoDB buffer pool misses, replication lag, lock contention, thread pile-ups, tablespace bloat, and unstable query plans. This post identifies the seven most reliable signs that your MySQL environment needs professional DBA attention, with diagnostic queries and remediation guidance for each.

  229 views
Column-Level Security: Enterprise Data Protection Without the Infrastructure Overhead

Column-level security is a native database feature that restricts access to specific table columns by user role. Mafiree implemented this for a client as a cost-effective replacement for a planned CDC replication architecture that existed solely to strip sensitive columns. The result: zero additional infrastructure, single source of truth, full GDPR/HIPAA compliance posture, and validated in production with no performance impact.

  314 views
MySQL Schema Migration Without Downtime: A Real Fintech Case Study

Schema changes on large MySQL tables can bring production systems to a halt if not handled correctly. This case study walks through how Mafiree helped a fintech client execute a zero-downtime MySQL schema migration on a 500M+ row production database — covering the real challenges faced, the three-phase tool strategy using gh-ost, pt-online-schema-change, and MySQL 8.0 INSTANT DDL, production configuration settings with performance benchmarks, and best practices for safely evolving your MySQL schema without impacting users

  2774 views
MySQL Architecture Explained: Performance Tuning & Troubleshooting Guide

MySQL features a unique tiered architecture that separates query processing from data storage through its pluggable storage engine model. This guide explores the core components—from connection handling and the SQL optimizer to the physical storage of data on disk. By understanding how engines like InnoDB provide ACID compliance and row-level locking, you can significantly improve your database's scalability. We also break down the query execution workflow and provide actionable tips for performance tuning, such as optimizing the buffer pool. Whether you're managing a replica set or a standalone instance, mastering MySQL’s internal structure is essential for building high-performance applications.

  1599 views
MariaDB vs MySQL: What's Different in 2026 and Which One Should You Use

Discover how MariaDB 11.x is redefining open-source databases with cutting-edge features like system-versioned tables, native AI-ready vector support, UUIDv7 for scalable inserts, and enterprise-grade security; all in the Community Edition, without the paywall.

  449 views

Subscribe for email updates

Get in touch with us

Highlights

More than 6000 Servers Monitored

Happy Clients

Certified DBAs

24 x 7 x 365 Support

PCI

Database Services

MySQL MongoDB PostgreSQL SQL Server Aerospike Clickhouse TiDB MariaDB Columnstore

Quick Links

Careers Blog Contact Privacy Policy Disclaimer Policy

Contacts

Linkedin Mafiree Facebook Mafiree Twitter Mafiree

Nagercoil Office

Miru IT Park, Vallankumaranvillai,

Nagercoil, Tamilnadu - 629 002.

Bangalore Office

Unit 303, Vanguard Rise,

5th Main, Konena Agrahara,

Old Airport Road, Bangalore - 560 017.

Call: +91 6383016411

Email: sales@mafiree.com


Copyright © - All Rights Reserved - Mafiree