High Availability Installation (Basic Failover)
This guide covers installing Netsocs across two Linux nodes in an active-passive high availability configuration. If Node A (master) goes down, Node B (backup) automatically takes over and Netsocs keeps running — no manual intervention required.
Scope of this guide
This configuration covers failover of the application layer only (Docker stack + shared files + Virtual IP). Databases (MySQL, MongoDB, Redis) are assumed to run on external servers and are not part of this setup.
Requirements¶
Both nodes must have¶
- Ubuntu 20.04+ or Debian 11+ (or compatible)
- Docker Engine ≥ 24.x installed
- Docker Compose plugin installed
- The
netsocs-docker-composerepository cloned at the same path on both nodes (recommended:/opt/netsocs) - Network connectivity between both nodes (mutual ping)
Network ports to open between nodes¶
| Port | Protocol | Direction | Purpose |
|---|---|---|---|
| 112 | VRRP | Both ways | Keepalived heartbeat |
| 2049 | TCP/UDP | Node B → Node A | NFS (shared volumes) |
| 111 | TCP/UDP | Node B → Node A | NFS portmapper |
Information you will need¶
| Variable | Description | Example |
|---|---|---|
NODE_A_IP |
Real IP of Node A (master) | 192.168.1.10 |
NODE_B_IP |
Real IP of Node B (backup) | 192.168.1.11 |
VIP_ADDRESS |
Floating Virtual IP — must not be in use | 192.168.1.100 |
NETWORK_IFACE |
Network interface name on both nodes | eth0 / ens3 |
AUTH_PASS |
VRRP shared password (max 8 characters) | Ns3c2024 |
VIP must be free
The VIP_ADDRESS must be an IP that is not assigned to any device on your network. Both nodes must be on the same subnet as this IP.
Step 1 — Configure Node A (Master)¶
On Node A, navigate to the Netsocs project directory and run the master setup script as root:
cd /opt/netsocs
sudo bash failover/start-as-master.sh
The script will ask for the required values interactively. If you prefer to pre-set them, export the variables before running:
export NODE_A_IP="192.168.1.10"
export NODE_B_IP="192.168.1.11"
export VIP_ADDRESS="192.168.1.100"
export NETWORK_IFACE="eth0"
export AUTH_PASS="Ns3c2024"
sudo bash failover/start-as-master.sh
What the script does on Node A¶
- Installs and configures an NFS server to share Netsocs volumes with Node B
- Creates the
compose.override.ymlsymlink to activate NFS-backed volumes - Brings up the Docker stack (
docker compose up -d) - Installs Keepalived and generates its configuration (MASTER mode, priority 100)
- Starts Keepalived and assigns the Virtual IP to Node A
- Places the
notify-master.sh,notify-backup.sh, andhealth-check.shscripts in/etc/keepalived/
When the script finishes, Node A is live and serving traffic on the VIP.
Step 2 — Configure Node B (Backup)¶
On Node B, navigate to the Netsocs project directory and run the backup setup script as root. Use the exact same values you used on Node A:
cd /opt/netsocs
sudo bash failover/start-as-backup.sh
What the script does on Node B¶
- Installs the NFS client and mounts the 8 shared volumes from Node A (persisted in
/etc/fstab) - Creates the
compose.override.ymlsymlink (same as Node A) - Installs Keepalived and generates its configuration (BACKUP mode, priority 50)
- Starts Keepalived in standby — the Docker stack is not started
- Places the same notify and health-check scripts in
/etc/keepalived/
Docker stack on Node B
The Netsocs stack does not run on Node B during normal operation. Keepalived will start it automatically via notify-master.sh when Node B takes the VIP.
Verification¶
After both scripts complete, run these checks.
Check Node A holds the VIP¶
On Node A:
ip addr show eth0 | grep 192.168.1.100
# The VIP should appear on the interface
Check Keepalived status¶
On either node:
systemctl status keepalived
journalctl -u keepalived -f
Check NFS mounts on Node B¶
On Node B:
df -h | grep netsocs-nfs
# Should show 8 NFS mounts
Access Netsocs¶
From any browser on your network, navigate to the VIP address:
http://192.168.1.100
Failover Behavior¶
The system follows an active-passive model:
Normal state:
Node A → MASTER → holds VIP → stack running → serving traffic
Node B → BACKUP → standby → stack stopped
Node A fails:
Node B → takes VIP → notify-master.sh → docker compose up -d
Node B → MASTER → stack running → serving traffic
Node A recovers:
Node A → reclaims VIP (preempt) → notify-master.sh → docker compose up -d
Node B → releases VIP → notify-backup.sh → docker compose down
Back to normal state
Failover time is approximately 3–6 seconds.
Testing Failover Manually¶
Simulate Node A failure¶
On Node A:
systemctl stop keepalived
On Node B (within ~5 seconds):
# VIP should appear on Node B
ip addr show eth0 | grep 192.168.1.100
# Docker stack should be running
docker compose ps
# Check the failover event log
tail -20 /var/log/keepalived-failover.log
Restore Node A as master¶
On Node A:
systemctl start keepalived
# Node A reclaims the VIP automatically via preempt (~10 seconds)
On Node B:
# VIP should be gone from Node B
ip addr show | grep 192.168.1.100 # no output expected
# Stack should be stopped
docker compose ps # no containers expected
Useful Log Files¶
| File | Purpose |
|---|---|
/var/log/keepalived-failover.log |
Failover events (VIP taken / released, stack start/stop) |
/var/log/keepalived-health.log |
Periodic health check results |
journalctl -u keepalived -f |
Live Keepalived daemon log |
Common Issues¶
VIP not appearing on Node A after setup¶
Check that port 112 (VRRP) is open between nodes:
# On Node A, allow VRRP from Node B
ufw allow from 192.168.1.11 proto vrrp
# On Node B, allow VRRP from Node A
ufw allow from 192.168.1.10 proto vrrp
NFS mounts failing on Node B¶
Verify the NFS server is running on Node A and the firewall allows ports 2049 and 111:
# On Node A
showmount -e localhost # should list 8 exports
systemctl status nfs-kernel-server
# On Node B
showmount -e 192.168.1.10 # must be reachable
Node B takes the VIP immediately after setup¶
This is expected if Node A's Keepalived was not yet running when Node B started. Once both nodes are running, Node A will reclaim the VIP automatically within ~10 seconds due to the preempt setting.