Cluster Daemon - Creating an Automated Failover System [part 1]
(Source)
Introducing clustd, a new project for failover systems. The goal is to provide a fully automated cluster service across multiple systems in case a system goes down for either maintenance or an unexpected event.
Background
When @steemdunk had an hour of downtime, I made this project as high priority to ensure downtime will be a thing of the past. This will be used to ensure the service stays alive with minimal impact in case a server goes down.
This has been a work in progress alongside some other projects for quite a while and it was time to publish the first part to this project. It has been rewritten a few times to get the design I wished to see.
Inner workings
Connections are made through web sockets. This allows for simplifying connection management internally and avoiding doing the initiation handshake for every client, which can get expensive on a server.
The system will maintain a persistent connection to each of its remote machines. There is active health checking for every machine. Each machine is responsible for health checking its peers. Pings are made every 1.5 seconds, allowing for a maximum of 3 seconds of downtime in a worst case scenario.
Running as a server and a client, it can handle inbound and outbound connections for full connection duplexing. For efficiency reasons and simplicity reasons, each machine will keep one connection open to each other rather than two.
When a master gets disconnected another machine will automatically become the master based on a naive consensus algorithm. Incorrect configurations can cause an error when determining the next master for the cluster.
Security
This project does not make use of WSS/HTTPS, the approach is different but similar! The clear reason for this is that it is still subject to an attack if you turn off certificate checking. In most cases, since the connections are made directly through an IP instead of a domain, certificate checking will always fail.
The encryption protocol scheme is similar to that of SSL with some differences. First there is a handshake that occurs where the server assigns a random number to the client, called a "ticket", which gets concatenated with the secret key. Secondly, there is a message counter that also gets concatenated with the secret key.
This provides full replay protection and key rotation on top of the random IV to ensure complete security for each message. Attempting to replay any messages will result in a decryption error and the machine's connection will be deemed unreliable and closed.
The algorithm being used is AES-128-GCM. This takes care of the message MAC to ensure the message wasn't tampered with. A completely random IV is generated for each message on top of the additional key rotation protection.
Getting setup
Building
It is recommended to use Node v9. Knowledge of node and npm are recommended but not required.
- Clone the repository: https://github.com/steemdunk/clustd.git
- Run
npm install
- Run
npx gulp build
Configuring
The configuration is noted in the README of the project:
https://github.com/steemdunk/clustd#configuration
It is possible to run the cluster on the same machine using different ports for local testing. This has no use in a production environment, however. ;)
There are some additional configuration variables to be mentioned:
export DEBUG='clustd:*'
To enable debugging the entire systemexport CLUSTD_CONFIG=./my-config.yml
To specify a configuration. By default the configuration path is set to./config.yml
, this allows changing the path if necessary.
Running
Once everything is configured appropriately. Starting it up is easy: node ./out/index.js
.
Sample screenshot of a 3 machine cluster, with the 3rd machine down and the 2nd machine is the master. Full debug is enabled and the activity is clearly visible.
Roadmap
Drivers are next for implementation in the next part. They will be what controls a system (i.e. starting and stopping a service) when a server becomes the master or secondary.
While the cluster itself is ready, it's not fully ready to be useful yet. This project is still in the alpha stages and ongoing improvements will be made as progress continues.
Checkout the project
Posted on Utopian.io - Rewarding Open Source Contributors
heh... upvoted for the URL of the 1st image ;)
way to go, sounds like an ambitious and interesting project!
Very cool thanks @samrg472
Awesome work Sam, another great contribution to the whole that I will be greatly appreciated by others. Amazing work Brother couldn't be more proud
Good work Sir!!!
You are amazing! Awesome job!
Thank you for the contribution. It has been approved.
You can contact us on Discord.
[utopian-moderator]
This is great for the community!
really very nice technology post ,thanks
Hey @samrg472 I am @utopian-io. I have just upvoted you!
Achievements
Community-Driven Witness!
I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!
Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x
You are the man!