Keeping Our Producing Nodes Safe

in #eos7 years ago (edited)

Screen Shot 2018-06-02 at 15.03.42.png

We are currently deep into simulating the rollout of the EOS mainnet, it marks a key point in the history of EOS but for those of us in the BP community, it also signifies the imminent transition from our test environments to the shiny production infrastructures.

Now that we have hit a milestone with the release of EOS v1.0.0, we have collated some of our learnings over the past few weeks regarding producer account creation and node management. It is imperative that node configuration doesn't compromise important account details whilst ensuring a solid failover and upgrade plan that incurs little to no downtime.

NOTE: Throughout this guide, it will use real representations of public/private keys. The reason we have done this is because it can be confusing when reading through some documentation to understand what to substitute into placeholders. Don't worry, we haven't compromised ourselves here, every key in this article has been modified.

Creating the Producer Account

@bensig has done an awesome job at summarising good practise with account creation and registration, a few additional points of consideration:

  1. A wallet does not, and should not, reside on your producing node(s).
  2. It's good practise to create multiple keys, 1 for owner, 1 for active. You should put your owner key in cold storage and your active key can be used for the day to day interaction with the account.
  3. Do not re-use the public key of the account used to create the producer account.
  4. If you have already created an account, you can update it. For example, if you wanted to update the active key for your blockmatrix9 producer account:
./cleos.sh set account permission blockmatrix9 active '{"threshold": 1, "keys": [{"key": "EOS7uxALstFENKDZW1pTdsc2rMoG1BUW3qxfcFMdRAjST4DyVexwu", "weight": 1}],"weight":1}]}' owner

You can now confirm that the owner and active keys differ by viewing the account:

./cleos.sh get account blockmatrix9
privileged: false
permissions:
     owner     1:    1 EOS84TXP4hkWeW7fLoqyYPe4sSTzwh12c1ELs1jhBmYX4iWDxBLX3
        active     1:    1 EOS7uxALstFENKDZW1pTdsc2rMoG1BUW3qxfcFMdRAjST4DyVexwu

Signing Keys

There has been some confusion whether the BP signing key-pair referenced in the config.ini configuration has to be the same as the active, or even worse, the owner key for the account.

This is not true, the signing key-pair is completely independent from the account key-pair. Some natural questions come from this:

  1. Well how does the chain understand who the producer is if the keys differ?
  2. How will the voting portals work?

These are very valid questions! Let's dive into them:

  1. This is managed through regproducer, more on this later!
  2. They work via your producer account name - account names are unique and cannot be duplicated.

Just to confirm, you may be wondering how to create a special signing key-pair. There is actually nothing different about it, you simply create them in the usual manner:

./cleos.sh create key

Failover and Maintenance

Ok, so we now have a shiny new producer account, with separate owner/active keys, and we have a separate key-pair specifically for signing. First thing to do, is to add this key-pair to the config.ini:

signature-provider = {{ public_key }}=KEY:{{ private_key }}

Substituting the ansible/jinja2 template for imaginary keys (notice how no quotes are used around the string):

signature-provider = EOS5c2MchDFM1bAnzW1q4BAn4DQnHUPGED76RvDZvQjUr2ybX75eN=KEY:5KahWm2PDGG3WTjv2QEo8bybDFpPkX8GSpPFq5Tb14feVk8Jfko

Failover Without Downtime

The system dictates that we can only have 1 active producing node running at once. This means that you can't have 2 producing nodes with the same producer name and signing key running in parallel. However, there is nothing wrong with having two producing nodes with the same producer-name but different signature-provider values.

You might be thinking, "what's the point in that, we have a producer API to pause/resume now?". Well it's a valid point, however there are some features not yet implemented with this API which results in it not being fit for the purpose of node failover.

So what happens when you are an active producer and you have 2 nodes running with different signing keys? Well the chain is clever enough to handle this situation.

Primary Node: This is the node with the signing key defined with the last regproducer call:

1278000ms thread-0   producer_plugin.cpp:1073      produce_block        ] Produced block 00016460e4e9240c... #91232 @ 2018-06-02T12:21:18.000 signed by blockmatrix9 [trxs: 2, lib: 90907, confirmed: 228]
1278500ms thread-0   producer_plugin.cpp:1073      produce_block        ] Produced block 000164618db16533... #91233 @ 2018-06-02T12:21:18.500 signed by blockmatrix9 [trxs: 0, lib: 90907, confirmed: 0]
1279001ms thread-0   producer_plugin.cpp:1073      produce_block        ] Produced block 000164627c7319d4... #91234 @ 2018-06-02T12:21:19.000 signed by blockmatrix9 [trxs: 1, lib: 90907, confirmed: 0]

Secondary Node: And simultaneously, on the node with the unregistered key:

1279509ms thread-0   producer_plugin.cpp:290       on_incoming_block    ] Received block 8c8c161497f00dc7... #91235 @ 2018-06-02T12:21:19.500 signed by blockmatrix9 [trxs: 1, lib: 90907, conf: 0, latency: 9 ms]
1279509ms thread-0   producer_plugin.cpp:751       start_block          ] Not producing block because I don't have the private key for EOS5c3MchDFM1bBnzW1h4BAn4CEnHUPGED76RvEZvPjUr4ybX75eN
1280000ms thread-0   producer_plugin.cpp:751       start_block          ] Not producing block because I don't have the private key for EOS5c3MchDFM1bBnzW1h4BAn4CEnHUPGED76RvEZvPjUr4ybX75eN

The chain handles this situation gracefully, there is no degradation in service and there are no penalties for doing so.

So Why Is This Useful?

Imagine a scenario where you need to take down your primary node. Whether it's for failover or maintenance, you want to be able to promote your secondary node to the primary and inform the chain of this change without risking collisions between the nodes and without risking downtime or missing your 6 seconds of activity in the currently active round.

Thanks to the signing keys not being tied to your producer account, you can do this without changing the config.ini and without restarting your node.

Simply action a regproducer for your producer account referencing the signing key contained in the signature-provider config of your secondary:

./cleos.sh system regproducer blockmatrix9 EOS84TXP4hkWoW7fLoqyYPe4sWTzwh12c1EKs1jhBmYX4iWDxBLX3 "https://blockmatrix.network" -p blockmatrix9

Now we can check the logs to see what's happened.

Primary Node: On the old primary node, the logs now show the private key rejection messages:

1079547ms thread-0   producer_plugin.cpp:290       on_incoming_block    ] Received block 95e63e835e1df45c... #97985 @ 2018-06-02T13:17:59.500 signed by blockmatrix9 [trxs: 4, lib: 97650, conf: 0, latency: 47 ms]
1079547ms thread-0   producer_plugin.cpp:751       start_block          ] Not producing block because I don't have the private key for EOS5c2LchDFM1bSnzW1h5BAn4CQnHUPHED76RvDZvPjUr4ybX75eN
1080000ms thread-0   producer_plugin.cpp:751       start_block          ] Not producing block because I don't have the private key for EOS5c2LchDFM1bSnzW1h5BAn4CQnHUPHED76RvDZvPjUr4ybX75eN

Secondary Node: The old secondary is now the new primary, and happily signs the blocks:

1080500ms thread-0   producer_plugin.cpp:1073      produce_block        ] Produced block 00017ec3d86eef5d... #97987 @ 2018-06-02T13:18:00.500 signed by blockmatrix9 [trxs: 3, lib: 97662, confirmed: 0]
1081001ms thread-0   producer_plugin.cpp:1073      produce_block        ] Produced block 00017ec42ec57e08... #97988 @ 2018-06-02T13:18:01.000 signed by blockmatrix9 [trxs: 3, lib: 97662, confirmed: 0]
1081501ms thread-0   producer_plugin.cpp:1073      produce_block        ] Produced block 00017ec5e8824bcf... #97989 @ 2018-06-02T13:18:01.500 signed by blockmatrix9 [trxs: 5, lib: 97662, confirmed: 0]

Testing

We have successfully tested this methodology across multiple versions of the EOS software including the recent v1.0.0. The testing has been conducted through private and public testnets without any issues.

What's Next?

There are various opportunities with this method. I would certainly recommend cycling the signing keys on a regular basis, this can be automated and requires no manual intervention.

There is a new experimental feature around Out-of-process Block Signing Private Keys which is a promising sign of things to come, it looks like we will eventually have support for Hardware Security Modules.

Thanks

We'd like to thank the BP community for all the collaboration, idea sharing and collective passion to keep pushing each other to raise our standards. It's an awesome project and it's amazing to be part of it!

Sort:  

Many thanks for sharing.
However, what about having a network topology that also help "Keeping Our Producing Nodes Safe"? For example, having the producing nodes not exposed. Actually, I wrote about network topology at https://eosio.stackexchange.com/questions/345/how-to-configure-block-producer-to-have-front-end-and-back-end-cluster-nodes. And I hope that you review it and participate in the discussion.

Oh absolutely, this would be a whole new article in itself! Your topology is quite close to what we have deployed, as you say, it's very important to control access to the producers. I will follow up in the near future.

Thanks for sharing, BlockMatrix!

Great article, thank you for putting it together!

Useful information! Thanks for sharing :)

Get a $9.60 Upvote and Your Post Resteemed to My 2 Accounts @a-0-0 & @a-a-a with 72,500+ Followers. Send 5 SBD with Your post URL in MEMO to @a-0-0

Coin Marketplace

STEEM 0.22
TRX 0.27
JST 0.041
BTC 104601.56
ETH 3878.85
SBD 3.32