Steem Blockchain Patch Issued
The Steem blockchain recently stalled at block 23847548 due to an invalid transaction that was allowed into a previous block. The Steemit development team quickly identified the cause and issued a patch, which was then deployed by a majority of the witnesses. At this time, the Steem blockchain has resumed normal operation. At no point during the event were user accounts or tokens at risk.
It is important to emphasize that what took place was actually the result of a protection mechanism built into the blockchain preventing the invalid transaction from doing any real harm. While it is unfortunate that operations were suspended during this time, it is these protection mechanisms that help ensure that accounts remain safe and secure even in the face of unforeseen events.
Cause
Seven days ago, an account (@nijeah) attempted to submit a transaction that would have resulted in a negative balance of STEEM being powered down from their account. The blockchain has safety rules that forbid such a transaction from occurring, but these rules did not forbid the invalid transaction from being submitted, even though its attempted execution--which would occur seven days later--would not be allowed to occur by the blockchain.
When the scheduled power down occurred, witness nodes were unable to process the transaction--and all subsequent transactions--due to the aforementioned rules. This is what we refer to as “halting” (as opposed to something like “forking”). A code change was needed in order to define how to properly handle this behavior.
Solution
The Steemit development team, along with the assistance of several of the witnesses, was able to quickly identify the root cause of the problem. As soon as the cause was identified, a patch was issued and the rollout of the patch was coordinated with the top witnesses.
Within only a few hours of the issue occurring, the patch was applied by a majority of the witnesses, and the Steem blockchain resumed normal operation.
Instructions for node operators
This section contains instructions for node operators who still need to apply the patch.
All nodes running 0.19.3
should update to release version 0.19.5
to start receiving blocks again. The patch will not require a replay.
If you were running the AppBase release candidate (0.19.4
), a new release candidate (0.19.10
) will be made shortly. Alternatively, you can run the branch 20180702-fix-vesting-withdrawals-steemd
to get the patch now.
Impact
Any transactions that were submitted during the time the blockchain was halted would have resulted in an error. Some pending transactions that were submitted just prior to the halt may not have been included in blocks, and would have expired. Affected transactions would need to be resubmitted, as they would not have been included in a block and are no longer valid.
Other than the period of time where no new transactions were allowed, there was no additional impact from the event. Everybody’s tokens remained safe, and accounts were not at risk of being hacked.
Conclusion
We want to thank everybody involved for their responsiveness during the event. It is a great testament to our amazing blockchain team and Steem witnesses that we were able get the blockchain back to operational status in such a short period of time.
Great job to everyone involved! Steem on!
Team Steemit
A wild night, excellent teamwork, and a quick summary and explanation. While halting can be scary, it's a clear and effective way to prevent transactions that could have a huge impact on funds and security.
I'd like to extend a huge, huge thank you to everyone involved in both helping users understand to hold tight and that the chain remained uncompromised while working to have nodes ready to resume, but even more so...
hefty appreciation to those up all night who may not look for or be individually rewarded with personal recognition for the hours of intense coordination and professionalism required to go from full stop to back on track in so little time. Thanks, truly.
I didn't even knew about this one, the devs and the witnesses involved acted so fast to implement and run this patch, which is definitely amazing!
What makes me curious is the fact that nobody tried to power down more SP than they had, at least not by now. This is one of the reasons why Steem is still in beta and actually we are the beta testers.
So, somehow, even though he has done a bad thing, I guess that we should congratulate @nijeah or who is behind that account for highlighting this vulnerability in the Steem code-base. It is definitely better now than later :D
Powering down more SP than you have was always checked and rejected immediately. In this case the missing check was for "negative power down" (which could also be described as attempting to use the power down command to power up). No one had been creative enough to try that yet!
Okay, I got it now, pretty intelligent, I must admit! So if I send to somebody -2 Steem, that person is actually sending me 2 Steem :))
That was a tricky one!
Damn... that was possible up until a few days ago?
Guess we have to thank @nijeah for "finding" this bug!!
Noow I get it :)
delete
After giving it a bit of thought, I would guess that @nijeah delegated his/her steem power to another account at the same time they powered down their Steem Power, done from two different browser tabs.
One witness could have processed the Steem Power Delegation, while the next block processed by a different witness handled the Power Down before the previous block was confirmed.
I'm even more confident now that the Steem network can handle any possible "monkey wrench" that may be thrown into the mix. Great teamwork !!!
This is precisely why Proof-of-(mis)Stake is flawed.
Can you imagine any other system going down that processes monetary value for a few hours retaining its userbase?
Sure, they fixed it -- but it took a lot of manual intervention. Doesn't inspire confidence in sanity checks and consensus mechanisms.
I don’t know if it is accurate to say that this was an issue related to the DPOS algorithm. Also, the practice of stopping operations if an unexpected scenario is triggered is pretty standard - afaik all of the major crypto currency exchanges have similar mechanisms in place z
When the ledger and associated funds could be compromised by potentials like double spending or printing out of thin air, I do think one of the best and most reasonable responses is a temporary network stoppage that does not require the complex ethical consideration that undoing, forking out, or changing transactions would require on top of important code/patching work.
Its called lunch hour at my bank...
heaven help me, I chortled.
just a regular guy ..having a regular pizza ..in a regular pizza pouch
#DontJudgeme
This is now associated in my mind with bankers - thank you.
omg, i need this in my life
Can we resteem a comment? My 100% vote isnt enough to convey the lols you gave me.
Appreciate that...you can resteem any of my latest posts
:)
Actually yes, this happened twice this year at my bank, operation were halted for 48 hours the first time and 6 hours the second time. The whole bloody bank stopped working for 2 days while IT people were scrambling to find and correct the problem. And still the bank retained the userbase bevause peopel are lazy and the bank compensates people who can substantiate claims that they had losses (because they couldn't buy or sell a financial instrument or repay a debt that was due, etc.)
delete
Really? Banks and credit card companies experience security vulnerabilities frequently. The difference is if you have your keys here then just relax and steem on.
u did good crimmy. thanks for keeping all of us in palnet up to date with what was happening.
@crimsonclad van this happen again?
It cannot. The patch has ended this exploit, and put a check in place to reject the transaction instead of freeze the chain! The fact that a patch was developed, tested, applied, the chain restarted, and a rolling upgrade across the network begun all in less than twelve hours is pretty amazing. A lot of great people stayed up all night and worked hard behind the scenes to make sure this loophole was closed before it could harm anyone or chain function again.
Thanks for your answer!
Thank you for this update, and well done to everyone who were participant to getting the blockchain back to normal operations.
With that said, I would like to submit for your consideration — because "stoppages" of one kind or another make people nervous — that you implement some kind of communication method for the general user base when things are out of sorts.
Most major multi-user sites have cloud-hosted "fallback sites" that operate completely independently of the main venue. This could be something like a separate "steemstatus.org" domain fully disconnected from Steemit and the blockchain. If something "goes awry," every request instead forwards to the contingency site (can be triggered automatically, or manually, depending on situation) where a live feed (blog style, message board style) provides anyone trying to access the main venue with a live news feed, or at least an "outage message."
eBay, for example, is really good about that. In "our" industry, Coinbase has it. It builds confidence in a system if users — rather than just finding darkpages and error messages — land on a page that simply has a message "A faulty transaction has caused a temporary stoppage of the blockchain. Our technicians are aware of the problem and are currently working on implementing a patch to address the problem. Check this site for updates."
It's a relatively minor thing; could even be run from a simple WordPress blog... but the communication would build a lot of confidence in the community that "someone's working on it."
Just a suggestion from a relatively small newbie (albeit with 40 years in the IT field); hope you'll consider it.
=^..^=
Great suggestion.
and this is why ure my fave cat on steemit... well, other than my kitty :D
Thanks for that idea!
Indeed, a really good idea.
Particularly important as the new account creation features under HF.20 might increase the growth of memberships.
Yes, this type of communication is needed. Otherwise rumors run wild, and that's never a good thing.
Funds are safu.
Thanks for the detailed information on this odd incident. I'm proud of and very grateful for the robust response of all the devs and witnesses to get this patched as soon as possible. I'm left with an even greater confidence in the integrity of the STEEM blockchain.
P.S. Off topic (and nitpicking), meet my good friend the em dash: (—) It looks more streamlined and professional than using two hyphens where you want long dashes.
A minor detail, maybe... but somebody had to be that guy ;-)
That's why I love Steemit. The Team is always ready for any problem. Keep it up Team!!
I was actually pretty impressed with how well most people handled it! Most were calm and waiting for more information.
Nice job by all of handling an issue and moving forward!
Agreed! I feel we're all growing together. We forget that 2 years ago this was all just a theoretical experiment spearheaded by a bunch of weirdos who thought we could reward social media and content in a totally new way. Who would have thought such a rag-tag group of contrarians could become such a unified and powerful community?!
Maybe they were unable to write about their concerns because the blockchain stalled.
There were some tweets from the @steemit account actually
Yes, but it's better to be some sort of communication presented on the main site. I know it's not a priority and Steemit is not eBay and not even Coinbase. It can be for now a page directing to the @steemit Twitter feed for realtime details.
In perspective, it would be better if it would be an interface that any Dapp can point to when there's an issue on the blockchain.
It is a good idea. Non-trivial to implement though, even though it sounds “easy”.
I know... That's why I suggested a temporary solution limited to Steemit and a link to their Twitter feed for updates.
I'm an engineer and I worked in aerospace industry. When you have a system that has a failure, there is always a "root cause and corrective action". I see that a root cause was identified as an unknown vulnerability that existed. The exploitation of that vulnerability didn't affect the accounts, but was effective at shutting down the blockchain.
The missing piece of the explanation is the "corrective action". I see that the fix was put into place, but that is not a corrective action. A corrective action would address why a vulnerability existed for so long and discovered and exploited by a copy and paste scammer. I know that code has bugs and can be difficult to discover every possible vulnerability, but take it from an aerospace engineer, you can go a long way error proofing software.
Besides the corrective action, is there a bounty for finding bugs?
Speaking of bounty. Binance has a bounty fund to pay for those who provide information which brings hackers to justice. Will Steemit Inc do something similar?
Thank you, you are truly amazing. My witness node is currently re-indexing.. but this takes hours. I notice the CPU utilization is not much (basically single core only). Can we parallelize the re-indexing process (at least partially) so it will speed up the process?
I believe there has been some work done on that with AppBase.
Merci pour les informations détaillées sur cet incident étrange. Je suis fier et très reconnaissant pour la réponse robuste de tous les développeurs et les témoins pour obtenir ce correctif dès que possible. Je suis encore plus confiant dans l'intégrité de la blockchain STEEM.
Will there be a time where transactions such as this one will be stopped in their tracks before this 7 day period goes by?
Also, did the user do that intentionally or was there something else involved?
The change that was applied will stop these now before they enter a block.
Ah, the sound of progress.