Steem Status: 100% Operational
Hello Steemians, we apologize for the interruptions in service you experienced yesterday which are now resolved. Thanks to the blockchain’s built-in safety measures, at no time were funds at risk. The interruptions were due to an issue that arose out of the process of transitioning to Hardfork 20. In our last post we announced that we published our Hardfork 20 release and that witnesses and exchanges should begin running version 0.20.0 of steemd at their earliest convenience. When a witness runs the new code, it signals that they are voting in favor of those changes. If a super-majority of witnesses are running Hardfork 20 on September 25th, the blockchain will fork onto the new chain.
Blockchain Halting
Until the date of the Hardfork, both versions produce blocks on the same chain. This gives the witnesses time to test the code as well as cast their vote for the new changes by running the new version. Yesterday, that chain unexpectedly forked when the 0.20.0 nodes produced blocks that did not match the consensus of nodes running version 19 of steemd. This activated one of the many safeguards built into the blockchain which pause block production due to the presence of some unforeseen error. These safeguards guaranteed that no important information, including token balances, was at risk.
The Problem
The specific cause of the problem was a bug in the HF20 code. A constant was changed that should not have been, and this caused the initial fork. Your witnesses did a fantastic job of quickly spotting the suspicious behavior and began reverting to a stable build of the Steem blockchain. This exposed another bug that resided in our fork database logic that was present due to the infrequency that this logic is activated, and which caused the minority fork to halt as well.
Our blockchain developers worked extremely hard to identify the problems quickly and issue a patch that would enable the witnesses to resume block production. They assessed that the 19 fork (version 0.19.X of steemd) was not the problem and advised the witnesses to revert to that version and restart their nodes. 19 of the witnesses responded by restarting their nodes, returning stability to the network. Steemit Inc.’s production nodes are now fully functioning as is steemit.com.
Next Steps
We have already performed a post-mortem on the event and have drafted an attack plan for developing the tools and resources which will ensure that events like this will not happen again. But before we can get to work on those, we’ll be spending the rest of today investigating the fork database code in order to ensure that it’s prepared for the hardfork date of September 25th.
The Cost of Innovation
When we launched the Steem blockchain it was with the goal of creating a protocol that could power real applications that real people use every day. That goal, along with the governance mechanism we chose, has enabled us‒as a community‒to understand what changes need to be made and act on that information by upgrading the software through hardforks. So far we have successfully hardforked 19 times in just 2 years.
This rapid rate of iteration is one of Steem’s uniquely valuable attributes and is virtually unheard of in the blockchain space, which itself is still in a highly experimental phase of development. The cost of this rapid innovation is incidents like the one that occurred yesterday, and the fact that so many people were unable to use the apps they love, is a testament to how successful Steem has been at fostering the development of functional applications that real people use every single day. Yesterday also demonstrated, once again, that our community has the ability to work together to handle these issues so that we can continue to push blockchain technology forward at a more rapid pace.
Thank You
We want to express our gratitude to Steem’s amazing community of witnesses, developers, and users. Blockchain-halting events are pretty much as bad as it gets for a blockchain, but everyone worked together to ensure service was returned as rapidly as possible. It’s a testament to the power of this community that such an event was handled so effectively. Many people outside of Steemit Inc.‒especially the witnesses‒shared vital information that enabled us to respond more rapidly, and return the platform to proper operation.
You don’t lose if you get knocked down; you lose if you stay down. - Muhammad Ali
Steemit Team
For details on all changes in HF20 :
https://steemit.com/steem/@steemitblog/steem-velocity-hardfork-hardfork-20
Not all is fixed
Far as I can tell, Nodes that exchanges use are down. This means no deposits and withdrawals. One can send STEEM to the exchange, but the exchanges won't know and user's STEEM won't show up on the exchange.
Here is a list of Broken Exchanges
What is Steemit Inc doing about this problem?
https://market.rudex.org/ works well, with 0% trading fee for STEEM/SBD btw
In a course I took about creating product loyalty, we were taught that having a problem and handling it well often creates more loyalty, than not having a problem at all.
Yesterday was a showcase in how many people are engaged and interested in the site, even those who voice frustration were concerned and interested in the outcome.
I know I could be over communicating this message, but I really do see improvement in the communication coming from SteemIt, Inc.
Thank you for the information.
As a point of improvement, I believe at least https://steemit.com/ temporarily should have been displaying some information during such an outage.
All the more reason why top 20's should be developers, or have someone on their team who is! Nice work everyone!
Is that the same as someone who deliberately creates a problem, and then handles it well?
(thinking cap on)... in history, who do we know who does that? ::chuckle::
Actually, when you are breaking new ground and developing new industries, mistakes happen. You deal with them, learn, and move on.
That is not the same thing at all as purposely creating/pointing out problems and pretending to fix them. Which is also a strategy used by many. Especially politicians and companies that profit from fixing things. :) The impact is close though.
Politicians. Hmm... I was waiting for that to appear. Yes. I agree with you.
Thank you Steemit, for making my point with every blockchain seizure.
I can't wait for the other "edge cases" to manifest themselves as more "functionality" is bolted on to the shuddering PoS chain.
I know, I know - everyone loves Steemit and anything to the contrary is blasphemy.
I'll see myself out.
It isn't blasphemy at all. It is just really weird to hang aroundthe place if you don't like it and don't trust it and aren't willing to assume the risks.
I am often confused. I don't like myspace, but I never, ever, log in to tell them what I don't like about it.
Seriously? I hang out at myspace every day and tell them what they are doing wrong. Hopefully, one day they will get the message!
I see someone is emotionally invested.
If you can't handle some criticism of Steemit, then what are you going to do when it seizes up like a rusty lawnmower next time?
I'm sure it will be instructive.
lol
haha
Why not enjoy the decline? Being proven right is its own reward, after all.
Also, why would you care? Isn't Steemit glorious and amazing - so much so that you don't have time to reply to me? What? It isn't? Well, whaddaya know...
I reply to most of my comments, not all. check it.
While we'd rather the incident hadn't happened, such events do show off the strength of the ecosystem. I think people use these incidents to gauge the degree of antifragility within a system. They are demonstrations of its ability to grow stronger from volatility. Good communications can highlight this aspect of the system.
The engagement and interest you mention is, IMO, the secret sauce of any such system. So many projects out there think that the solution to all of their problems are algorithmic-first, and if they can just design the right algorithms, that will solve all of their problems when in reality networks of people solve problems. The most elegantly designed "on-chain governance" will crumble if no one uses the software and likely will collapse when they do use it because then it will be forced to interact with unpredictable and irrational (not in a bad way) human actors. The only way to develop the right algorithms, build the right software, and establish functional governance mechanisms, is to build a network of engaged and interested people who are constantly kicking the tires, providing feedback, and influencing change in a positive direction. We don't just have that, the Steem community is years ahead of the competition. People often make the mistake of thinking that Steem should be defined as what it has been in the past, when the real power of Steem is what we will work together to make it in the future.
Bitcoin has also had one major outage which needed manual intervention from the miners - https://github.com/bitcoin/bips/blob/master/bip-0050.mediawiki
Bitcoin used to have a centralized alert system, where urgent messages could be broadcast over the Bitcoin network by anyone having the private keys for the alert system. This alert system has later been removed from the Bitcoin Core client. I think maybe it would make sense to set up some alert system where any of the top 20 active witnesses could send an alert. I don't believe false alerts would be a big problem, since such an alert message (hopefully) would cause the witness to lose votes.
Thanks for the feedback. I don't think witness responsiveness has ever been, or was in this case, an issue but it certainly might be worth considering. It's important to remember that any time that is spent developing one solution is time not spent developing another. As Steve Jobs famously said, "People think focus means saying yes to the thing you've got to focus on. But that's not what it means at all. It means saying no to the hundred other good ideas that there are."
I wish the success of the platform! Hoping you release more youtube videos Andrew; you're a great spokesman.
Thanks! I'll work on it!
True. The alert-system is probably not worth pursuing - but I do believe it's rather important that nodes stays up and serves read-only-requests even if the block production is halted.
I agree!!
It was nice having the Steemit Twitter page to stay up to date on the status!
Thanks for mentioning the twitter page, I was unaware
Posted using Partiko iOS
I was happy that there the problem did arose. We are the only blockchain that encounters problem because a) there are so many users using the blockchain,
b) The inevitability of stumbling bugs at rate of upgrades and forks we are expereincing.
It's something that we in fact can be proud. As you said, it's better than not facing any problem at all.
Well said boss
Posted using Partiko Android
It seems like all the nodes went offline and all interfaces to Steem (like steemit.com) got broken. Wouldn't it be a lot better if the nodes could continue serving content up until the "uncle block", and that the interfaces could issue a status message, something like "blockchain temporary frozen due to technical problems - posting and voting won't work"?
imho, it maybe a lot more complicated than that, but yeah sure there could be more sophistication but that requires prioritization and surely more available quality resources and that will all happen in due time
It's not the first time the network freezes, and I believe it's not the last time either. As I see it, this is a major trust issue - all of a sudden, both busy, steemit, steempeak, eSteem, etc stops working - and people are even concerned that their steem funds may be lost.
I haven't studied the software, but it should not be a great problem to change the node behaviour from "shutdown" to "stop producing blocks and ignore all newly produced blocks".
Showing the end-users nice error messages explaining that the network has gone down into a "fail-safe read-only-mode while some technical problem is investigated" is of lesser importance, but that's a job done by busy.org, steemit.com, steempeak.com, eSteem, etc.
i'm with you, thanks, but your middle paragraph is imho not that simple or even sensible, because imho if a catastrophic error occurs (which did) NO activity should be allowed, ALL user activity should cease hence shutdown
In my opinion, no matter how catastrophic the error is, anyone ought to be allowed to inspect the historical blockchain, that includes browsing old articles and replies on sites like steemit.com, as well as checking the wallet. This is important for user confidence, and also for search engines, etc.
As no new blocks are produced, it will of course be impossible with any actions that are to be recorded in the blockchain, that includes placing orders on the internal market, voting, writing replies or posts, etc. Ideally some alert should be shown even before the reader considers voting on anything, but a generic "an error has occurred" when trying to vote on something should eventually suffice. I believe the latter is roughly how steemit.com works today if an action cannot be recorded in the blockchain for whatever reason; one gets some error message in red text.
agreed, it will possibly all happen when STEEMIT is declared out of beta and big investors allow for / deliver the needed quality resources - hopefully sooner than later!
I actually think this was a very good thing to have happen.
We have a lot of smart people around here and to read how all came together to offset a potentially devastating event provides a great deal of confidence.
Bugs are a matter of course in the programming world. Having them is not the problem, what is done once they are found is. It was reassuring to see the team get the chain going in a rather quick period of time. Down for a number of hours seems like a lot but think about how long Microsoft leaves bugs operating.
The fact that many, including myself, were lost without STEEM shows how committed we are as a community. This is something that might be unmatched in this arena at this point.
Until reading this post, and the comments, I didn't realize what a good thing it was for the site to crash. Now I know that crashes like that are healthy, and we should probably have them at least once a week.
Well observed, @vantocan! Probably you are a European. We analyze every word along its full meaning and also read the white space in between the lines of sentences. Not a long time ago, the months of delay of the SMT has also been celebrated as a big victory from the heroes of the Steemit Inc., NY. That's fault management as usual in PR, unfortunately not on real intellectual eye level for Europeans. Every PR–scentence from NY sounds like parents are talking to their children. It's a matter of culture.
You don't have to wonder much about, that your intelligence feels a little bit insulted. It is nothing serious at all, just Public Relation!
AKA
I am so happy to be a part of this wonderful community! :)
I was waiting for some report to find out about the events of yesterday, excellent work.
It's good you got it fixed. So the hardfork date was only a target? You were basically trying to hardfork early?
No, the hardfork will be activated on September 25 when a majority of the witness are running the new version of the software, but there was a mistake in the part that was supposed to be compatible with the current version.
Installing the new software at the latest moment and activating it immediately would create more chaos.