RE: [COMPLETE] Maintenance of chainBB - indexes missing
The user data, posts and replies are all stored in Steem, but I maintain the forum indexes in an external database for raw speed and scalability.
Steem also doesn't offer API's for retrieving content as the forums would need it either. The forums themselves pull based on category (steemd pulls on tags) and then sorts them by when the last response occurred. It would be possible to build a custom plugin for steemd that achieved this, but that's a lot harder than using more traditional web technologies.
There's also the small matter of a steemd full node requiring upwards of 40GB of RAM to run smoothly, where as this forum index I'm running requires around 2GB. Adding a custom plugin to steemd could reduce the 40GB, but probably not a lot, as most of the normal plugins would still be required to maintain the proper data.
It's just not possible to efficiently pull the data needed with steemd at this point, so the indexes are in a database :)
Hope that helps explain why I went the route I did!
I understand the problem and I think keeping the indexes in a database was a good choice. There are ways to implement it over blockchain with a nodeJS app as a middleware to provide a different API using steem-js but it also involve some hacks and caching which is not necessarily easier.
However, since you are looking for a better database solution I recommend you to take a look at Gundb as it could provide a decentralized solution with an API as good as firebase.
I'll have to check it out. Right now I think I'm at my capacity for "new technologies being learned" though.
I think what happened was the docker networking was configured to open the port to mongodb, and someone just connected and dropped the data. Like I said, I didn't work too hard on it's first beta deployment :)
I have a large EC2 instance running steemd + mongodb + python chewing through the blockchain to rebuild those indexes right now. It's taking a little longer than I expected, but instead of deleting the ec2 instance (like I did last time), I'm just going to deactivate it. That will prevent a 2-3 hour full blockchain sync and then 11 million blocks of data processing. Next time (if there is one) I can just start it up and sync where I left off.
If I can get this hosted in a real production environment (as opposed to the VPS its on now), I'll make sure there's redundancy and backups to prevent all of this anyways :)