Parsing the Steem Blockchain
Hello there!
They call me CryptoBills and this is my first post on SteemIt. I'm a software developer based in the bay area and have just recently gotten interested in the Blockchain and in building apps on it. That's when I came across Steem and saw that it was a perfect entry point into learning more about building crypto apps. Not only that, looks like there is an awesome community here and I am looking forward to being a part of it.
Before we get to deep
Its gonna get technical past this point, read on if you dare. Techs involved are Docker, Python, Ubuntu, and MongoDB.
What we are building
Part of what I am thinking of doing requires me to have ready access to the data in Steem for posts and comments, so I figured why not build a Steem block parser as an experiment and a first step into learning more about Steem development. I got a parser setup and parsing without too much hastle and now I'd like to share my finding with you guys.
Project goals
Basically we want to accomplish a few simple objectives
- Connect to the blockchain
- Grab the data from the chain
- Take said data and dump it into a database
I'll spare you the details but I went through a bunch of iterations before a found a system that I thought worked well enough.
I chose to use docker as a server for this experiment. This will allow us to have full control over our environment as I found that there are some specifics to working with the steem-python lib. Mainly that it has to be used with python 3.6. If you aren't familiar with docker, there is another post on SteemIt that explains it pretty well.
In a nutshell, you want to go to the docker site and download and install docker for your OS. Then you want to run the docker app, and then install kitematic.
Once you install docker, you want to get the latest ubuntu. The easiest way I found to do this is to run the following in the terminal
docker run -it ubuntu:rolling
This will init the docker instance and auto connect you to it.
Note Should you disconnect and need to re-connect, you can do it in Kitematic by finding your created docker instance in the list. Note the name as if you accidentally create another instance it will have a new name.
As you can see I have two instances, but the one I want is at the bottom.
Ok, now in your docker terminal (launch from Kitematic if you are not connected already), you will want to run the following commands.
apt-get update && apt-get -y upgrade
apt-get install -y python3 python3-pip build-essential libssl-dev libffi-dev python3-dev pandoc
apt-get install -y python3-venv
apt-get install -y tmux vim
cd root
pyvenv steem
source steem/bin/activate
pip install wheel
pip install pytest
pip install steem
This installs all the dependencies we need to get started. Note that vim is what I use to edit code within a linux environment, tmux is what I use to see multiple windows inside of the one terminal.
(Also note I am assuming you know how to use linux because your are still here :) )
So luckily, there are already libraries available that let us connect to the Steem chain without too much hassle. I chose to use Python and the steem-python. (Be sure to thumb through the docs as I won't go into too much detail on the code)
from steem import Steem
from steem.blockchain import Blockchain
import pprint
import pymongo
from pymongo import MongoClient
import math
# setup steem access
s = Steem()
# setup the db and collection access
client = MongoClient('docker.for.mac.localhost', 32779)
db = client['steem']
# used to add block to db
last = 0
percent_done = 0
start = 0
step = 100
total = s.last_irreversible_block_num - start
print('parsing %i blocks' % total)
for i in range(start, total, step):
blocks = s.get_blocks_range(i, i + step)
# update display
new_percent_done = math.floor(i / total * 100)
if new_percent_done != percent_done:
# keep track of completion rate
print('%i percent done, %i to go' % (new_percent_done, total - i))
percent_done = new_percent_done
# iterate over the blocks and add our id
for j in range(len(blocks)):
b = blocks[j]
if 'block_id' in b:
_id = b['block_id']
blocks[j]['_id'] = _id
try:
# if we fail, just skip
db.blocks.insert_many(blocks, bypass_document_validation=0, ordered=False)
except:
pass
last = i
And that's that! I was surprised at how few lines of code it took to get the whole thing working on the Python side. The python-steem lib is really useful for parsing the blockchain. Hopefully someone found this useful, as it took me some time to figure out the specifics. Especially the docker setup for Python. Looking forward to getting deep into some Steem development in the near future!
Totally accidentally posted this before it was done the first time, oops
Welcome
Thanks!
Welcome to steemit!
Thanks!
welcome to the #steemit community. i'm looking forward to following your journey through the blockchain hopin' you make some #crypto gain :).
Ha thanks! Glad to be here, looking forward to adding more content as I continue to learn