Steem Python - Speed Test

in #steemdev7 years ago

Steem-Python Streaming Blocks

In this post, I will compare the speed when streaming data from the Steem Blockchain using the steem-python library.

The idea for this post came after a discussion with @pibara after he had presented his asyncsteem (GitHub Link) code. He is currently building a new library utilizing asynchronous processing in an attempt to speed up interactions with the Blockchain.

These tests include three different functions available in the steem-python library. As well, the same tests are executed against different RPC nodes to compare the performance of the various nodes.

In upcoming tests with asyncsteem I want to use this as a baseline to be able to compare execution speed.

Steem-Python img-txt

Different options in steem-python:

  • stream_comments() - Steemd
    • Wrapper for stream_from()
  • stream_from() - Blockchain
    • API call - get_ops_in_block
  • get_blocks_range() - Steemd
    • API call - get_blocks()

stream_comments()

Many people starting out with the steem-python library might try to use this when they want to stream posts from the Blockchain. I’ve seen it in several guides, and it is extremely simple to use as it returns a Post instance for every post. But it’s drawback is the extra overhead. Most people are not interested in each and every post, so performing an extra API call is a big waste of time and resources. If the RPC node is not fast enough this can also easily lead to the script not being able to keep up with all blocks.

stream_from()

Stream_comments() described above is a wrapper for this function. So without considering the difference in functionality, stream_comments() will always be slower.

This function executes the API call get_ops_in_block, which returns all operations part of a specific block. So comparing with stream_comments(), this can be used for any kind of operation, not only for posts/comments. But the returned data is in an unprocessed state, so if you want a Post instance your code needs to make that call. But doing this on demand, only when needed, will for sure speed up your code.

get_blocks_range()

While the two functions above end up using the same API call, this one is different. The get_block API call is used instead. Another difference is the execution of the RPC call. The two functions above will use the call() function, as defined in http_client.py, while here, the call_multi_with_futures() function is used, which is a wrapper for call() that allows for a threaded execution with 10 workers by default.

Results

Below is a summary of execution times against different RPC nodes. api.steemit.com executes get_blocks_range() much faster compared to the other two functions, while for the other two RPC nodes, the execution time is evener. This is true even for very old blocks, so I don't think the Jussi implementation on api.steemit.com plays a role. It seems like the threaded calls when using get_blocks_range() is handled differently by api.steemit.com, which would explain the performance difference.

get_blocks_range()

rot@tor:~$ python3 testblocks_range.py
* 100 blocks processed in 2.206186532974243 seconds api.steemit.com
* 100 blocks processed in 11.113192319869995 seconds rpc.buildteam.io
* 100 blocks processed in 5.037718057632446 seconds rpc.steemviz.com
rot@tor:~$ python3 testblocks_range.py
* 100 blocks processed in 2.258392810821533 seconds api.steemit.com
* 100 blocks processed in 10.185550689697266 seconds rpc.buildteam.io
* 100 blocks processed in 5.035927772521973 seconds rpc.steemviz.com

Executed multiple times against api.steemit.com

Multiple executions of the same query, the first iteration is always slower. This shows the positive effect of Jussi on api.steemit.com. This was not visible for the other RPC nodes in the test.

* 100 blocks processed in 2.6615469455718994 seconds api.steemit.com
* 100 blocks processed in 1.9518346786499023 seconds api.steemit.com
* 100 blocks processed in 1.8236401081085205 seconds api.steemit.com

stream_from()

rot@tor:~$ python3 teststream_from.py
* 100 blocks processed in 15.777471780776978 seconds api.steemit.com
* 100 blocks processed in 12.378310680389404 seconds rpc.buildteam.io
* 100 blocks processed in 5.519677400588989 seconds rpc.steemviz.com
rot@tor:~$ python3 teststream_from.py
* 100 blocks processed in 15.657495975494385 seconds api.steemit.com
* 100 blocks processed in 15.46108865737915 seconds rpc.buildteam.io
* 100 blocks processed in 5.46795654296875 seconds rpc.steemviz.com

Executed multiple times against api.steemit.com

Interesting to see that there is no gain executing the same query multiple times. This improvement was only seen with the get_blocks_range function.

* 100 blocks processed in 17.176149129867554 seconds api.steemit.com
* 100 blocks processed in 17.165062189102173 seconds api.steemit.com
* 100 blocks processed in 16.03311800956726 seconds api.steemit.com

stream_comments()

Please note that this script checks for 100 posts and not specifically 100 blocks. In the range I tested, there were less than 100 blocks processed to get to 100 posts. As this is anyway just a wrapper for stream_from() I didn't feel the need to try to perfectly align this one.

rot@tor:~$ python3 teststream_comment.py
* 100 posts processed in 17.911587715148926 seconds api.steemit.com
* 100 posts processed in 22.999014854431152 seconds rpc.buildteam.io
* 100 posts processed in 5.980460166931152 seconds rpc.steemviz.com
rot@tor:~$ python3 teststream_comment.py
* 100 posts processed in 17.890295267105103 seconds api.steemit.com
* 100 posts processed in 13.94810175895691 seconds rpc.buildteam.io
* 100 posts processed in 6.749229669570923 seconds rpc.steemviz.com

Example Code

These sample scripts are modified from @pibara's benchmark script part of asyncsteem (GitHub Link).

get_blocks_range()

rot@tor:~$ more testblocks_range.py
#!/usr/bin/python3
import steem
import time
nodes = nodes=["https://api.steemit.com/",
              "https://rpc.buildteam.io/",
              "https://rpc.steemviz.com/"]
steemd = steem.steemd.Steemd(nodes)
for node in nodes:
   last_block = 19399400
   current_block = steemd.last_irreversible_block_num
   ltime = time.time()
   blocks = steemd.get_blocks_range(last_block,last_block+100)
   for entry in blocks:
       block_no = entry["block_num"]
       if block_no % 100 == 0:
           now = time.time()
           duration = now - ltime
           ltime = now
           print("* 100 blocks processed in",duration,"seconds", steemd.hostname)
   steemd.next_node()

stream_from()

rot@tor:~$ more teststream_from.py
#!/usr/bin/python3
import steem
import time
nodes = nodes=["https://api.steemit.com/",
              "https://rpc.buildteam.io/",
              "https://rpc.steemviz.com/"]
steemd = steem.steemd.Steemd(nodes)
blockchain = steem.blockchain.Blockchain(steemd)
for node in nodes:
   last_block = 19399400
   ltime = time.time()
   for entry in blockchain.stream_from(last_block):
       block_no = entry["block"]
       if block_no != last_block:
           last_block = block_no
           if last_block % 100 == 0:
               now = time.time()
               duration = now - ltime
               ltime = now
               print("* 100 blocks processed in",duration,"seconds", steemd.hostname)
               break
   steemd.next_node()

stream_comment()

rot@tor:~$ more teststream_comment.py
#!/usr/bin/python3
import steem
import time
nodes = nodes=["https://api.steemit.com/",
              "https://rpc.buildteam.io/",
              "https://rpc.steemviz.com/"]
steemd = steem.steemd.Steemd(nodes)
for node in nodes:
   last_block = 19399400
   ltime = time.time()
   for index, entry in enumerate(steemd.stream_comments(last_block)):
       if index >= 99:
           now = time.time()
           duration = now - ltime
           ltime = now
           print("* 100 posts processed in",duration,"seconds", steemd.hostname)
           break
   steemd.next_node()

Thank you for your time!!

MoH
footer

Sort:  

wow, great work! Does this mean that blockchain.stream_from() using get_ops_in_block is slower than streaming the full blocks via steemd.get_blocks_range()? I would not have expected that...

Hmm, but the result from get_blocks_range() is not ordered, you may get last_block + 100 in the first iteration and your loop stops after actually having received only one block... But I may be wrong here?!
edit: I was wrong, the result is ordered and 100 blocks are received

At least with api.steemit.com it was much faster. The get_blocks_range will fetch all 100 blocks at once, so the loop will just iterate the result to confirm it got 100 blocks. For stream_from it will fetch a new block per iteration.
So depending on real life needs, fetching batches of 100 blocks might not be feasible.

talking about programming languages, I'm really confused. I do not understand at all.

No problem. :)
That's the moment when you take everything and just pipe it to /dev/null ;)

Oh, I understand about my incomprehension. steemdev is a very complicated thing for me to understand. even I do not know what it is.
lol

This is SteemDev in a nutshell :)

if Post is Good
    Do Upvote
else if Post is not so Good
    Do nothing
else
    Do flag

that will exhaust your VP to the ground due to "not good" posts. :D

that 'else if' should probably be 'elif'
just teasing ;-)

Just when you think you know everything...then you see this!
Thanks for the insight

You got a 11.46% upvote from @steemyoda courtesy of @steemyoda!

Thank you mysterious Yoda! 😀

Thanks for the information

Indeed you are one of a kind @danielsaori
you have done a great job here.

You got a 3.55% upvote from @upmyvote courtesy of @themarkymark!
If you believe this post is spam or abuse, please report it to our Discord #abuse channel.

If you want to support our Curation Digest or our Spam & Abuse prevention efforts, please vote @themarkymark as witness.

You got a 4.43% upvote from @postpromoter courtesy of @danielsaori!

Want to promote your posts too? Check out the Steem Bot Tracker website for more info. If you would like to support the development of @postpromoter and the bot tracker please vote for @yabapmatt for witness!

This post has received a 9.69 % upvote from @boomerang thanks to: @danielsaori

Coin Marketplace

STEEM 0.16
TRX 0.15
JST 0.028
BTC 57958.96
ETH 2283.65
USDT 1.00
SBD 2.46