DIY Steemit Statistics with Python: Part 8 - Daily Hit Parades
Today we will be reproducing @arcange's Daily Hit Parades. There won't be any visualization-related code, but we will learn about recursive common table expressions along the way.
Preparation
Before we start, we prepare the workspace as usual (see the previous posts in the series for additional context: 1, 2, 3, 4, 5, 6, 7):
%matplotlib inline
import sqlalchemy as sa, pandas as pd, seaborn as sns, matplotlib.pyplot as plt
sns.set_style()
e = sa.create_engine('mssql+pymssql://steemit:[email protected]/DBSteem')
def sql(query, index_col=None):
return pd.read_sql(query, e, index_col=index_col)
Top Posts by Upvotes
As we should remeber from the previous episode, SteemSQL helpfully tracks the current status of all posts and comments in the Comments
table. The number of upvotes is stored in the net_votes
field. Obtaining the list of top posts by upvotes at the current moment is therefore pretty straightforward:
sql("""
select top 5 author, permlink, net_votes
from Comments
where
cast(created as date) = '2017-09-01' -- Limit to the chosen day
and parent_author = '' -- Limit to posts only
order by net_votes desc
""")
author | permlink | net_votes | |
---|---|---|---|
0 | kingscrown | epic-litecoin-day-ath-and-predictions | 838 |
1 | roelandp | steemfest-updates-new-names-announced-bitfilm-... | 825 |
2 | kingscrown | btc-rallies-to-5000-usd-biggest-gainers-and-lo... | 728 |
3 | minnowbooster | minnowbooster-tutorial-4-leasing-delegated-ste... | 614 |
4 | jerrybanfield | 19-2-update-new-backup-server-steemj-progress-... | 611 |
To get a "beginner's hit-parade" we could limit the rating to users with a reputation below 50. Recall from part 5 of our series that we can get this value from the Accounts
table.
sql("""
select top 5 author, permlink, net_votes
from Comments c
join Accounts a on a.name = c.author
where
cast(c.created as date) = '2017-09-01' -- Limit to the chosen day
and c.parent_author = '' -- Limit to posts only
and log10(a.reputation)*9 - 56 < 50 -- Limit by reputation
order by net_votes desc
""")
author | permlink | net_votes | |
---|---|---|---|
0 | jessymg | in-may-romantic-ronda | 372 |
1 | logan6230 | oracion-del-viernes-01-de-septiembre-de-2017 | 367 |
2 | holadeivi | noche-de-fotografia-increible-retratos-a-una-c... | 362 |
3 | pedrolopez | with-god-everything-is-possible-dios-tiene-un-... | 360 |
4 | daneftaly | lakajat-tomar-llevar-recoger | 358 |
Top Posts by Pending Payouts
The situation is the same with pending payouts. This time we will simply sum the pending_payout_value
and total_pending_payout
value in the query. To add a bit more to the query, let us also construct the correct link to the post and extract its title:
sql("""
select top 5
concat('@', author) as Author,
concat('http://steemit.com/',
category, '/@', author,
'/', permlink) as Link,
title as Title,
pending_payout_value + total_pending_payout_value as Payout
from Comments c
where
cast(c.created as date) = '2017-09-01'
and parent_author = '' -- Limit to posts only
order by Payout desc
""")
Author | Link | Title | Payout | |
---|---|---|---|---|
0 | @roelandp | http://steemit.com /steemfest/@roelandp /steemfe... | SteemFest² Updates - New Names Announced - Bit... | 707.72 |
1 | @jeffberwick | http://steemit.com /conspiracy/@jeffberwick /lig... | Light on Conspiracies - Don't Fall For The Hoa... | 471.22 |
2 | @timsaid | http://steemit.com /photography/@timsaid /timtra... | TimTravels - Preveli and Kournas Lake - Day 3 | 387.10 |
3 | @kingscrown | http://steemit.com /litecoin/@kingscrown /epic-l... | Epic Litecoin Day - ATH and Predictions | 377.23 |
4 | @gavvet | http://steemit.com /war/@gavvet /from-the-stone-... | From the stone age to modern warfare in a coup... | 371.35 |
Getting the "beginner's" version is as simple as before, so this part is left as an exercise to the reader.
Top Posts by Comments
Counting total comments for the post is a bit more interesting. The Comments
table (as well as TxComments
) does not immediately tell us the total number of comments. Instead, it provides the fields parent_author
and parent_permlink
, which point to the parent of every comment.
In particular, as you must have noted above, we always use the condition parent_author = ''
to limit queries to posts only - those are essentially "comments without parents".
Here is, for example, the record corresponding to this post, note the empty parent_author
:
sql("""
select author, permlink,
parent_author, parent_permlink
from Comments
where id = 11243470
""")
author | permlink | parent_author | parent_permlink | |
---|---|---|---|---|
0 | konstantint | diy-steemit-statistics-with-python-part-7-payouts | python |
The first-level children of a post have their parent_author
and parent_permlink
set to the author
and permlink
of the corresponding record. For the example post above we could extract them as follows:
sql("""
select c1.author, c1.permlink,
c1.parent_author, c1.parent_permlink
from Comments c0
join Comments c1 on (c1.parent_author = c0.author and
c1.parent_permlink = c0.permlink)
where c0.id = 11243470
""")
author | permlink | parent_author | parent_permlink | |
---|---|---|---|---|
0 | neowne | re-konstantint-diy-steemit-statistics-with-pyt... | konstantint | diy-steemit-statistics-with-python-part-7-payouts |
1 | stickchumpion | re-konstantint-diy-steemit-statistics-with-pyt... | konstantint | diy-steemit-statistics-with-python-part-7-payouts |
2 | paulag | re-konstantint-diy-steemit-statistics-with-pyt... | konstantint | diy-steemit-statistics-with-python-part-7-payouts |
We could proceed in the same way to obtain second-level comments for the post:
sql("""
select c2.author, c2.permlink,
c2.parent_author, c2.parent_permlink
from Comments c0
join Comments c1 on (c1.parent_author = c0.author and
c1.parent_permlink = c0.permlink)
join Comments c2 on (c2.parent_author = c1.author and
c2.parent_permlink = c1.permlink)
where c0.id = 11243470
""")
author | permlink | parent_author | parent_permlink | |
---|---|---|---|---|
0 | konstantint | re-stickchumpion-re-konstantint-diy-steemit-st... | stickchumpion | re-konstantint-diy-steemit-statistics-with-pyt... |
1 | konstantint | re-paulag-re-konstantint-diy-steemit-statistic... | paulag | re-konstantint-diy-steemit-statistics-with-pyt... |
Now, if we need to count all comments for the post, we must somehow traverse all the levels of the hierarchy, collecting them together.
Recursive Common Table Expressions to the rescue! Read the query below carefully:
sql("""
with AllComments as (
-- Initialize "AllComments" with
-- a single post we want to start with:
select 0 as level, author, permlink
from Comments
where id = 11243470
-- Append children of whatever
-- we collected so far into the
-- "AllComments" table:
union all
select g.level+1 as level, c.author, c.permlink
from AllComments g
join Comments c on (g.author = c.parent_author
and g.permlink = c.parent_permlink)
)
select * from AllComments
""")
level | author | permlink | |
---|---|---|---|
0 | 0 | konstantint | diy-steemit-statistics-with-python-part-7-payouts |
1 | 1 | neowne | re-konstantint-diy-steemit-statistics-with-pyt... |
2 | 1 | paulag | re-konstantint-diy-steemit-statistics-with-pyt... |
3 | 1 | stickchumpion | re-konstantint-diy-steemit-statistics-with-pyt... |
4 | 2 | konstantint | re-stickchumpion-re-konstantint-diy-steemit-st... |
5 | 2 | konstantint | re-paulag-re-konstantint-diy-steemit-statistic... |
6 | 3 | paulag | re-konstantint-re-paulag-re-konstantint-diy-st... |
With this trick in our hat, counting all comments for each post on a given day and selecting the top-N is a matter of some additional aggregation:
sql("""
with AllComments as (
select id, author, permlink
from Comments
where
parent_author = ''
and cast(created as date) = '2017-09-01'
union all
select g.id, c.author, c.permlink
from AllComments g
join Comments c on (g.author = c.parent_author
and g.permlink = c.parent_permlink)
),
AllCommentCount as (
select id, count(*)-1 as NumComments
from AllComments
group by id
)
select top 5
c.author, c.permlink, NumComments
from AllCommentCount cc
join Comments c on cc.id = c.id
order by NumComments desc
""")
author | permlink | NumComments | |
---|---|---|---|
0 | rivalhw | 5jdadw-steemit | 217 |
1 | roelandp | steemfest-updates-new-names-announced-bitfilm-... | 183 |
2 | kaylinart | steemit-bought-me-a-car-holy-cow-thank-you-so-... | 180 |
3 | kus-knee | the-old-dog-asks-have-you-been-ripped-off-yet-... | 175 |
4 | timsaid | timtravels-preveli-and-kournas-lake-day-3 | 146 |
That's it for now with the hit-parades. Try practicing the queries to figure out what is the most commented or the most upvoted post of all time (we already know which is the post with the highest payout of all time from the previous episode of this series).
The source code of this post is also available as a Jupyter notebook.
@originalworks
@OriginalWorks Mention Bot activated by @paulag. The @OriginalWorks bot has determined this post by @konstantint to be original material and upvoted it!
To call @OriginalWorks, simply reply to any post with @originalworks or !originalworks in your message!
For more information, Click Here!
on steemit to the novices help? such as me?
I did not understand the question, but feel free to ask for specific advice here, for example.
I can see now that SteemSQL is going to be incredibly useful - amazing post, thank you! :)