DIY Steemit Statistics with Python: Part 7 - Payouts

konstantint (48)in #python • 8 years ago

The last 6 charts in the @arcange report are dedicated to payouts.

Preparation

Before we start, we prepare the workspace as usual (see the previous posts in the series for additional context: 1, 2, 3, 4, 5, 6):

%matplotlib inline
import sqlalchemy as sa, pandas as pd, seaborn as sns, matplotlib.pyplot as plt

sns.set_style()
e = sa.create_engine('mssql+pymssql://steemit:[email protected]/DBSteem')

def sql(query, index_col=None):
    return pd.read_sql(query, e, index_col=index_col)

Payouts

Payouts for a post depend on how many "reward shares" it receives from the voters. Computing the actual payout from the raw voting activity recorded in the blockchain and correctly accounting for all details could be a bit tricky.

Luckily for us, SteemSQL does the hard accounting job for us and is helpfully keeping track of pending and previous payouts for every post and comment. This data is kept in the Comments table (not to be confused with TxComments):

sql("""select top 5 
            author, 
            permlink,
            pending_payout_value,
            total_pending_payout_value,
            total_payout_value,
            curator_payout_value
        from Comments""")

	author	permlink	total_payout_value	curator_payout_value
0	steemit	firstpost	0.94	0.76
1	admin	firstpost	0.00	0.00
2	proskynneo	steemit-firstpost-1	1.06	1.06
3	red	steemit-firstpost-2	0.10	0.10
4	red	red-dailydecrypt-1	1.02	1.02

As we see, the table, somewhat confusingly, keeps track of multiple "payout" categories. For the purposes of our charts we will simply sum all payouts and pending payouts together.

Highest Payout

Let us start by looking at the highest payout per day.

max_daily_payout = sql("""
select 
    cast(created as date) as Date,
    max(pending_payout_value 
        + total_pending_payout_value
        + total_payout_value
        + curator_payout_value) as Payout
from Comments
group by cast(created as date)
order by Date
""", "Date")

Plotting is business as usual:

max_daily_payout.plot(title="Highest daily payout (SBD)")
max_daily_payout[-30:].plot(title="Highest daily payout (last 30 days - SBD)", 
                            ylim=(0,1500));

Interestingly enough, there seems to exist a post with a payout of more than $45K. Let us find it:

superpost = sql("""
    select * from Comments 
    where pending_payout_value 
        + total_pending_payout_value
        + total_payout_value
        + curator_payout_value > 45000""")

superpost[['author', 'category', 'permlink']]

	author	category	permlink
0	xeroc	piston	piston-web-first-open-source-steem-gui---searc...

Here is this post.

Total and Average Daily Payouts

The query to compute the total or average payouts as well as the number of posts per day is analogous. In fact, we can compute all such statistics in one shot. Let us compute the average daily payout for posts and comments separately.

avg_payouts = sql("""
with TotalPayouts as (
    select 
        cast(created as date) as [Date],
        iif(parent_author = '', 1, 0) as IsPost,
        pending_payout_value 
            + total_pending_payout_value
            + total_payout_value
            + curator_payout_value as Payout
    from Comments
)
select
    Date,
    IsPost,
    avg(Payout) as Payout,
    count(*)    as Number
from TotalPayouts
group by Date, IsPost
order by Date, IsPost
""", "Date")

Observe how we can display plots which use two different Y-axes on the same chart:

posts = avg_payouts[avg_payouts.IsPost == 1][-30:]
comments = avg_payouts[avg_payouts.IsPost == 0][-30:]

fig, ax = plt.subplots()

# Plot payouts using left y-axis
posts.Payout.plot(ax=ax, c='r', label='Average payout (Posts)')
comments.Payout.plot(ax=ax, c='r', ls=':', label='Average payout (Comments)')
ax.set_ylabel('Payout')
ax.legend(loc='center left')

# Plot post counts using right y-axis
ax2 = ax.twinx()
posts.Number.plot(ax=ax2, c='b', label='Count (Posts)')
comments.Number.plot(ax=ax2, c='b', ls=':', label='Count (Comments)', 
                     ylim=(0,90000))
ax2.set_ylabel('Count')
ax2.legend(loc='center right')
ax2.grid(ls='--', c='#9999bb')

Median Payout

SQL Server does not have a median aggregation function, which makes the query for computing the daily median post payout somewhat different:

median_payout = sql("""
with TotalPayouts as (
    select 
        cast(created as date) as [Date],
        pending_payout_value 
            + total_pending_payout_value
            + total_payout_value
            + curator_payout_value as Payout
    from Comments
    where parent_author = ''
)
select distinct
    Date,
    percentile_cont(0.5) 
        within group(order by Payout) 
        over(partition by Date) as [Median Payout]
from TotalPayouts
order by Date
""", "Date")

Just like @arcange, let us plot the median payout along with a 7-day rolling average.

df = median_payout[-30:]
df.plot(ylim=(0,0.1))
df['Median Payout'].rolling(7).mean().plot(
                   label='Median Payout (7-day avg)', ls=':', c='b')
plt.legend();

This completes the reproduction of the charts in the @arcange's "Steemit Statistics" posts.
In the next episode we will be reproducing @arcange's Daily Hit Parades.

The source code of this post is also available as a Jupyter Notebook.

#howto #steemit #bisteemit #payouts

8 years ago in #python by konstantint (48)

Sort:

neowne (48) 8 years ago

Very great and detailed reproduction!

$0.06

1 vote

[-]

paulag (73) 8 years ago

hi @konstantint, you post is awesome and under valued. Keep up the good work

$0.05

1 vote

[-]

konstantint (48) 8 years ago

Thanks for the positive feedback! Feel free to resteem if you believe the post could benefit from additional exposure :)

$0.00

[-]

paulag (73) 8 years ago

I though I did resteem, sorry Have done now :-)

$0.05

1 vote

[-]

stickchumpion (63) 8 years ago

Excuse ignorance but is this how you code Python , I was intrested in R myself but also heard alot about Python

$0.00

[-]

konstantint (48) 8 years ago

Yes, this is Python (take note of the main category of this post ;)

You may do similar kinds of things in R as well, however if you are just starting to get acquainted with the field and have the luxury of choosing your first language arbitrarily, I would recommend considering Python.

The Python community is larger and many of the state of the art machine learning methods (which I do plan to eventually reach with this series of posts, in particular) are somewhat easier to work with in Python, in my opinion.

$0.00

[-]

candelario (36) 7 years ago

Wow, great job. This is some high quality, well presented, and sadly undervalued, analysis. I guess there aren't a lot of data scientists on Steemit who appreciate this yet?

$0.00

STEEM 0.14

TRX 0.24

JST 0.033

BTC 87021.22

ETH 2136.52

USDT 1.00

SBD 0.64

DIY Steemit Statistics with Python: Part 7 - PayoutssteemCreated with Sketch.

Preparation

Payouts

Highest Payout

Total and Average Daily Payouts

Median Payout

Coin Marketplace

DIY Steemit Statistics with Python: Part 7 - Payouts