A quick analysis of Steemit tags data using Python

in #stats7 years ago (edited)

My first real post! You can learn more about me here

Whilst exploring steemit and learning more about how everything works, I tried looking at things the way I enjoy most, through data!

This a quick analysis I did of the tags summary data using Python (would love to learn if there are better ways of formatting code in posts!). I tried getting a feel for post, comments and payouts on steemit based the post tags.

I've left the code inline if you're interested!

Here is a link to the full Jupyter notebook as well (including the raw data) https://gist.github.com/michael-erasmus/4e7334537a445f3e5830cde9d7db644c

If you enjoyed this post and would like to see more stuff like this, let me know! (and upvote!)

%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import seaborn as sns

sns.set_style('whitegrid')

Read in data derived from trending topics list

This was collected on 2017-07-21 22:46:42 (UTC)

df = pd.read_csv('tags.csv')
df.head()
Tag Posts Comments Payouts
0 aceh 4399 805 55180.578
1 adventure 1474 494 147060.430
2 advice 554 210 21937.755
3 altcoin 627 105 19558.774
4 amazing 617 85 2204.555

A quick look at the summary stats

df.describe()
Posts Comments Payouts
count 249.000000 249.000000 2.490000e+02
mean 1931.751004 1866.923695 1.743020e+05
std 4419.145897 5418.708141 4.639500e+05
min 1.000000 32.000000 0.000000e+00
25% 422.000000 206.000000 1.679843e+04
50% 667.000000 408.000000 4.425472e+04
75% 1387.000000 1074.000000 1.194865e+05
max 42928.000000 43884.000000 4.091146e+06

Let's see a quick distribution of Posts, Comments and Payouts


fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(30,10))
sns.distplot(df.Posts, ax=ax1, color='b')
sns.distplot(df.Comments, ax=ax2, color='r')
sns.distplot(df.Payouts, color='g')

output_6_1.png


Let's look at the top 20 tags in terms of Posts, Comments, and Payouts

top = lambda n, by: df.sort_values(by=by, ascending=False).head(n)
payout_formatter = FuncFormatter(lambda x,p: '%1.1fM' % (x*1e-6))

fig, (ax1,ax2,ax3) = plt.subplots(figsize=(20,15), ncols=3)

sns.barplot(y='Tag', x='Posts',
            palette=sns.light_palette("blue", n_colors=20, reverse=True),
            data=top(20, "Posts"), 
            ax=ax1)
ax1.set(title="Posts", xlabel='')

sns.barplot(y='Tag', x='Comments',
            palette=sns.light_palette("red", n_colors=20, reverse=True),
            data=top(20, "Comments"), 
            ax=ax2)
ax2.set(title="Comments", xlabel='', ylabel='')

sns.barplot(y='Tag', x='Payouts',
            palette=sns.light_palette("green", n_colors=20, reverse=True),
            data=top(20, "Payouts"), 
            ax=ax3)

ax3.set(title="Payouts in SBD", xlabel='', ylabel='')
ax3.xaxis.set_major_formatter(payout_formatter)

fig.suptitle('Top 20 Tags in terms of Posts, Comments and Payouts')

output_8_1.png


Let's look at the Payout's in terms of posts and comments

df = df.sort_values(by="Payouts", ascending=False)
fig, (ax1,ax2) = plt.subplots(figsize=(20,10), ncols=2, sharey=True)

sns.regplot(x='Posts', 
            y='Payouts', 
            scatter_kws={'alpha':0.5}, 
            data=df, 
            ax=ax1)

for i, tag in enumerate(df.Tag.head(20)):
    ax1.annotate(tag,(list(df.Posts)[i],list(df.Payouts)[i]))
    
ax1.yaxis.set_major_formatter(payout_formatter)


sns.regplot(x='Comments', 
            y='Payouts', 
            scatter_kws={'alpha':0.5}, 
            data=df, 
            ax=ax2)

#Annotate the top 20 in terms of payout
for i, tag in enumerate(df.Tag.head(20)):
    ax2.annotate(tag,(list(df.Comments)[i],list(df.Payouts)[i]))
    
ax2.yaxis.set_major_formatter(payout_formatter)
ax2.set(ylabel='')

output_11_1.png


What's the payout per post/comment for tags?

It's important to note that this is not the same as the "Average Post/Comment Payout"!

Because we only have the Total Payout without knowing how much is allocated for Posts and Comments, the exact ratio might not be too useful, but can still give us a rough idea how the tags compare against each other.

df['payout_per_post'] = df.Payouts / df.Posts
df[['Tag', 'Posts', 'payout_per_post']] \
    .sort_values(by='payout_per_post', ascending=False)\
    .head(20) \
    .reset_index() \
    .drop('index', axis=1)
Tag Posts payout_per_post
0 bitshares 286 1105.515846
1 steemfest 125 925.923792
2 witness-category 113 844.448761
3 meetup 112 643.752375
4 beyondbitcoin 994 500.433907
5 steem-help 211 469.506502
6 steem-pocalypse 38 411.794526
7 curation 371 368.283876
8 security 530 353.183917
9 coinkorea 481 339.354857
10 crypto-news 984 326.677209
11 gridcoin 75 319.262480
12 charlesfuchs 8 317.167500
13 introduceyourself 4430 306.785769
14 stats 402 296.285604
15 marketing 560 274.113300
16 economics 470 264.304479
17 steem 10607 253.045986
18 steemvoter 114 252.183789
19 eos 362 251.570713

Let's do the same thing for comments

df['payout_per_comment'] = df.Payouts / df.Comments
df[['Tag', 'Comments', 'payout_per_comment']] \
    .sort_values(by='payout_per_comment', ascending=False) \
    .head(20) \
    .reset_index() \
    .drop('index', axis=1)
Tag Comments payout_per_comment
0 creativity 163 1481.952816
1 minnowsunite 170 1479.143265
2 future 35 1348.421343
3 finance 48 1218.871104
4 marketing 142 1081.010197
5 review 66 1028.849197
6 security 215 870.639423
7 work 87 633.268655
8 stats 190 626.877963
9 creative 143 610.995790
10 beyondbitcoin 816 609.597186
11 china 49 605.306388
12 recipe 161 587.820820
13 government 104 578.813019
14 anarchism 295 568.976651
15 girls 127 507.826693
16 fun 246 470.909626
17 recipes 256 466.744328
18 painting 237 466.703169
19 tutorial 132 452.375341
Sort:  

Great stuff! I'm just doing my first baby steps in Python.
I will try this out. Will see if I can do it in my Ide since I don't have Jupyter installed

Awesome, good luck. Happy to help out, if you had any questions feel free to reach out!

Congratulations @helium! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

You published your First Post
You got a First Vote
You made your First Comment
Award for the number of upvotes received

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

Coin Marketplace

STEEM 0.24
TRX 0.26
JST 0.040
BTC 96721.23
ETH 3463.08
SBD 1.56