[Steem Blockchain] How Much Steem Power Does It Take to Do Anything You Like?

in #steem7 years ago

Moby Dick off the port side, cap'n!

Like many people who get more involved in understanding the steem blockchain than they really should, taking at least 2d10 SAN damage in the process, I have certain questions that I would like to know the answer to.

@paulag and the rest of the #BI community have been doing a great job at running the numbers and presenting them on a regular basis, and they should be commended at the highest levels for doing so. Judging by the votes their posts get, that's exactly what's happening. But they tend to get into the minutia of time series and I'm more interested in dealing with understanding exactly what the numbers mean at a functional level.

So I've dragged my old "I'm a Programmer!" hat out of the back of the dusty closet it'd been hanging in, knocked it against the wall a few times, put it on, and ignored studiously the cascade of rust flakes that fell off of it.

Tools of the Trade

There is no chance that I'm going to use the high-powered tools that the #BI guys use. It's not that I don't have the hardware to do it, but I just don't have access to the software – and I don't really have the experience to make use of them well. Instead, I'm going to use the most public domain and open source stuff that I can find, in part because that means that anybody that reads this can do the same thing, replicate my findings, dispute my findings – just generally do "that science thing."

That's how we get better.

Choice of language: Python 3.6.4.

Is it my favorite language? Maybe not. By and large I would prefer to use Erlang as my toolset, but it just so happens that there are some very useful libraries available for Python and I've written a ton of code for it in the past, albeit in the last major version.

(There has been some significant changes in the language since I last hacked at it. Good times!)

Choice of editor: Atom.

Because sometimes what you need is a ridiculously capable, highly modular editor which has support for turning into a full IDE with linter. That just means that the editor itself will tell me if I'm making strange, gross syntax errors and do its best to correct me along the way. If you have to write code, you want to do it in an editor that helps you rather than hinders you.

Yes, I know. I could've gone with Emacs, and there were times in my life where that would've been my primary choice. But not for the last decade. I love Emacs! I love eLisp! At a certain point you really want to spend more time writing content than you do tweaking the internals of how the editor you use works.

I have long passed that point.

(Interestingly, Atom is also the editor I use for writing my replies. The Steemit reply box is just too small to work with and the Markdown interpretation is entirely in the wrong place, so I just pull up a blank Markdown file to work in. I get a nice interpreted view of what my Markdown looks like and a very nice editing environment that just requires that I cut-and-paste the results back over into the tiny white box. It's vastly superior.)

Choice of database back-end: SteamData.

And now the moment I know you've all been waiting for.

After all, the SQL database has gone pay, and all reasonable people use SQL as a database query language. How could I possibly get access to the contents of the steem blockchain without going through that service?

Well, it's easy. I don't like SQL.

Heresy, I know! Frankly, it gives me COBOL flashbacks. All that text, all that annoying formatting, all that weird "trying to be English" construction – I just don't need that in my life.

MongoDB has very different approach to dealing with database interfaces. The queries are much more programmatic and, frankly, it just feels better for me to use.

Also, it's still free to access and has a solid Python library to interface with it in code. Cheers to @furion on that one; it's solid tech.

Choice of DB spelunker: Robo 3T.

It's been a long, long time since I was anything like a database programmer. I might have been good – once. One of the things that I need to make me even barely competent is the right tool for exploring the database itself. I need a GUI that lets me poke at the fields and the collections, figure out what the format looks like and how to access any given piece. What the default values are. Lucky for me, the tool that used to be known as RoboMongo is still available from the guys who bought it out from the developer, and it's still a great tool.

Choice of Graphics Generator: Plotly For Python

Odds are good that at some point I'm going to want to draw some pretty pictures to keep the attention of part of the audience. You know who you are.

The fact that I've never used Plotly before is almost immaterial to this endeavor. Of course, it's full of bizarre syntax, confusing methodologies, and poorly written documentation – but when have we ever let that hinder us?

What's The Question?

I started this whole process just to answer one question.

If we take a list of "active accounts" and sorted in descending order by the amount of SP/vests that they have, at what point do the cumulative vests above the line exceed the cumulative vests below the line?

Or to put it in terms which are more procedural, what is the cut off point below which it simply doesn't matter what the rest of the active population of the steem blockchain wants because they don't have enough cumulative voting power to stand against the extant whales above them?

From a sociological and political perspective, this is an important question. Below this notional line, the members of the blockchain can do whatever they want and it simply won't matter because they can be almost trivially overruled, individually and en masse, by those at the top.

My hypothesis is that this line is really quite high and that it only takes a minimal number of the high-end population to override the will of anybody underneath. My initial estimation is somewhere short of 400, and perhaps radically short of 400.

We have a question. We have a hypothesis. Now – in order to do science – we must do an experiment.

Limiting the Terms

The basic idea is set.

I have proven that I can pull data from the database into the system. I've got a roadmap of all of the attributes for accounts, and I have considered what things I can actually work with.

A raw pull of every account accessible from the database nets me about 750,000 accounts. This looks reasonable, from a quick poke around the results. Unfortunately, right up front I can see that we are going to have some problems with the usability of this content.

That is the Robo RT view of the first object returned from the naked query for all accounts. There are some obvious problems with this data.

It's not a problem with the data, per se. This is a real account, and it was created way back in the first days of the blockchain, I'm quite sure – and it really hasn't been touched or active since then.

That provides us something of a problem. Sure, I could play around with ancient accounts that don't really do much, or I could filter based on some qualification to trying get accounts which are actually somewhat more valid and interesting.

Something obvious here is that the dates which are stored by default in the steam blockchain have kind of a strange epoch. January 1, 1970 at midnight. This particular account only varies from those defaults in the last vote time for some reason, which was in April 2016.

This is some crap.

Poking around at the accounts which are returned and moving deeper into the stack, I believe that I have determined a field which I can use to somewhat cut back on a lot of the garbage accounts.

"last_account_update"

A by hand survey suggest that this field is only updated on accounts which have been relatively active at some point. Because I'm lazy, and I really don't want to take too small a slice, I want my breakpoint time to be January 1, 2000. I know that the system itself hasn't been in use since then, so any dates earlier than that point are clearly untouched defaults.

Applying this as a filter to my query cuts down the number of replies to a mere 250,000 or so. A little more than that, actually.

This feels like a number that I can deal with. (Whether it's a number that tools can deal with remains to be seen.)

Building the Books

So what does that MongoDB request in Python end up looking like?

from steemdata import SteemData
from datetime import datetime

db = SteemData()

breakTime = datetime(2000, 1, 1)

query = db.Accounts.find({'last_account_update':
                          {'$gte': breakTime}},
                         projection={'name': 1,
                                     'vesting_shares.amount': 1,

                                     '_id': 0},
                         )

We import the modules necessary to generate our query. We need the datetime module in order to generate the proper offset object for the comparison.

The query itself is relatively straightforward. We just want to match on everything whose last account update time is more recent than 1 January 2000. Really quite straightforward.

The projection tells the system what fields we actually want to get back. We don't care about all the fields in every account. That's just too much data. We only want the name of the account, the floating-point number of vesting shares, and we in particular don't want the big hash ID which does us no good, anyway.

We end up getting back an object which is uninstantiated. That is, the actual interaction with the database hasn't occurred yet, this literally represents only a query which will be sent once something is done to this data.

So let's turn it into a list. We can work with lists. They're iterable and dynamic, so if we need to walk the list to do something to the content, we can.

queryList = list(query)

I know. My naming scheme for variables is almost incomprehensible.

What does this data look like?

>>> from pprint import pprint

>>> pprint(queryList[:10])

[{'name': 'a-00', 'vesting_shares': {'amount': 12422.16369}},
 {'name': 'a-11', 'vesting_shares': {'amount': 12107.547996}},
 {'name': 'a-2', 'vesting_shares': {'amount': 8097.823354}},
 {'name': 'a-3', 'vesting_shares': {'amount': 3554.801833}},
 {'name': 'a-4', 'vesting_shares': {'amount': 19749.370041}},
 {'name': 'a-5', 'vesting_shares': {'amount': 2490.81233}},
 {'name': 'a-6', 'vesting_shares': {'amount': 13497.429734}},
 {'name': 'a-7', 'vesting_shares': {'amount': 321268.833422}},
 {'name': 'a-8', 'vesting_shares': {'amount': 11928.556824}},
 {'name': 'a-a-0', 'vesting_shares': {'amount': 16582.118011}}]

That's pretty interesting!

These are the first 10 results from the search to the database. I find it more than a little curious that very obviously testing accounts are still showing up after our filter. Some content dropped out, because the filtered list is about 1/3 as long as the original. But these accounts pointedly did not.

Also note that inside the list, each account came back as a Python dictionary, the second of which is a dictionary nested inside a dictionary. This is kind of a pain in the ass, so inevitably I'm going to write a function which when given one of these elements, pulls out and returns the vesting shares value as a value. Anything else would be insane.

This is still not quite as useful as it needs to be, because what we really want is a sorted list of these accounts in descending order of vesting shares. Luckily, we have more than enough horsepower to generate a new list in short order.

def vestingAmount(queryEntry):
    return queryEntry['vesting_shares']['amount']


def sortQuery(queryList):
    return sorted(queryList, key=vestingAmount, reverse=1)

>>> sQuery = sortQuery(queryList)

>>> pprint(sQuery[:10])

[{'name': 'steemit', 'vesting_shares': {'amount': 101294832035.79428}},
 {'name': 'misterdelegation', 'vesting_shares': {'amount': 33854469950.665653}},
 {'name': 'steem', 'vesting_shares': {'amount': 21249773925.079193}},
 {'name': 'freedom', 'vesting_shares': {'amount': 15507987396.01915}},
 {'name': 'blocktrades', 'vesting_shares': {'amount': 9494007774.078524}},
 {'name': 'ned', 'vesting_shares': {'amount': 7344140982.676874}},
 {'name': 'databass', 'vesting_shares': {'amount': 3500010180.297931}},
 {'name': 'hendrikdegrote', 'vesting_shares': {'amount': 3298001762.871842}},
 {'name': 'jamesc', 'vesting_shares': {'amount': 3199868835.022211}},
 {'name': 'val-a', 'vesting_shares': {'amount': 3132003554.29581}}]

We have to be careful not to forget the reverse Boolean on the sort (like I did the first two times) or else you get the lowest value accounts at the top. There are a surprising number of accounts with zero SP which made it through my initial filter.

I'm sure we'll talk about that a little bit later.

Instead, let's look at what we did get.

Unsurprisingly, the first three accounts can be directly traced back to Steemit corporate, holding a vast amount of resources in reserve. As other people have reported, there have been some recent movements from the Steemit account to Mr. Delegation, theoretically to help fund some further delegation to applications which are being developed on the blockchain.

That's actually all well and good, but the sheer vastness of the numbers is really going to screw with any kind of analysis that we can do here. I am tempted to remove the first three accounts from consideration when it comes to asking the question I've already proposed. After all, it really doesn't matter what the rest of the platform wants if corporate doesn't want something to happen.

Maybe we'll just set them aside.

Going further, @freedom is an interesting case.

Let's take a look over at another view of the blockchain for what freedom is up to.

Freedom is just a big old transfer bin. It appears to simply exist to transfer money in and out. There is no kind of activity of curation or posting, it only exists as an arbitrage point.

(Yes, I know – most of the people interested in reading this sort of thing already understand what kind of insanity is going on at the top of the most valuable accounts on the blockchain. But someone might not. They might be reading this. Maybe. What I find interesting is the sheer number of beg-messages hitting freedom as memo transfers. I could probably make a tidy sum just off of the accumulated bits and pieces being spent to send freedom messages.)

Being an exchange is clearly a profitable occupation, because @blocktrades follows freedom in the list. As well you probably should, since both of them need a lot of liquid resources in order to function as an exchange in the first place. That's not completely surprising.

After that comes @Ned, whom as an implementer of the blockchain itself I would expect to have quite a comfortable nest egg tucked away, just in case – and, in fact, that's exactly what we see.

The numbers are starting to fall off really fast, now. Even this high on the food chain, there are pretty big gaps between major players.

And then there's @databass.

Almost no activity. In fact, this account was created by Steemit originally with a massive vest-dump – and it has done nothing since.

Literally, nothing.

Though it does stand as a pretty good example of remembering that nothing is private on the blockchain. Nothing is concealed. I almost feel sorry for the Earth Nation bot over there…

But then everything changed when the Fire Nation attacked.

The other thing that is brought to mind by this account and other accounts like it is that is going to be impossible to truly filter out all of the accounts which are or have been architecturally intended to be repositories for Steemit vests. This is a particularly large one, with a particularly funny name. How many more accounts that even passed my very lax filter are really just resource dumps?

It would require a lot more research and effort than I am both willing to do and capable of doing to even start making a dent in that problem. I can imagine some sort of immense directed acyclic graph which would depict the relationships as far as we can tell between accounts, but that's no small accomplishment.

Maybe someone else will take up that project one day.

Just for grins, let's look a little further down the database.

>>> pprint(sQuery[10000:10010])

[{'name': 'webdesign29', 'vesting_shares': {'amount': 426642.409568}},
 {'name': 'misha', 'vesting_shares': {'amount': 426576.330304}},
 {'name': 'spencec6', 'vesting_shares': {'amount': 426533.168183}},
 {'name': 'joao-cacador', 'vesting_shares': {'amount': 426521.652178}},
 {'name': 'arthur.grafo', 'vesting_shares': {'amount': 426507.687568}},
 {'name': 'ainiaziz', 'vesting_shares': {'amount': 426476.41644}},
 {'name': 'kode', 'vesting_shares': {'amount': 426282.95957}},
 {'name': 'dannyleenders', 'vesting_shares': {'amount': 426239.559514}},
 {'name': 'creditceo', 'vesting_shares': {'amount': 426189.040678}},
 {'name': 'thecryptodavid', 'vesting_shares': {'amount': 426188.271218}}]

The vests here have really fallen off, which is quite interesting. These all appear to be relatively sensible accounts (though the lack of creativity from @webdesign29 should be a case for a little bit of snickering).

Let's pick one at random and take a look at what their activity profile looks like.

This all looks pretty reasonable, though I'm more than a little jealous of @arthur.grafo's ability to earn some fat cash for writing poetry.

I went into the wrong line of work, I think.

A mere 10,000 steps down the list, however, and we have dropped into the realms of only 200 SP accounts. That's barely enough vests for the system to allow you to decide for yourself how large your votes should be. This is firmly down into the territory of "regular people can be here."

Remember, this list is over 250,000 lines long. We are less than 1/25 of the way into it, and we are already looking at account levels of SP which are not even really considered into the "minnow" range. These are tasty plankton.

That bodes well for proving my hypothesis, but it probably bodes poorly for the ecology of the steem blockchain.

Answers Forged

Well – that was depressing.

Let's get back to answering the question that we posed originally. This should be pretty straightforward, because we have all the data in lists and it's easy to build some accumulations, figure out the numbers, and find out what's going on.

I've flattened the list a bit to just make it easier to deal with.

def queryList2queryTupList(queryList):
    idx = 0
    outList = []
    for e in queryList:
        outList.append(query2tup(idx, e))
        idx += 1
    return outList

>>> fQuery = queryList2queryTupList(sQuery)

>>> len(fQuery)
287464

>>> pprint(fQuery[:30])
[(0, 'steemit', 101294832035.79428),
 (1, 'misterdelegation', 33854469950.665653),
 (2, 'steem', 21249773925.079193),
 (3, 'freedom', 15507987396.01915),
 (4, 'blocktrades', 9494098772.554981),
 (5, 'ned', 7344140982.676874),
 (6, 'databass', 3500010180.297931),
 (7, 'hendrikdegrote', 3298001762.871842),
 (8, 'jamesc', 3199868835.022211),
 (9, 'val-a', 3132003554.29581),
 (10, 'michael-b', 3084198458.874888),
 (11, 'val-b', 3058661749.06894),
 (12, 'proskynneo', 2991480332.628863),
 (13, 'thejohalfiles', 2633775823.387832),
 (14, 'minority-report', 2230797417.585554),
 (15, 'xeldal', 2044948082.411315),
 (16, 'roadscape', 1959600356.285733),
 (17, 'jamesc1', 1644781210.539062),
 (18, 'arhag', 1605237829.023531),
 (19, 'fyrstikken', 1546686313.784641),
 (20, 'adm', 1524486923.142378),
 (21, 'safari', 1500015009.426551),
 (22, 'riverhead', 1493568252.839648),
 (23, 'adsactly', 1334221109.867592),
 (24, 'trafalgar', 1326882955.849771),
 (25, 'created', 1247305568.029396),
 (26, 'tombstone', 1210362781.831355),
 (27, 'wackou', 1113334187.722864),
 (28, 'glitterfart', 1062137422.056735),
 (29, 'steemed', 1051204436.079952)]

A list of tuples is pretty easy to deal with.

So -- what's the ridiculously high sum of total vests?

>>> totVests = 0
>>> for e in fQuery:
    totVests += e[2]
    
>>> totVests
348533694697.9651

At the moment, Steemd shows steem_per_mvests at 489.056, making this roughly 170,452,494.594 STEEM/SP in aggregate value.

So we just need to count down the list of accounts adding up vests until we exceed 50% (or 174,266,847,348.98254 vests), and wherever that falls, that's the break point.

Easy!

>>> accumVests = 0
>>> idx = 0
>>> for e in fQuery:
    accumVests += e[2]
    if accumVests > halfVests:
        print (idx, accumVests, halfVests)
        break

    
0 181401162080.11328 174266847348.98254
>>> fQuery[0]
(0, 'steemit', 101294832035.79428)

Okay, that's a problem. The @steemit corporate account has more than 50% of the vests in the entire system of reasonably active accounts.

Let's just skip over the first four accounts altogether, shall we? In fact, make it the first six.

Anyone figured out what the problem is?

Ayup. Once we cut out the top 5, we really needed to recalculate what half the vests are.

Let's just write a function to do this. It's getting messy to do it in the REPL.

def computeMidbreak(tupQueryList):
    totVests = sum([e[2] for e in tupQueryList])
    halfVests = totVests / 2
    print('Total vests: {}, half vests: {}\n'.format(totVests,
                                                     halfVests))

    accumVests = 0

    for e in tupQueryList:
        accumVests += e[2]
        if accumVests > halfVests:
            print('Rank {} - {} reaches {} of {}!\n'.format(e[0],
                                                            e[1],
                                                            accumVests,
                                                            halfVests))
            break
        else:
            if (e[0] % 500) == 0:
                print('Accum rank {} - {}, {} of {}\n'.format(e[0],
                                                              e[1],
                                                              accumVests,
                                                              halfVests))

>>> computeMidbreak(fQuery[5:])
      
Total vests: 167132532617.84103, half vests: 83566266308.92052

Rank 73 - gtg reaches 83675498293.07193 of 83566266308.92052!

This is some blunt force code. It's not even brute force – it goes well beyond that.

Effectively we take whatever tuple query list we're handed, go ahead and calculate the total vests by simple summary, calculate the half vests off of that, and then step through the list, accumulating vests as we go until we either reach the half-point or run out of list.

You'll notice that I didn't even code for the possibility that we run out of list. I'm both lazy and know that at some point we have to have more accumulated vests than half the value in the pile.

I didn't actually expect that we would reach the line at position 73 of over 250,000. I knew the distribution was bad, but I didn't realize how bad.

The top 73 holders of vests on the steem blockchain, even allowing for decapitating the top five which are corporate content and the top one which holds more SP in one place than half the rest of the blockchain represent more voting and influence power than the rest of the accounts on the blockchain combined.

Or to put it a different way, no matter who you are, no matter what you do, if the top 73 (actually 78) people involved with this blockchain decide something should happen – that's what happens. No amount of campaigning, no amount of persuasion, no amount of subterfuge can change that fact.

Out of curiosity, let's see what happens if we decapitate the top 10,000 accounts on this list. We will go ahead and move the top of the bar down 10,000 spots, which only brings the number of accounts covered to 240,000, and let's see how far you have to go before 50% of the vests are owned.

>>> computeMidbreak(fQuery[10000:])
      
Total vests: 4094385384.9629292, half vests: 2047192692.4814646

Accum rank 10000 - readingdanvers, 427126.578891 of 2047192692.4814646

Accum rank 10500 - christianytony, 204839141.37938106 of 2047192692.4814646

Accum rank 11000 - timoshey, 389534075.1581148 of 2047192692.4814646

Accum rank 11500 - catchup, 557556731.7472327 of 2047192692.4814646

Accum rank 12000 - hammockhouse, 711657143.056614 of 2047192692.4814646

Accum rank 12500 - ask-not-please, 853782777.8299729 of 2047192692.4814646

Accum rank 13000 - thesimplelife, 985274542.4688984 of 2047192692.4814646

Accum rank 13500 - roundoar03, 1107113991.4568622 of 2047192692.4814646

Accum rank 14000 - brainisthekey, 1220770239.1390197 of 2047192692.4814646

Accum rank 14500 - uncle-blade, 1327720546.3466616 of 2047192692.4814646

Accum rank 15000 - samiksa1982, 1428844721.4849968 of 2047192692.4814646

Accum rank 15500 - chivacoa, 1522977099.2646227 of 2047192692.4814646

Accum rank 16000 - elibemusic.com, 1610075085.9102407 of 2047192692.4814646

Accum rank 16500 - lokkie, 1690947907.6110935 of 2047192692.4814646

Accum rank 17000 - catonwheels, 1766333288.3911705 of 2047192692.4814646

Accum rank 17500 - thegame68, 1836934473.2640998 of 2047192692.4814646

Accum rank 18000 - mikefromak, 1903144829.8388672 of 2047192692.4814646

Accum rank 18500 - augistune, 1965502901.859729 of 2047192692.4814646

Accum rank 19000 - elisambre, 2025283613.9336421 of 2047192692.4814646

Rank 19189 - andyblack reaches 2047242786.7394085 of 2047192692.4814646!

Allowing for the fact that we started at rank 10,000, it was only another 10,000 accounts until the owned vests exceeded 50% of the total vests.

That is a ridiculously sharp dropping curve right there.

Let's take this one step further. Let's go that one step beyond.

Let's decapitate the top 100,000 accounts on this list. Now, I know that because the differentiation between account values at that point on the list is really small, is going to require a lot of accounts to accumulate before the 50% mark gets hit. I'm going to change the code so that we only get an accumulated rank update every 5000.

def computeMidbreak(tupQueryList):
    totVests = sum([e[2] for e in tupQueryList])
    halfVests = totVests / 2
    print('Total vests: {}, half vests: {}\n'.format(totVests,
                                                     halfVests))

    accumVests = 0

    for e in tupQueryList:
        accumVests += e[2]
        if accumVests > halfVests:
            print('Rank {} - {} reaches {} of {}!\n'.format(e[0],
                                                            e[1],
                                                            accumVests,
                                                            halfVests))
            break
        else:
            if (e[0] % 5000) == 0:
                print('Accum rank {} - {}, {} of {}\n'.format(e[0],
                                                              e[1],
                                                              accumVests,
                                                              halfVests))

>>> computeMidbreak(fQuery[100000:])
      
Total vests: 185774094.90749836, half vests: 92887047.45374918

Accum rank 100000 - avicena41, 1611.526929 of 92887047.45374918

Accum rank 105000 - stewiegriffin, 7597302.133254998 of 92887047.45374918

Accum rank 110000 - bioherby, 14380629.430285048 of 92887047.45374918

Accum rank 115000 - jameseaton, 20525595.943487044 of 92887047.45374918

Accum rank 120000 - whitelotus, 26220640.330002043 of 92887047.45374918

Accum rank 125000 - legrandgm, 31624152.514669873 of 92887047.45374918

Accum rank 130000 - aaqibsohail, 36872727.66174664 of 92887047.45374918

Accum rank 135000 - unknownplayer, 42054003.03950547 of 92887047.45374918

Accum rank 140000 - lena-mikado, 47229579.20794142 of 92887047.45374918

Accum rank 145000 - charleneishere, 52402406.15960627 of 92887047.45374918

Accum rank 150000 - mocle, 57571997.65572927 of 92887047.45374918

Accum rank 155000 - bangbang, 62738220.82528121 of 92887047.45374918

Accum rank 160000 - alansmithee, 67900191.79367426 of 92887047.45374918

Accum rank 165000 - darmidayitrizi, 73058103.28484616 of 92887047.45374918

Accum rank 170000 - potcurator, 78212514.4902361 of 92887047.45374918

Accum rank 175000 - farimani, 83362800.59005027 of 92887047.45374918

Accum rank 180000 - johntheviper, 88508593.28006594 of 92887047.45374918

Rank 184258 - hecqubus reaches 92887116.07530342 of 92887047.45374918!

It only took 84,000 accounts once you offset by 100,000 to control half of the remaining SP pool.

Out of curiosity, let's check that guy out.

Well, he seems all right. He's got a couple of posts, he's got a couple of uploads and…

He has 0.503 SP plus his initial goal creation investment of about 15.

That's how far down the account blockchain that we've come by jumping to the 100,000 point.

Accounts of this level, carrying this much SP, make up the vast bulk of accounts on the blockchain.

Let's look at that.

What's That Look Like?

Lines in the Sky

This is what just a naked calculation with linear scaling of rank versus vests looks like.

Yes, the curve really is that steep. Steemit is holding so much value compared to the rest of the blockchain that this is the distribution curve without any scaling. Now, it would look a little bit different if we strip off the top five accounts again, but just contemplate this.

Think about it.

Maybe this isn't the right way to look at the data. I have a little bit of experience with data mining. Whenever I see this sort of distribution, my first thing to reach for is a logarithmic curve. If we plot the vests as they stand on a log curve, surely it can't possibly be so bad.

It can really be that bad.

Notice our distribution here. Horizontally in this inter is 100 vests, or roughly .25 SP at the current rate. There is a huge population on this platform which is almost indistinguishable in the amount of vests that they have hanging out in their pockets.

And there is a very tiny number, comparatively, who have quite a lot of vests in their pocket. So much so that when plotted on a logarithmic curve they still evidence an exponential J hook.

Let's see if we can cut off the top five accounts and replot this.

Oh look! When plotted on a linear scale, you can just about see a very tiny inflection after we trim off the top five accounts. Of course, that also just changes the top from somewhere over 10 billion to a mere 3.5 billion.

Let's go back to log.

No real difference, except the slope may be very slightly less at the under 20 K rank. Very slightly. We know that the breakpoint for owning more than 50% of the vests is at position 73 on this graph.

This is kind of brutal.

Even changing the y-axis floor to zero doesn't really help matters. It just helps hide how harsh that drop really is.

On the positive side, if you ever wondered where you stood as regards the whole population of the steem blockchain in terms of how much SP you retain – here you go. Odds are good that you are somewhere between the rank 100,000 and 260,000, in that vast, highly populated plateau.

Epilogue

So what does that actually mean?

The distribution of voting power on the steem blockchain is worse than the distribution of wealth on most of the planet. The major difference is that on the blockchain, we can actually see it directly rather than merely observe what that wealth can bring.

Frankly, I prefer the real world. I can at least aspire to be rich, work to be rich, and labor under (perhaps) the delusion that my efforts will make a difference. Here… Well, there are a lot of people who like to talk about the amazing technology of social networking and the blockchain to give people a say in the governments of their communities.

That's just silly.

Perhaps the saving grace of the blockchain, just as it is in the real world, is the inability of people to get along – particularly of people with equal opportunity and power. While conspiracies are the theories that keep giving, in reality we know that competition between equals is some of the most vicious and cooperation is opportunistic. If anything, we may be saved by the fact that if you laid every whale end-to-end, like economists they would all point in different directions.

This data could definitely use more mining. Some sort of directed acyclic graph as I mentioned before would be awesome for building a level of comprehension of what accounts are related to others. There's an entire field of visualization just waiting to be turned.

I don't promise to be any kind of business intelligence guy. Don't consider this investment advice. In no way take anything that I've said as an endorsement or denigration of anything except human nature, because humans suck. I'll go on record as saying that.

Do take this as food for consideration and as an invitation to go and explore the data available yourself. I suspect you will learn things that you never expected.

  • Music to hack database code and wrestle with graphing code to:

Sort:  

this is an awesome post, well analysed and so well written. You are a coder, I dont code, you rock.

I did a post in the past that touched on 'controlling' power on steemit. this is an issue and distribution in the initial stages has created a very uneven system. To me this is one of the bigger problems steemit has to face moving forward.
So excited to see a data post like this, tagging #blockchianbi

Coming from you, this is high praise indeed! Thank you very much! I mean that.

To be fair, I haven't been a coder for many, many years. Though like a fish with a bicycle, you never forget the sweet, sweet sensation of finally getting something to work.

At this point I'm trying to decide if this is an artifact of the reward mechanics in the blockchain white paper and thus implemented or an inevitable result of any kind of reward-system which has few outlets and thus most of the ecological energy must be turned inward to generate more of itself. That is, largely, what we have in steem. There are a few ways to take the actual currency out of the system profitably and many ways to use the currency within the system to simply get more currency, whether it be self-voting, or engaging with bots, or lend/leasing delegation power – which can lead to just making more steem and thus more vests.

A lot of these really large accounts have existed since the primordial days, as far as I can tell.

It might also be interesting to poke at the Operations collection and simply get a list of accounts which have made any sort of operation in the last year, for instance, build a vests table out of that, and see what the distribution looks like. That would be one more lens to bring to bear on looking at how money is moving on the blockchain.

Like I said, I am no kind of business analyst. Never have been. But I do have some interesting questions from time to time.

Again, thank you for the praise. It's super effective!

if I had these coding skills I would be using them to analyse different blockchains. Might also be something that you could look at? DM me on discord if you are interested and I will tell you a little more

If it has a MongoDB back-end interface, I might could flail around at some things. If it's SQL -- well, that might be technically possible, but I've mentioned how much I dislike SQL, right? [grin]

oh this is not sql stuff :-) and not a DB either...how are you with API's?

A lot of stuff I cant get at cos I dont code, many blockchains have apis

check out this post I did a while back
https://steemit.com/utopian-io/@paulag/exploring-the-smartcash-api-with-steemit-blockchain-transfer-comparison

Huh. That just looks like an ugly but usable JSON file, which reduces to a fairly straightforward parsing problem. I'm pretty sure there are even easy-use Python libs for dealing with JSON on the regular.

Doesn't look too bad at first glance. Everything depends on what you want to do with it, of course.

like I said I dont do code, but these types of posts and analysis are in need. that post was my first attempt and from discussions with others, people want to see more of this. however with power BI I am too limited

I can probably do some grunt-work transformations of data from JSON into something more tabular, if that's the sort of thing you're looking for.

I like to start with the question I'm trying to answer, mind you, because -- er, well, even when I do, I can sometimes get obsessed with solving a problem that I don't need to and waste a good three hours wrestling and swearing with something I don't even need to touch.

Ah, not to say that happened or anything. [grin]

The code isn't even the hard part on things like this, truthfully. It's looking at the data you have and trying to figure out what you can do with it.

This is awesome! I'm going to save this to play around with later for sure.

I have a question that is of a slightly different nature. We don't know what half of those SP dumps are doing, but we do have analysis for active flag behaviors and analysis of abuse. If we focus on these, do they have the SP to do whatever they want or not? My impression was that it wouldn't be hard to have a giant group of people band together against these actors, but that graph you showed is a bit discouraging. However, we might want to throw out much more from that top X list.

And I assume that the whales aren't all banding together to assert their rule either. Or we really would be screwed. If that becomes the conclusion, we could be in deep trouble :). If even some whales fight against this rule there is hope I suppose.

This is awesome! I'm going to save this to play around with later for sure.

That's why I tried to include all of the bits of code that I was leveraging and methodologies. I've heard rumors that such behavior is called "science," but I couldn't testify to that fact for sure.

I have a question that is of a slightly different nature. We don't know what half of those SP dumps are doing, but we do have analysis for active flag behaviors and analysis of abuse. If we focus on these, do they have the SP to do whatever they want or not? My impression was that it wouldn't be hard to have a giant group of people band together against these actors, but that graph you showed is a bit discouraging. However, we might want to throw out much more from that top X list.

First of all we would have to reliably be able to define "abuse." I'm not sure that we have a good enough definition or assessment of abusive behavior to be able to programmatically determine whether a given act or operation on the blockchain is abuse.

I recognize that giving voice to such things is pretty heretical at core because the base assumption is that abuse is obvious, omnipresent, and easily denoted. In practice – well, it generally comes down to "someone is doing something I don't like or disagree with so they should stop." For obvious reasons that's not a position I can either support or programmatically ferret out, despite my inclinations.

However – your question is really not about abuse at all. It's about concentration of force, and by force in this context I literally mean vests.

Examine the data that I displayed for various decapitation levels of the list. The curve is self similar even as you take off more and more of the top end. The slope becomes less in absolute terms but remains largely the same in relative terms.

Which bodes ill for the idea that a giant group of people could band together against those actors because it would take way more people as you move down the curve – and as we all know, trying to get people on Steemit to agree to act in the same way at the same time is worse than herding cats. I know cats like food. Users on Steemit can't be reliably expected to choose between multiple acts which reward them in ways that I want them to compared to ways that I don't.

Coordinating those kinds of actions is inherently difficult, maybe impossible. Looking at the reports from even relatively well organized groups like @steemcleaners , the best, most powerful they've ever been able to be has only affected around 6% of the reward pool. And that not consistently.

To put it gently, 6% ain't shit. I think I require that phrasing to really emphasize how nothing that is in the greater scheme.

And I assume that the whales aren't all banding together to assert their rule either. Or we really would be screwed. If that becomes the conclusion, we could be in deep trouble :). If even some whales fight against this rule there is hope I suppose.

As I've said elsewhere, the best thing for the community is that if you took every whale and laid them end to end, they'd all point in different directions.

In a real sense, the only person who can oppose a whale is a whale of the same relative magnitude – and even the whales are vastly different in power even in the top 78. The best thing we can hope for is that they continue to pursue their individual best interests and find the idea of coordination in a long-term sense to be counter indicated. And it would be, frankly.

About getting people to act together, I'm not so sure. I find all these organized groups to be pretty promising already, even if we're not at the level of being able to chip at high power folks

But you're right, it's not likely to happen. And you did mention the pointing directions thing and didn't realize for some reason you were talking about it.

Insightful as usual. Cheers.

About getting people to act together, I'm not so sure. I find all these organized groups to be pretty promising already, even if we're not at the level of being able to chip at high power folks

I'm less sanguine, if only because I tend to immediately recognize that any weapon that can be forged by me can be used against me.

Organized groups of people running around engaged in mass vigilante justice? How can that possibly go wrong?

We don't really need to ask that question since we are currently living through the beginnings of the Age of Warlords on Steemit. Bernie Sanders, H-dude, they are representative figures around which mobs can condense and go to war against one another.

We can see how that turns out. Or rather, we will see. Right now, from ground level, it's kind of a mess and doesn't really have any promise of getting better.

Since I am a cynic, my level of belief in the essential goodness of human nature is pretty taxed in general. The things that give me the most hope our those which seem to disturb others: that very few people care about what I do and what other people do, that there is still some social friction which keeps mass movements from spontaneously forming and lynching me, and that the human resistance to being organized externally and for equal powers to innately oppose one another helps protect me against them.

You have to go for what happiness you can find in life.

Dang, I'm not in the upper half? But I had a feeling it was that bad. 50-100 was my mental estimate,

It's amazing how far down the list you can end up.

Like I said at the end of the article, this is an area ripe for more exploration by people who are probably better at this than I am.

The curve is ugly. There is just no way around it. I also know it sure that there is any way to do anything about it, or whether something should be "done about it" at all.

This will require further rumination.

Always better to know than to guess, however.

Hello, as a member of @steemdunk you have received a free courtesy boost! Steemdunk is an automated curation platform that is easy to use and built for the community. Join us at https://steemdunk.xyz

Upvote this comment to support the bot and increase your future rewards!

The question I’ll ask me now is does the trend over time leads to more centralization or to decentralization.

Well, consider this.

We know that all activities on the steem blockchain are scaled in relation to the amount of SP that you have. In particular, the effect of your votes in terms of how much reward they provide is scaled according to your vests.

Before hardfork 19, that effect was semi-exponential. Afterwards (now) much closer to linear. However, the scaling effect is still pretty pronounced. Go digging in the Steem World views for some of the top members of the blockchain and compare the power of their 100% votes to your 100% vote.

Now consider that delegating SP to an account that has been purchased for a whopping 6 steem can fairly consistently create the appearance of decentralization without actually being decentralized as long as the voting is managed by some kind of automation. The linear effect of scaling means that you don't really lose anything by spreading your SP across multiple accounts.

As a result, given the very strong mechanical pressure which literally translates to "the rich get richer" and the ability to spread SP across multiple accounts, it's fairly obvious that the trend over time is probably going to be to more centralization of that power rather than decentralization.

The overall retention rate of the Steemit platform at something less than 10% according to the BI folks just makes that pressure worse.

The changes coming down the pipe for the next hardfork don't appear to be designed or intended to change the nature of that situation. From some perspectives, you might even think that the design is intended to make it easier to have centralized power with a decentralized appearance. Things like being able to mine for new accounts are definitely going to advantage those who already have servers and significant investment over those who might be new to the platform.

But this sounds a little bit like guessing and not a data analysis and you assume that the majority of whales don’t spread the love they are capable of. There are definitely some whales that do but the question is are the egoistic whales the majority.

More accurately, it's game design analysis. We know how people respond to systems and we can observe how they have responded to the system.

That is, after all, how you make predictions. Unless you own a time machine, you can't analyze data that hasn't happened yet.

I don't need to assume that the majority of whales that spread the love that are capable of. We can observe our experience of the blockchain. We can read all of the content that the BI group has generated regarding historical behavior of various tags and content.

So, yes, it's obvious that egoistic whales are the majority. It's extremely clear that most of the SP on the platform is not, in fact, going to voting. At all. The vast bulk of it is sitting in accounts which have neither inputs or outputs, just acting as repositories of value.

There are some extremely active accounts at the top of the chain – which are exchanges. They don't vote for much of anyone. They don't get curation value. They don't get author rewards. They exist to exchange tokens and that's all.

That is the brutal truth, whether we like it or not.

Excellent! Kudos and thanks for the info!

Too bad I can't resteem it anymore :(

Coin Marketplace

STEEM 0.19
TRX 0.18
JST 0.033
BTC 88286.50
ETH 3019.87
USDT 1.00
SBD 2.77