Charts and interpretation of @calamus056 's comment self-voting user list since HF19 - PART 3 (potential comment abuse)steemCreated with Sketch.

in #steem-coop7 years ago (edited)

@calamus056 recently made a report, part 3 in his series on on self-voting user list since HF19. It was basically a data dump without any much interpretation of the data gathered. It is of course coming from his context against self voting, which is evident from other posts and the comments, but the data itself was presented as is. You can see it here.

I thought it would be interesting to try and interpret it and graph it. Statistics asks questions of data and attempts to make sense of it. See this from a stats research guide on the subject:

In regular conversation, both words [data and statistics] are often used interchangeably. In the world of libraries, academia and research there is an important distinction between data and statistics. Data is the raw information from which statistics are created. Put in the reverse, statistics provide an interpretation and summary of data.

[...] A statistic will answer “how much” or “how many”. [...]

Notes

These data and interpretation are for comments only, not root posts.

@calamus056 has asked me to re-relay his disclaimer:

DISCLAIMER: The information in this article shouldn't be perceived as 100% accurate. When you spot significant errors, please leave a comment. Also keep in mind that the full list below is a raw data dump. In no way is it implied that all cases are considered problematic. It's for you to decide what you think about it and what to do with the information. The reason for names being included is that this is public information and others will release (and some already have released) the information independently.

But also note that I did not run this interpretation by him and it is not endorsed by him unless he chooses to do so in the comments 🙂

Foreword on data

The data is from users who's self vote ratio on comments is 50% or above.

Is there a consistent relationship between SBD rewarded from voting on others' comments compared with SBD rewarded from voting on own comment?

Answer: YES

The first graph here shows the SBD on own comments plotted against SBD on others' comments for users in the data set, as a scatter plot.

See that most points are clustered in the bottom left, though there does seem to be some linear relationship. This is very common in real world data which is often distributed logarithmically. When making an interpretation we would always adjust for this, but don't forget that we are adjusting.

stats3-own-sbd-vs-others-sbd.png

Here data on both axis are adjusted for their logarithmic distribution and we can see a clear linear relationship which is denoted by a barely visible line (sorry about that, it's the best the software I'm using could provide).

This is called linear regression. Note the best formula we can fit to the data is y = 0.557x - 7.5226, using this we can see the relationship for x SBD on own comments and get a prediction of y SBD on others comments. You might notice from the graph that the line does not go all the way though and this approximate equation has a lower limit of about x = $13.506

E.g. $50 on own comments would predict about $20.33 rewards for others. In this case the self vote reward ratio is 71.09%

stats3-own-sbd-log-vs-others-sbd-log.png

What do we get from this? Well it should be no surprise that of the users who vote reward themselves more than half of what they reward others that the predictor for this group only would give us an equation predicting greater rewards for others. There is some variation but only on the downward side of the line, which has to be true by definition of our data set, i.e. 50% self vote rewards or above.

This is a point in not being mislead to a false conclusion.

What might be more interesting than something we already know is that there is a something of a pattern in rewards for self and others among these users. The majority seem to clustered around the main line so we can say that (among this user set) generally as self rewards increase so do rewards for others.

Another graph on this: ratio plot of SBD rewarded from voting on others' comments compared with SBD rewarded from voting on own comment

stats3-own-sdb-and-others-sbd-2.png

This shows us the distribution of the 216 users that vote more than or equal to 50% for themselves (in terms of SBD rewards).

We can see that close to 100% self voting is relatively rare (there is only one at 100%, and the next is at 98.34%) and that is falls away quickly among the population.

The median self vote percentage here is 63.85%, so half of these users have self vote rewards between 50% and 63.85%

This is consistent with our scatter plot above.

Is there a consistent relationship between number of comments and self vote reward ratio?

Answer: NO

This is a text book example of no trend whatsoever, the data points are very well distributed. They bunch a bit more toward the 50% side of self vote reward ratio, but we expect that because we know most users in the group skew towards 50% rather than 100%.

What we don't see se any relationship between the number of comments made and self vote reward ratio. The suggests that we cannot say, for example, that if a user posts a lot of comment they would probably self vote a lot, or that if the user self votes a lot they must spam a lot of comments.

stats-pc-own-sdb-vs-self-votes-log-2.png

Is there a consistent relationship between number of comments and SBD self vote reward?

Answer: NO

From a first glance there seems to be vague relationship between number of comments and SBD from self vote, right? We might be tempted to conclude that those who get more SBD from self voting are leaving more comments and so spammers!

stats3-own-sdb-log-vs-self-votes-log.png

But something comes to mind - wouldn't more comments usually tend towards more rewards? 🤔

So let's look at the the number of comments compared with the overall SBD rewards (including both self rewards and those for others).

stats3-own-sdb-log-vs-total-sbd-log.png

EDIT: The legend name was wrong. I have updated. The data is still the same.

It's almost the exact same graph!

So this is a kind of a red herring, and not useful to us. It just shows us what is intuitively true - those who comment more generally will get more rewards, both for themselves and for others. In other words, this is not interesting at all.

Summary of all aspects

This chart is not really that useful but kind of asks the same question as above. I just wanted to do it mainly for the visual effect.

stats3-own-sbd-log-vs-others-sbd-log-vs-num-self-votes.png

This is the same as the first graph but with added circle size proportional to number of comments. We can see that the more SBD, the number of comments generally increases (as we saw above), but also that it is not strictly so. The very top earners are not making as many comments as some a little lower than them.

Overall conclusion

We have seen some relevant stats based on the data @calamus056 published recently.

Though the data is limited, and @calamus056 himself cautions on the accuracy, the over all impression is that the number of users self voting comments more than 50% of rewards since HF 19 are comparatively small (only 216 users are on the list), and those that self vote their comments more than half generally still vote a lot on others. This was claimed anecdotally by some, and the data seems to reflect it.

I would like to see this same data but from the same number of days before HF 19. Then we could compare to see if there has been a significant rise. We should also get this data from the same number of days after this snapshot so we can see if the situation is stable or getting "worse" (more self voting).

Currently all I can say for sure is that it is less than I thought it was in the general population. This does not mean that I don't disagree with self voting still, or that I don't think it's open for abuse, or that the very top self voters are abusing the system.

Caution on conclusion

This is based on what has already happened and may not be a predictor of things to come. I think it is still very important to safeguard the platform against abuse. Even if abuse is not happening at quite the level we thought it might have we should still look at the example of the bad apples and wonder if this is an acceptable level of abuse (given of course that you agree that "too much" self voting is abuse - you may not).

Effect on #project-smackdown

I have not had a lot of time to reflect on these findings, but I will. My initial reaction, paired with recent conversations in the #steem-coop, is that a more targeted approach to dealing with comment abuse is required. I am moving to the position that top self voters by ratio are really the ones abusing the self vote loophole at the moment, instead of those with just high rewards.

^^^ Please let me know what you think of this. ^^^

I defend (and have defended) the publishing of information like this. At a basic level it is free speech, but in any case there is a need to understand issues and statistics can help with that, as long as the data is solid and the questions are appropriate for that.

Thanks for reading 😸

Sort:  

Self voting is inefficient in the long run as people will eventually realize that it is destructive. I still get why people do it, I even do so myself on occasion. Maybe the dollar amounts should not be shown on comments, rather the amounts of likes only.

There is probably a lot of clever ways to disincentivize this behavior, at least at the extreme levels (50%+)

nice research, i just hope people will get that by helping other they gain followers and they will vote too, so its like you vote 1 time for someone and you will gain his trust for long time, its good investment to like others:)

You smart one here. I claiming this from the day one. I self vote my posts, my comments and other posts and other comments. We are all happy. The only ones that complaining and making analysis are guys who riding on voting bots before HF19. I throw them all off. Now they put me on some stupid black list and author of this article told me couple of days ago that there is no witch hunt. Now he try to sugarcoat his statements and he try to make graphs above not relevant. But real steemians exactly know how it is. You to. You welcome to visit my posts anytime. You'll get upvote, don't worry.

You're not on any black list due to us because we do not have one at present.

That's even better. Steem on.

There are probably a lot of people writing posts and publishing stats that have nothing to do with us, but if you feel picked on by the Steem Coop, that's really your issue more than it is ours. We're pointing out flaws in the system design and actively do something to negate actions that we 1 don't consider necessary 2 consider potentially harmful.

The methods employed will necessarily evolve over time and the point is not to attack users, no matter your feelings about it, but to rally consideration for working changes to the blockchain.

You're free to do whatever you think is good or necessary. I agree. And I'm free to express my thoughts and opinions. We're all happy.

I think nothing wrong with self voting. Just an opinion

In the same way that there's nothing wrong with smoking.

The real question is wether the selfvoters pay enough to repair for the cost they put on the blockchain, with the ammount the invested/will invest. There's a high risk that they don't and that bots will risk a collapse of the system in the long run.

looks like a big ol dick of self voting, which hs never been bad

people need to get down with rational self interest

this is proof of stake, u hold teempower you get paid

whats geigh is when one stake holders stops the staking of another stakeholder

bitcoin and other proof of stake coins dont do that, IMAGINE if bitcoin miners , if they were big enough, could just cancel out the rewards of other smaller mining pools, imagine that shit, how lame would that be? That what we have on steemit PRE communities

COMMUNITIES will allow people to finally post in peace and not be harassed by flagging whales, its a broken system , uts NOT proofof staein its current incarnatuion, the delegated in delegatd proof of stake

I really appreciate your research into this.

One thing I would like to see is a chart that shows how those self voting are RECEIVING votes on their comments from others. I predict that self voters are depriving themselves of the votes of others because they are self voting, and a)people notice and don't like it, and b)recipients of votes like to reciprocate, which self voting fails to encourage.

While there would need to be a control group included of those that do not self vote their comments at all to compare with self voters, and this would complicate the research a bit, I think it would be a priceless tool (if I'm right) that would quickly change the ways of self voters.

Also, since Steemit is mostly folks with little SP, there is a tipping point beyond which self voting is likely more profitable than receiving votes from the community alone.

This would be very interesting information too.

I tried really hard, but I just can't not say '...lies, damn lies, and statistics.'

Sorry.

You make several good points, most of which have been raised internally in the Steem Coop, if not yet also in posts by some of its members. =)

yes damn statistics lol... personz and calamus made all our lives harder by providing us with more facts about the world ;)

Thanks a lot. Great ideas and as @the-ego-is-you said, there's been discussion on this.

It's very easy to mislead with statistics, you see this every single day in newspapers. There are a few things to look out for but even then it's hard to catch when it's not a true interpretation. So it certainly is easier to err on the side of caution and consider it all lies 😉

Nice work @personz, keep it up!

The work you have done here on data & statistics, as well as the loose consclusions you have drawn, is very interesting to look at.

What we seem to have here is an attempt to divide SteemIt users into groups (divide-and-conquer), lay down some kinds of "rules" or guidelines regarding "good vs. bad," and then one group trying to take control over another groups' behavior - in regards to property/resources. To me, that sounds a whole lot like "government" or "mind control."

Here's the simple fact of the matter: each person on the blockchain is the owner of their own, various forms of Steem tokens. My Steem, Steem Power, and SBDs' are private property. Yours belong to you. What I do with mine is my own business. What you do with yours is your own business. But, here is the the real problem - here is the REAL question that needs to be asked:

"Is a person using their Steem tokens in a way that is hurtful or harmful to other people?"

IMHO, that is a very important question to ask. If an individual chooses to give away 100% of their upvotes - that's great! It isn't hurting anyone. If another person chooses to use all of their upvotes for their own benefit - that's great! It isn't hurting anyone. Upvoting does not matter - whether you are upvoting yourself or somebody else, the percentages of your distribution towards self or others makes no moral or ethical difference. It merely provides some kind of insight into a persons' level of self-interest, at any given point in time. This level of self-interest will fluctuate with different circumstances in each individuals' life.

Today, "Joe," may have great abundance and feel more generous towards others, while, "Jane," may have great abundance and be completely focused on increasing her own wealth. A month from now, "Joe," may start calling in favors from people, trying to control other peoples' behavior by reminding them how generous he has been to them in the past - and that his future generosity will depend upon others' conforming to his wishes. In turn, "Jane," may then begin making large donations to charitable organizations. So, what at first appeared to be one way, turns out to be a different way, given time.

Quit worrying about other peoples' tools and resources - and start evaluating the CONTENT of what they are promoting. This is not a "data set" or a "statistic." This is a judgment call. Some people follow good judgement, and other people do not.

What we seem to have here is an attempt to divide SteemIt users into groups (divide-and-conquer), lay down some kinds of "rules" or guidelines regarding "good vs. bad," and then one group trying to take control over another groups' behavior - in regards to property/resources. To me, that sounds a whole lot like "government" or "mind control."

That's a bit of an extreme reading. No one is trying to take control of anyone else, but we are trying to influence people, just as you are trying to influence me with your comment. Making an argument based on facts and statistics to show my research, and that of my friend, and continue the conversation on self voting. I am not the only one talking about this. When we disagree with someone in the strongest possible way (down voting / flagging) this is as much our prerogative as self voting is for some others. No one's property is being infringed on.

I am looking honestly to see whether there is any case to be made for whether some behavior can be considered abuse on the platform. Content is only one part of it. Votes determine what is popular, often what is seen, and what is rewarded and to whom. Just because no one is hurt does not make an issue unimportant. Violence is not the only problem possible. And sometimes effects which are harmful take study to uncover.

You mention self-interest, and this is a key point. The system has to be balanced so that self interested actions are mostly directed outward to other people. What you see in this post is the extreme other side, the people who's actions are mostly directed inward. This may not be a problem but you have to actually look to see if it is or not. And this is looking.

You are right that today's behavior does not necessarily predict tomorrow's and that people change their habits. However aggregating public information and looking at it critically is one way to understand larger trends in the population.

Thanks for answering. Personally, I try to hit my upvotes at about a 50/50 split. I honestly have no idea how successful I am at keeping it on that target. I would guess it's within 10 - 15%, probably leaning more heavily towards self-upvoting. It seems like a lot of people in your set of statistics would also fall into that category - somewhere between 51 - 65% self upvoting. If somebody decides to give their money away, 1 time out of every 3 or 4, that seems generous enough to me. I see it as them giving away 35 - 49% of their potential rewards. People who give away that much of their money to charity actually tend to get audited more by the IRS to see if they are committing tax fraud, because very few people are actually that generous. I find it alarming that a significant number of people on SteemIt seem to see it as a "problem" that people aren't giving away more than 1/2 of their money. It actually seems a little bit INSANE to me. Why do people feel like they are so entitled to be rewarded that they are actually now openly questioning and criticizing people who they feel are not generous enough?!?!

This platform is for sharing ideas and others comment. Upvoting is an added benefit not a right. So why should you claim a right to upvote oneself. Someone's comment may shape your direction. Upvoting oneself is just simple although allowed. Let community judge you.

great research

Please help me how to upgtade my skill in steem. @personz

Coin Marketplace

STEEM 0.21
TRX 0.26
JST 0.040
BTC 101158.43
ETH 3649.74
USDT 1.00
SBD 3.20