How-to solve SPAM and Democratize Steem: Introducing UserAuthority

in #utopian-io7 years ago (edited)

How-to solve SPAM and Democratize Steem: Introducing UserAuthority

I've developed, contributed to, quite some search engine algorithms, and therefore I know from experience how hard it can be to invent, and implement, proper ranking mechanisms, especially in environments that are quite new and/or paradigm shifting; such as Utopian and Steem are themselves. Utopian @stoodkev has already been contributing a tremendous amount of work regarding the Utopian Bot.

Having said that, I hereby propose a number of improvements to the Utopian Voting Bot. I hope this will not only be implemented on Utopian, but on the complete Steem ecosystem. At the bottom of the post, I propose 2 HardForks (HF) based on UserAuthority.

UserAuthority (UA) seen as a Probability Distribution

Inspired by the inner workings of search engine Google, I hereby propose the metric UserAuthority corresponding to the principal eigenvector of the normalized follower matrix within the Steem ecosystem.

Let:
UA(A) = (1-d) + d[UA(1)/C(1) + ... + (UA(n)/C(n))]
where
UA(A) = UserAuthority of user A
UA(x) = UserAuthority of user X
C(x) = the total amount of users user X is following
d = damping factor (set to 0.85)

In layman's terminology, this probability distribution can function as an authoritative perceived quality metric for every "post", "upvote" and "downvote" a user transacts, based on the entire Steem follower graph. The probability that a random user is followed (liked) is its UserAuthority. This mechanism effectively deals with fake and/or bot accounts as well, making it nearly impossible to deliberately mislead the ecosystem in order to get higher author rewards. This principle is intuitively justified both if many users follow a user or if just a few users having a high UserAuthority follow a user. Consequently, if spam accounts or bot accounts mainly link to eachother recursively, each will have a low UserAuthority within the entire Steem follower graph.

Nota bene: it is possible to manually ignore known spammer / bot accounts from the cumulative follower graph (which might be needed due to a large amount of people following - for example - the heavily debated booster bots in order to not miss their notifications).

Mathematical Proof by Example

Please regard the following simplified example-follower graph. The unidirectional arrows represent a user following another user.

follower-graph.png

Update 1: I hereby add the link matrix following from the above total example-follower graph:

link-graph-per-user.png

Update 4: Note that by simply saving, for every account, which accounts that account follows (read matrix left-to-right), as a consequence, reading the matrix top-to-bottom shows which accounts follow that account.

Please also regard the iteration data: in just a couple of iterations, the UserAuthority mechanism succesfully identified spammers / bots via the total follower graph.

Update 2: Instead of Gaussian elimination (row reduction via Gauss-Jordan) I have used an iterative approach to reduce computational cost for re-calculating the UserAuthority binary index daily.

Update 3: Using Gaussian row reduction would be very RAM-heavy in case of a large follower graph. However, by solving many simple follower equations per user, every iteration better approximates each user's UserAuthority. Its computational complexity grows linearly in stead of exponentially when more users are added to the follower graph. The "1"'s at iteration 0, are simply an entry-approximation where every account is weighed equally. However, if you would randomly choose numbers at iteration 0 as an estimate for UserAuthority, in stead of all the "1"'s, the iterations eventually end up at the same equilibrium state (it would only take some more iteration steps to get to that equilibrium state).

Screen Shot 2017-11-17 at 18.03.33.png

Technical implementation

The Steem blockchain stores all data in chronological order, yet in order to use UserAuthority for algorithmic curation purposes, a pre-calculated reversed binary index holding all results shown in the Excel example is needed. This data can be retrieved via the BSON data using SteemData's MongoDB interface.

Update 5: Only a subset of the SteemData's MongoDB interface is needed to construct the follower matrix:
{uid: [follows], total_follows: total_follows}

Further implementations

It is possible to expand on the UserAuthority mechanism by adding weights to manually assigned trusted "witnesses" in the form of content moderators. In a Utopian-IO context, a contribution post is only allowed to be voted upon after manually assessing it via a topical moderator.

Update 6:
UserAuthority (UA) not only allows for algorithmical curation mechanisms, but it can democratize the entire Steem ecosystem if adopted widely.

For example:

Proposal for HF21: a minimum amount of UA is needed to downvote
downvotable(user) bool = (UA(user) >= UA(threshold)) ? true : false
=> that prevents "flag wars"

Proposal for HF22: implement UA to curate monetary rewards (author / curation):
upvote_reward = UA * SP
=> that effectively combats SPAM-rewards / self-upvoting / delegating SP to multiple self-owned bots for self-upvoting.

Nota bene: the only influence needed, for any user disagreeing with the actions of some other user, is by simply unfollowing that user.



Posted on Utopian.io - Rewarding Open Source Contributors

I've published a follow-up article:

UserAuthority (UA): explanations, applications and implications

Sort:  

While I'm largely undecided on the details of how this could be implemented, if it should, and specifically into what) - I think there's a discussion worth having here.

Using the quality of a user (based off followers) to gauge their rewards/impact is a relatively new concept that I'm seeing used more and more in some of the 3rd party services. Based on how they're performing - it might be good to apply the concept to the overall Steem ecosystem or perhaps SMTs.

spot on mate.. speaking volume here.. upVoted so as not missed! @cnts :]

Hi Jesta, via DMs, 2 people suggested to me that I should forward this via email to somebody called "sneak". Do you know if that's the same person as account @sneak ? Do you know this person "sneak", and could you perhaps ask him/her to have a look? Or do you think I should send the email?

Yes, that'll be the one.

Yup, @sneak is one of the guys on the Steem team.

Have you considered applying the SVD calculation to speed up computation and basing your calculation on the top 90% of the eigenvalues?

No, not yet and I don't think it would matter that much. As comparing users to pages, on steemit, only a very small amount of users (let's say 500k) compared to pages on the web (a lot more) exist. Computationally it doesn't currently seem to be a problem, so optimizing it would bring little added value. And also, the total follower graph doesn't change drastically daily: a lot of new relationships may be formed and some (a lot less) are terminated, yet all in all that wouldn't make a big difference to the overall end-results. So daily re-calculating the UA binary index (containing about 8 bytes per user, or more if more information is stored in it), is fine.

This is great. I've seen so many 000.1 spam comments, not to my account (only recently ascended to a level where they notice me) but to others. Seems like it would be a good way to ensure value to Steem, limit the garbage in the chain and add trust. Great explanation of the concept.

I might write-up another post on how to combine Keybase.io as an encrypted private chat layer built-into the Steem blockchain! No more 0.001 SBD public memo notifications then! ;-)

That would be great. That would take the blockchain much closer to the traditional Social Media platforms or at least give it the ability to extend apps towards that goal. That would definitely add another dimension.

The Steem blockchain already allows for encrypted messages, yet a de facto private chat layer has not been implemented into Condenser yet (= the technical term for the Steemit front-end).

Thank you for this great article. I wish you success because the way to write a good article will get to the best. Thank you again

ok now I'm thoroughly confused, lo9l! but thanks for taking the time to write this blog.

Still confused? You can ask questions if you want, I'm happy to respond!

woow owwo and woow ... definitely perfect !

Very interesting. The SMT white paper talks about oracles for controlling who can access an SMT rewards pool and I wonder if algorithms like this might be more effective and/or help aid oracles in making decisions about accounts and the value they bring (or take away) from the network.

I'd love to see an open market place for various spam prevention algorithms that users could implement to improve their own experience. Maybe it could involve shared mute lists or something similar along with regularly published reports of abusers taking from the reward pool but adding no value so that certain accounts like @steamcleaners (or something similar) could go through and downvote them and other busy whales could delegate some SP to help.

May I add a simple suggestion for a bot. First of all I'm not a programmer. Just a smart guy who love steemit.

You could use a bot to check for repeated comments of a user. Lot's of spammers copy/paste their spam. There are also words like upvote,plz,follow used repeatedly. If a bot could be made no find users with high amount of copy/paste content and for common words used in upvote begging, It could generate a list for whales to review. Then those who are clearly spamming could be picked and put on a public list.

I'm very happy to see many people fighting the good fight. You guys give me hope regarding the future of steemit. My simple suggestion wouldn't go too far. Spammers will adopt. I actually came across serial spammers with reputation above 50. It's just nuts: https://steemit.com/steemit/@vimukthi/serial-spaming-your-way-into-a-reputation-of-56-how-did-this-happen

Wish you guys best of luck and hope my suggestion helped :-)
@vimukthi

Hi @vimukthi , your suggestion on how to algorithmically detect spam / repeated comments (= content analysis), could be another extension of my UserAuthority (UA) spam identification capabilities (= user authority / popularity analysis).

Your suggestion could also be implemented stand-alone via bi-directional hashing encryption, by calculating "character proximity". Lots of difficult words here ;-), but your solution on its own is not really needed. Compare this principle to copy-pasta webpages: it is impossible to stop authors publishing such webpages, but it's Google's only task to prevent those pages getting to the top of the search rankings (SERPs).

Seen from a Steem ecosystem perspective, it's merely important to stop those comments from receiving high-value author rewards. Hence the need for my UA algo to be implemented Steem-wide (See my HF22 proposal UA * SP at the bottom of my article.)

Mathematically, it looks like your UA algo will work just fine :)

Some folks may not like it, but as with all algos there's always room for improvement!

BTW: here's a link to an article on "bot tells"
https://steemit.com/steemit/@torquewrench1969/tips-i-use-to-identify-bot-accounts

Spot on! ;-)

Another post comming out about this project shortly!

An equivalent of this algorithm was used, in another form, as the only relevant algorithm, when Google launched in 1998. In this form, the same problem Google had in 1998 is addressed: which page should be at the top results, ref. which user is to be regarded as authorative.

This is awesome. We need some sort of change like this. And whats even better is you explained it with my lanaguage, Excel! Steem on, I have upvoted and resteemed

Haha, well, I like Excel for explaining stuff and quickly setting up a POC-app-ish kind of thing....
And I like your work, as well as yourself, as well, Paula :-). Steem on!

I love when I make new friends, steemit is awesome for that anyway!

I feel the same! :-)

This is fantastic! A tremendous idea and post, genuine and heartfelt feedback, from a diverse community, conversation and dialogue leading to meaningful changes! I have no idea what your talking about, but I love every word!!!

This is also a cool comment! :-)
It's perfectly okay if you don't understand the math. But let me try explaining as simple as I can:

  • UA is a derived metric (all the data is in the blockchain already).
  • The math behind my UserAuthority (UA) algorithm calculates how "popular" an account is, by taking into account who follows that account. And that's done automatically for every user in the entire Steem system.
  • not all follows are weighed equally
  • randomly clicking any user's follower list (who follows that user) is that follower's UserAuthority. So it is the probability a random click has to get to said account via its followers.

Okidoki? Better? Better...
(Otherwise, eat a Snickers! ;-) )

Much better I get it! I love snickers by the way, eat 'em all the time. Very gracious of you to make this effort to explain, appreciate that as much as the explanation itself. Thank you.

Everybody needs to understand in order to be capable of "judging" what I propose!
I gradually updated my article with more explanations day by day. At first, people seemed afraid to comment (maybe afraid of looking like an idiot?). But the smartest people around, are the people asking questions by first outing they didn't understand.

I get confused with cooking myself! Although my baked eggs are nice ... :-)

New users, myself included, who have little/no userauthority, will be equated to the bots - is my understanding of your proposal correct?

Good question! Short answer: no, you're not seen as a bot, at all!
Longer answer: the proposed mechanism is extendable with all sorts of metrics, that combined identify bots / spammers extremely accurately, distinguished from new users, like you, and more prominently: me! (my own account is brand new!).

Examples:

  • very low UA, very high amount of posts (that's a sign, right? lots of messages but nobody follows..)
  • very low UA, very high rewards (that's even stranger!)
  • very low UA, yet it has lots of followers (spooky....!)
  • very high UA, zero posts (hmmm, that seems like a passive whale account...)
    etc.

PS: in case an account has zero followers (new accounts), they are not found on the total link graph. Yet they are easily includable via a "NewUsers bot", detecting accounts with zero followers, and following them for inclusion within the follower graph.

Thank you for explaining this! I know I am not a bot, let's hope Steemit will also know it : D

Steem then first needs to implement my code via HF21 ;-) But Utopian-IO probably will do so before that!

It's actually fairly simple and really elegant. Not sure if it was your explanation or the concept itself, but I like it. Thanx for the hard work, Steem and Steemit need it!

Thx! When first publishing the article I just wanted to help Utopian improve its bot algorithm (then v1, now v2, hoping to implement UA in its V3 the coming days). Yet only after publishing it, a few hours later, I realized the very same algo can "sanitize" Steem.

I encourage everybody looking at this page to openly express their doubts, questions, etc.

Couldn't this result in smaller users being penalized unjustly?

Good question! Steemulator asked the same question, so hereby the same answer :-)

Short answer: no, they're not seen as a bot, at all!
Longer answer: the proposed mechanism is extendable with all sorts of metrics, that combined identify bots / spammers extremely accurately, distinguished from new users, like me (my own account is brand new!).

Examples:

  • very low UA, very high amount of posts (that's a sign, right? lots of messages but nobody follows..)
  • very low UA, very high rewards (that's even stranger!)
  • very low UA, yet it has lots of followers (spooky....!)
  • very high UA, zero posts (hmmm, that seems like a passive whale account...)
    etc.

PS: in case an account has zero followers (new accounts), they are not found on the total link graph. Yet they are easily includable via a "NewUsers bot", detecting accounts with zero followers, and following them for inclusion within the follower graph.

Coin Marketplace

STEEM 0.19
TRX 0.24
JST 0.037
BTC 95716.86
ETH 3329.80
USDT 1.00
SBD 3.02