Re-Thinking Curation

in #curation7 years ago (edited)

Post curation is broken. I think a lot of new users can agree with me on that. A single upvote by a whale can make your post more visible than if it were to receive 1,000 upvotes by active community members at a penny each. Votes can be bought and sold, and the visibility of a post has very little to do with the quality of the content in the post itself, and a lot to do with the poster's connections to powerful people and their ability to use bots effectively.

The biggest problem with the current curation model is that it's not based on viewers. It doesn't matter how many viewers actually enjoyed the content. The only thing that matters is the influence of the accounts who voted on the content - Which might have very little to do with how interesting the content actually is.

Ideally, only four numbers should matter when it comes to content curation (and none of those numbers are the SP of the person casting the vote). In fact, the four numbers that matter the most are right in front of us on every post:

  • How many times has a post been seen?
  • How many upvotes has it gained?
  • How many downvotes has it gained?
  • How many comments does it have?

Using these four numbers, it should be possible to evaluate the quality of a post. If something is getting hundreds of views, but has a very low vote and comment rate odds are that it's low quality content. If something has very few views, but almost every viewer has commented and upvoted it, odds are that it's quality content.

Let's come up with a better way to calculate what posts deserve to be visible on our SteemIt feeds:

V = Post views
U = Upvotes
D = Downvotes
C = Number of users who have commented
A = Age of post

Q1, Q2, Q3, etc = Constants: Some numbers that we will have to choose - More on these later

My initial thought for fair curation is that it should be based on a very simple model: The more upvotes that a post has relative to its views, the better it is.

(U-D)/V

This is nice and simple. Let's take two example posts, A and B. Post A has 200 views, 24 upvotes and a 3 downvotes. Post B has 20 views and 3 upvotes. Under the current system, post A is likely to be more popular because those 24 upvotes likely had more influence than the 3 upvotes on post B.

Under my proposed formula, instead we get this:

Post A Popularity: (24-3)/200 = 0.105
Post B Popularity: (3-0)/20 = 0.15

Given that Post B was more heavily enjoyed by a larger percentage of its viewers compared to post A, we should be giving post B more attention on the platform compared to post A.

Of course this is not ideal. What if post A is a controversial subject? Maybe it inspired a lot of back and forth debate and things got heated, leading to some down-voting.

We should probably build comments into our formula.

(1+C)(U-D)/V

That's a bit better. Every post's popularity is now multiplied by the number of users who have commented on it. Unfortunately this unfairly penalizes new posts with very few comments, while giving a massive boost to posts with heavy comments - Even if they weren't rated very favorably.

A better approach would be to use come constant, Q1, to determine how much we should weight comments in comparison to how much we should weight the value of the initial post. Q's value is a bit arbitrary, and would have to be played with. Q1=10 might be a good starting value.

(Q1+C)(U-D)/V

Back to my example with posts A and B, let's assume A has 1 comment and B has 15 comments.

Post A Popularity: (10+15)(24-3)/200 = 2.625
Post B Popularity: (10+1)(3-0)/20 = 1.65

That looks a bit better. Post B is doing well. It has been rated overwhelmingly well by people who looked at it and definitely deserves attention. Post A is controversial, but has inspired a lot of talk by a lot of different people and definitely deserves some more attention based on the fact that good discussions are happening.

It could be argued that this system isn't weighing downvotes heavily enough. There's an easy fix for that. Add another constant, Q2, to weigh the downvotes more heavily as needed.

(Q1+C)(U-D*Q2)/V

Great!

The nice thing about this formula is that it's self-correcting. Let's say some stupid post somehow gets a lot of discussion and upvotes early on. It rockets to the top of the charts. Now people can downvote it, and it will quickly disappear. Or even if they don't bother to downvote it, just looking at the post, deciding it's not interesting and choosing not to interact will increase the value of V, causing the post to lose some rating every time it's viewed and not interacted with. V will have less of an impact on posts that already have a large V value, so failing to interact with a post that already has 1000 views will be far less impactful than failing to interact with a post that only has 10 views. This means that good quality posts with heavy interactions and upvotes will tend to stay at the top.

The last factor to consider is the age of the post, A. This really isn't a big issue, but there are some minor problems that could occur such as a post being pushed to the very top of the popularity charts because it has 3 views, 3 upvotes and a comment. While it would quickly self-correct within a few views, this would lead to unnecessary churn in the top posts with a lot of posts cycling in and out very quickly.

As such, it would be good to somehow incorporate the age of the post, A into the formula. I'm not exactly sure on how this part should go, but I suspect that we would want posts to have some minimum age (Say, 1 hour) and some maximum age (7 days) to prevent new content from churning the feed 24/7, and to prevent old content from never disappearing.

Would this work?

No.

This is an idealistic formula. It assumes that all voters are well-intentioned. More importantly, it has a utopian vision wherein I assume that all actors are acting in the best interests of the platform. In a world where there aren't bots, this is a beautiful way of curating content. Content with a high percentage of upvotes becomes more visible. Content with a high percentage of downvotes becomes less visible. Content with a lot of views but very few votes becomes less visible. And so on.

Sadly, the real world of SteemIt has bots. So many bots. The instant this formula is implemented, bots would be gaming the system. We'd see services with hundreds of 0SP bots upvoting and downvoting posts in no time at all. Until the bot problems are solved and the community finds a solution to all of the automated voting that's going on, it's very difficult to solve the curation problems.

I present this article not as a solution for present-day problems, but as a future consideration for how curation should work in an ideal environment.

If you enjoy my posts, don't forget to upvote and

Thanks for reading,
-Matt

Sort:  

Hey @weaselhouse, your contribution has been picked by my prototype Machine Learning bot, TrufflePig, as an undervalued post that deserves a higher reward than you were actually given. According to the bot this post is worth 64 SBD. You can find the details about this evaluation here!

Thanks for the shout-out! I'm glad to see that more people are getting involved in addressing some of the curation problems.

If you're not already familiar, check out the @thing-2 bot as well as @gentlebot. It has a very similar goal to those of your bot (and it also found some of my content!)

Lol, my favorite quotes from your post:

"Will this work? no"

and

"Sadly, the real world of Steemit has bots. So many bots!"

Well thought out.

The longer I'm on here, the more I realize Steemit should be thought of as more of a game.

Also the longer I'm on here, the quality and length of my posts decrease dramatically!

Right now SteemIt is a platform that talks about SteemIt. That feels weird to me. "Here's a post on SteemIt talking about how to make money on SteemIt."

Great, but what is the value of SteemIt other than being a self-serving money making platform?

I'd love to see SteemIt grow as a social media platform. I'd love to see more photography, philosophy, pets, cooking, news, etc popping up in the popular posts feed - And I think that can happen, but it will require a major change in how curation is handled.

Yeah that'd be nice, then again I'm pretty tired of photography and pets as well.

How it shakes out, only time will tell I suppose.

I'm faster than the bot. YES it is original ;) PJ-Approved

Also speaking of visibility, you should start adding images to your posts.
It was only by chance that I saw this at the top of my feed.

I clearly need more bar graphs and speculative currency talk.

Seems to work well for @haejin

So I clicked on haejin and holy shit! this guy is posting every two to three hours!

ha, I guess I would too if I was making a hundred dollars a post

He's super controversial because he earns something like 4% of the Steemit reward pool. His weekly payout is around $90k USD, which puts his annual USD income from SteemIt alone around $4.6 million.

A number of people feel that he's killing the platform, because he makes very short posts that don't take much effort and reaps massive rewards from them (Thereby diluting the reward pool for everyone else).

Not sure that I agree that he's the cause of the problem, but it's definitely indicative of a problem with the platform when one user is hauling out 100k a week while others make pennies for similar content.

Oh wow, I wasn't familiar. Now I'm even more jaded!

You've clearly missed some of the drama. Another user, BernieSanders, led a campaign to downvote Haejin. It's still going on, with tons of Haejin's posts being hidden. Meanwhile BernieSanders was downvoted by others and he dropped from somewhere in the high 60s down to level -18.

GrumpyCat is campaigning to prevent bots from voting for posts that are more than 3.5 days old. He's been downvoting tons of bot-upvoted posts for weeks.

There's a lot of people abusing the system. There's also a lot of people trying to fix it. I haven't lost hope, rather I'm trying to contribute, to raise awareness, and to help where I can.

Oh man, Steemit drama. You should post that saga. Entertaining stuff.

Hard to find the content mainly because of the spam. I would never of found this post if I wasn't following you. lol But the spam won't go away when the driving force of steem is to make money.

This post has received a 1.09 % upvote from @booster thanks to: @weaselhouse.

thank you for not down-voting I my self enjoy reading good content. what this is a reflection of the journey down the cloud mining rabbit hole and to offer those Clues to offer those who find themselves here in wonderland.

Coin Marketplace

STEEM 0.20
TRX 0.25
JST 0.038
BTC 97555.31
ETH 3422.82
USDT 1.00
SBD 3.02