Overview of a Field Study on Feedback and Reputation Guilds
With SteemKURE, I had the idea to provide a network hub for people to create their own curation guilds or groups for evaluating specific types of content.
source
To explain how this is useful, I had the idea of an analogy to a workplace to demonstrate how Steemit would work in a more real life situation that people could relate to. I went looking to see if anyone had used guilds, or something like this idea I had, in the real world.
A few weeks ago I found a paper that talked about the quality work amongst guilds, which struck me as very salient considering that is my focus for bringing about success for Steemit. The title said more: Crowd Guilds: Worker-led Reputation and Feedback on Crowdsourcing Platforms. It's a field study conducted by researchers from the Stanford Human-Computer Interaction group.
I will present an overview of this paper, as it helps to provide some ideas for how things can develop on Steemit.
You can read the original if you want. It's worded in more academic terminology, and almost twice as long as this post. Sorry for the length nonetheless, I was originally not going to do a post, and just use this paper to learn, highlight sections, take notes and write ideas. But I think this is of value for everyone so I decided to make it an easier read and highlight the important things. I found validation for my understanding of where I was going with SteemKURE (Kindred United to Reward Excellence).
There is a popular crowdsourcing platform right now called Amazon Mechanical Turk. This is a decentralized workforce where each member distributes their own independent work as they see fit to get rewarded by those seeking to have a specific task accomplished. Those doing the rewarding are evaluating the work done by others before they issue the reward.
The aim of decentralization is to encourage accuracy in many areas through independent judgment, not a centralized body that judges everything. Except decentralization makes communication and coordination more difficult. This disempowers workers who are then forced to form worker-collectives off of the platform to better network and organize themselves.
I'll interject here to mention the similarity with Steemit that you might already recognize. A lot of people are frustrated because of the lack of ability to interconnect with others such as having a messaging platform built into the site, or groups that people can form in order to communicate more efficiently such as Facebook has.
So the result of workers having to form collectives off of the platform is disenfranchisement with an unfavorable workplace environment since they can;t effectively give each other feedback within that environment.
Why do these crowdsourcing platforms form?
To encourage accuracy and independent judgment, which should produce higher-quality work to do better than the others so that your reputation is known as someone who does things well, and you are sought-out for such a reputation.
Decentralization is thus motivated by high-quality work.
You want to rely on someone's abilities to be able to do something. Reputations help to provide a measure of that ability. In a decentralized block chain, those who host nodes to process blocks in that chain are depended upon to provide a certain quality of service such as uptime to ensure the network stability.
While this is true about positives for decentralization, it also has the effect of undercutting behaviors and institutions that require high-quality work for their success. In a traditional model with centralized worker coordination, centralization tends to improve work quality, skill development, knowledge management and performance ratings.
When decentralization occurs, reputation tends to arise as a measure of quality.
Normally centralized performance reviews determine the reputation of the worker.
Decentralized feedback from independent requesters can be used to generate reputation scores, such as is done by Uber and Upwork. The research paper refers to these types of reputation scores as "notoriously inflated and noisy making it difficult for requesters to find high-quality workers and difficult for workers to be compensated for their quality".
Guilds were created as a strategy to coordinate a decentralized workforce.
The Middle Ages saw the development of trade guilds with respect to a trade's distribution across a large area. This helped to train apprentices, setting prices, and engage in collective action.
source
Trade and craft guilds measured and certified the quality of work that the members engaged in order to ensure that the product they were creating within their guild was of sufficient standards. Guilds lost influence over the trade due to exercising excessive control on trade, but still survive in professional organizations today, such as engineering, acting and medicine.
Since there is a bias, favoritism, and selective evaluation that influences the work created by someone, a more honest evaluation method is required to determine the actual quality of work itself.
To judge the work for the work, not who it comes from.
The paper presents the idea of crowd guilds, which are crowd worker collectives that work together to certify their own members and provide feedback for improvement.
source
The above figure lays out the method for crowd guilds to engage in continuous double-blind peer assessment. Random samples of the work submitted on the platform are rated for the quality of its submission and critiques are provided for further improvement.
The peer assessments serve as a reputation and qualification signal within the platform itself.
As workers gain more positive assessments from senior guild members, they rise within the guild. This translates into higher wages through recognition of the reputation and qualification measurement from peers on the platform.
Crowd guilds can address challenges of collective action to agree or reject proposals, and provide formal mentorship of feedback and training and social support through forums or groups.
Microwork and micro-tasking is to split up and breakdown a large volume task into smaller tasks that can be done independently. This is also known as ubiquitous human computing or human-based computation due to the requirements of the task being too complex for distributed computing. Another key point of microwork and microtasking is that they require human judgment.
Visible and invisible expertise is required to perform these so-called simple tasks. The need for expert high quality micro task work has led to private qualification groups of experts.
A two-week field experiment to evaluate 300 workers from Amazon Mechanical Turk was conducted. There was one control group, and one crowd guild group of 104 workers randomly assigned. The crowd guild group and the control group were both given tasks, with a forum and automatically generated peer assessment tasks. The control group, however, never returned the peer assessments to the worker to be evaluated.
The peer assessed ratings from the guilds better represented the actual accuracy of the worker than the acceptance rate of a task on Amazon Mechanical Turk. Members of the guilds also provided one another with** more actionable feedback** and advice compared to the control group.
Improving work quality
The following methods have been used to gauge and enhance the quality of groundwork:
- voting by peers
- establishing agreements between workers
Task-specific feedback helps workers improve their behavior and performance.
Work feedback can be collected by:
- requesters
- workers
- self-assessment
- expert evaluations
Tutorials can help people reflect on their work and learn skills to enhance the quality of work.
Peer assessment driving quality
Workers can bootstrap their own work reputation metrics if they can effectively assess each other's work.
This is more appropriate in terms of scalability when external assessments can take more time and can be costly if the size is large. Peer reviews are highly scalable and reliable among workers of the same expertise.
To effectively assess each other's contributions, getting assessors to evaluate other work should be based on their previous performance and their estimated quality of work. You wouldn't want to have someone who doesn't know how to evaluate other's work, engaging in an evaluation of other's work and improperly evaluating it. That would defeat the purpose of the reputation and quality promotion within guilds.
Since there is different quality of work performed, there can be a hierarchy of trusted workers that develops. As each person is evaluated by others, these peer assessment results can determine who was qualified to evaluate which type of work that they are familiar with. Those who review can become better at crafting work themselves through the advice and feedback provided by the peer assessment process.
It becomes like a distributed mentorship with feedback from those with more expertise results in an improvement in quality overall. The quicker the feedback the quicker someone is able to improve and increase their mastery of their work in creating something.
A crowd guild can introduce a review framework where reciprocal assessment of the work provides constructive feedback and quality-based reputation categories.
By members evaluating each other's work accuracy, it creates a more accurate rating for individual pieces of work, and also a more stable and accurate reputation system.
Guilds
source
Guilds historically represent a group of workers with shared interests and goals.
Their organization as a collective enables evaluations of those shared interests and goals to develop a reputation system.
Originally, the medieval development of guilds were associations of artisans and merchants that controlled the practice of their craft through apprenticeships and training of skills only they possessed. Survivability depended on the knowledge you have. If you could keep your knowledge secret, where only certain people were allowed to learn it, then your knowledge gives you power to do things other people can't. This is how secret societies developed, then through crafts and craftsman. But secret societies and guilds have developed for thousands of years in other areas of information.
Internal quality evaluation processes are used to progress members through a system of titles, such as apprentice, journeyman and master. Craftsman and their guilds prided themselves on their collective reputation that was afforded to their craft and the high-quality work represented. This allowed them to demand premium prices because of this renowned reputation and quality. Some guilds would fine members who deviated from quality standards to uphold that reputation.
These traditional work models are not adequate for the modern location-independent employment patterns. Freelancing and contract work is popular, but they don't provide the common work benefits like economic security, career development, training, mentoring and the default social interaction that happens when you're around others. Much of this support can be obtained by freelancers through professional organizations, except some of the value of guilds is missing.
The modern era has seen a resurgence in guilds through Massive Multiplayer Online games (MMOs).
Through the cooperation of guilds, players are assisted in leveling up, coordinating strategies to overcome a challenge, and also develop a reputation system amongst themselves.
Crowd guilds are a way to establish a feedback system that promotes advancement of skills and quality, and can operate at different scales with a distributed membership. Training is possible, along with collective action in response to issues.
Collective action
Despite being located in various locations, independent workers can collaborate and share information through forums. They can help identify high-quality work and reliable requesters that offer employment opportunities. But since the forum is hosted outside the marketplace, obtaining reputation information and bringing about larger scale collective action becomes harder.
Decentralizing from a main information hub fractures the interconnectivity of the larger marketplace of attention it provides. This is why one-stop locations, such as Facebook that allow people to do many things in one place and interconnect easily, are very popular.
Some requesters can have odd selective behavior as to what work they accept and reward, leading to workers becoming frustrated for not being rewarded for work they do. Workers can also become frustrated due to having a limited say in how the platform policies are cultivated and adopted. Unfair rejections or demands from the reviewers can be publicly discussed, but workers tend to have limited opportunities to impact the platform operation resulting in less accommodation of emerging needs from the worker base.
Crowd guilds seek to solve this through a peer assessed reputation and provide signals for who can be more trusted.
Crowd Guilds
A collective evaluation of members, through a peer feedback reputation system, allows workers to coordinate and manage their own reputation.
Crowd guilds can be built to focus on specific types of tasks and cultivate certain level of expertise and quality standards for the work done related to particular categories.
The peer assessment framework allows workers to review each other's work, provide feedback through critical evaluation of the work, and establish visible guild worker levels that demonstrate their qualifications in a specific category.
Standard reputation information in crowdsourcing platforms have freelancers, with acceptance rates and five-star ratings for example which tend to be highly inflated.
Historically, guilds have stepped in to guarantee quality of the members within their platform, and this has been carried out to professional societies that ensure quality standards as well. Crowd guilds will form around a community that can effectively assess the skills of the members within it. Management of reputation is assigned through guild levels. Samples of the members work can be anonymously selected for review by more senior members to evaluate and determine the reputation level based on the work they provide.
Peer review
Online peer assessment is used to regularly filter out low-quality work in crowdsourcing platforms. Piers and professionals are generally accurate in assessments of each other, but those with more experience and authority garner a greater degree of trust and reliability. This means that a better evaluation should be conducted by guild members that are ranked one level above, such as level 2 evaluating level 1, level 3 evaluating level 2, etc.
Reviews are generated by sampling a percentage of the worker's submissions, which are rewrapped as an evaluation task and posted back on the platform as a paid assessment task for evaluation. In these crowdsourcing platforms, the reviews are double-blind where the worker doesn't know who reviewed them or which tasks were reviewed. Similarly the reviewer doesn't know which worker they are evaluating. Since reviews are conducted by those with one level of difference in their quality of work, the expectations for quality from the reviewers are reasonable.
source
- Four-point scale for evaluating quality
- subpar, more appropriate for level N - 1
- at par for current level N
- above par, appropriate for level N + 1
- far above par, appropriate for level N + 2
Will the requester likely accept the work (not if "you" would but if they would).
Suggestions for the worker to improve, and promotes critiquing through the statements "I Like, I Wish, What If". This promotes motivating higher quality responses that are more actionable for the workers to apply as feedback to their work.
Quality standards to progress to the next level are higher, but they aren't explicitly set for any level. They are interpreted by those of that level, as they are the reviewers who evaluate and decide who gets to join that respective level.
To prevent senior members from abusing their power, reviewers are also reviewed themselves as a meta-review. These reviews are not performed by members of a higher quality rank alone, but are done by all members of the guild. This allows unfair reviews to be recognized more readily and the bad reviewer punished for their behavior.
Review tasks are paid. This payment can come from the rewards being issued, from all the workers themselves, or the platform collectively, by collecting fees or through donations.
Platforms themselves will benefit from crowd guilds, with the workers who create the content on the platform that determines its reputation for quality and attracts more people to it as a result, with workers benefiting the most through an increase in earnings as they level up. Exacting a small cost from each work task completed allows for funds to pay for the reviewing of tasks. Reviewers are to conduct reviews after 10 of their own tasks are completed, and they are allocated 10% per task that they do review.
As with all information, your skill set determines your ability to get paid, just as was done in medieval times with secrecy of information, and many people do not want their work being viewed by other workers. They want to protect their information. They can opt out for specific tasks submitted to not be used in the review processes. This will unfortunately limit their work from being assessed and provided with feedback.
Levels
source
Promotion to a new level is determined by peer-reviewed feedback.
The four-point measure from the review is converted into scale of 1 to 4, and a moving average is calculated across 10 reviews. When the moving average reaches a threshold, their reputation level is updated.
source
After a new level is reached, the moving average is reset. The moving average is convenient to allow people to learn from their mistakes and continue to improve, and not be held back by one mistake that turns into a permanent mark on their reputation. Through the reputation system, workers are incentivized to continue doing good work or else they will suffer a leveling down of the reputation if they consistently produce low-quality work.
In order to level down, the threshold requires 90% of reviews to only get a 1 out of 4 rating. This should be tuned to deal with larger numbers of workers and larger numbers of tasks performed at those larger scales.
Exploring social effects
The focus of the paper was on the attention to reputation, but crowd guilds afford other engagement to support their trade and the welfare of their members. There is informal social engagement through their collective spaces, group determination of wage levels, and collective rejection of inappropriate work.
The hourly wage for work performed at each respective level is calculated through an average aggregation of all members wages at that level. This allows for more flexibility in the payment rewards offered for tasks that are submitted by requesters.
The guild can collectively reject tasks that are requested which disregard ethical wage standards. If only one worker rejects a task it's sent as notification to the requested, but if many workers of the same level reject it then the task is removed from everybody's feed in the guild. 3% of workers at a level rejecting a proposal will result in the removal of the task from the platform.
Results
The control group completed 13,427 tasks, with 113 tasks on average per worker.
The guild group completed 15,176 tasks, averaging 87 tasks per worker.
The reputation signals are also more representative of ground-truth accuracy. There is twice the degree of accuracy between the ground-truth and the average peer assessment ratings in the guild group, compared to the control group. This results in less inflated ratings and more accurate reputation system.
source
A lower score is preferable as it indicates reviewers are more discerning and apply higher ratings sparingly because it reflects upon themselves collectively as a guild or platform.
Assessing crowd guilds’ qualitative impacts
The study has so far demonstrated more accurate reputational information from guilds, but it also looks at the effect on the community. The forums and feedback also produce collective identity and ideas for improvement within the guild or platform.
Crowd guilds pragmatize feedback
Both crowd guilds and control group had people sharing know-how or awareness of platform features, while moderators tended to focus on bugs and bringing up issues. Guilds are often more focused on pragmatic feedback to develop strategies for improvement in support, while the control group workers were more likely to offer emotional support rather than informational support.
Control group response to an issue being raised:
I’m sorry. hugs I hope your day gets better. Maybe there is something the people running the study can do to fix the problem.
Guild response when the issue of reviewing not being it available to everyone:
I have been able to do review tasks since being upped to level 2 /shrug
Reviewers in the guild group also used fewer characters compared to those in the control group, while also being generally more focused and pragmatic in their responses.
Crowd guilds improve the accuracy and effectiveness of peer reviews by professionalizing it, contrast to the control groups who tend to personalize their reviews. This can explain why the reputation levels are more based in the ground-truth as they're not letting their personal emotions cloud the evaluations of other people's work.
Many independent workers preferred a fully decentralized platform: they didn't like the review and leveling process that removed independence from the worker. Those who found success in the current crowdsourcing platform, such as Mechanical Turk, see an increased interdependence as a threat to their success that carries out to their livelihood, stability and freedom.
The shift from a mere acceptance-rate of tasks being rewarded by a requester, towards a social process of peer assessment, can result in disagreements and concerns between community members. The "us vs. them" dynamic of a worker vs. a requester is replaced with a "us vs. us" dynamic of one level of worker being reviewed by another level of workers. It does enable high-quality reputation metrics and community feedback, but can also cause strife in disagreements. To counteract this possibility, the guild should develop clear measures, norms and accountability processes to overcome these issues.
The crowd guild experiment was carried out with 104 workers, but large-scale crowd guild dynamics might operate differently. The most effective route is likely to allow workers to create their own guilds (SteemKURE) and join berries different guilds that tailored to specific task qualities and accomplishments. Creating large guilds that are hard to define and differentiate between certain types of tasks or content that don't match very well, is something to avoid.
The workers were also paid to participate in the crowd guild experiment, so whether people would prefer this in real life is not determined.
For a crowd guild to succeed, high-quality workers and requesters must remain within that social system for long periods of time and have a stake in making it useful. The image and reputation of quality that the guild can create or evaluate depends on the quality of its members.
The ramifications of crowd guilds
The forum for the crowdsourcing community was used as a water cooler, but the guilds shifted the forum towards a work environment.
Guilds are good overall, but there are also problems that can occur in long-term unfair ratings which can breed distrust in the community. This can lead to an oligarchical guild which creates further resentment and distrust internally. The meta-review process, of reviewers being reviewed, or the ability to split off and create your own guild with others, might be significant pressures to keep corruption in check. This again depends on the collective pressure of individual members taking action to fix issues.
The first step of crowd guilds is usually to begin with a narrow goal and pragmatic design which is aimed to improve the methods to establish an accurate reputation system and fairer pay on the platform. This secondly translates into a centralization of worker collective action to solve problems.
The researchers are looking to analyze variations in guild governance policies to better understand self-governance, long-term engagement and satisfaction. Questions about leadership, policy control, making decisions and the allocation of income, are more complex as the scale grows. As there are more workers and more diversity, this will require different guilds to form around specific communities or groups, and types of tasks or content.
Brief Analysis
One thing I wanted to emphasize at this point from the research, is about decentralization where I wrote:
Decentralizing from a main information hub fractures the interconnectivity of the larger marketplace of attention it provides. This is why one-stop locations, such as Facebook that allow people to do many things in one place and interconnect easily, are very popular.
Steemit lacks interconnectivity features which limits interactivity in the platform. I had opened a github for a PM system soon after I joined as I saw that a major issue which is one underlying root causal factor for much of the the behavior and subsequent loss of users from dissatisfaction with the functionality of the platform to allow them to connect with others when they start up.
Whether it's a new sub chain, or a centralized messaging service, it doesn't matter to most users, it's not blockchain content so make it centralized. But the point is to get messaging functionality within Steemit based on the existing Steem users so networking can happen in a more convenient and interactive way.
I also opened a github to "watch" or follow certain activity, like post activity, such as knowing when people comment on a post if we want to be advised on new comments on a post we like. Currently, we have to check back on a post to see if someone has commented on a post we want to get interested in providing feedback on.
Getting these two things in will greatly improve user interactivity and retention. This is an underlying problem for the success of the site that needs to be resolved. Or maybe the idea is to have another site, like Busy, or something else, build a full feature social media package that appeals to the modern user. I don't know.
References:
Thank you for your time and attention! I appreciate the knowledge reaching more people. Take care. Peace.
If you appreciate and value the content, please consider:
@krnel
2017-01-14, 5:19pm
Hello @krnel,
Congratulations! Your post has been chosen by the communities of SteemTrail as one of our top picks today.
Also, as a selection for being a top pick today, you have been awarded a TRAIL token for your participation on our innovative platform...STEEM.
Please visit SteemTrail to get instructions on how to claim your TRAIL token today.
If you wish to learn more about receiving additional TRAIL tokens and SteemTrail, stop by and chat with us.
Happy TRAIL!
:D Thanks again.
Wow, what a great level of understanding of the situation and writing quality.
Thanks a bunch for sharing this rather important information with us all.It is in deed, a crucial aspect of this site that many have a hard time grasping in its entirety, if it is even possible, but you have done an amazing job at circling the matter and shed light on many of its subtle aspects.
If we were to build a different site we would loose a lot of people again I think. So, as much as it might be task intensive, the build-in of a more interpersonal and interactive interface would not only deter people from leaving the site to use other services somewhere else and by the same token lose the potential of hosting such services while having automatic feedback from our communities, but it would also propel the popularity and effectiveness of our platform already revolutionary in itself. There's been work going in that direction and, as an example, have enjoyed the potential offered by a site like AutoSteem for instance, where the degree of effectiveness also offers extra connectivity outside the site and within it as well. Maybe it could be integrated in some form or another.
Anyway, we could go for a long time on this one, but I hope my two cents might be helping in one form or another. Thanks again for an amazing piece of information so crucial to the development and success of our platform. All for one and one for all! Namaste :)
Thanks for the feedback, missed it previously.
Nice meaty stuff! - have you read any of my past writes, generally about coming up with a account-tagging gig stuff, and the future of work?
I have read some of your work, but can't be certain about account tagging "gig" stuff, is that based on the "gig economy"? Tag accounts? Tag you're it, next in line for the gig?
Nope, a consensus tagging mechanism for establishing the skillset of accounts, then link it up to an on-demand gig fulfilment system using machine-learning. But of course, the size of the market needs to be large enough, although I think it's a good time to start. @sirlunchthehost has been doing a very 'ghetto' version of that
I am all for a shift towards objective recognition of quality work people do to give value to the platform. SteemKURE is how I planned to get people to network and do that on their own, and eventually something like a reputation for quality forms around certain groups. Ratings of skill is a good thing to add like this post does. It would work per group that curates specific types of content. Not sure how to apply a metric to value the group yet though hehe. Thanks for the feedback, appreciated.
Very good read @krnel!
"Guilds are good overall, but there are also problems that can occur in long-term unfair ratings which can breed distrust in the community. This can lead to an oligarchical guild which creates further resentment and distrust internally."
This is an issue I identified early as a possibility with SteemTrail as the project progresses to community run websites. As the keys are turned over to the community, there will need to be some experimenting with self governance.
Definitely some fun times ahead.
Hehe yup ;) thanks for the feedback.
I'll have to admit that I didn't read the entire thing, but upvoting to support all similar work.
That also sounds a lot like how a corporation functions.