ファブリス
Administrator
From: Belgium
Registered: 2006-06-14
Posts: 4021
Website
The new Study page will use a up/down vote similar to Amazon's "Did you find this story helpful ?' (Yes) (No)
The formula needs to weigh the number of votes, a story with 9 out of 10 helpful votes should rank higher than one with 3 of out 3, simply because more votes means the votes are more credible (both up and down).
Amazon uses something like a "bayesian" rating:
http://www.thebroth.com/blog/118/bayesian-rating
But to keep things simple I want to avoid having to rate 1-5 stars ON TOP of voting the helpfulness of a story.
Thus, I'm using a simple formula :
$rating = ($yesvotes / $numvotes) * ($yesvotes / $avgnumvotes);
yesvotes = number of "helpfulness" votes answered YES
numvotes = total number of "helpfulness" votes (YES and NO)
avgnumvotes = total of "helpfulness" votes for all stories on one page, divided by number of stories, hence average number of "helpfulness" votes per story (YES or NO)
I have no idea what my formula means or what the graph looks like, I just know it works. Anyways, if you have suggestions for a better formula, fire away!
The trick is to do something like Bayesian but in the absence of stars, the "rating" here is the number of "helpfulness" votes. This may be wrong, but I've ran a few tests and the formula does a decent job of weighing the votes, much better than plain sorting on number of votes.
Edit: yes yes Wikipedia.. edited out my pointless self-derision.
DeadNight
Member
From: Israel
Registered: 2008-06-25
Posts: 12
The Bayesian rating system assumes that the `real rating` of an item should be between the average rating of all items and the actual rating that said item have, depending on the amount of votes it has.
The more votes it has, the closer its rating should be to the actual rating. The less votes, the closer to the average.
So, the first step would be to find the average of the actual ratings:
average_rating = Σ(actual_rating) / number_of_items
Where Σ means the sum of all items.
Now, we have to decide where to put the rating of the item between this average and its actual rating:
|-----------------|
average actual
We can assume that the item which has the most votes is the most accurate, and as such will have the actual rating, and can be used as a measuring point.
So we need a modifier in the range of 0 to 1, while 1 represents the item with the most votes:
votes_modifier = 1 - ( ( number_of_votes - max(number_of_votes) ) / max(number_of_votes) )
Now we have all what we need, and all that is left is to calculate the real rating.
We start with the average rating as the base value, then add the difference between the average and the actual rating, modified by the votes modifier:
real_rating = average_rating + ( ( actual_rating - average_rating ) * votes_modifier )
Hope this helps
.
DeadNight
Member
From: Israel
Registered: 2008-06-25
Posts: 12
You're welcome, I'm happy to contribute my share to the community
.
Actually, an item with 10/90 shouldn't be on top of 3/3.
The item with 3/3 has an actual rating of 100%, while the item with 10/90 has an actual rating of approximately 11%.
Let's assume the average actual rating is 70%, and the maximum amount of votes for a single item is 200 for that particular page.
As for the 3/3 item:
votes_modifier = 1 - ( ( 3 - 200 ) / 200 ) = 0.015
As for the 10/90 item:
votes_modifier = 1 - ( ( 90 - 200 ) / 200 ) = 0.45
As expected, the 10/90 item has a higher modifier, so its Bayesian rating will be closer to the actual rating.
Now, because the 10/90 item has a rating lower than the average, a negative modifier will be added to it.
As for the 3/3 item:
real_rating = 0.7 + ( ( 1 - 0.7 ) * 0.015 ) = 0.7 + ( 0.3 * 0.015 ) = 0.7 + 0.045 = 0.745
As for the 10/90 item:
real_rating = 0.7 + ( ( 0.11 - 0.7 ) * 0.45 ) = 0.7 + ( -0.59 * 0.45 ) = 0.7 - 0.2655 = 0.4345
So we've got the following ratings:
3/3: 74.5%
10/90: 43.45%
ファブリス
Administrator
From: Belgium
Registered: 2006-06-14
Posts: 4021
Website
That's correct! I was using number of +votes for rating instead of ratio +votes/-votes. Thanks!
The problem I want to solve is that 1/1 shouldn't come on top of 45/50, because the 45/50 votes are more credible. 1/1 is better than no votes, but would need at least another couple votes to rank higher than 45/50, because it could simply be an error. Likewise the other way, if a story received a down vote by error, 49/50 shouldn't drop below 3/3 or 1/1.
But the formula from the article uses a separate rating (eg. amazon uses 1-5 stars). Instead I want to use the rating derived from the number of up/down votes, so that users don't have to rate stories two times (helpful? yes/no + another star rating).
But I don't know if it's possible to use their formula without a separate rating though, since I'm using the number of votes also for the rating.
DeadNight
Member
From: Israel
Registered: 2008-06-25
Posts: 12
I agree that using a 1-5 star system is awkward (though I don't like it personally, so I'm biased)
I've written the above formula assuming your rating system of a yes/no vote, using my own interpretation of the Bayesian rating system according to the article (it doesn't use the maximum amount of votes per one item, for example, but then no item ever will get its actual rating)
Play with some scenarios using my formula, and you'll see that it handles those situations well. 45/50 will be above 1/1 or 3/3, while lower than 49/50. That's because 45/50 will be closer to 0.9 while 1/1 will be really close to the average for that page, because of its low votes modifier.
I also assumed that items with 0 votes will have exactly the average (and hence will be below 1/1 but above 0/1)
So all in all, a separate rating is not required.