Maths! (Ranking system)

Index » 喫茶店 (Koohii Lounge)

  • 1
 
Reply #1 - 2009 April 14, 2:56 pm
ファブリス Administrator
From: Belgium Registered: 2006-06-14 Posts: 4021 Website

The new Study page will use a up/down vote similar to Amazon's "Did you find this story helpful ?'  (Yes)  (No)

The formula needs to weigh the number of votes, a story with 9 out of 10 helpful votes should rank higher than one with 3 of out 3, simply because more votes means the votes are more credible (both up and down).

Amazon uses something like a "bayesian" rating:
http://www.thebroth.com/blog/118/bayesian-rating

But to keep things simple I want to avoid having to rate 1-5 stars ON TOP of voting the helpfulness of a story.

Thus, I'm using a simple formula :

$rating = ($yesvotes / $numvotes) * ($yesvotes / $avgnumvotes);

yesvotes = number of "helpfulness" votes answered YES
numvotes = total number of "helpfulness" votes (YES and NO)
avgnumvotes = total of "helpfulness" votes for all stories on one page, divided by number of stories, hence average number of "helpfulness" votes per story (YES or NO)

I have no idea what my formula means or what the graph looks like, I just know it works. Anyways, if you have suggestions for a better formula, fire away!

The trick is to do something like Bayesian but in the absence of stars,  the "rating" here is the number of "helpfulness" votes. This may be wrong, but I've ran a few tests and the formula does a decent job of weighing the votes, much better than plain sorting on number of votes.

Edit: yes yes Wikipedia.. edited out my pointless self-derision.

Reply #2 - 2009 April 14, 3:27 pm
Jarvik7 Member
From: 名古屋 Registered: 2007-03-05 Posts: 3946

$rating = $rand

Reply #3 - 2009 April 14, 5:21 pm
xaarg Member
From: Neverland Registered: 2007-07-13 Posts: 160

ファブリス wrote:

called "bayesian", whatever that means wink

http://en.wikipedia.org/wiki/Thomas_Bayes

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
Reply #4 - 2009 April 14, 5:25 pm
DeadNight Member
From: Israel Registered: 2008-06-25 Posts: 12

The Bayesian rating system assumes that the `real rating` of an item should be between the average rating of all items and the actual rating that said item have, depending on the amount of votes it has.
The more votes it has, the closer its rating should be to the actual rating. The less votes, the closer to the average.

So, the first step would be to find the average of the actual ratings:
average_rating = Σ(actual_rating) / number_of_items
Where Σ means the sum of all items.

Now, we have to decide where to put the rating of the item between this average and its actual rating:
    |-----------------|
average              actual

We can assume that the item which has the most votes is the most accurate, and as such will have the actual rating, and can be used as a measuring point.
So we need a modifier in the range of 0 to 1, while 1 represents the item with the most votes:
votes_modifier = 1 - ( ( number_of_votes - max(number_of_votes) ) / max(number_of_votes) )

Now we have all what we need, and all that is left is to calculate the real rating.
We start with the average rating as the base value, then add the difference between the average and the actual rating, modified by the votes modifier:
real_rating = average_rating + ( ( actual_rating - average_rating ) * votes_modifier )

Hope this helps smile.

Reply #5 - 2009 April 15, 3:17 am
ファブリス Administrator
From: Belgium Registered: 2006-06-14 Posts: 4021 Website

Thanks DeadNight, that was very clear.

It seems the main difference between the formula I was using and the one you suggested is that before I had a 3/3 voted item on top of a 10/90 item. Whereas the Bayesian formula weighs in more the number of votes, so 10/90 comes on top of 3/3.

I'm not sure which one I'll use yet, perhaps for the Study area where many stories under the top voted ones have a few votes, the number of votes shuldn't weigh in too much in the equation.

Reply #6 - 2009 April 15, 3:59 am
DeadNight Member
From: Israel Registered: 2008-06-25 Posts: 12

You're welcome, I'm happy to contribute my share to the community smile.

Actually, an item with 10/90 shouldn't be on top of 3/3.
The item with 3/3 has an actual rating of 100%, while the item with 10/90 has an actual rating of approximately 11%.

Let's assume the average actual rating is 70%, and the maximum amount of votes for a single item is 200 for that particular page.
As for the 3/3 item:
votes_modifier = 1 - ( ( 3 - 200 ) / 200 ) = 0.015
As for the 10/90 item:
votes_modifier = 1 - ( ( 90 - 200 ) / 200 ) = 0.45

As expected, the 10/90 item has a higher modifier, so its Bayesian rating will be closer to the actual rating.

Now, because the 10/90 item has a rating lower than the average, a negative modifier will be added to it.

As for the 3/3 item:
real_rating = 0.7 + ( ( 1 - 0.7 ) * 0.015 ) = 0.7 + ( 0.3 * 0.015 ) = 0.7 + 0.045 = 0.745
As for the 10/90 item:
real_rating = 0.7 + ( ( 0.11 - 0.7 ) * 0.45 ) = 0.7 + ( -0.59 * 0.45 ) =  0.7 - 0.2655 = 0.4345

So we've got the following ratings:
3/3: 74.5%
10/90: 43.45%

Reply #7 - 2009 April 15, 4:41 am
ファブリス Administrator
From: Belgium Registered: 2006-06-14 Posts: 4021 Website

That's correct! I was using number of +votes for rating instead of  ratio +votes/-votes. Thanks!

The problem I want to solve is that 1/1 shouldn't come on top of 45/50, because the 45/50 votes are more credible. 1/1 is better than no votes, but would need at least another couple votes to rank higher than 45/50, because it could simply be an error. Likewise the other way, if a story received a down vote by error, 49/50 shouldn't drop below 3/3 or 1/1.

But the formula from the article uses a separate rating (eg. amazon uses 1-5 stars). Instead I want to use the rating derived from the number of up/down votes, so that users don't have to rate stories two times (helpful? yes/no + another star rating).

But I don't know if it's possible to use their formula without a separate rating though, since I'm using the number of votes also for the rating.

Reply #8 - 2009 April 15, 5:07 am
DeadNight Member
From: Israel Registered: 2008-06-25 Posts: 12

I agree that using a 1-5 star system is awkward (though I don't like it personally, so I'm biased)

I've written the above formula assuming your rating system of a yes/no vote, using my own interpretation of the Bayesian rating system according to the article (it doesn't use the maximum amount of votes per one item, for example, but then no item ever will get its actual rating)

Play with some scenarios using my formula, and you'll see that it handles those situations well. 45/50 will be above 1/1 or 3/3, while lower than 49/50. That's because 45/50 will be closer to 0.9 while 1/1 will be really close to the average for that page, because of its low votes modifier.

I also assumed that items with 0 votes will have exactly the average (and hence will be below 1/1 but above 0/1)

So all in all, a separate rating is not required.

  • 1