I was curious on how Reddit ranks the front page posts in the “hot” section. I explored it and found a few interesting things.
Reddit decides the front (hot) page posts by three factors:
- Up Votes
- Down Votes
- Posted Date
This is the Reddit’s algorithm for hot posts: explanation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
This seems very exciting, so I’ve decided to use Reddit search api to get the JSON of a day’s data, then run the algorithm with the data and see if i can see the same front page of the Reddit. Entire day’s data would be huge so I’ve decided to go with a subreddit, I choose /r/technology. I have the JSON data using Reddit search API and taken a screenshot of /r/technology to compare the results.
Now I have 3 things:
- Reddit ranking algorithm
- Data of /r/technology for a day (sorted based on posted date)
- Screenshot of /r/technology to compare with generated results
I’ve written a python script to do the job.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
But there is one big challenge, Reddit does not reveal the no of down votes, neither in website nor API, so the generated results match closely but not exactly with the screenshot.
Now I have 25 hot posts generated by algorithm based on the input data. Out of these 25, 22 matched with the screenshot but not exactly at the same position, this is due to the mismatch of the downvotes.