Each YouTube video is associated with several other YouTube videos - shown as "Recommended". We can regard the videos as nodes on a network and the associations between them as connections on that network.
I'm curious about how many hops along the network is reasonable to establish that two nodes aren't related.
The YouTube API offers a way to obtain a listing of the related videos.
If we start from each end and step out to each related video then we could either get to a ridiculously large number of checks, exhaust the available memory or reach the edge of the network before finding a path between the end nodes.
If each video is related to 20 other videos, then the first video will involve checking another 20 related videos. It is reasonable to assume that some of the related videos on each hop will be related to some of the same videos as the visited videos, so the number of queries shouldn't be multiplying by 20 for each video that is visited.
I've assembled a basic experiment and found that there are 6 levels of recommendations separating Pseudo Echo's cover of Funky Town and I See Red by Split Enz.
Of the 37,797 videos checked 15,979 were duplicates.
The next level of complexity will be to build up a record of what the videos in between are.
No comments:
Post a Comment