In an earlier post I mentioned that I have been playing around with the Youtube API to see how to find out how videos might be connected.
I have been able to find music videos related to each other within 5 or 6 levels of suggested videos. Strangely my own videos on my Youtube channel don't show as being related.
The first simple, obvious optimisation that I included in the Youtube network path search was to not look up suggestions for videos that had already been checked. Out of interest I kept a record of these duplicate suggestions and noticed that the percentage seemed quite high. Now I am a little curious about how common duplicates are in other types of networks.
Twitter is the obvious candidate for exploring social networks, as most things are public and there is a simple RESTful API to navigate.
I may need to set up some rules to recognise someone as a celebrity or promotional / marketing account if they have a very high number of followers, likewise I could treat someone as a spammer if they follow an exorbitant number of accounts.
If I'm feeling particularly motivated to apply what I've been learning about recently, I could even import the data into a MongoDB database, set up some indexes and run some queries. (The prevalence of JSON formatting in these platforms makes this easier than you might think).
Each YouTube video is associated with several other YouTube videos - shown as "Recommended". We can regard the videos as nodes on a network and the associations between them as connections on that network.
I'm curious about how many hops along the network is reasonable to establish that two nodes aren't related.
The YouTube API offers a way to obtain a listing of the related videos.
If we start from each end and step out to each related video then we could either get to a ridiculously large number of checks, exhaust the available memory or reach the edge of the network before finding a path between the end nodes.
If each video is related to 20 other videos, then the first video will involve checking another 20 related videos. It is reasonable to assume that some of the related videos on each hop will be related to some of the same videos as the visited videos, so the number of queries shouldn't be multiplying by 20 for each video that is visited.
I've assembled a basic experiment and found that there are 6 levels of recommendations separating Pseudo Echo's cover of Funky Town and I See Red by Split Enz.
Of the 37,797 videos checked 15,979 were duplicates.
The next level of complexity will be to build up a record of what the videos in between are.