Force-directed tag clouds

I’d been making graphs of Spotify’s “Related Artists” network, but was finding that pieces of the graph often remained disconnected.

To connect these disparate parts of the network, I queried last.fm for the top tags that had been attached to each artist, and added those to the graph.

This brought the network together nicely, so I applied it to a larger data set: all the unique artists that had ever been played on a particular BBC 6 Music radio show.

Dark matter

The full graph of artists and their tags was interesting, but to get a clearer overview of the show’s musical themes, the artist nodes were hidden after the graph had been laid out (using Gephi's "Force Layout 2" algorithm).

This left just the tags, laid out in two dimensions, where the most similar tags are closest together and the most frequently used are largest.

As some of the labels were overlapping, I used Gephi’s "Label Adjust" layout algorithm to shift their positions enough that most of the overlapping was avoided.

Here are some examples - I think they summarise the shows' content rather well:

Stuart Maconie’s Freakier Zone

A force-directed tag cloud of artists played on Stuart Maconie’s 6Music radio show

Gilles Peterson

A force-directed tag cloud of artists played on Gilles Peterson’s 6Music radio show

Marc Riley

A force-directed tag cloud of artists played on Marc Riley’s 6Music radio show

Unique identifiers

One problem was that when several artists shared the same name, irrelevant tags would be attached to an artist. To avoid this, only the artists that had been given MusicBrainz IDs in the BBC data were included, and these MBIDs were used to query last.fm for tags.

Discussion

In a sense, the artists are the “dark matter” of the graph: they pull the tags together and organise their macroscopic structure, but remain invisible in the final, visible map.

It may be that a highly-concentrated cluster of artists (as well as one or two very loosely-connected artists) pushed some tags further apart than they deserved to be.

These word clouds were generated with Gephi, as it handles thousands of nodes easily. I'd like to be able to do the same thing in D3, as Gephi is quite awkward to use, and has cropped the node labels when exporting the above images (it seems to only take the nodes into account when cropping the output, and not their labels).

Here's the (working, but unoptimised) code for building the artists + tags graph data.