Networks! They are all around us. The universe is filled with systems and structures that can be organized as networks. Recently, we have seen them used to convict criminals, visualize friendships, and even to describe cereal ingredient combinations. We can understand their power to describe our complex world from Manuel Lima's wonderful talk on organized complexity. Now let's learn how to create our own.
In this tutorial, we will focus on creating an interactive network visualization that will allow us to get details about the nodes in the network, rearrange the network into different layouts, and sort, filter, and search through our data.
In this example, each node is a song. The nodes are sized based on popularity, and colored by artist. Links indicate two songs are similar to one another.
Try out the visualization on different songs to see how the different layouts and filters look with the different graphs.
Technology
This visualization is a JavaScript based web application written using the powerful D3 visualization library. jQuery is also used for some DOM element manipulation. Both frameworks are included in the js/libs
directory of the source code.
If you hate CoffeeScript, you can always compile the code to JavaScript and start there.The code itself is actually written in CoffeeScript, a little language that is easy to learn, and compiles down to regular JavaScript. Why use CoffeeScript? I find that the reduced syntax makes it easier to read and understand what the code is doing. While it may seem a bit intimidating to learn a whole new 'language', there are just a few things you need to know about CoffeeScript to be a pro.
Quick CoffeeScript Notes
Functions
First and foremost, This is what a function looks like:
functionName = (input) -> results = input * 2 results
So the input parameters are inside the parentheses. The ->
indicates the start of the implementation. If a function's implementation is super simple, this can all go on one line.
cube = (x) -> x * x * x
A function returns the last thing executed, so you typically don't need a return
statement, but you can use one if you like.
Indentation matters
The other main syntactical surprise is that, similar to Python, indentation is significant and used to denote code hierarchy and scope. We can see an example of this in the function above: The implementation is indented.
In practice, this isn't too big of an issue: just hit the Tab key instead of using curly braces – {
, }
– and you are set.
Semicolons and Parentheses
Taking a page from Ruby, semicolons are not needed and should be avoided in CoffeeScript.
Also, parentheses are optional in many places. While this can get confusing, I typically use parentheses unless they are around a multi-line function that is an input argument into another function. If that doesn't make sense, don't worry – the code below should still be easy to follow.
For other interesting details, spend a few minutes with the CoffeeScript documentation. You won't be disappointed.
In-browser CoffeeScript
Using CoffeeScript is made simpler by the fact that our CoffeeScript code can be compiled into JavaScript right in the browser. Our index.html
includes the CoffeeScript compiler which in turn compiles to JavaScript any script listed as text/coffeescript
:
<script src="js/libs/coffee-script.js"></script> <script type="text/coffeescript" src="coffee/vis.coffee"></script>
It's just that simple. When vis.coffee
loads, it will be parsed and compiled into JavaScript before running. For production, we would want to do this compilation beforehand, but this let's us get started right away.
Setting up the Network
You don't have to organize your code like this, if it seems like too much work. But the encapsulation makes it easier to change your input data laterOk, let's get started with coding this visualization. We are going to be using a simplified version of Mike Bostock's (the creator of D3) reusable chart recommendations to package our implementation. What this means for us is that our main network code will be encapsulated in a function, with getters and setters to allow interaction with the code from outside. Here is the general framework:
Network = () -> width = 960 height = 800 # ... network = (selection, data) -> # main implementation update = () -> # private function network.toggleLayout = (newLayout) -> # public function return network
So the Network
function defines a closure which scopes all the variables used in the visualization, like width
and height
. The network
function is where the main body of the code goes, and is returned by Network
at the end of the implementation.
Functions defined on network
, like network.toggleLayout()
can be called externally while functions like update
are 'private' functions and can only be called by other functions inside Network
. You can think of it as the same abstraction that classes provide us in object-oriented programing. Here is how we create a new network
:
$ -> myNetwork = Network() # ... d3.json "data/songs.json", (json) -> myNetwork("#vis", json)
So here, myNetwork
is the value Network()
returns – namely the function called network
. Then we call this network
function, passing in the id of the div
where the visualization will live, and the data to visualize.
Network Data Format
From above, we see we are passing in the data from songs.json to be visualized. How is this data organized?
Since a network defines connections between nodes as well as the data contained in the nodes themselves, it would be difficult to define as a simple 'spreadsheet' or table of values. Instead, we use JSON to capture this structure with as little overhead as possible.
The input data for this visualization is expected to follow this basic structure:</p
{ "nodes": [ { "name": "node 1", "artist": "artist name", "id": "unique_id_1", "playcount": 123 }, { "name": "node 2", # ... } ], "links": [ { "source": "unique_id_1", "target": "unique_id_2" }, { # ... } ] }
This is a JSON object (just like a JavaScript object). If you haven't looked at JSON before, then let's take a look now! The format is pretty straight forward (as it's just JavaScript).
This object requires two names, nodes
and links
. Both of these store arrays of other objects. nodes
is an array of nodes. Each node object needs some fields used in the visualization as well as an id
which uniquely identifies that node.
The objects in the links
array just need source
and target
. Both of these point to @id@'s of nodes.
The traditional/default method in D3 of defining a link's source and target is to use their position in the nodes array as the value. Since we are going to be filtering and rearranging these nodes, I thought it would be best to use values independent of where the nodes are stored. We will see how to use these id's in a bit.
Moved by the Force
We will start with the default force-directed layout that is built into D3 . Force-based network layouts are essentially little physics simulations. Each node has a force associated with it (hence the name), which can repel (or attract) other nodes. Links between nodes act like springs to draw them back together. These pushing and pulling forces work on the network over a number of iterations, and eventually the system finds an equilibrium.
Force-directed layouts usually result in pretty good looking network visualizations, which is why they are so popular. D3's implementation does a lot of work to make the physics simulation efficient, so it stays fast in the browser.
Example of force-directed layout from our song network demo
To start, as is typical with most D3 visualizations, we need to create a svg element in our page to render to. Lets look at the network()
function which performs this action:
First we declare a bunch of variables that will be available to us inside the Network
closure. They are 'global' and available anywhere in Network
. Note that our D3 force directed layout is one such global variable called force
.
Inside our network
function, we start by tweaking the input data. Then we use D3 to append an svg
element to the input selection
element. linksG
and nodesG
are group elements that will contain the individual lines and circles used to create the links and nodes. Grouping related elements is a pretty common strategy when using D3. Here, we create the linksG
before the nodesG
because we want the nodes to sit on top of the links.
The update
function is where most of the action happens, so let's look at it now.
# The update() function performs the bulk of the # work to setup our visualization based on the # current layout/sort/filter. # # update() is called everytime a parameter changes # and the network needs to be reset. update = () -> # filter data to show based on current filter settings. curNodesData = filterNodes(allData.nodes) curLinksData = filterLinks(allData.links, curNodesData) # sort nodes based on current sort and update centers for # radial layout if layout == "radial" artists = sortedArtists(curNodesData, curLinksData) updateCenters(artists) # reset nodes in force layout force.nodes(curNodesData) # enter / exit for nodes updateNodes() # always show links in force layout if layout == "force" force.links(curLinksData) updateLinks() else # reset links so they do not interfere with # other layouts. updateLinks() will be called when # force is done animating. force.links() # if present, remove them from svg if link link.data().exit().remove() link = null # start me up! force.start()
The final version of this visualization will have filtering and sorting capabilities, so update
starts with filtering the nodes and links of the total dataset. Then sorts if necessary. We will come back and hit these functions later. For the basic force-directed layout without all these bells and whistles to come, all we really care about is:
force.nodes(curNodesData) updateNodes() force.links(curLinksData) updateLinks()
The force's nodes array is set to our currently displayed nodes, erasing any previous nodes in the simulation. Then we update the visual display of the nodes in the visualization. This same pattern is then followed for the links.
Remember: the force layout doesn't add circles and lines for you. It just tells you where to put them.It is important to realize that in D3, the nodes and links in the force layout don't automatically get visualized in any way. This is to say that there is a separation between the force-directed physics simulation and any visual mapped to that simulation.
To create this visual representation associated with this force-directed simulation, we will need to bind to the same data being used, which we do in updateNodes
and updateLinks
:
# enter/exit display for nodes updateNodes = () -> node = nodesG.selectAll("circle.node") .data(curNodesData, (d) -> d.id) node.enter().append("circle") .attr("class", "node") .attr("cx", (d) -> d.x) .attr("cy", (d) -> d.y) .attr("r", (d) -> d.radius) .style("fill", (d) -> nodeColors(d.artist)) .style("stroke", (d) -> strokeFor(d)) .style("stroke-width", 1.0) node.on("mouseover", showDetails) .on("mouseout", hideDetails) node.exit().remove() # enter/exit display for links updateLinks = () -> link = linksG.selectAll("line.link") .data(curLinksData, (d) -> "#{d.source.id}_#{d.target.id}") link.enter().append("line") .attr("class", "link") .attr("stroke", "#ddd") .attr("stroke-opacity", 0.8) .attr("x1", (d) -> d.source.x) .attr("y1", (d) -> d.source.y) .attr("x2", (d) -> d.target.x) .attr("y2", (d) -> d.target.y) link.exit().remove()
Finally, some D3 visualization code! Looking at updateNodes
, we select all circle.node
elements in our nodeG
group (which at the very start of the execution of this code, will be empty). Then we bind our filtered node data to this selection, using the data
function and indicating that data should be identified by its id
value.
The enter()
function provides an access point to every element in our data array that does not have a circle
associated with it. When append
is called on this selection, it creates a new circle element for each of these representation-less data points. The attr
and style
functions set values for each one of these newly formed circles. When a function is used as the second parameter, like:
.attr("r", (d) -> d.radius)
The d
is the data associated with the visual element, which is passed in automatically by D3. So with just a few lines of code we create and style all the circles we need.
Because we will be filtering our data to add and remove nodes, there will be times where there is a circle element that exists on screen, but there is no data behind it. This is where the exit()
function comes into play. exit()
provides a selection of elements which are no longer associated with data. Here we simply remove them using the remove
function.
If the concepts of enter()
and exit()
are still not clicking, check out the Thinking With Joins and three little circles tutorials. These selections are a big part of D3, so it is worth having a feel for what they do.
Configuring the Force
There has to be a ton of Star Wars jokes I should be making... but I can't think of any.In order to get the force-directed graph working the way we want, we need to configure the force
layout a bit more. This will occur in the setLayout
function. For the force-directed layout, our force
configuration is pretty simple:
force.on("tick", forceTick) .charge(-200) .linkDistance(50)
Here, charge
is the repulsion value for nodes pushing away from one another and linkDistance
is the maximum length of each link. These values allow the nodes to spread out a bit.
The forceTick
function will be called each iteration (aka 'tick') of the simulation. This is where we need to move our visual representations of the nodes and links of the network to where they are in the simulation after this tick. Here is forceTick
:
# tick function for force directed layout forceTick = (e) -> node .attr("cx", (d) -> d.x) .attr("cy", (d) -> d.y) link .attr("x1", (d) -> d.source.x) .attr("y1", (d) -> d.source.y) .attr("x2", (d) -> d.target.x) .attr("y2", (d) -> d.target.y)
Pretty straightforward. The D3 simulation is modifying the x
and y
values of each node during the simulation. Thus, for each tick, we simply need to move the circles representing our nodes to where x
and y
are. The links can be moved based on where their source
and target
nodes are.
Setting Up Data
Speaking of source
and target
, we need to go back and see how to deal with our initial data where we were using the id
of a node in place of the node's index in the nodes array. Here is setupData
which is the very first thing executed in our network code:
# called once to clean up raw data and switch links to # point to node instances # Returns modified data setupData = (data) -> # initialize circle radius scale countExtent = d3.extent(data.nodes, (d) -> d.playcount) circleRadius = d3.scale.sqrt().range([3, 12]).domain(countExtent) data.nodes.forEach (n) -> # set initial x/y to values within the width/height # of the visualization n.x = randomnumber=Math.floor(Math.random()*width) n.y = randomnumber=Math.floor(Math.random()*height) # add radius to the node so we can use it later n.radius = circleRadius(n.playcount) # id's -> node objects nodesMap = mapNodes(data.nodes) # switch links to point to node objects instead of id's data.links.forEach (l) -> l.source = nodesMap.get(l.source) l.target = nodesMap.get(l.target) # linkedByIndex is used for link sorting linkedByIndex["#{l.source.id},#{l.target.id}"] = 1 data
setupData
is doing a few things for us, so let's go through it all. First, we are using a d3.scale
to specify the possible values that the circle radii can take, based on the extent of the playcount values. Then we iterate through all the nodes, setting their radius
values, as well as setting their x
and y
values to be within the current visualization size. Importantly, nodes are not automatically sized by any particular data value they contain. We are just adding radius to our data so we can pull it out in updateNodes
. The x
and y
initialization is just to reduce the time it takes for the force-directed layout to settle down.
Finally, we map node id's to node objects and then replace the source
and target
in our links with the node objects themselves, instead of the id's that were in the raw data. This allows D3's force layout to work correctly, and makes it possible to add/remove nodes without worrying about getting our nodes array and links array out of order.
Radializing the Force
Now we have everything needed to display our network in a nice looking, fast, force-directed layout. It might have been a lot of explanation, but we did it in a pretty small amount of code.
The force-directed layout is a great start, but also a bit limiting. Sometimes you want to see your network in a different layout – to find patterns or trends that aren't readily apparent in a force-directed one. In fact, it would be really cool if we could toggle between different layouts easily – and allow our users to see the data in a number of different formats. So, lets do that!
Here's a basic idea: we are going to hijack D3's force-directed layout and tell it where we want the nodes to end up. This way, D3 will still take care of all the physics and animations behind the scenes to make the transitions between layouts look good without too much work. But we will get to influence where the nodes go, so their movements will no longer be purely based on the underlying simulation.
Radial layout example, grouping song nodes by artist
Always force-jack with extreme cautionTo aid in our force-jacking, I've created a separate entity to help position our nodes in a circular fashion called RadialPlacement
. Its not really a full-on layout, but just tries to encapsulate the complexity of placing groups of nodes. Essentially, we will provide it with an array of keys. It will calculate radial locations for each of these keys. Then we can use these locations to position our nodes in a circular fashion (assuming we can match up our nodes with one of the input keys).
RadialPlacement
is a little clumsy looking, but gets the job done. The bulk of the work occurs in setKeys
and radialLocation
:
# Help with the placement of nodes RadialPlacement = () -> # stores the key -> location values values = d3.map() # how much to separate each location by increment = 20 # how large to make the layout radius = 200 # where the center of the layout should be center = {"x":0, "y":0} # what angle to start at start = -120 current = start # Given a set of keys, perform some # magic to create a two ringed radial layout. # Expects radius, increment, and center to be set. # If there are a small number of keys, just make # one circle. setKeys = (keys) -> # start with an empty values values = d3.map() # number of keys to go in first circle firstCircleCount = 360 / increment # if we don't have enough keys, modify increment # so that they all fit in one circle if keys.length < firstCircleCount increment = 360 / keys.length # set locations for inner circle firstCircleKeys = keys.slice(0,firstCircleCount) firstCircleKeys.forEach (k) -> place(k) # set locations for outer circle secondCircleKeys = keys.slice(firstCircleCount) # setup outer circle radius = radius + radius / 1.8 increment = 360 / secondCircleKeys.length secondCircleKeys.forEach (k) -> place(k) # Gets a new location for input key place = (key) -> value = radialLocation(center, current, radius) values.set(key,value) current += increment value # Given an center point, angle, and radius length, # return a radial position for that angle radialLocation = (center, angle, radius) -> x = (center.x + radius * Math.cos(angle * Math.PI / 180)) y = (center.y + radius * Math.sin(angle * Math.PI / 180)) {"x":x,"y":y}
Hopefully the comments help walk you through the code. In setKeys
our goal is to break up the total set of keys into an inner circle and an outer circle. We use slice to pull apart the array, after we figure out how many locations can fit in the inner circle.
radialLocation
does the actual polar coordinate conversion to get a radial location. It is called from place
, which is in turn called from setKeys
.
Toggling Between Layouts
Lets the user explore different layouts interactively
With RadialPlacement
in tow, we can now create a toggle between our force-directed layout and a new radial layout. The radial layout will use the song's artist field as keys so the nodes will be grouped by artist.
In the update
function described above, we saw a mention of the radial layout:
if layout == "radial" artists = sortedArtists(curNodesData, curLinksData) updateCenters(artists)
Here, sortedArtists
provides an array of artist values sorted by either the number of songs each artist has, or the number of links. Let's focus on updateCenters
, which deals with our radial layout:
updateCenters = (artists) -> if layout == "radial" groupCenters = RadialPlacement().center({"x":width/2, "y":height / 2 - 100}) .radius(300).increment(18).keys(artists)
We can see that we just pass our artists array to the RadialPlacement
function. It calculates locations for all keys and stores them until we want to position our nodes.
Now we just need to work on this node positioning and move them towards their artist's location. To do this, we change the tick function for the D3 force
instance to use radialTick
when our radial layout is selected:
# tick function for radial layout radialTick = (e) -> node.each(moveToRadialLayout(e.alpha)) node .attr("cx", (d) -> d.x) .attr("cy", (d) -> d.y) if e.alpha < 0.03 force.stop() updateLinks() # Adjusts x/y for each node to # push them towards appropriate location. # Uses alpha to dampen effect over time. moveToRadialLayout = (alpha) -> k = alpha * 0.1 (d) -> centerNode = groupCenters(d.artist) d.x += (centerNode.x - d.x) * k d.y += (centerNode.y - d.y) * k
We can see that radialTick
calls moveToRadialLayout
which simply looks up the location for the node's artist location from the previously computed groupCenters
. It then moves the node towards this center.
This movement is dampened by the alpha
parameter of the force layout. alpha
represents the cooling of the physics simulation as it reaches equilibrium. So it gets smaller as the animation continues. This dampening allows the nodes repel forces to impact the position of the nodes as it nears stopping – which means the nodes will be allowed to push away from each other and cause a nice looking clustering effect without node overlap.
We also use the alpha
value inside radialTick
to stop the simulation after it has cooled enough and to have an easy opportunity to redisplay the links.
Because the nodes are different sizes, we want them to have different levels of repelling force to push on each other with. Luckily, the force's charge
function can itself take a function which will get the current node's data to calculate the charge. This means we can base the charge off of the node's radius, as we've stored it in the data:
charge = (node) -> -Math.pow(node.radius, 2.0) / 2
The specific ratio is just based on experimentation and tweaking. You are welcome to play around with what other effects you can come up with for charge.
Filter and Sort
Force-directed and Radial layouts after filtering for obscure songs
The filter and sort functionality works how you would expect: we check the networks current setting and perform operations on the nodes and links based on these settings. Let's look at the filter functionality that deals with popular and obscure songs, as it uses a bit of D3 array functionality:
# Removes nodes from input array # based on current filter setting. # Returns array of nodes filterNodes = (allNodes) -> filteredNodes = allNodes if filter == "popular" or filter == "obscure" playcounts = allNodes.map((d) -> d.playcount).sort(d3.ascending) # get median value cutoff = d3.quantile(playcounts, 0.5) filteredNodes = allNodes.filter (n) -> if filter == "popular" n.playcount > cutoff else if filter == "obscure" n.playcount filteredNodes
filterNodes
defaults to returning the entire node array. If popular or obscure is selected, it uses D3's quantile
function to get the median value. Then it filters the node array based on this cutoff. I'm not sure if the median value of playcounts is a good indicator of the difference between 'popular' and 'obscure', but it gives us an excuse to use some of the nice data wrangling built into D3.
Bonus: Search
Search bar - Simple searching made easy
Search is a feature that is often needed in networks and other visualizations, but often lacking. Given a search term, one way to make a basic search that highlights the matched nodes would be:
# Public function to update highlighted nodes # from search network.updateSearch = (searchTerm) -> searchRegEx = new RegExp(searchTerm.toLowerCase()) node.each (d) -> element = d3.select(this) match = d.name.toLowerCase().search(searchRegEx) if searchTerm.length > 0 and match >= 0 element.style("fill", "#F38630") .style("stroke-width", 2.0) .style("stroke", "#555") d.searched = true else d.searched = false element.style("fill", (d) -> nodeColors(d.artist)) .style("stroke-width", 1.0)
We just create a regular expression out of the search, then compare it to the value in the nodes that we want to search on. If there is a match, we highlight the node. Nothing spectacular, but its a start to a must-have feature in network visualizations.
We can see that updateSearch
is a public function, so how do we connect it to the UI on our network visualization code?
Wiring it Up
The other button groups use very similar code.There are a lot of ways we could connect our buttons and other UI to the network functionality. I've tried to keep things simple here and just have a separate section for each button group. Here is the layout toggling code:
d3.selectAll("#layouts a").on "click", (d) -> newLayout = d3.select(this).attr("id") activate("layouts", newLayout) myNetwork.toggleLayout(newLayout)
So we simply active the clicked button and then call into the network closure to switch layouts. The activate
function just adds the active class to the right button.
Our search is pretty similar:
$("#search").keyup () -> searchTerm = $(this).val() myNetwork.updateSearch(searchTerm)
It just uses jQuery to watch for a key-up event, then re-runs the updateSearch
function.
Thanks and Goodnight
Hopefully we have hit all the highlights of this visualization. Interactive networks are a powerful visual, and this code should serve as a jumping off point for your own amazing network visualizations.
Other placement functions could be easily developed for more interesting layouts, like spirals, sunflowers, or grids. Filtering and sorting could be extended in any number of ways to get more insight into your particular dataset. Finally, labelling could be added to each node to see what is present without mousing over.
I hope you enjoyed this walk-through, and I can't wait to see your own networks!
In partnership with social analytics service Topsy, Twitter launched a Political Index that measures sentiment towards Barack Obama and Mitt Romney.
Each day, the Index evaluates and weighs the sentiment of Tweets mentioning Obama or Romney relative to the more than 400 million Tweets sent on all other topics. For example, a score of 73 for a candidate indicates that Tweets containing their name or account name are on average more positive than 73 percent of all Tweets.
The key is the comparison against all tweets for a sense of scale. As seen from the chart below, the index fluctuates closely with Gallup estimates.
Some consider Nigel Holmes, whose work tends to be more illustrative, the opposite of Edward Tufte, who preaches the data ink ratio. Column Five Media asked Holmes about how he works and what got him interested in the genre.
As a young child in England, I loved the weekly comics "The Beano" and "The Dandy." They were not like American comic books; they were never called "books," for a start. These English comics from the late 1940s and early '50s had recurring one-page (usually funny) stories featuring a cast of regular characters. They had names like Biffo the Bear, Lord Snooty, and Desperate Dan. The comics were printed on poor-quality newsprint, which seemed to go yellow as you were reading it, but there was something very attractive about them.
I like the small dig on Tufte around the middle, while citing the paper that happen to find that Holmes' graphics were more memorable than basic charts.
My own work at first was a little too illustrative, and Edward Tufte made a big fuss about what he thought was the trivialization of data. Recent academic studies have proved many of his theses wrong.
It seems the arguments haven't changed much over the decades.
The Wall Street Journal visualized major political contributions, according to the Federal Election Commission, in a piece they call Political Moneyball.
Based on the money sent between the players (and other characteristics like party and home state), our presentation pulls players toward similar players and pushes apart those that have nothing in common. The players who are most interconnected (like industry PACs who try to make alliances with everyone) end up close to the center. Those who are less connected (like a donor who only gives money to Ron Paul) are pushed away from the center.
Analysis was powered by CartoDB, and the network by Tulip.
The challenge with these network graphs that have lots of nodes and edges is narrowing down what's useful. With yesterday's Internet map it's easy to relate, because you just search for the sites of interest, and the large ones such as Facebook and Twitter provide context.
However, with Political Moneyball it's tougher, because there are so many entities you've never heard of. My suggestion: Start with the examples section (such as who the National Rifle Association supports) in the sidebar, and go from there. It'll be much easier to get into it.
Peter Nitsch created Ascii Street View, converting Google Street View to colored letters. Search for a location and experience the retro goodness. [via Waxy]
Using a new kind of MRI scanner, scientists at the National Institutes of Health mapped the connections in the human brain, revealing an intricate, grid-like structure.
"Before, we had just driving directions. Now, we have a map showing how all the highways and byways are interconnected," says Van Wedeen, a member of the Human Connectome Project. "Brain wiring is not like the wiring in your basement, where it just needs to connect the right endpoints. Rather, the grid is the language of the brain and wiring and re-wiring work by modifying it."
[via Matt Mullenweg]
Ruslan Enikeev created a searchable Internet map of links and bubbles, showing over 350,000 sites and two million links from 196 countries. Similar sites are closer together.
As one might have expected, the largest clusters are formed by national websites, i.e. sites belonging to one country. For the sake of convenience, all websites relative to a certain country carry the same color. For instance, the red zone at the top corresponds to Russian segment of the net, the yellow one on the left stands for the Chinese segment, the purple one on the right is Japanese, the large light-blue central one is the American segment, etc.
Importantly, clusters on the map are semantically charged, i.e. they join websites together according to their content. For example, a vast porno cluster can be seen between Brazil and Japan as well as a host of minor clusters uniting websites of the same field or similar purposes.
Why does porn nestled between Brazil and Japan? I dunno.
I do like how a search for FlowingData shoots you over to a bubble that fills the screen, but then as you zoom out, you see this giant bubble on the top left appear.
Completely dwarfed by Twitter, just a tiny little bit in the big digital sky. Nice.
From the Guardian US, a simple site that tells you if a record was broken today, and if so, what records. It was pieced together with Google Docs and github, and uses the New York Times Olympics API. [via]
No doubt there is going to be a lot of tweeting about the Olympics during the next couple of weeks, but sometimes it's hard to get a sense of what people are talking about because of the high volume. Emoto, a team effort by Drew Hemment, Moritz Stefaner, and Studio NAND, is a Twitter tracker that aggregates sentiment around topics.
We search twitter for general terms related to the Olympics. Within these search results, we look for individual topics, like athletes, disciplines, venues, or also general topics like "traffic" or "suprise". Then, the text of each tweet is analysed for emotional words using the lexalytics software, resulting in a sentiment score ranging from -6 (very negative) over 0 (neutral) to 6 (very positive). As this project is about mapping emotions, our visualisation uses only tweets which can be assigned a sentiment value.
In line with the London 2012 logo, Emoto pieces together triangles to show this sentiment. The top yellow triangle indicates a very positive tweet, whereas a dark blue triangle far to the bottom indicates a very negative tweet. The triangles in between indicate sentiment scores in the middle, and the triangles are sized by the number of tweets that day.
There's a view that lets you passively watch tweets as they come in live, which will be fun to see when something significant happens, but it'll probably get most interesting when you can compare over time.
I'm sure there's some skepticism about the triangles over everyone's favorite bar graphs, but there's so much fuzziness in tweet sentiment analysis that I don't think it matters that much.
A lot of Olympic events are over and done with in a few minutes (or seconds), so the difference between winning and losing can be something really tiny. As the games in London get started, The New York Times put together a great series on the tiny details that athletes try to hone in on as they jump over hurdles, twist over the vault, and hand off the baton.
The feature was surprisingly sort of buried in a lot of other Olympic coverage, but hopefully they put together more of them. The combination of graphics and insight from athletes is uber interesting.
Update: The butterfly was just added, and cycling is up next.
No comments:
Post a Comment