TEDx UCLA: Can Twitter Save Lives?

Back in June 18th, 2011, I had the good fortune to be invited to speak at the inaugural TEDx event at UCLA.  I took the opportunity to present about the post disaster situation in Japan and spoke of the potential that social media and locational technologies hold for future crisis management and awareness.

YouTube Preview Image

Japan Earthquake: Emotions at a glance

What was the country “feeling” after the earthquake?  Was it engulfed in sorrow?  Anger?  Fear?  What effect did the hundreds of aftershocks have on the populace?  In an attempt to answer these questions, a social media analysis can provide a window  into the sentiment that was prevalent at each phase of recovery by visualizing each emotion group played out over time.  The charts below stacks each emotion group, one on top of another.  These was generated using the Protovis Javascript API.  Clicking into any of the emotion charts will allow interaction with their values over time:

(view full screen)

In concordance with previous analyses, the most noticeable observations come in the “fear” emotion on April 7th.  Looking at the “Earthquake magnitude” chart, one can see that the second largest aftershock occurs on that day, bringing meaning to the notion that “fear” was a predominant emotional reaction to an already stressed nation at the time.  One can also depict that while the April 7th 7.1 magnitude earthquake was the largest aftershock since the big one on March 11th, that the country was consistently rocked throughout, averaging more than 10 earthquakes a day.  However, as the earthquake chart reveals, the number of quakes had tailored off considerably over time, perhaps causing it to expose even more shock value to the “big” 7.1 quake, at a time when the people were starting to feel a level of normality.

Japan Earthquake: Locating the tweets (Part 4)

“Location” has become an essential component to many social media technologies.  Not only is it important to convey “what happened?”, but also to reveal “where” it happened.  Check-in technologies have become popularized by companies like Foursquare, Gowalla and Yelp, further blurring the lines between the content and the geo-coordinates.  Twitter has traditionally not put an emphasis on the notion of “place”, but they did announce, in late 2009, their own geolocation feature.  Users were now able to enable geolocation in their settings page to reveal the exact location of where they are tweeting from.  While twitter itself does not have an interface for mapping tweet locations, making the data available through their Geotagging API allows third party applications access to this information and map them accordingly.

 

Creating a Twitter Infographic Using Gephi

This infographic was created through a painstaking process that utilized almost 10 different applications to generate the final result. The main application used to create the word cluster graphic was Gephi, an open source platform that lets you visualize complex networked data elements in a visually compelling and interactive environment.  However, coming up with this particular end result was complicated by various factors, one of which was the complexity that arose from using Japanese characters in its analysis.

The Workflow

Step 1

The first step in this Japan Twitter project was to actually collect and archive the twitter data coming out of Japan after the earthquake.  For this, a cron job was written as a PhP script by David Shepard, a member of the UCLA Digital Humanities Collaborative. The script used the Twitter search API to find and filter tweets based on relevant hashtags, and dumping them into our own MySQL database.  The cron job ran every 3 minutes for 30 days, collecting over 650,000 tweets during this time period.

Once the Twitter data was safely in our MySQL database, I queried out and generated 30 separate text files, one for each day following the earthquake.  Each “day” file consisted of just the tweet text from the thousands of tweets that belonged to that day (on average there were about 20,000 tweets per day).

Here, you can see the number of tweets collected on an hourly basis:

Step 2

In order to capture the range of emotions through the different phases of recovery following the disaster, I followed a methodology employed by Eiji Aramaki from Tokyo University, who took the words from an Emotion Dictionary to extract emotion patterns in a set of text files.  Dr. Aramaki provided me with about 2000 of the most commonly used “emotion” words in the Japanese language, sub-divided into 10 different categories. A separate CSV file for each emotion was generated.

I then used WordSmith, an application that allows you to extract word patterns, to find concurrences of every emotion word against each “day” file.  Through WordSmith’s concordance tool, I was able to run a batch process that matched each of my 10 “emotion” files against each of my 30 “day” files.

Here is a screenshot of WordSmith’s concordance function:

Step 3

The data generated from WordSmith was exported as a series of spreadsheets. These spreadsheets were combined, merged, analyzed, and recalculated to produce a single matrix of emotion words by day. While I was able to do most of the work in Excel, because of varying language character problems, I was forced use Google Spreadsheets, mostly to generate the CSV file format that Gephi requires as an input source file (Excel lost the Japanese text on csv export, while Google did not).

In order to create an emotion “measure” for each day, the spreadsheet generated columns that counted the number of times each keyword was found in each of the 30 days. For example, for word 悲しみ (sadness) was found 0.5 times for every 10,000 tweets on March 11th, 3.1 times on March 12th, 325 times on March 13th, and so on.

Step 4

The heart of the word cluster analysis was conducted in Gephi.  Gephi requires you to define your data in two basic elements:  Nodes and Edges.  For this analysis, I chose to define these as follows:

Nodes:  Every emotion word, and every day was used and defined as a Gephi node

Edges:  Every connection between a “word” and a “day” was defined as an edge, and weighted by how many times that word was found for every 10,000 tweets, for each day.

Here is a screen shot of Gephi’s data view:

Once the data elements were defined, Gephi is ready to visualize (ie, the fun part!).  Gephi comes with many layout templates that you can choose from.  Each layout has its own built in algorithms that take the nodes and edges from your database to generate a network diagram.  I chose to use a layout called “Parallel Force Atlas” (it sure sounds good).  You can choose to size and/or color each node by different data attributes, and do the same for the edges, which serve as the connectors between the nodes.  You then press a button, configure a few parameters (such as “gravity”), and voila! you are introduced to a beautiful infographic.

Step 5

What I then thought would be an easy step to export the graphic and create a web viewer (for panning and zooming the huge image) turned out to be a much bigger task than I anticipated. First of all, the Gephi exporters failed to export the Japanese characters… with one exception: SVG format. For some reason, SVG was the only export format that allowed the Japanese characters to survive. Since I wanted to provide a web interface that allows for zooming and panning the graphic, I ended up choosing one that uses the OpenLayers javascript API, which is predominantly used for geo-spatial data visualizations, but also allows you to use on images.  In order to get the image ready for OpenLayers, I used MapTiler, an application that generates the different image “tiles” that are needed for the different zoom levels.  You can see a full screen version of the final infographic here.

Japan Earthquake: What are they tweeting about?

What are they tweeting about?

One key feature of social media is that it provides a snapshot of a moment’s mood, reflected by the content of what people are tweeting about in real time.  In order to analyze the emotional and psychological state of the nation in the days after the disaster, I have taken the tweet content text in the UCLA archive, and divided them into 30 text files, one for each day following the Earthquake, starting on March 11, 2011.  To measure day to day fluctuations of emotions, I will use a similar methodology employed by Eiji Aramaki PhD (Tokyo University) which takes words from an “Emotion Dictionary” (感情表現辞典) and matches it against the tweet content.  The dictionary classifies different emotions into 10 groups:

  1. 喜び – Happiness
  2. 怒る – Anger
  3. 哀しい – Sad
  4. 怖い – Fear
  5. 恥 – Shame
  6. 好き – Like
  7. 厭 – Unpleasant
  8. 昻 – Nervous
  9. 安 – Relief
  10. 驚く – Surprise
In order to visualize the relationship between various emotions keywords against the different days following the earthquake, a visualization was generated using Gephi.  The words are color coded by emotion type, and line thickness of the connectors represents the strength of the connection between the word and the days.

(view full screen)

Top 20 emotion words:

Word Emotion Category Per 10000 Tweets
1 like 3,242.58
2 relief 1,151.22
3 nervous 324.83
4 嬉し泣き happy 322.78
5 sad 322.78
6 誇る happy 292.07
7 心痛 fear 228.80
8 享楽 happy 121.40
9 relief 121.40
10 like 120.92
11 不安がる fear 104.83
12 傷付く sad 74.57
13 恐怖感 fear 53.37
14 悲しみ sad 51.65
15 愛情 like 47.32
16 難苦 unpleasant 45.29
17 怯れ fear 44.97
18 like 43.35
19 深謝 happy 38.77
20 驚愕 surprise 34.45

Emotions by Day

The following animated chart (press the play button to start it), shows the changes for each emotion category over the 30 days.

(view full screen)