Back in June 18th, 2011, I had the good fortune to be invited to speak at the inaugural TEDx event at UCLA. I took the opportunity to present about the post disaster situation in Japan and spoke of the potential that social media and locational technologies hold for future crisis management and awareness.
In concordance with previous analyses, the most noticeable observations come in the “fear” emotion on April 7th. Looking at the “Earthquake magnitude” chart, one can see that the second largest aftershock occurs on that day, bringing meaning to the notion that “fear” was a predominant emotional reaction to an already stressed nation at the time. One can also depict that while the April 7th 7.1 magnitude earthquake was the largest aftershock since the big one on March 11th, that the country was consistently rocked throughout, averaging more than 10 earthquakes a day. However, as the earthquake chart reveals, the number of quakes had tailored off considerably over time, perhaps causing it to expose even more shock value to the “big” 7.1 quake, at a time when the people were starting to feel a level of normality.
“Location” has become an essential component to many social media technologies. Not only is it important to convey “what happened?”, but also to reveal “where” it happened. Check-in technologies have become popularized by companies like Foursquare, Gowalla and Yelp, further blurring the lines between the content and the geo-coordinates. Twitter has traditionally not put an emphasis on the notion of “place”, but they did announce, in late 2009, their own geolocation feature. Users were now able to enable geolocation in their settings page to reveal the exact location of where they are tweeting from. While twitter itself does not have an interface for mapping tweet locations, making the data available through their Geotagging API allows third party applications access to this information and map them accordingly.
This infographic was created through a painstaking process that utilized almost 10 different applications to generate the final result. The main application used to create the word cluster graphic was Gephi, an open source platform that lets you visualize complex networked data elements in a visually compelling and interactive environment. However, coming up with this particular end result was complicated by various factors, one of which was the complexity that arose from using Japanese characters in its analysis.
The first step in this Japan Twitter project was to actually collect and archive the twitter data coming out of Japan after the earthquake. For this, a cron job was written as a PhP script by David Shepard, a member of the UCLA Digital Humanities Collaborative. The script used the Twitter search API to find and filter tweets based on relevant hashtags, and dumping them into our own MySQL database. The cron job ran every 3 minutes for 30 days, collecting over 650,000 tweets during this time period.
Once the Twitter data was safely in our MySQL database, I queried out and generated 30 separate text files, one for each day following the earthquake. Each “day” file consisted of just the tweet text from the thousands of tweets that belonged to that day (on average there were about 20,000 tweets per day).
Here, you can see the number of tweets collected on an hourly basis:
In order to capture the range of emotions through the different phases of recovery following the disaster, I followed a methodology employed by Eiji Aramaki from Tokyo University, who took the words from an Emotion Dictionary to extract emotion patterns in a set of text files. Dr. Aramaki provided me with about 2000 of the most commonly used “emotion” words in the Japanese language, sub-divided into 10 different categories. A separate CSV file for each emotion was generated.
I then used WordSmith, an application that allows you to extract word patterns, to find concurrences of every emotion word against each “day” file. Through WordSmith’s concordance tool, I was able to run a batch process that matched each of my 10 “emotion” files against each of my 30 “day” files.
Here is a screenshot of WordSmith’s concordance function:
The data generated from WordSmith was exported as a series of spreadsheets. These spreadsheets were combined, merged, analyzed, and recalculated to produce a single matrix of emotion words by day. While I was able to do most of the work in Excel, because of varying language character problems, I was forced use Google Spreadsheets, mostly to generate the CSV file format that Gephi requires as an input source file (Excel lost the Japanese text on csv export, while Google did not).
In order to create an emotion “measure” for each day, the spreadsheet generated columns that counted the number of times each keyword was found in each of the 30 days. For example, for word 悲しみ (sadness) was found 0.5 times for every 10,000 tweets on March 11th, 3.1 times on March 12th, 325 times on March 13th, and so on.
The heart of the word cluster analysis was conducted in Gephi. Gephi requires you to define your data in two basic elements: Nodes and Edges. For this analysis, I chose to define these as follows:
Nodes: Every emotion word, and every day was used and defined as a Gephi node
Edges: Every connection between a “word” and a “day” was defined as an edge, and weighted by how many times that word was found for every 10,000 tweets, for each day.
Here is a screen shot of Gephi’s data view:
Once the data elements were defined, Gephi is ready to visualize (ie, the fun part!). Gephi comes with many layout templates that you can choose from. Each layout has its own built in algorithms that take the nodes and edges from your database to generate a network diagram. I chose to use a layout called “Parallel Force Atlas” (it sure sounds good). You can choose to size and/or color each node by different data attributes, and do the same for the edges, which serve as the connectors between the nodes. You then press a button, configure a few parameters (such as “gravity”), and voila! you are introduced to a beautiful infographic.
What are they tweeting about?
One key feature of social media is that it provides a snapshot of a moment’s mood, reflected by the content of what people are tweeting about in real time. In order to analyze the emotional and psychological state of the nation in the days after the disaster, I have taken the tweet content text in the UCLA archive, and divided them into 30 text files, one for each day following the Earthquake, starting on March 11, 2011. To measure day to day fluctuations of emotions, I will use a similar methodology employed by Eiji Aramaki PhD (Tokyo University) which takes words from an “Emotion Dictionary” (感情表現辞典) and matches it against the tweet content. The dictionary classifies different emotions into 10 groups:
- 喜び – Happiness
- 怒る – Anger
- 哀しい – Sad
- 怖い – Fear
- 恥 – Shame
- 好き – Like
- 厭 – Unpleasant
- 昻 – Nervous
- 安 – Relief
- 驚く – Surprise
Top 20 emotion words:
|Word||Emotion Category||Per 10000 Tweets|
Emotions by Day
The following animated chart (press the play button to start it), shows the changes for each emotion category over the 30 days.