Japan Earthquake: Locating the tweets (Part 4)

“Location” has become an essential component to many social media technologies.  Not only is it important to convey “what happened?”, but also to reveal “where” it happened.  Check-in technologies have become popularized by companies like Foursquare, Gowalla and Yelp, further blurring the lines between the content and the geo-coordinates.  Twitter has traditionally not put an emphasis on the notion of “place”, but they did announce, in late 2009, their own geolocation feature.  Users were now able to enable geolocation in their settings page to reveal the exact location of where they are tweeting from.  While twitter itself does not have an interface for mapping tweet locations, making the data available through their Geotagging API allows third party applications access to this information and map them accordingly.


Creating a Twitter Infographic Using Gephi

This infographic was created through a painstaking process that utilized almost 10 different applications to generate the final result. The main application used to create the word cluster graphic was Gephi, an open source platform that lets you visualize complex networked data elements in a visually compelling and interactive environment.  However, coming up with this particular end result was complicated by various factors, one of which was the complexity that arose from using Japanese characters in its analysis.

The Workflow

Step 1

The first step in this Japan Twitter project was to actually collect and archive the twitter data coming out of Japan after the earthquake.  For this, a cron job was written as a PhP script by David Shepard, a member of the UCLA Digital Humanities Collaborative. The script used the Twitter search API to find and filter tweets based on relevant hashtags, and dumping them into our own MySQL database.  The cron job ran every 3 minutes for 30 days, collecting over 650,000 tweets during this time period.

Once the Twitter data was safely in our MySQL database, I queried out and generated 30 separate text files, one for each day following the earthquake.  Each “day” file consisted of just the tweet text from the thousands of tweets that belonged to that day (on average there were about 20,000 tweets per day).

Here, you can see the number of tweets collected on an hourly basis:

Step 2

In order to capture the range of emotions through the different phases of recovery following the disaster, I followed a methodology employed by Eiji Aramaki from Tokyo University, who took the words from an Emotion Dictionary to extract emotion patterns in a set of text files.  Dr. Aramaki provided me with about 2000 of the most commonly used “emotion” words in the Japanese language, sub-divided into 10 different categories. A separate CSV file for each emotion was generated.

I then used WordSmith, an application that allows you to extract word patterns, to find concurrences of every emotion word against each “day” file.  Through WordSmith’s concordance tool, I was able to run a batch process that matched each of my 10 “emotion” files against each of my 30 “day” files.

Here is a screenshot of WordSmith’s concordance function:

Step 3

The data generated from WordSmith was exported as a series of spreadsheets. These spreadsheets were combined, merged, analyzed, and recalculated to produce a single matrix of emotion words by day. While I was able to do most of the work in Excel, because of varying language character problems, I was forced use Google Spreadsheets, mostly to generate the CSV file format that Gephi requires as an input source file (Excel lost the Japanese text on csv export, while Google did not).

In order to create an emotion “measure” for each day, the spreadsheet generated columns that counted the number of times each keyword was found in each of the 30 days. For example, for word 悲しみ (sadness) was found 0.5 times for every 10,000 tweets on March 11th, 3.1 times on March 12th, 325 times on March 13th, and so on.

Step 4

The heart of the word cluster analysis was conducted in Gephi.  Gephi requires you to define your data in two basic elements:  Nodes and Edges.  For this analysis, I chose to define these as follows:

Nodes:  Every emotion word, and every day was used and defined as a Gephi node

Edges:  Every connection between a “word” and a “day” was defined as an edge, and weighted by how many times that word was found for every 10,000 tweets, for each day.

Here is a screen shot of Gephi’s data view:

Once the data elements were defined, Gephi is ready to visualize (ie, the fun part!).  Gephi comes with many layout templates that you can choose from.  Each layout has its own built in algorithms that take the nodes and edges from your database to generate a network diagram.  I chose to use a layout called “Parallel Force Atlas” (it sure sounds good).  You can choose to size and/or color each node by different data attributes, and do the same for the edges, which serve as the connectors between the nodes.  You then press a button, configure a few parameters (such as “gravity”), and voila! you are introduced to a beautiful infographic.

Step 5

What I then thought would be an easy step to export the graphic and create a web viewer (for panning and zooming the huge image) turned out to be a much bigger task than I anticipated. First of all, the Gephi exporters failed to export the Japanese characters… with one exception: SVG format. For some reason, SVG was the only export format that allowed the Japanese characters to survive. Since I wanted to provide a web interface that allows for zooming and panning the graphic, I ended up choosing one that uses the OpenLayers javascript API, which is predominantly used for geo-spatial data visualizations, but also allows you to use on images.  In order to get the image ready for OpenLayers, I used MapTiler, an application that generates the different image “tiles” that are needed for the different zoom levels.  You can see a full screen version of the final infographic here.

Japan Earthquake: What are they tweeting about?

What are they tweeting about?

One key feature of social media is that it provides a snapshot of a moment’s mood, reflected by the content of what people are tweeting about in real time.  In order to analyze the emotional and psychological state of the nation in the days after the disaster, I have taken the tweet content text in the UCLA archive, and divided them into 30 text files, one for each day following the Earthquake, starting on March 11, 2011.  To measure day to day fluctuations of emotions, I will use a similar methodology employed by Eiji Aramaki PhD (Tokyo University) which takes words from an “Emotion Dictionary” (感情表現辞典) and matches it against the tweet content.  The dictionary classifies different emotions into 10 groups:

  1. 喜び – Happiness
  2. 怒る – Anger
  3. 哀しい – Sad
  4. 怖い – Fear
  5. 恥 – Shame
  6. 好き – Like
  7. 厭 – Unpleasant
  8. 昻 – Nervous
  9. 安 – Relief
  10. 驚く – Surprise
In order to visualize the relationship between various emotions keywords against the different days following the earthquake, a visualization was generated using Gephi.  The words are color coded by emotion type, and line thickness of the connectors represents the strength of the connection between the word and the days.

(view full screen)

Top 20 emotion words:

Word Emotion Category Per 10000 Tweets
1 like 3,242.58
2 relief 1,151.22
3 nervous 324.83
4 嬉し泣き happy 322.78
5 sad 322.78
6 誇る happy 292.07
7 心痛 fear 228.80
8 享楽 happy 121.40
9 relief 121.40
10 like 120.92
11 不安がる fear 104.83
12 傷付く sad 74.57
13 恐怖感 fear 53.37
14 悲しみ sad 51.65
15 愛情 like 47.32
16 難苦 unpleasant 45.29
17 怯れ fear 44.97
18 like 43.35
19 深謝 happy 38.77
20 驚愕 surprise 34.45

Emotions by Day

The following animated chart (press the play button to start it), shows the changes for each emotion category over the 30 days.

(view full screen)

Japan Earthquake: Collecting social media and ushahidi data

<< Part 1: How Twitter was used after the earthquake

In order to understand the impact that Twitter had in the post disaster relief efforts, I will look at two different data sources for analysis.

  1. UCLA’s Hypercities Japan Twitter Archive
    A team from UCLA’s Digital Humanities Group archived twitter feeds for 30 days following the disaster, collecting more than 650,000 tweets.  Using Twitter’s public search API,  Tweets were selected based on the following criteria:

    1. User’s location is in Japan
    2. Included one of the following hashtags
      1. #earthquake
      2. #sendai
      3. #jishin
      4. #tsunami
      5. #eqjp
      6. #pray4japan
      7. #japan
      8. #j_j_helpme
      9. #hinan
      10. #anpi
      11. #daijyoubu
      12. #311care
  2. Sinsai.info database
    Courtesy of Makoto Inoue, administrator for the sinsai.info ushahidi website, this database includes the official incident data of more than 20,000 reports curated and posted by hundreds of volunteers.  More than 80% of the reports came from Twitter.

UCLA’s Twitter Archive

UCLA’s archive was collected over a 30 day period, from March 10 – April 11, via a cron job that queried Twitter’s search API every 3 minutes to collect relevant tweets.  The tweets were subsequently saved on UCLA’s own database server.  While the archive has more than 650,000 records, it is a small portion of the supposed 700 million total tweets recorded during the same time period, but nevertheless represents an accurate sampling of the sentiment presented by the social web during this time.  One thing that should be noted is that the tweets were filtered by user’s locations, focusing only on users based in Japan.

Here’s a look at the raw numbers:

  • 666,552 Total number of tweets collected
  • 232,914 Distinct users
  • 558,040 Retweets (with the word “RT” in the text)
  • 186,697 Distinct tweets
These numbers reveal some interesting Twitter usage statistics:
  • 2.86 Average number of tweets per user during this 30 day period
  • 84% Percentage of tweets that were “retweets”

The following chart shows a temporal display of the number of tweets per hour:

It is interesting to note that the highest number of tweets per hour comes about a month after the earthquake on April 7th at 11:32pm.  This is likely to be due to the occurrence of the second largest aftershock that shook Japan at magnitude 7.1 (there was actually a 7.9 earthquake that followed 30 minutes after the main 9.0 earthquake on March 11th).  At a time when the psychological, emotional and physical state of the nation was still frayed, it portrays the existing fears and distress of the population, through tweets like these:

これ以上東北を苦しめないでくれ。胃が痛い。。 #sendai #jishin

Don’t make Northeast Japan suffer anymore. My stomach hurts.
これって、余震なのかな?新たな別の地震なのかな? #saigai #jishin

Is this an aftershock?  Or a new, different earthquake?
もうなんなの‥?何でこんなに、皆が怖くて辛い思いをしなくちゃいけないの‥?もう十分過ぎる程揺れたじゃん‥皆が何したってゆうの(´;ω;`)もう揺れるのやめてよ(´;ω;`) #jishin

What’s going on?  Why are we made to suffer so much?  Haven’t you shaken us enough?  What have we done to deserve this?

Where are the users from?

One of the criteria of the data collection was to filter those that included a user profile location.  Because of this, we are able to map the location of the users in this sample set during the 30 day period following the earthquake.  Many users had the same location in their profile, accounting for a total of only 14,607 distinct locations (out of a total of 666,507 tweets).  This means that many users had the same location in their profiles.  The following are the top 10 most “popular” user profile locations.  The location with the most users was in Shinjuku, Tokyo, with 24,169 users:

Location Count
1 東京都新宿区市谷本村町5-1 24169
2 東京都千代田区大手町 16346
3 東京都渋谷区神南2−2−1 14981
4 島根県松江市 14857
5 東京都千代田区霞が関 中央合同庁舎5号館 13913
6 渋谷区, 東京都 JP 9297
7 東京都新宿区(Tokyo Shinjuku) 7845
8 東京都千代田区霞が関 7563
9 仙台市, 宮城県 JP 7450
10 Tokyo ときどき Kyoto 7311

Out of the top 10 locations, only 3 are located outside of Tokyo.  In number 4 is an odd Shimane Prefecture.  In number 9 comes Sendai, Miyagi, which was the region most devastateed by the Tsunami.  In number 10 is “Tokyo, sometimes Kyoto”.

Japan Earthquake: How was social media used?

This is an exploratory paper on a look at how locational technologies were not effectively utilized during the Japan Earthquake despite their availabilities through social media and mobile devices.   It also looks at how geo-enabling might be used to monitor future disaster relief efforts.

Part 1:  How Twitter was used after the Earthquake

For many us, the moments during and after March 11th, 2011 were both harrowing and unreal, as we saw the horrors of the Japan Earthquake and Tsunami unfold.  For those of us who were not physically in Japan, we were forced to look upon the disaster in despair, helpless to provide any immediate assistance.  What made this disaster closer to us, in some ways even personal to the global audience, was the abundance of social media streams that allowed the world to feel the pain, see user generated media content, and listen to what was going on… in real time.

My uncle’s house is underwater because of the tsunami. He is stuck on the second floor. Please save him. Ishinomaki-shi 3-2-26 #j_j_helpme

On March 11th, 2011, the tweet shown above was seen on Twitter.  It was a plea for help from a woman trying to save her uncle, trapped on the second floor of his house that was in a flood zone caused by the tsunami disaster in Ishinomaki.  She added the hashtag #j_j_helpme which was designated to be used for people seeking help in the aftermath of the earthquake.  Her plea for help was retweeted, over and over again.  She even left an address that allowed us to be able to locate her uncle.  Looking at the location on a map, sure enough, we find out that her uncle’s house was located in one of the hardest hit residential areas inundated by the tsunami.

Uncle’s house is in the tsunami flood zone

While it is unclear as to whether or not her tweet actually mobilized relief agencies to save her uncle, we are able to follow her thread by “following” her via her twitter account, and find out that just a few days later, she posted the following message:

“My uncle was rescued! Thank you everybody! I pray that others will be rescued as well!!!”

The power of the social web

It was through moments like these, following stories via the social web, that enabled many of us from around the world to experience what was happening on the ground, as if we were there.  In some ways, the spatial boundaries were bridged through the power of social media.  The social fabric of the nation quickly revolved around the usage of Twitter as the primary mode of communication, from requesting medical aid, assistance, seeking information about missing people, sending encouragement, and also reporting damage and transportation infrastructure statuses.  While Twitter was used predominantly to talk about entertainment and anime before the earthquake, it quickly morphed into something entirely different on the day of the disaster, where 72% of the topics were related to the Earthquake, and another 8% were on transportation.

Before and after Twitter topics

In some ways, Twitter became the virtual bulletin board for exchanging valuable information, disseminating it to the public, and utilizing the social networks to “spread the word” quickly and effectively.  For March 11 alone, 33 million tweets were reported in Japan, almost double the average daily usage.  Over the next 30 days, more than 700 million tweets were reported.  Out of a total population of 128 million, that is a lot of tweets, even when you take into account the fact that most users tweet multiple times.


The power of the “re”tweet

Part of the intrigue, and power of the social web, lies in its ability to transmit data through a multitude of networks that grows exponentially the more “popular” the information is.  In Twitter, this is accomplished through its “retweet”ing capabilities, the simple notion of sharing a tweet with others in your network, and subsequently having people in your network retweeting it again, until a single tweet reaches a massive audience, sometimes in a matter of hours.

In the case of the tweets related to the Earthquake, retweeting was used effectively to communicate infrastructure damage, missing person notices, and even announcing relevant hashtags. Here are some examples of tweets that were retweeted more than a thousand times:

Tweet from a hospital director in Miyagi announcing that 30 patients are near starvation, seeking food, medical equipment and fuel.

RT @tamtamhirai: 30人以上が餓死寸前です。食料、医薬品、燃料を至急お願いします。 宮城の県南地域には物資が全く来ません。 地域住民が30万人以上、孤立状態です。メディアに無視されてる地域です。助けてください 宮城県柴田町仙南中央病院病院長 鈴木健#311sppt #j_j_helpme

If you have crayons, leave them close to the children.  Children have the ability to draw what they are unable to communicate.  Do not stop them even if they draw pictures of dead bodies or violent scenes.  Drawing allows them to express their feelings, and helps them cope with the situation.

RT @syadoyama: 【保護者さまへ】クレヨン等があったら、子どもが使えるところへ置いてあげてください。阪神大震災の時も、子ども達の心に傷が残りました。表現できない感情をこどもは絵にします。死体の絵など暴力的な絵を描いても止めないでください。描くことで吐き出し、癒されていきます。#jishin

People in Iwaki are dying.  The media is going to Miyagi and Iwate, places that are safe to visit.  Iwaki has no food or water.  There is no media present.  There is no gasoline, and therefore no way to leave.
RT @lacrima_night: いわきの人、餓死してしまいます。マスコミは、宮城とか岩手とか、自分達が行っても安全な所しか報道していません。いわきは屋内退避になってから、食糧が届かない、給水もしてないという状況です。マスコミの報道も一切ありません。ガソリンもなくて、いわきから出ることもできません#jishin
RT @koollifekit: @naokoken77 非常に重要なことなので見やすくまとめます【被災地の方へ・救難信号】食料と水「F」医者「―」医療品「=」着陸可能「△」燃料「L」このマークを空からわかるように示しておけば海外の救助隊にも通じるそうです。 #j_j_helpme #save

The single most retweeted tweet

Among all the tweets, there was a single tweet that got retweeted more than 20,000 times:

RT @NamicoAoto: 父が明日、福島原発の応援に派遣されます。半年後定年を迎える父が自ら志願したと聞き、涙が出そうになりました。「今の対応次第で原発の未来が変わる。使命感を持っていく。」家では頼りなく感じる父ですが、私は今日程誇りに思ったことはありません。無事の帰宅を祈ります。#jishin

“My eyes filled up with tears when I heard that my father volunteered to go the the Fukushima Nuclear Plant, even though he will be retiring in just half a year. He said that “the future of this nuclear crisis depends on what we do now, and I must go.” At home, he is not always the most reliable father…but today, I have never felt as proud of him. And I pray for his safe return.”

While most tweets were informational in nature, the most “popular” tweet in Japan following the earthquake was about courage and sacrifice.  Because the effects of radiation typically takes years to kick in, it was the older generation that stepped up to the plate to go to the front lines, risking exposure, but knowing that they had fewer years to live than their younger counter-parts. In many ways, symbolizing the spirit of the Japanese people during these trying times, even prompting the Prime Minister to proclaim to these volunteers, “You are the only ones who can resolve a crisis. Retreat is unthinkable,” according to the Financial Times.


Just a day after the earthquake, Twitter announced a set of recommended hashtags to be used to categorize specific post-disaster situational needs:

General earthquake information: #jishin

Requests for rescue or other aid: #j_j_helpme

Evacuation information: #hinan

Confirmation of safety of individuals, places, etc.: #anpi

Medical information for victims: #311care

This information quickly went viral, and became recognized as the “official” hashtag by the public as was noted by many of the retweets regarding this announcement:

Tweet announcing the “official” hashtags to be used:

RT @Nya_Nya_JAM: twitter社より。統一のハッシュタグなどが発表になりました。情報の統合に協力しましょう。#jishin: 地震一般に関する情報#j_j_helpme :救助要請#hinan :避難#anpi :安否確認#311care: 医療系被災者支援情報

Part 2 of this paper will look at how Twitter, through the use of these hashtags were used during the days after the disaster.

Part 2:  Analyzing Post Disaster Twitter Data >>