Japan Earthquake: Collecting social media and ushahidi data

In order to understand the impact that Twitter had in the post disaster relief efforts, I will look at two different data sources for analysis.

  1. UCLA’s Hypercities Japan Twitter Archive
    A team from UCLA’s Digital Humanities Group archived twitter feeds for 30 days following the disaster, collecting more than 650,000 tweets.  Using Twitter’s public search API,  Tweets were selected based on the following criteria:

    1. User’s location is in Japan
    2. Included one of the following hashtags
      1. #earthquake
      2. #sendai
      3. #jishin
      4. #tsunami
      5. #eqjp
      6. #pray4japan
      7. #japan
      8. #j_j_helpme
      9. #hinan
      10. #anpi
      11. #daijyoubu
      12. #311care
  2. Sinsai.info database
    Courtesy of Makoto Inoue, administrator for the sinsai.info ushahidi website, this database includes the official incident data of more than 20,000 reports curated and posted by hundreds of volunteers.  More than 80% of the reports came from Twitter.

UCLA’s Twitter Archive

UCLA’s archive was collected over a 30 day period, from March 10 – April 11, via a cron job that queried Twitter’s search API every 3 minutes to collect relevant tweets.  The tweets were subsequently saved on UCLA’s own database server.  While the archive has more than 650,000 records, it is a small portion of the supposed 700 million total tweets recorded during the same time period, but nevertheless represents an accurate sampling of the sentiment presented by the social web during this time.  One thing that should be noted is that the tweets were filtered by user’s locations, focusing only on users based in Japan.

Here’s a look at the raw numbers:

  • 666,552 Total number of tweets collected
  • 232,914 Distinct users
  • 558,040 Retweets (with the word “RT” in the text)
  • 186,697 Distinct tweets
These numbers reveal some interesting Twitter usage statistics:
  • 2.86 Average number of tweets per user during this 30 day period
  • 84% Percentage of tweets that were “retweets”

The following chart shows a temporal display of the number of tweets per hour:

It is interesting to note that the highest number of tweets per hour comes about a month after the earthquake on April 7th at 11:32pm.  This is likely to be due to the occurrence of the second largest aftershock that shook Japan at magnitude 7.1 (there was actually a 7.9 earthquake that followed 30 minutes after the main 9.0 earthquake on March 11th).  At a time when the psychological, emotional and physical state of the nation was still frayed, it portrays the existing fears and distress of the population, through tweets like these:

これ以上東北を苦しめないでくれ。胃が痛い。。 #sendai #jishin

Don’t make Northeast Japan suffer anymore. My stomach hurts.
これって、余震なのかな?新たな別の地震なのかな? #saigai #jishin

Is this an aftershock?  Or a new, different earthquake?
もうなんなの‥?何でこんなに、皆が怖くて辛い思いをしなくちゃいけないの‥?もう十分過ぎる程揺れたじゃん‥皆が何したってゆうの(´;ω;`)もう揺れるのやめてよ(´;ω;`) #jishin

What’s going on?  Why are we made to suffer so much?  Haven’t you shaken us enough?  What have we done to deserve this?

Where are the users from?

One of the criteria of the data collection was to filter those that included a user profile location.  Because of this, we are able to map the location of the users in this sample set during the 30 day period following the earthquake.  Many users had the same location in their profile, accounting for a total of only 14,607 distinct locations (out of a total of 666,507 tweets).  This means that many users had the same location in their profiles.  The following are the top 10 most “popular” user profile locations.  The location with the most users was in Shinjuku, Tokyo, with 24,169 users:

Location Count
1 東京都新宿区市谷本村町5-1 24169
2 東京都千代田区大手町 16346
3 東京都渋谷区神南2−2−1 14981
4 島根県松江市 14857
5 東京都千代田区霞が関 中央合同庁舎5号館 13913
6 渋谷区, 東京都 JP 9297
7 東京都新宿区(Tokyo Shinjuku) 7845
8 東京都千代田区霞が関 7563
9 仙台市, 宮城県 JP 7450
10 Tokyo ときどき Kyoto 7311

Out of the top 10 locations, only 3 are located outside of Tokyo.  In number 4 is an odd Shimane Prefecture.  In number 9 comes Sendai, Miyagi, which was the region most devastateed by the Tsunami.  In number 10 is “Tokyo, sometimes Kyoto”.

