Importing GeoPlanet data into MySQL

The Yahoo! GeoPlanet data reappeared a few weeks ago and—as Twitter are now using WOEIDs for identifying locations—hopefully it's here to stay (apparently there was some work needed to untangle the source of certain pieces of data).

SQL schema for database tables:

CREATE TABLE `places` (
  `woeid` varchar(15) NOT NULL,
  `iso` varchar(6) NOT NULL,
  `name` text NOT NULL,
  `language` varchar(6) NOT NULL,
  `type` varchar(15) NOT NULL,
  `parent` varchar(15) NOT NULL,
  PRIMARY KEY (`woeid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `aliases` (
  `woeid` varchar(15) NOT NULL,
  `name` text NOT NULL,
  `name-type` varchar(6) NOT NULL,
  `language` varchar(6) DEFAULT NULL,
  KEY `woeid` (`woeid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Note: field sizes may not be optimal.

Fetching the latest data set:

wget 'http://developer.yahoo.com/geo/geoplanet/data/getLatest.php'
unzip geoplanet_data_*.zip
mv geoplanet_places*.tsv places.tsv
mv geoplanet_aliases*.tsv aliases.tsv

Importing the data into mysql:

DB=geoplanet # the name of the database to import into, containing the tables above
DB_USER=root # a user with write permission to the database above
mysqlimport --ignore-lines=1 --fields-optionally-enclosed-by='"' --fields-terminated-by="\t"  --user="$DB_USER" --password --local $DB places.tsv
mysqlimport --ignore-lines=1 --fields-optionally-enclosed-by='"' --fields-terminated-by="\t"  --user="$DB_USER" --password --local $DB aliases.tsv

It takes up about 400MB altogether.