-
Using the Geonames database with Django
April 2, 2008 at 15:07:43 CESTI've been using the Geonames database with Django for a while, as part of the ffloat.it project (which will be launched soon, stay tuned ;)). I've done some preprocessing to the data, normalizing some bits and denormalizing others, but always aiming for the best speed. My main needs were:
- Searching locations by name
- Finding the parent for a given location
- Finding children for a given location
I don't need to find locations by coordinates, so this first release is not optimized for that. In addition, this release has been tested with MySQL only, but I think I don't use any MySQL-specific features, so you may be able to use it with PostgreSQL changing the database driver in the geoimport.py script.
Next release, planned for somewhen in the next two weeks, will feature incremental updates support (currently, you'll need to import the full database at once) as well as PostgreSQL support.
Keep in mind this database is really huge, containing more than 6000000 (yep, six millions) locations and more than 2000000 internationalized names. Importing it takes more than an hour in my laptop and it needs about 1.5 GiB of disk space (I've massively indexed it).
Install instructions
Just unzip django-geonames-0.1.zip and drop the geo application inside your project. Add it to your INSTALLED_APPS, syncdb and then go to the geo directory and run the geoimport.py script (you'll need to specify database credentials and db name, run geoimport.py -h for help). I haven't included any documentation for the models, but i think, more or less, they should be self explanatory. Don't hesitate to ask any question, posting a comment to this entry.
I hope you find it usefull.
Wow, thanks man. I was about to write this by myself the next week..
Thanks for sharing.
Just one more thing:
If you use mysql with this (I don't have any deeper experience with postgres), http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html might be very helpful, i think you can get this down to minutes.
Hi Peritus,
Thanks for suggesting that document. I'll do some testing tomorrow to see if I can make importing faster.
I used the US Government data (contains named unpopulated places watering holes etc) it was about 6GB import and cleaning took about 8 minutes on my 1.5Ghz Celeron with 1.5GB memory into Postgres. It was all indexed and the average search time for most things was 2ms by name once it was all indexed.