Lookup country code by IP address using redis and python.
The solution:
A quick google search found several free sources of data. http://dev.maxmind.com/geoip/legacy/geolite/
I only care about country and not city or lat/lon or timezone. The "GeoLite Country" data will work. It is 95,000 lines of csv data updated once a month.
The format is "start IP","end IP", "start decimal", "end decimal", "country code", "country name". I'm using the csv module to parse it.
The redis sorted set takes a score and a unique element. Note that country code gets repeated in the data and so I made the country code unique by appending the IP.
Converting an IP address to decimal value would work easily with netaddr module. I don't have that on my mac, so I'm using the struct.unpack("!L",socket.inet_aton(ip))[0] method.
I also detect missing ranges in the data and insert "empty" for those ranges into the data.
The Lookup code:
import redis
import GeoIP
red = redis.Redis()
CC = GeoIP.ip_to_country("xx.xx.xx.xx")
print CC
GeoIP.ip_to_country(ip) will return the country code from the data or "empty" for empty blocks - say 127.0.0.1. It will return "unknown" for things outside the range or invalid formatted IP addresses
The data is about 20MB.
The module:
# The GeoLite databases are distributed under
# the Creative Commons Attribution-ShareAlike 3.0 Unported License.
# The attribution requirement may be met by including the following
# in all advertising and documentation mentioning features of or use of this database:
#
# This product includes GeoLite data created by MaxMind, available from
# http://www.maxmind.com.
import redis
def data_to_redis( myRedis, key, filename ):
import csv
# the data is csv. "start","end","d_start","d_end","country code","country"
# the redis data is a zlist. note the country code repeats
fp = open( filename, "r" )
count = 0
empty = 0
lastEnd = 0
csv_reader = csv.reader( fp )
for line in csv_reader:
#print line
try:
startIP,endIP,startDec,endDec,CC,country = line
except:
print line
#print "{0} {1} {2} {3} {4}".format( startIP, endIP, startDec, endDec, CC )
# use the startDec as the score
score = int(startDec.strip('" '))
endDec = int(endDec.strip('" '))
if score-1 > lastEnd:
# assume a missing block.
#print "missing block: {0} to {1}".format( lastEnd+1, score-1 )
myRedis.zadd( key, "empty|{0}".format(lastEnd), lastEnd+1 )
empty += 1
lastEnd = endDec
# use CC|startDec as the
member = CC.strip('" ')+"|"+str(score)
myRedis.zadd( key, member, score )
#if count > 10:
# print "early exit for debug"
# return
count += 1
print "added {0} records to {1}. empty blocks {2}".format( count, key, empty )
def ip_to_country( myRedis, key, ip ):
dec = ip_to_dec( ip )
data = myRedis.zrevrangebyscore( key, dec,0, num=1, start=0 )
if len(data) > 0:
CC,start = data[0].split("|")
else:
CC = "unknown"
return CC
def ip_to_dec( ip ):
import struct
import socket
"convert decimal dotted quad string to long integer"
# note the big vs little indian packing
return struct.unpack('!L',socket.inet_aton(ip))[0]
Todo:
add some more error checking and perhaps create a CC to country name hash in redis.
No comments:
Post a Comment