Friday, August 29, 2014

Unique IP in a log

I have a log file with fields and an IP address. I got asked to find the unique IPs today.

It took some internet searching to piece this together. So recording for later:

tail -1000 error.log | grep from | awk '{print $14}' | sort -n | uniq

Tail -N    Gets the last N lines from the log file. 
Grep       Filter out the lines for the ones I wanted. It has many lines without IPs

Those could be replaced with grep from error.log.
Or "cat error*.log"

awk is used to access fields in a record. This says...  show field 14.
That's the space delimited field were the IP is in this particular log.

sort sorts. uses -n to sort numerically.
uniq only keeps the unique records.

Wednesday, July 2, 2014

Code challenge: IP to country with redis

The challenge: 
Lookup country code by IP address using redis and python.

The solution:
A quick google search found several free sources of data.

I only care about country and not city or lat/lon or timezone. The "GeoLite Country" data will work. It is 95,000 lines of csv data updated once a month.

The format is "start IP","end IP", "start decimal", "end decimal", "country code", "country name". I'm using the csv module to parse it.

The redis sorted set takes a score and a unique element. Note that country code gets repeated in the data and so I made the country code unique by appending the IP.

Converting an IP address to decimal value would work easily with netaddr module. I don't have that on my mac, so I'm using the struct.unpack("!L",socket.inet_aton(ip))[0] method.

I also detect missing ranges in the data and insert "empty" for those ranges into the data.

The Lookup code:

import redis
import GeoIP 
red = redis.Redis() 
CC = GeoIP.ip_to_country("xx.xx.xx.xx") 
print CC 

GeoIP.ip_to_country(ip) will return the country code from the data or "empty" for empty blocks - say It will return "unknown" for things outside the range or invalid formatted IP addresses

The data is about 20MB.

The module:

# The GeoLite databases are distributed under
# the Creative Commons Attribution-ShareAlike 3.0 Unported License.
# The attribution requirement may be met by including the following
# in all advertising and documentation mentioning features of or use of this database:
# This product includes GeoLite data created by MaxMind, available from

import redis

def data_to_redis( myRedis, key, filename ):
    import csv

    # the data is csv. "start","end","d_start","d_end","country code","country"
    # the redis data is a zlist. note the country code repeats

    fp = open( filename, "r" )
    count = 0
    empty = 0
    lastEnd = 0
    csv_reader = csv.reader( fp )
    for line in csv_reader:
        #print line
            startIP,endIP,startDec,endDec,CC,country = line
            print line

        #print "{0} {1} {2} {3} {4}".format( startIP, endIP, startDec, endDec, CC )
        # use the startDec as the score
        score = int(startDec.strip('" '))
        endDec = int(endDec.strip('" '))
        if score-1 > lastEnd:
            # assume a missing block.
            #print "missing block: {0} to {1}".format( lastEnd+1, score-1 )
            myRedis.zadd( key, "empty|{0}".format(lastEnd), lastEnd+1 )
            empty += 1

        lastEnd = endDec
        # use CC|startDec as the
        member = CC.strip('" ')+"|"+str(score)
        myRedis.zadd( key, member, score )
        #if count > 10:
        #    print "early exit for debug"
        #    return
        count += 1
    print "added {0} records to {1}. empty blocks {2}".format( count, key, empty )

def ip_to_country( myRedis, key, ip ):
    dec = ip_to_dec( ip )
    data = myRedis.zrevrangebyscore( key, dec,0, num=1, start=0 )
    if len(data) > 0:
        CC,start = data[0].split("|")
        CC = "unknown"
    return CC

def ip_to_dec( ip ):
    import struct
    import socket
    "convert decimal dotted quad string to long integer"
    # note the big vs little indian packing
    return struct.unpack('!L',socket.inet_aton(ip))[0]

add some more error checking and perhaps create a CC to country name hash in redis.

Saturday, August 4, 2012

Dear Amazon

Dear Amazon. I realize this is an issue that impacts less than 0.001% of your customers. But we live in rural California and do not get mail delivery to our house. So we can either get USPS sent to our post office box or FedEx/UPS to our house address. Is it possible to flag our account to indicate mail only? Or FedEx only? Or more generally, that a PO Box can only be USPS? I realize you try to ship the cheapest way possible. I appreciate that. And when stuff comes, it is brilliant. But the stress of a package coming via FedEx to our PO Box that won't get delivered is dreadful. I'm not sure what language you write your software in, but perhaps one additional conditional would be able to resolve this for all of us on The Sea Ranch and other rural locations without mail delivery. Thanks for your help. bob

Sunday, May 27, 2012

Speech to Text with timeline data

I'm working on a book project where I have audio output of a reader and the associated text. I want to sync the two. My tool has a simple method... press the mouse down for each word as the audio is playing. Record the time. But sometimes (often), more fine tuning is needed. I figured that someone must have solved this already. I did a search and played with some tools. None of them were magic. The best thing so far is Audacity. A free audio tool. It has an automated bit that gets close. But I find having to edit that is harder than just going at it with the label track. My process: - zoom in. - use left-mouse and shift-left-mouse to change the selection. - then ctrl-left-mouse to play. - adjust the selection if necessary - ctrl-b to create a new label (I don't enter the work-- potentially useful) Walk through the text on a page. Export the label data. It creates a text file with two columns, start and end time. Adjusting my code to take start & end. And writing a perl/python script to convert to the Flash array. Only real question remaining is the format desired. The data files can be imported for editing. I think the logical thing to do would be to outsource someone. $5/page? It takes me about 10 minutes or so per page. Current book has 26 pages. I suspect I'll get faster.

Thursday, February 24, 2011

AMFPHP treats all ints as uints

Negative ints across the wire are deserialized as unsigned ints. Really big unsigned ints. Not sure how old of a version we are using.

Solvable though.

Sunday, January 30, 2011 - for reals?
Free basic system.

Saturday, January 29, 2011

Game jam 2011

Game jam update #1: Downloaded Flixel (an actionscript game engine) and went through some tutorials. Outlook: No finished game from me this weekend. If I had 48 hours of actual coding (and sleep at regular intervals), I know I could have something. However, I'm too likely to get distracted by the Mochi APIs or Amazon Web Services (for the server side) or a walk along the bluffs to make much progress tomorrow.