onehourhacks-GameSpot: XKCD Fetcher

Problem: You want every XKCD in existence on your harddrive. Solution: Pirate this script!

Instead of going through and raping the XKCD website, they have provided a nice JSON document for every comic. The script below can produce two outputs, one is a nice list of comics that includes number, title, and alt-text, that looks like this:

405 - Journal 3 - Oh, and, uh, if the Russian government asks, that submarine was always there.
406 - Venting - P.P.S. I can kill you with my brain.
407 - Cheap GPS - In lieu of mapping software, I once wrote a Perl program which, given a USB GPS receiver and a destination, printed 'LEFT' 'RIGHT' OR 'STRAIGHT' based on my heading.
408 - Overqualified - To anyone I've taken on a terrible date, this is retroactively my cover story.
409 - Electric Skateboard (Double Comic) - Unsafe vehicles, hills, and philosophy go hand in hand.
410 - Math Paper - That's nothing. I once lost my genetics, rocketry, and stripping licenses in a single incident.

The other will just print a list of URLs to stdout, that you could write to a file or possibly pipe to a wget script. To use this function just invoke the program with a -w option.

One caveat, this script will only fetch comics until the second number in the range statement, so change it before running. This script is licensed under an Apache/GPL/MIT/Other OSI approved license, just be sure to attribute me, thanks :) Remember that XKCD comics are CCbyNC so you are free to copy them to your heart's content.

#!/usr/bin/env python
#Copyright 2011 Joseph Lewis <joehms22 gmail com>
#MIT License/GPL 2+ License/Apache License

import urllib2
import time
import sys

WGET = False
try:
    if sys.argv[1] == "-w":
        WGET = True
except IndexError:
    pass

for i in range(1, 850):
    if i == 404:  #Skip the 404th comic which is an error 404 page.
        continue
    url = "http://xkcd.com/%i/info.0.json" % (int(i))
    comic_json = urllib2.urlopen(url)
    time.sleep(.2)
    comic = eval( comic_json.read() )

    number = comic['num']
    title  = comic['title']
    image  = comic['img']
    alt    = comic['alt']

    if WGET:  #Just list urls for wget :)
        print(image)
    else:
        print("<li><a href='%s'>%s - %s</a> - %s</li>\n" % (image, number, title, alt))

onehourhacks-GameSpot

Saturday, February 26, 2011

XKCD Fetcher

No comments:

Post a Comment