Wednesday, December 28, 2011

UNIX Utilities: Echo

This is the first post of what I hope to be many describing UNIX command line utilities, first up is the "echo" command:

/**
* A copy of the "echo" command on UNIX, missing a few options, but it
* does what is normally asked of it.
*
* Copyright 2011-12-23 Joseph Lewis <joehms22 gmail com>
*/
#include <iostream>

using namespace std;

int main(int nargs, char* vargs[])
{
for(int i = 0; i < nargs - 1; i++)
{
if(i != 0)
cout << " ";
cout << vargs[i + 1];
}
cout << "\n";
}

Monday, December 26, 2011

Speed up SSH and FTP logins

Scenario: You're working in the office, and a foot of snow comes down out side, killing your Internet connection, all of a sudden, your FTP server on the local network starts timing out, and SSHing in to the server is unbearably slow to get to the password prompt.

What happened is that FTP and SSH services do a reverse DNS query to ensure that hosts actually are who they say they are, when your connection went out, they were no longer able to do this, but keep trying anyway, slowing things down to the point of frustration or timeout.

The fix is simple, but makes your connections a little less secure, as this check won't be performed on all connections, not a big deal if your boxes only face internal networks.

First, (assuming you're using openssh) fix SSH:

echo "UseDNS no" >> /etc/ssh/sshd_config

Next (assuming your're using pure-ftpd) fix FTP:

echo 'yes' > /etc/pure-ftpd/conf/DontResolve

Restart both services through /etc/init.d or upstart and you're set!

Saturday, December 17, 2011

Watching A Directory For Changes

This is a very simple VALA script that watches a directory for changes (not including sub-directories) then prints out the changed files.

Usage: changewatch /path/to/folder/to/watch


/**
* Watches a directory for changes:
*
* compile: valac changeWatch.vala --pkg gio-2.0
* Copyright 2011-12-09 Joseph Lewis
* Apache License
*/

using GLib;

void on_change (File f) {
print(f.get_path() + "\n");
}

void main (string[] argv) {

if(argv[1] == null)
{
print("Usage: "+argv[0]+" /path/to/file/to/watch\n");
return;
}

GLib.File fp = File.new_for_path(argv[1]);

GLib.FileMonitor mon1;

try {
mon1 = fp.monitor_directory(
GLib.FileMonitorFlags.NONE
);
mon1.changed.connect(on_change);
print("Monitoring: "+fp.get_path()+"\n");
} catch (GLib.Error e) {
print("Error: "+e.message+"\n");
}
GLib.MainLoop loop = new GLib.MainLoop();
loop.run();
}

Friday, December 2, 2011

Choosing Your Next Webserver

Being an occasional web developer, I like to be able to debug on my local machine. My old friend Apache just wasn't doing it anymore however, because it made my computer boot incredibly slow, I didn't need the power, and the configuration was giving me a headache.

I went in to Synaptic Package Manager and pulled out three others (nginx which I had heard good things about, Cherokee which seemed to have a bunch of packages, and nanoweb a PHP based server that I had never heard of).

Apache
I used gnome-system-monitor for profiling each of the programs as they were running, Apache had 3 threads/child processes running at 7Mb and 4 at 5Mb. That, along with a hefty and often confusing configuration is why I dropped it.

The amount of RAM to beat then was 41 Mb.


Nginx (Engine X)
I first tried a server that I had been dying to give a go for quite some time, Nginx, it runs popular sites such as http://dearblankpleaseblank.com and according to its online site excels at serving static pages.

The configuration didn't really seem like anything I had dealt with before, it was kind of like JSON, kind of like INI and took quite a bit to understand, the online manual wasn't much help and the examples in the file were minimal.

I wasn't able to get PHP running with it even after following a few tutorials. I'm sure you can, but I decided to move on.

The rest of the directory structure was kind of like Apache, with the same sites_available folders and such. However the default site was in /usr/ somewhere rather than /var/www, bizarre.

While it was running it had 5 threads/child processes running at about 2Mb a piece, 10Mb, not bad.


nanoweb
Next up I installed "nanoweb" a simple webserver written in PHP5. After rebooting I found that the default page provided some help on getting things configured, and there was some kind of configuration library that was available as a secondary package, although this would have also been available outside of localhost, and therefore exposed your server to threats.

I took a look at the configuration files and found three, much nicer than Apache. One seemed to be for CGI, I didn't have a reason to look at that, but I was able to see it had support for vhosts (in a very simple ini style file, it looked like each vhost would have taken something like five lines to set up).

I configured my site through the main file replacing all of the default "It Works" style site's values with mine. A reboot worked flawlessly and I was able to start working on my site right away. The file had lots of documentation and tips for configuration.

Using my profiling tool I saw that nanoweb used 3 threds/subprocesses with an average of 7Mb overhead each, undoubtedly due to PHP being a scripted language. 21Mb, not as good as nginx, but not as bad as Apache.

Cherokee
I didn't want to give up the nice nanoweb configuration I had, but I had read some things about it and I wanted to try it.

The install went really fast, unlike nanoweb that took about 20Mb of packages Cherokee must have taken more like four. Probably due to it's base requirements being no more than the C standard library.

I found that to configure you had to run a utility called "cherokee-admin" once run it gives you a username, temporary password, and opens a server on port 9090 until you close the admin interface.

Once logged in the GUI is amazingly nice, giving a nice profile of your machine, easy access to all virtual hosts, and wizards to set up most things. Enabling PHP on my site took four clicks, and it seems that most things are configured for security. Even applying the changes was nearly instantaneous as the web-server can do a restart from the interface.

While running the whole thing has two threads of 2Mb each, 4Mb.

Wednesday, November 23, 2011

Database Password Hashing

Okay, so you want to make sure your database is secure, no matter what your front-end is? Here is a simple script that does exactly that, hashes passwords in the tables as they go in with a salt and provides an easy way to check logins against the user table. If you want, you can even hash user-names too, that way even if your database is completely stolen it isn't insecure:


-- Users is a table, password hash, up to 100 chars, currently used is
-- sha1 because it is nativly supported in PHP.
CREATE TABLE `Users` (
`usr_id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`username` VARCHAR(20) NOT NULL,
`usr_password` VARCHAR(100) NOT NULL
) ENGINE = INNODB;

-- Yes, secure things as they are added, using the username as a salt.
CREATE TRIGGER user_input BEFORE INSERT ON `Users` FOR EACH ROW SET NEW.usr_password = SHA1(CONCAT(NEW.usr_password, NEW.username));

CREATE TRIGGER user_update BEFORE UPDATE ON `Users` FOR EACH ROW SET NEW.usr_password = SHA1(CONCAT(NEW.usr_password, NEW.username));

-- Checks if a user with the given username and password exists in the
-- db, returns null if no and the ID if true.
delimiter |
CREATE FUNCTION CHECK_USER (usrname VARCHAR(20), password VARCHAR(1000))
RETURNS INT DETERMINISTIC
BEGIN
DECLARE tmp INT;
SELECT usr_id INTO tmp FROM Users WHERE username = usrname AND usr_password = SHA1(CONCAT(password,usrname));
RETURN tmp;
END|
delimiter ;

A small sample of the program in action:

    mysql> select * from Users;  
Empty set (0.00 sec)

mysql> INSERT INTO Users (username, usr_password) VALUES ("Cookie", "Monster"), ("Big", "Bird");
Query OK, 2 rows affected (0.07 sec)
Records: 2 Duplicates: 0 Warnings: 0

mysql> select * from Users;
+--------+----------+------------------------------------------+
| usr_id | username | usr_password |
+--------+----------+------------------------------------------+
| 3 | Cookie | 00c66ad8335364be46f67da3699be142189b2aa9 |
| 4 | Big | 4e49d9c043ee8733f6019cc777d65421721333c7 |
+--------+----------+------------------------------------------+
2 rows in set (0.00 sec)

mysql> SELECT CHECK_USER("Big", "Bird");
+---------------------------+
| CHECK_USER("Big", "Bird") |
+---------------------------+
| 4 |
+---------------------------+
1 row in set (0.01 sec)

mysql> SELECT CHECK_USER("Oscar", "Grouch");
+-------------------------------+
| CHECK_USER("Oscar", "Grouch") |
+-------------------------------+
| NULL |
+-------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> SELECT CHECK_USER("Big", "Baby");
+---------------------------+
| CHECK_USER("Big", "Baby") |
+---------------------------+
| NULL |
+---------------------------+
1 row in set, 1 warning (0.00 sec)

mysql>

Note that while all hashing may be done in the DB, it is still recommended you do it server side/client side too, as passwords may end up in SQL log files if not hashed beforehand. Everything will still work even if a password is hashed multiple times.

Saturday, October 29, 2011

.exe file String extractor.

I recently had the need to quickly and dirtily find some strings from a Windows .exe file on Linux.

Going through with HexEditor was a pain in the arse. There seemed to be a few programs for this on windows, which would work messily under Wine. It was easier to write my own.

The Rules:
  • Strings must be ASCII.
  • Strings must be greater than 4 characters.
  • Strings must be at least 50% different characters. (i.e. www.com.com fails, while www.google.com succeeds)
This still leaves you with a lot of junk, but at least you can easily see what is junk and not.

#!/usr/bin/env python
'''A program that extracts strings from binary files.

Usage: reader.py path/to/file
'''

import sys

numstrs = 0
with open(sys.argv[1]) as j:
j = j.read()
mystr = ""
for i in range(0,len(j)):
if 31 < ord(j[i]) < 127:
mystr += j[i]
else:
# If the string isn't that long discard it.
if len(mystr) > 4:
uniqs = set()
for char in mystr:
uniqs.add(char)

# If duplicate chars are less than half the string
if len(uniqs) > .5 * len(mystr):
print mystr
numstrs += 1

mystr = ""

print "==========\n\nOutput: %s strings" % numstrs

Some of you may be thinking, why do you care? Well, this is a great way to quickly see the human interactions going on within a binary: what libraries are being loaded, what websites are being accessed, what programs are being called, or even give you cheats by allowing you to look at some dialogs not normally shown.

Sample: SimAnt
Extended sections of junk have been replaced by "...".


...
Copyright (C)1990, Daniel Goldman
...
96|yr
y, <Xw
<st'<ntQ<pt`<Et
...
eov0001:
Cannot find overlay file "
eov0002:
eov0003:
eov0007:
Commit error - section
eov0009:
eov0004:
-- use RELOAD to increase size
...
Expanded Memory
Extended Memory
Conventional Memory
eca0001:
.RTLink CACHE -
Handling
Function code
, Error code
RTVMEXP
RTVMEXT
RTVMCONV
RTOVEXP
RTOVEXT
RTOVCONV
eov0010:
evm0019:
Environment syntax error --
(last char is erroneous)
Fatal Error $
Press any key to terminate.$PQWV
...
pError!!
pNOTE: SimAnt
runs radically
p faster in 16-color mode!
pPlease refer to the
pSimAnt
manual.
pSave Game As:
pThe name 'SimAnt
' is reserved
pfor this application.
...
"'*'%"
pSimAnt
Saved Game
how to edit/create one.
HIGHLIGHTED
CHITIN
MAXILLAE
OCELLI
INFRABUCCAL
MIDGUT
TROPHALLAXIS
CASTE
LARVAE
PUPATE
CASTES
PUPAE
ECLOSE
CALLOWS
PHEROMONES
BROOD
HONEYDEW
APHIDS
BREEDERS
RECRUIT
FORAGE
FORAGING
QUEENCHAMBER
NURSERIES
MS Run-Time Library - Copyright (c) 1990, Microsoft Corp
...
DACsample
FREELIST > 38
winheaders
Sound Driver reports driver: %d
%d DAC channels
%d FM channels
%d TANDY PSG channels
%d PS1 PSG channels
%d COVOX PSG channels
and %d MIDI channels
MUSIC INIT!
UPDATEEVERYTHING!
WAIT YES
result=%d
Yardmode=%d, newMode=%d
i=%d, j=%x
= %x
EMS version: %x
total pages: %d
free pages: %d
TILESXXX
emstiles
Tile >= 256 in PutLifeAndTile
Tile >= 256 in PutLifeAndTile
PutTile grndTile > 0x100
Y>MAXAY
SimAnt
Vis message %s at %d,%d
balbuf
Cannot GetResource (HEX, %d), ResErr %d
MemDeath
Warning: Can't load strings: %d
strptrs
NewPtr argument too large
%-ld%%
BAD SWITCH
USAGE: SimAnt [/d{EeTHMm2V}] [/s{NABCI}]
SimAnt configuration file missing
simant.cfg
install.exe
DOS Error %d: %s
sound
SimAnt cancelled.
BAD font in MakeBaloon
balloon
@tdyballoon
clrbln
Balloon Size=%d, %d
ePtr->mouse_flags=%x, MRPRESSED=%x
Map window image
Generated map window
ITEM=%x, startTool=%x
tileNum=%d
ANTMENU
L: item=%x, openSub=%x
E=%x, openSub=%x
Last sub state = %x
X: item=%x, openSub=%x
~zwsolhea^[
<@DHLPTX\`dhlptx|
[^aehloswz~
|xtplhd`\XTPLHD@<
TPLHD@<84.,
,.48<@EINRV
<@DIII]c
]IID@
CHEAT %d
Load Game
game not loaded
CITYMCRP
Save Game
lastFileName==%s
OVERWRITE
TOTALLEN=%u
COPYING DATA INTO BUFFER!
WRITING
Saved correctly
File list
Cannot read drive %c
Can't read disk
Aborting
Cannot read drive %c
<PATH=%s>
%s*.*
*.ant
%-12s
0No Files
%d:%s
-> %s
Event=%x,%x
DEFAULT:Event=%x,%x
%c:%s
CHDIR(%s)
Filename:%s
PathName=%s, iniPath=%s
:;,.=+-_\/*
Unpause
Pause
FREE+SOFT AT INITMAP=%ld
Generated buffer window
Map window image
Generated map window
Draw map - yard
YMAXCOUNT =%d!!!!!
GOT Scenario %x
StrnID=%d
Pict=%d
line %d:%s
PictStrnDialog: %d
MePLane=%d
Malloc
()*+,-./0
1234567
EVENT=%x
FLUSH
User Request
Keypress=%x
COMMAND KEY:%c
%ld bytes @ startup
%ld bytes free now
%ld bytes discardable stuff
%ld AVAILABLE
KEYEVENT=%x, %x
ePtr->mouse_flags=%x
MapPlane=%d
len = %d
%d(%d)(%d): [%s]
ERROR: Cannot file text rez(%s)
DisplayCard(cardRez)(%d)
Card Resource not found.
INFO TEXT REC: %d
Unknown Resource.
AnimYellowInsane
MemPunt @%s after %s
RALLOCXX
DISCARDED
SYSTEM
Free illegal type
ralloc.dmp
Ralloc dump at %s, %s
By handles:
By location:
%p->%p:
size=%x, %ld, type=%c%s
age=%5ld name=%s
DiscardEntry
ALL MEMORY HANDLES USED
OLDESTH=%p
<SPLIT>
<WHOLE>
Cannot find a big enough space! (%ld bytes)
<FO2>
ALLOC 0 BYTES
NOEMS - memPtr=%p
RL5: Invalid handle index
LOCKED FREED %s
RL5: Invalid handle index
REALLOC 0 BYTES
{DIS}
Lock depth exceeded 100
Locked discarded!!
RU: Bad handle in unlock
RL5: Invalid handle index
malloc
_ffree: Could not find memory ptr
_frealloc: Could not find memory ptr
EMS Error
%s.ndx
Index file missing
WINLABEL
BITMAP
CSRMASK
CSRPIC
ANIMDLT
SCREEN
PALETTE
CARDTITLE
STYLE
U%s,%d
%d:%d
DBRecall Unpack error!!! - object=%d, type=%d
Attempt to ADD during a READ-ONLY run
Attempt to DELETE during a READ-ONLY run
Attempt to PACK during a READ-ONLY run
DBEMSXXX
DBEMSLIST
Handle mismatch
Database error
Out of handles.
%s.dat
Cannot create data file.
for more information.
Dos error: %d: %s
%s.dat
Cannot open database %s
Purge attempt with database closed
Release %d not in cache!
Unhook attempt with database closed
cachetable
no memory over 0 in age!
MouseHide < 0
HPKMGOIQLRS9;
HotBox overflow
Continue
Continue
%s: Are you sure?
Cancel
Retry
Cancel
Unknown unit
Drive not ready
Unknown command
Data error (bad CRC)
Bad request structure length
seek error
unknown media type
sector not found
printer out of paper
write fault
read fault
general failure
abort request
GSaveRect
Color picture in mono file
PutPackedBuf
GPutPacked
PutPackedBuf
WARNING: i>=MAXWINDOWS-1 in KillWin(%d)
SetWin: Window open but clip is NULL
ERROR MEMNULL winClipHandle
tmprects
clipout
winClipList
C097: Clip overflow %d
C098: Clip overflow %d
winClipList
C099: Clip overflow %d
subinclude
CL074:Temp clip overflow in SubInclude
Sub include NULL rect!!
Clip out of memory!!!
subinclude
CL074:Temp clip overflow in SubInclude
Sub include NULL rect!!
Clip out of memory!!!
subexclude
CL074:Temp clip overflow in SubExclude
Sub exclude NULL rect!!
Clip out of memory!!!
include
CL174:Temp clip overflow in SubExclude
include NULL rect!!
clip_Push
Cannot allocate memory in clip_Push
Error CL98463: Pushed it too far
Clip_Pop w/ nothing on the stack: E9073
Could not find fonts
Could not find fonts!
Font loaded!
Could not find fonts
Could not find fonts!
FATAL: %s
%%-%ds
%%c%%-%ds
Menu data too long
MSMOUSE
language.dat
language
shared
lrshare
Cannot load menu
bmcmBad 'Display Mode' in configuration file
Hires EGA
Tandy
Hercules
Lores EGA
Mono EGA
MCGA/VGA Color
MCGA/VGA mono
VGA Color
hcega
tdyga
lcega
hcega
MDA system detected. Cannot run graphics.
Couldn't load resource %x,%x
CANNOT LOAD WINDOW %03x
Cannot load resource
please try another
Could not load purge list
!!W:%x
QQW:%x
##W:%x
**W:%x
w:%x, i:%x
xxw:%x, i:%x
zzw:%x, i:%x
ppw:%x, i:%x
%%w:%x, i:%x
!!i:%x
@@i:%x
formatStr
formatStr
Illegal win num %x at lock
win_Lock > 10 levels deep!!
Attemp to unlock when discarded win %x
Bad object type in win_DrawElevator
eleTop= %d
PrevListLine punt
SS=%d, line=%d
Bad object type in win_ProcSliderEvent
fontBuf
fontBuf
fontBuf
font1
font2
font3
font4
WINDOW %x NOT LOCKED DURING CALL TO win_WinRectAddr!!
WINDOW %x NOT LOCKED DURING CALL TO win_ObjAddr!!
GGetPic - trans @ %d, %d (%d, %d), bytes=%u
GPutPic size AA %d, %d, tag=%x
GPutPic size %d, %d, tag=%x
Couldn't find resize icoN!!
FontHeader
FONTIMAGE
locTable
owTable
Printset @ %s %d objects
Buffer in set too small!!
animbufs
anim_Remove objects not found
anim_hide object not found
anim_hide object not found
anim_Remove objects not found
animHandle
animobjs
to flush
Bad pic type in ANIM.C
...
music failure
Seg=%x, buf=%p, handle=%p
_C_FILE_INFO=
0PX
000WP
(null)
USunMonTueWedThuFriSat
JanFebMarAprMayJunJulAugSepOctNovDec
Error 0
No such file or directory
Arg list too long
Exec format error
Bad file number
Not enough core
Permission denied
File exists
Cross-device link
Invalid argument
Too many open files
No space left on device
Math argument
Result too large
Unknown error
...
P<<NMSG>>
R6000
- stack overflow
R6003
- integer divide by 0
R6009
- not enough space for environment
run-time error
R6002
R6001
- null pointer assignment
...
06/13/90
==========

Output: 828 strings

It is interesting looking at some of the common problems that plagued early computer programs.

Wednesday, October 26, 2011

Sleep Sort

A kind of neat sorting algorithm if, for some reason, you had a machine that had unlimited interrupts, and unlimited running time, but had almost no CPU power to do other sorts, perhaps biological computing?


 1
2
3
4
5
6
7
8
9
10
11
#!/bin/bash
function f() {
sleep "$1"
echo "$1"
}
while [ -n "$1" ]
do
f "$1" &
shift
done
wait

Wednesday, October 19, 2011

Real Time Header Editing with Paros

Sometimes it is advantageous to modify headers as they are sent by a website, let's say you're looking for SQL injection vulnerabilities on your server or want to see what happens when non validated information escapes your javascript validation.

But what to use to easily view/modify/drop headers and incoming connections?

Introducing Paros, a web proxy written entirely in Java so it is cross platform, just set your browser or operating system to use 127.0.0.1:8080 as a proxy after firing it up. Basic functionality shown after the break.





Paros has many advanced features, I'll just discuss trapping requests and responses and using the history.


When you begin using Paros, any websites you visit will be left in the sites pane on the left hand side, from there you can click on each requested item for a site, and view the reqest and response for that item in the appropriate tabs.

If you want to modify the request or response, head over to the Trap tab, and check the appropriate boxes. If there is POST information sent along with the HTTP request, then it will be displayed below; I changed the view to "tabular" to more easily see/edit the data.

Aside: In this instance my browser is pointed at a site called conjuguemos.com teachers us it to test their students' ability to conjugate verbs, when students are done, unencrypted javascript headers are sent containing the session information, not very bright in my book, as any student could quickly and easily modify their scores to have 10,000 words right in thirty-eight seconds.

Once a header/response is trapped you may edit it by hand in the panel given, or drop it (say you wanted to test a timeout for your web widget, like a Twitter reader if Twitter were down).

Sessions can also be saved and restored through the File menu. Overall the tool is easy to use, fast to learn, and very effective.

Sunday, October 16, 2011

Ubuntu 11.10

Whenever a new OS comes out, there are always improvements, but lots of compromises. After trying out Ubuntu 11.10 I'm, quite frankly, disappointed.

The Good
The upgrade was painless, the boot time is much improved, and the lightDM looks beautiful.

The Bad
Grub has failed on me four times in two days, I have to hit the power button and start again to get a non-blinky cursor.

In two days of use, my mouse cursor has frozen on screen twice, I can't move it afterwards.

The new software center only allows you to install one piece of software before searching for another.

Unity (still) doesn't allow me to drag and drop shortcuts to my folders to the dock.

The Ugly
Nautilus (if you still are nautilus beneath your new exterior) doesn't show breadcrumbs unless in full screen.

The menu in the upper right of the screen is excessivly large for people like me that never use the person switcher, a desktop email client, or social networks outside our browsers.

You can't uninstall zeitgeist without completely removing the ability to launch applications.

After installing compiz-config-settings manager to try and resize the dock to be smaller, and to show up immediately when I mouse over to it (so it doesn't destroy my workflow) I couldn't, compiz also disabled by alt+tab, and my Aero snap.

With the newest gnome desktop build (ubuntu-classic) the desktop looks like crap, lots of vertical lines, and no configuration for how the layout is done (i.e. I want to remove the bottom panel and install docky without going to gconf, that would be fine).

How to fix it
Install xfce or KDE (yes, even I am willing to switch to KDE after

Friday, September 30, 2011

Conway's Game of Life

Conway's Game of Life is a classic computer game invented to simulate a self-replicating machine on a very simple grid. I decided to write a simple implementation, it is after the break.

Just as in life, simple structures form to generate complex behaviours in these simple systems that can be generated very quickly on modern computers.

The unofficial open source icon is even inspired from the game:

The "Glider" a structure that "flies" around in the game.



The program in action.




This is a simple implementation of the Game of Life:

#!/usr/bin/env python
''' Conway's game of life in python.

Copyright 2011 Joseph Lewis MIT License

1. Any live cell with fewer than two live neighbours dies.
2. Any live cell with two or three live neighbours lives.
3. Any live cell with more than three live neighbours dies.
4. Any dead cell with exactly three live neighbours becomes alive.
-- Wikipedia.
'''

import random
import time


ROWSCOLS = None # Number of rows and columns in the game.
grid = None # A grid with 0s and 1s, 0 is no cell, 1 is a cell.

def printgrid():
'''Prints out the given grid after blanking the screen.'''

print 10 * "\n"
print "+"+ "-" * ROWSCOLS + "+" # Border top

for r in grid:
line = "|" # Border Left
for c in r:
if c:
line += "0" # Cell
else:
line += " "
print line + "|" # Border Right

print "+"+ "-" * ROWSCOLS + "+" # Border Bottom


def do_life():
'''calculates the next grid'''

grid_len = range(0,ROWSCOLS)
grid2 = [[0 for i in grid_len] for i in grid_len]

for r in grid_len:
for c in grid_len:
if grid[r][c]:
increment(grid2, r, c)

for r in grid_len:
for c in grid_len:
g = grid2[r][c]
if g < 2 or g > 3:
grid[r][c] = 0
elif g == 3:
grid[r][c] = 1


def increment(grid, row, col):
'''Increments the life count for all cells around this one.'''
for r,c in [(row, col+1), (row, col-1),
(row+1, col), (row-1, col),
(row+1, col+1), (row-1, col-1),
(row+1, col-1), (row-1, col+1)]:
grid[r % ROWSCOLS][c % ROWSCOLS] += 1


def start_life():
'''Randomly fills the game.'''
for i in range(0, ROWSCOLS**2 / 4):
r = random.randint(0, ROWSCOLS-1)
c = random.randint(0, ROWSCOLS-1)
grid[r][c] = 1


if __name__ == "__main__":
try:
print("=Conways Game of Life=")
k = int(input("How many turns should I run before I reset (200 is nice)? "))
ROWSCOLS = int(input("How many rows and columns of life should there be (35 is nice)? "))
t = float(input("How many frames per secould should I run (10 is nice)? "))

except Exception:
print("Hey! Play nice, enter a number!")
exit(1)

grid = [[0 for i in range(0, ROWSCOLS)] for i in range(0, ROWSCOLS)]

j = 0
while True:
if not j:
start_life()
j = k # Number of turns to run before restart.
j -= 1
do_life()
printgrid()
time.sleep(1/t)

Tuesday, September 27, 2011

Automatic Summary of a Document

It is the primary dogma of computer science that computer scientists are lazy. That is why they work with tiny stupid slaves all day, forcing them to do their bidding. Another strange and seemingly unrelated fact is that reading documents and providing summaries is mind numbingly boring.

Some years ago the folks at Microsoft made a little addition to their word processor that would allow it to automatically generate summaries. This feature is little used and little needed. However it is still fun, and could have the potential to provide some nice summaries to sites like Manybooks.net which would definitely benefit from some documents of theirs (especially lesser known books) being summarized.

Thinking about having your computer generate nice summaries and actually having it do them are two entirely different things it turns out. However in more structured documents, such as non-fiction some simple math can work out the problem of which sentences are important and which are not.

Before we can worry about that, let's first find the important words in a document. One of the better ways, is simply counting every occurrence of every word, then removing fairly common words; I removed the top 100 in the example code given by setting their scores to -1, although different values can be tried to attempt to prove a sentence better or not. For example the sentence, "Call me Ishmael." would likely score rather low, though it is one of the most known sentences of all time. If the word "me" were set to a higher value (as it is in the top 100) the sentence may be included in a summary. 

Sentences can then be split up at punctuation marks (. ! ?) and a total for the sentence can be generated by adding all of the word scores and dividing by the number of words in the sentence, thus creating an average word score for the sentence.

Once all sentences have been scored, the top ones are taken out, and put in order again, thus preserving some of the structural integrity of the original document, then can be returned.

In a few of my tests, this was not enough however. I also added a multiplier that took in to account the average length of words in the sentence; as either main points are generally made well, or supporting evidence doesn't get its credit.
Talk is cheap, show me the code!

#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''Provides An Automated Summary of a document.

Copyright 2011 Joseph Lewis

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of the nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
'''

import sys
import re
import argparse

__author__ = "Joseph Lewis"
__copyright__ = "Copyright 2011, Joseph Lewis"
__license__ = "BSD"

TOP_HUNDRED_SCORE = 0
EN_TOP = '''the be to of and a in that have i it for not on with he as you do at
this but his by from they we say her she or an will my one all would
there their what so up out if about who get which go me when make can
like time no just him know take person into year your good some could
them see other than then now look only come its over think also back
after use two how our work first well way even new want because any
these give day most us'''


text = """Fourscore and seven years ago our fathers brought forth upon this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battlefield of that war. We have come to dedicate a portion of that field as a final resting-place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But in a larger sense we cannot dedicate, we cannot consecrate, we cannot hallow this ground. The brave men, living and dead, who struggled here, have consecrated it far above our poor power to add or detract. The world will little note, nor long remember, what we say here. But it can never forget what they did here. It is for us, the living, rather to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us, that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion; that we here highly resolve that these dead shall not have died in vain; that this nation, under God, shall have a new birth of freedom, and that government of the people, by the people, and for the people, shall not perish from the earth. """


def chars_per_word(sentence):
'''Returns the average characters per word.'''
return float(len(sentence)) / float(len(sentence.split(' ')))

def clean_text(text,leavepunct=False):
# Strip to a-z A-Z (TODO: I18N).
text = text.replace("\n", " ")
if leavepunct:
return re.sub("[^a-z.!\? ]", " ", text.strip().lower())
return re.sub("[^a-z ]", " ", text.strip().lower())

def word_frequency(text):
'''Counts the frequenc of words in the piece of text, and returns
a dict for each word (a-z) lowercased where the key represents the
word and the number of times the word is found the value.

'''
words = {}


tmp = clean_text(text)

# Cut to words.
for word in tmp.split():
if word in words:
words[word] += 1
else:
words[word] = 1

return words

def set_words_to_value(word_dict, top=EN_TOP, value=TOP_HUNDRED_SCORE):
'''Sets the given words to the given value in the given word_dict.
The default is to set the top hundred words in english to the
TOP_HUNDRED_SCORE.

'''

j = word_frequency(top).keys() # get the words in the top hundred text.
# remove the top 100 words
for w in j:
words[w] = value

def sentences_in(text):
'''Returns a list of sentences in the text.

'''
text = text.replace("\n", " ")
return re.split('[\?.!]', text)


def score_sentence(sentence, words):
'''The scoring function, given a dictoinary of word:value pairs,
creates a score for each sentence.

'''
# Score value based upon words and frequencies of those words.
tmp = clean_text(sentence)
total = 0

for word in tmp.split():
if word in words:
total += words[word]

# Make the total in to a percentage.
try:
total /= float(len(tmp.split()))

# Secret ingredient, higher characters per word generally means
# more important sentence.
total *= chars_per_word(tmp)

return total
except ZeroDivisionError:
return -100


def top_sentences(text, word_freq_dict):
'''Returns a sorted list with the top rated sentences first, that
contains the tuples (score, sentence_location, sentence_text)

For example, the sentence "Call me Ishmael" would come back:
(1.8304283, 0, "Call me Ishmael.")

0 would mean it was the first sentence in the text, and it had a
score of 1.83...

'''
sentences = [] # array of tuples (total score, sentence num, sentence text)
known_sentences = set()
currs = 0
for s in sentences_in(text):
currs += 1 # Increment the current sentence.
total = score_sentence(s, words) # Don't add duplicate sentences.

s = s.strip()
if s not in known_sentences:
sentences.append((total, currs, s+"."))
known_sentences.add(s)

# Sort highest rated sentences to lowest.
sentences = sorted(sentences)
sentences.reverse()

return sentences


def __combine_summary(summary):
'''Combines a summary in to a meaningful paragraph. A summary is an
array of tuples with values of (position of sentence, text). This
creates better summary paragraphs by ordering important sentences
in the order they were originally.

'''
summary = sorted(summary)

paragraph = ""
for l, s in summary:
paragraph += s + " "

return paragraph

def percentage_top(percentage, sorted_sentences):
'''Returns the top rated percentage of the given sentences, in
paragraph form.

i.e. to get the top 25 percent of a text, call with:
percentage_top(25, )

'''

percentage = percentage / 100.0

num_sentences = int(len(sorted_sentences)*percentage) + 1

if num_sentences >= len(sorted_sentences):
num_sentences = len(sorted_sentences)

# Create summary (top x sentences)
summary = []
for j in range(num_sentences):
t,l,s = sorted_sentences[j]
summary.append((l,s))


return __combine_summary(summary)



def top_words(num_words, sorted_sentences):
'''Returns x number of words from the top sentences.'''

summary = []
words = 0
try:
for t,l,s in sorted_sentences:
if words >= num_words:
break

words += len(s.split())
summary.append((l,s))


except IndexError:
pass

return __combine_summary(summary)


if __name__ == "__main__":

parser = argparse.ArgumentParser(description='Creates summaries from documents by magic.')
parser.add_argument('-p','--percent', type=int, default=None,
help='the percent of the original document to give as a summary')
parser.add_argument('-w', '--words', type=int, default=None,
help='the number of words to output as a summary')

parser.add_argument('PATH', help='the path of the textfile to read from')

args = parser.parse_args()

try:
if args.PATH:
text = open(args.PATH).read()
words = word_frequency(text)

set_words_to_value(words)
sentences = top_sentences(text, words)

if args.words:
print top_words(args.words, sentences)
if args.percent:
if args.words:
print "\n\n"
print percentage_top(args.percent, sentences)

except IOError:
print "File given can't be found."

Monday, September 19, 2011

XLSX Parsing in Python

A while ago I had to throw together a program that read from .xslx files. Knowing essentially nothing about them, other than the fact that they were xml files in a zip, I was able to hack together something that worked kind of like the default CSV reader in Python. For every row in the file, it returns an array of cell contents.

In this case only Text and Numeric cells will render properly, formulas and such won't work. Also, the only sheet read is sheet1.

Okay, a little about the file structure of xlsx: Sheet info is stored in xl/worksheets/sheetX.xml, which looks essentially like this:
<sheetdata>
<row collapsed="false" customformat="false" customheight="false" hidden="false" ht="12.8" outlinelevel="0" r="1">
<c r="A1" s="0" t="s"><v>0</v></c>
<c r="B1" s="0" t="n"><v>123</v></c>
</row>
</sheetdata>

Cells (c tags) have several types (denoted by the t attribute) "s" is a string, and "n" is a number. The string's attribute is an index for a string in the xl/sharedStrings.xml file, which removes duplicate strings from the saved data.


''' Opens .xlsx files as if they were .csv files. '''
import zipfile
import xml.dom.minidom
import re

class XLSXReader:
rows=[]

def _nodeText(self, node):
return "".join(t.nodeValue for t in node.childNodes if t.nodeType == t.TEXT_NODE)

def _get_col_num(self, col):

strpart = col.attributes['r'].value
colnum = re.sub('[^A-Z]', '', strpart.upper().strip())

c = 0
for char in colnum:
c += ord(char)

c -= (65) # ASCII to number

print("Colnum for '%s' is %s" % (strpart, c))

return c


def __init__(self, filename):
shared_strings = []
self.rows = []
myFile = zipfile.ZipFile(filename)

# Read the shared strings file.
share = xml.dom.minidom.parseString(myFile.read('xl/sharedStrings.xml'))
j = share.getElementsByTagName("t")

for node in j:
shared_strings.append(self._nodeText(node))

sheet = xml.dom.minidom.parseString(myFile.read('xl/worksheets/sheet1.xml'))
sheetrows = sheet.getElementsByTagName("row")
for row in sheetrows:
cols = row.getElementsByTagName("c")

largest_col_num = 0
for col in cols:
colnum = self._get_col_num(col)
if colnum > largest_col_num:
largest_col_num = colnum

thiscol = ['']*(largest_col_num + 1)

for col in cols:
value = ""
try:
value = self._nodeText(col.getElementsByTagName('v')[0])
except IndexError:
continue

#Get col number (A=0, B=1, etc. up to AA)

colnum = self._get_col_num(col) # ASCII to number

try:
if col.attributes['t'].value == 's':
thiscol[colnum] = shared_strings[int(value)]
else:
thiscol[colnum] = value
except KeyError:
thiscol[colnum] = value
self.rows.append(thiscol)

myFile.close()

def __getitem__(self, i):
return self.rows[i]

To use the code, call the class with a file name, then use a for - each loop to iterate over the rows.

Update 2012-09-14: Mati noted that in some situations the xlsx format didn't have the proper attribute to indicate a number, and sent in a patch. Thanks!

Sunday, September 18, 2011

Fast Ubuntu Tweaks

Over the past few years I went from absolutely loving Ubuntu to liking it, to almost disliking it now. Yes, it is a great distribution; but the teams that make it seem to not be taking UNIX design philosophy seriously.


Ubuntu seems to be building up a pillar of their own custom software. I was disappointed when they removed the option to change the GDM greeter a few years ago. Since then they seem to be obsessed with cohesion by removing options and doing half-ass jobs on building interfaces. (Have you noticed that when you right the indicator applet panel, the gradient is duplicated all the way down the menu?)

Unity is an excellent example of this; sure the desktop needed an overhaul, but don't push untested beta version software to users that has essentially no built-in way to change its options. And let's not even start with the problems it gives me on with my Nvidia!

But talk is cheap, so I'll show you the code. Three lines and a few config options should fix things up:

sudo apt-get purge ubuntuone*
sudo apt-get remove indicator-me indicator-messages
sudo apt-get purge zeitgeist*

Those three lines remove the ubuntuone client (do you use it? is it worth the program running in the background and trying to phone home?) Removes the indicator applet (that thing with the mail button, and your social media accounts), and removes zeitgeist (who uses it, really?)

Next: edit your "Startup Applications", what is this, Windows? Few people want VNC to be started when their computer does, if they do, they'll enable it manually.

With all this hate out, I'll say that my next computer system will either be Elementary OS (cohesive, yet follows UNIX design) or Windows 8.