Friday, September 30, 2011

Conway's Game of Life

Conway's Game of Life is a classic computer game invented to simulate a self-replicating machine on a very simple grid. I decided to write a simple implementation, it is after the break.

Just as in life, simple structures form to generate complex behaviours in these simple systems that can be generated very quickly on modern computers.

The unofficial open source icon is even inspired from the game:

The "Glider" a structure that "flies" around in the game.



The program in action.




This is a simple implementation of the Game of Life:

#!/usr/bin/env python
''' Conway's game of life in python.

Copyright 2011 Joseph Lewis MIT License

1. Any live cell with fewer than two live neighbours dies.
2. Any live cell with two or three live neighbours lives.
3. Any live cell with more than three live neighbours dies.
4. Any dead cell with exactly three live neighbours becomes alive.
-- Wikipedia.
'''

import random
import time


ROWSCOLS = None # Number of rows and columns in the game.
grid = None # A grid with 0s and 1s, 0 is no cell, 1 is a cell.

def printgrid():
'''Prints out the given grid after blanking the screen.'''

print 10 * "\n"
print "+"+ "-" * ROWSCOLS + "+" # Border top

for r in grid:
line = "|" # Border Left
for c in r:
if c:
line += "0" # Cell
else:
line += " "
print line + "|" # Border Right

print "+"+ "-" * ROWSCOLS + "+" # Border Bottom


def do_life():
'''calculates the next grid'''

grid_len = range(0,ROWSCOLS)
grid2 = [[0 for i in grid_len] for i in grid_len]

for r in grid_len:
for c in grid_len:
if grid[r][c]:
increment(grid2, r, c)

for r in grid_len:
for c in grid_len:
g = grid2[r][c]
if g < 2 or g > 3:
grid[r][c] = 0
elif g == 3:
grid[r][c] = 1


def increment(grid, row, col):
'''Increments the life count for all cells around this one.'''
for r,c in [(row, col+1), (row, col-1),
(row+1, col), (row-1, col),
(row+1, col+1), (row-1, col-1),
(row+1, col-1), (row-1, col+1)]:
grid[r % ROWSCOLS][c % ROWSCOLS] += 1


def start_life():
'''Randomly fills the game.'''
for i in range(0, ROWSCOLS**2 / 4):
r = random.randint(0, ROWSCOLS-1)
c = random.randint(0, ROWSCOLS-1)
grid[r][c] = 1


if __name__ == "__main__":
try:
print("=Conways Game of Life=")
k = int(input("How many turns should I run before I reset (200 is nice)? "))
ROWSCOLS = int(input("How many rows and columns of life should there be (35 is nice)? "))
t = float(input("How many frames per secould should I run (10 is nice)? "))

except Exception:
print("Hey! Play nice, enter a number!")
exit(1)

grid = [[0 for i in range(0, ROWSCOLS)] for i in range(0, ROWSCOLS)]

j = 0
while True:
if not j:
start_life()
j = k # Number of turns to run before restart.
j -= 1
do_life()
printgrid()
time.sleep(1/t)

Tuesday, September 27, 2011

Automatic Summary of a Document

It is the primary dogma of computer science that computer scientists are lazy. That is why they work with tiny stupid slaves all day, forcing them to do their bidding. Another strange and seemingly unrelated fact is that reading documents and providing summaries is mind numbingly boring.

Some years ago the folks at Microsoft made a little addition to their word processor that would allow it to automatically generate summaries. This feature is little used and little needed. However it is still fun, and could have the potential to provide some nice summaries to sites like Manybooks.net which would definitely benefit from some documents of theirs (especially lesser known books) being summarized.

Thinking about having your computer generate nice summaries and actually having it do them are two entirely different things it turns out. However in more structured documents, such as non-fiction some simple math can work out the problem of which sentences are important and which are not.

Before we can worry about that, let's first find the important words in a document. One of the better ways, is simply counting every occurrence of every word, then removing fairly common words; I removed the top 100 in the example code given by setting their scores to -1, although different values can be tried to attempt to prove a sentence better or not. For example the sentence, "Call me Ishmael." would likely score rather low, though it is one of the most known sentences of all time. If the word "me" were set to a higher value (as it is in the top 100) the sentence may be included in a summary. 

Sentences can then be split up at punctuation marks (. ! ?) and a total for the sentence can be generated by adding all of the word scores and dividing by the number of words in the sentence, thus creating an average word score for the sentence.

Once all sentences have been scored, the top ones are taken out, and put in order again, thus preserving some of the structural integrity of the original document, then can be returned.

In a few of my tests, this was not enough however. I also added a multiplier that took in to account the average length of words in the sentence; as either main points are generally made well, or supporting evidence doesn't get its credit.
Talk is cheap, show me the code!

#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''Provides An Automated Summary of a document.

Copyright 2011 Joseph Lewis

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of the nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
'''

import sys
import re
import argparse

__author__ = "Joseph Lewis"
__copyright__ = "Copyright 2011, Joseph Lewis"
__license__ = "BSD"

TOP_HUNDRED_SCORE = 0
EN_TOP = '''the be to of and a in that have i it for not on with he as you do at
this but his by from they we say her she or an will my one all would
there their what so up out if about who get which go me when make can
like time no just him know take person into year your good some could
them see other than then now look only come its over think also back
after use two how our work first well way even new want because any
these give day most us'''


text = """Fourscore and seven years ago our fathers brought forth upon this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battlefield of that war. We have come to dedicate a portion of that field as a final resting-place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But in a larger sense we cannot dedicate, we cannot consecrate, we cannot hallow this ground. The brave men, living and dead, who struggled here, have consecrated it far above our poor power to add or detract. The world will little note, nor long remember, what we say here. But it can never forget what they did here. It is for us, the living, rather to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us, that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion; that we here highly resolve that these dead shall not have died in vain; that this nation, under God, shall have a new birth of freedom, and that government of the people, by the people, and for the people, shall not perish from the earth. """


def chars_per_word(sentence):
'''Returns the average characters per word.'''
return float(len(sentence)) / float(len(sentence.split(' ')))

def clean_text(text,leavepunct=False):
# Strip to a-z A-Z (TODO: I18N).
text = text.replace("\n", " ")
if leavepunct:
return re.sub("[^a-z.!\? ]", " ", text.strip().lower())
return re.sub("[^a-z ]", " ", text.strip().lower())

def word_frequency(text):
'''Counts the frequenc of words in the piece of text, and returns
a dict for each word (a-z) lowercased where the key represents the
word and the number of times the word is found the value.

'''
words = {}


tmp = clean_text(text)

# Cut to words.
for word in tmp.split():
if word in words:
words[word] += 1
else:
words[word] = 1

return words

def set_words_to_value(word_dict, top=EN_TOP, value=TOP_HUNDRED_SCORE):
'''Sets the given words to the given value in the given word_dict.
The default is to set the top hundred words in english to the
TOP_HUNDRED_SCORE.

'''

j = word_frequency(top).keys() # get the words in the top hundred text.
# remove the top 100 words
for w in j:
words[w] = value

def sentences_in(text):
'''Returns a list of sentences in the text.

'''
text = text.replace("\n", " ")
return re.split('[\?.!]', text)


def score_sentence(sentence, words):
'''The scoring function, given a dictoinary of word:value pairs,
creates a score for each sentence.

'''
# Score value based upon words and frequencies of those words.
tmp = clean_text(sentence)
total = 0

for word in tmp.split():
if word in words:
total += words[word]

# Make the total in to a percentage.
try:
total /= float(len(tmp.split()))

# Secret ingredient, higher characters per word generally means
# more important sentence.
total *= chars_per_word(tmp)

return total
except ZeroDivisionError:
return -100


def top_sentences(text, word_freq_dict):
'''Returns a sorted list with the top rated sentences first, that
contains the tuples (score, sentence_location, sentence_text)

For example, the sentence "Call me Ishmael" would come back:
(1.8304283, 0, "Call me Ishmael.")

0 would mean it was the first sentence in the text, and it had a
score of 1.83...

'''
sentences = [] # array of tuples (total score, sentence num, sentence text)
known_sentences = set()
currs = 0
for s in sentences_in(text):
currs += 1 # Increment the current sentence.
total = score_sentence(s, words) # Don't add duplicate sentences.

s = s.strip()
if s not in known_sentences:
sentences.append((total, currs, s+"."))
known_sentences.add(s)

# Sort highest rated sentences to lowest.
sentences = sorted(sentences)
sentences.reverse()

return sentences


def __combine_summary(summary):
'''Combines a summary in to a meaningful paragraph. A summary is an
array of tuples with values of (position of sentence, text). This
creates better summary paragraphs by ordering important sentences
in the order they were originally.

'''
summary = sorted(summary)

paragraph = ""
for l, s in summary:
paragraph += s + " "

return paragraph

def percentage_top(percentage, sorted_sentences):
'''Returns the top rated percentage of the given sentences, in
paragraph form.

i.e. to get the top 25 percent of a text, call with:
percentage_top(25, )

'''

percentage = percentage / 100.0

num_sentences = int(len(sorted_sentences)*percentage) + 1

if num_sentences >= len(sorted_sentences):
num_sentences = len(sorted_sentences)

# Create summary (top x sentences)
summary = []
for j in range(num_sentences):
t,l,s = sorted_sentences[j]
summary.append((l,s))


return __combine_summary(summary)



def top_words(num_words, sorted_sentences):
'''Returns x number of words from the top sentences.'''

summary = []
words = 0
try:
for t,l,s in sorted_sentences:
if words >= num_words:
break

words += len(s.split())
summary.append((l,s))


except IndexError:
pass

return __combine_summary(summary)


if __name__ == "__main__":

parser = argparse.ArgumentParser(description='Creates summaries from documents by magic.')
parser.add_argument('-p','--percent', type=int, default=None,
help='the percent of the original document to give as a summary')
parser.add_argument('-w', '--words', type=int, default=None,
help='the number of words to output as a summary')

parser.add_argument('PATH', help='the path of the textfile to read from')

args = parser.parse_args()

try:
if args.PATH:
text = open(args.PATH).read()
words = word_frequency(text)

set_words_to_value(words)
sentences = top_sentences(text, words)

if args.words:
print top_words(args.words, sentences)
if args.percent:
if args.words:
print "\n\n"
print percentage_top(args.percent, sentences)

except IOError:
print "File given can't be found."

Monday, September 19, 2011

XLSX Parsing in Python

A while ago I had to throw together a program that read from .xslx files. Knowing essentially nothing about them, other than the fact that they were xml files in a zip, I was able to hack together something that worked kind of like the default CSV reader in Python. For every row in the file, it returns an array of cell contents.

In this case only Text and Numeric cells will render properly, formulas and such won't work. Also, the only sheet read is sheet1.

Okay, a little about the file structure of xlsx: Sheet info is stored in xl/worksheets/sheetX.xml, which looks essentially like this:
<sheetdata>
<row collapsed="false" customformat="false" customheight="false" hidden="false" ht="12.8" outlinelevel="0" r="1">
<c r="A1" s="0" t="s"><v>0</v></c>
<c r="B1" s="0" t="n"><v>123</v></c>
</row>
</sheetdata>

Cells (c tags) have several types (denoted by the t attribute) "s" is a string, and "n" is a number. The string's attribute is an index for a string in the xl/sharedStrings.xml file, which removes duplicate strings from the saved data.


''' Opens .xlsx files as if they were .csv files. '''
import zipfile
import xml.dom.minidom
import re

class XLSXReader:
rows=[]

def _nodeText(self, node):
return "".join(t.nodeValue for t in node.childNodes if t.nodeType == t.TEXT_NODE)

def _get_col_num(self, col):

strpart = col.attributes['r'].value
colnum = re.sub('[^A-Z]', '', strpart.upper().strip())

c = 0
for char in colnum:
c += ord(char)

c -= (65) # ASCII to number

print("Colnum for '%s' is %s" % (strpart, c))

return c


def __init__(self, filename):
shared_strings = []
self.rows = []
myFile = zipfile.ZipFile(filename)

# Read the shared strings file.
share = xml.dom.minidom.parseString(myFile.read('xl/sharedStrings.xml'))
j = share.getElementsByTagName("t")

for node in j:
shared_strings.append(self._nodeText(node))

sheet = xml.dom.minidom.parseString(myFile.read('xl/worksheets/sheet1.xml'))
sheetrows = sheet.getElementsByTagName("row")
for row in sheetrows:
cols = row.getElementsByTagName("c")

largest_col_num = 0
for col in cols:
colnum = self._get_col_num(col)
if colnum > largest_col_num:
largest_col_num = colnum

thiscol = ['']*(largest_col_num + 1)

for col in cols:
value = ""
try:
value = self._nodeText(col.getElementsByTagName('v')[0])
except IndexError:
continue

#Get col number (A=0, B=1, etc. up to AA)

colnum = self._get_col_num(col) # ASCII to number

try:
if col.attributes['t'].value == 's':
thiscol[colnum] = shared_strings[int(value)]
else:
thiscol[colnum] = value
except KeyError:
thiscol[colnum] = value
self.rows.append(thiscol)

myFile.close()

def __getitem__(self, i):
return self.rows[i]

To use the code, call the class with a file name, then use a for - each loop to iterate over the rows.

Update 2012-09-14: Mati noted that in some situations the xlsx format didn't have the proper attribute to indicate a number, and sent in a patch. Thanks!

Sunday, September 18, 2011

Fast Ubuntu Tweaks

Over the past few years I went from absolutely loving Ubuntu to liking it, to almost disliking it now. Yes, it is a great distribution; but the teams that make it seem to not be taking UNIX design philosophy seriously.


Ubuntu seems to be building up a pillar of their own custom software. I was disappointed when they removed the option to change the GDM greeter a few years ago. Since then they seem to be obsessed with cohesion by removing options and doing half-ass jobs on building interfaces. (Have you noticed that when you right the indicator applet panel, the gradient is duplicated all the way down the menu?)

Unity is an excellent example of this; sure the desktop needed an overhaul, but don't push untested beta version software to users that has essentially no built-in way to change its options. And let's not even start with the problems it gives me on with my Nvidia!

But talk is cheap, so I'll show you the code. Three lines and a few config options should fix things up:

sudo apt-get purge ubuntuone*
sudo apt-get remove indicator-me indicator-messages
sudo apt-get purge zeitgeist*

Those three lines remove the ubuntuone client (do you use it? is it worth the program running in the background and trying to phone home?) Removes the indicator applet (that thing with the mail button, and your social media accounts), and removes zeitgeist (who uses it, really?)

Next: edit your "Startup Applications", what is this, Windows? Few people want VNC to be started when their computer does, if they do, they'll enable it manually.

With all this hate out, I'll say that my next computer system will either be Elementary OS (cohesive, yet follows UNIX design) or Windows 8.

Friday, September 16, 2011

GSoC 2011

I was pretty much missing over the summer, due mostly to my working for GSoC 2011. I was notified yesterday that the bulk of the code I had written was merged in to the master branch.

No, it isn't a hack, but yes, it is cool :)

Thursday, September 8, 2011

Purely Pedantic Password Affirmation

PBKaC (Problem between keyboard and chair). Yes, people are the source of all problems in Computer Science, a computer does exactly what it is told to. But sometimes they can be the solution to problems too.

What if, when you entered a password, three things were sent back to the server?
  1. A password hash.
  2. A list of the times taken between keypress events for the password, hashed or something.
  3. Some identifying information for the computer, plugins, whatever.
The server could check the headers sent back to see if this is a common computer used by the user, and if so check the password and move on with its day.

The server could also check the times between keypresses in the password for relatively spaced times in the way the user normally enters the password, providing a fingerprint for a particular user.

Suppose it normally takes me 30ms to reach from "F" to "T", and fifteen to get from "O" to "."; if the password was entered differently the user should be redirected to a secondary question page; either they broke a hand, or someone else entered their password.

Think of it as non-random password screening. It would stop bots in their tracks, and would create only a small problem for users.

Tuesday, September 6, 2011

Page Speed Testing

While it is true that the average Internet user is no longer buzzing along at a healthy 56Kbps, that doesn't mean page load speed isn't important. Perhaps your viewers are on a spotty wifi connection, are using a phone to access your site, or really are using a modem.

 You might wan to try out Google's Page Speed lab. It checks for common problems with website page loading speed and gives examples of what can be fixed.

Speedy websites are fun to make, easy to maintain, and helpful to users. If you want to be proactive (before the site is built), here are some tips on how to build site speed in to the development cycle:


Lesson 1: Fun to make
1. Make a small bonfire out of your Dreamweaver, Rapidweaver, Publisher, and and (God forbid) Frontpage discs.
2. Run away before the smoke from burning discs gives you cancer.

3. Now, pull out your favorite text editor and start coding! I like Geany, because it auto-completes HTML tags, highlights CSS, JavaScript, and PHP embedded in HTML and a whole slew of other things, while being lightweight. As a side note, one of my favorite features is Ctrl + Shift + O, which opens the file that your cursor is over, say you had embedded a CSS file, placing the cursor over it and hitting the combo would open it up for editing.

4. Code like the wind!
5. Test.
6. Repeat 4 and 5 until you have the same page you did originally. I usually get the size down by three or four times by hand coding.
7. Run your site through the w3c validators and put a fancy badge on your page.

Where is the fun in this? Well, first of all, you know what every page does and have gained new knowledge in the fields of JS/CSS/HTML, soon enough, you'll be able to re-write entire sites in a few hours for great speed enhancement.

Lesson 2: Easy to maintain
Being your site is now W3C compliant and small, changes are really easy! Want all the text on your site to be purple and flashing? Change it through CSS in thirty seconds. Holiday themes are easy to make now!

What else can you do though? How about separating out the common stuff in each page to a separate php file? And the navigation to a nav generator? Now all of the pages on your site are updated at once when you upload a new one.

This isn't the best practice, but in my opinion, it works for small sites. Why not separate out variables, like phone numbers to a separate php file? Then you can just echo the variable wherever it is needed. Result: Want to change xxx-xxx-yyyy to xxx-xxx-yyyx? You can in a very short amount of time; and you don't have to worry about missing pages!

Lesson 3: Helpful to users
Users that have screen readers installed will thank you, along with users that are on spotty/slow connections. Oh, and your server admins will be happy too, imagine using half the bandwidth through smarter pages?

Sunday, September 4, 2011

The Entertainer

I received an Arduino in the mail a few months ago, and was playing around with some scripts on it to try things out. It isn't much, but it works; here is the intro to "The Entertainer" that will play on a small speaker:


"Entertainer.pde"
/*
Plays "The Entertainer"

circuit:
* 8-ohm speaker on digital pin 8

Based Upon

http://arduino.cc/en/Tutorial/Tone

*/
#include "pitches.h"

// notes in the melody:
int melody[] = {
NOTE_D3, NOTE_DS3,NOTE_E3, NOTE_C4, NOTE_E3, NOTE_C4, NOTE_E3, NOTE_C4, NOTE_C4,NOTE_D4,NOTE_DS4,NOTE_E4,NOTE_C4,NOTE_D4,NOTE_E4,NOTE_C4, NOTE_D4, NOTE_C4};

// note durations: 4 = quarter note, 8 = eighth note, etc.:
int noteDurations[] = {
4, 4, 4, 2,4,2,4,1, 4,4,4,4,4,4,4,2,4,2,1};

void setup() {
// iterate over the notes of the melody:
for (int thisNote = 0; thisNote < 18; thisNote++) {

// to calculate the note duration, take one second
// divided by the note type.
//e.g. quarter note = 1000 / 4, eighth note = 1000/8, etc.
int noteDuration = 1000/noteDurations[thisNote];
tone(8, melody[thisNote],noteDuration);

// to distinguish the notes, set a minimum time between them.
// the note's duration + 30% seems to work well:
int pauseBetweenNotes = noteDuration * 1.30;
delay(pauseBetweenNotes);
// stop the tone playing:
noTone(8);
}
}

void loop() {
// no need to repeat the melody.
}
"Pitches.h" 
/*************************************************
* Public Constants
*************************************************/

#define NOTE_B0 31
#define NOTE_C1 33
#define NOTE_CS1 35
#define NOTE_D1 37
#define NOTE_DS1 39
#define NOTE_E1 41
#define NOTE_F1 44
#define NOTE_FS1 46
#define NOTE_G1 49
#define NOTE_GS1 52
#define NOTE_A1 55
#define NOTE_AS1 58
#define NOTE_B1 62
#define NOTE_C2 65
#define NOTE_CS2 69
#define NOTE_D2 73
#define NOTE_DS2 78
#define NOTE_E2 82
#define NOTE_F2 87
#define NOTE_FS2 93
#define NOTE_G2 98
#define NOTE_GS2 104
#define NOTE_A2 110
#define NOTE_AS2 117
#define NOTE_B2 123
#define NOTE_C3 131
#define NOTE_CS3 139
#define NOTE_D3 147
#define NOTE_DS3 156
#define NOTE_E3 165
#define NOTE_F3 175
#define NOTE_FS3 185
#define NOTE_G3 196
#define NOTE_GS3 208
#define NOTE_A3 220
#define NOTE_AS3 233
#define NOTE_B3 247
#define NOTE_C4 262
#define NOTE_CS4 277
#define NOTE_D4 294
#define NOTE_DS4 311
#define NOTE_E4 330
#define NOTE_F4 349
#define NOTE_FS4 370
#define NOTE_G4 392
#define NOTE_GS4 415
#define NOTE_A4 440
#define NOTE_AS4 466
#define NOTE_B4 494
#define NOTE_C5 523
#define NOTE_CS5 554
#define NOTE_D5 587
#define NOTE_DS5 622
#define NOTE_E5 659
#define NOTE_F5 698
#define NOTE_FS5 740
#define NOTE_G5 784
#define NOTE_GS5 831
#define NOTE_A5 880
#define NOTE_AS5 932
#define NOTE_B5 988
#define NOTE_C6 1047
#define NOTE_CS6 1109
#define NOTE_D6 1175
#define NOTE_DS6 1245
#define NOTE_E6 1319
#define NOTE_F6 1397
#define NOTE_FS6 1480
#define NOTE_G6 1568
#define NOTE_GS6 1661
#define NOTE_A6 1760
#define NOTE_AS6 1865
#define NOTE_B6 1976
#define NOTE_C7 2093
#define NOTE_CS7 2217
#define NOTE_D7 2349
#define NOTE_DS7 2489
#define NOTE_E7 2637
#define NOTE_F7 2794
#define NOTE_FS7 2960
#define NOTE_G7 3136
#define NOTE_GS7 3322
#define NOTE_A7 3520
#define NOTE_AS7 3729
#define NOTE_B7 3951
#define NOTE_C8 4186
#define NOTE_CS8 4435
#define NOTE_D8 4699
#define NOTE_DS8 4978

Friday, September 2, 2011

The Uselessness of Passwords

The Public

Most people that put passwords on their computers assume too much about the security of the system they operate on. Rule of thumb: unless you have specified an encrypted hard drive, you don't have one. The password you put on your machine will keep me (or anyone else worth their salt) out of your files for all of ten seconds.

For Mac OSX a simple Command + S while booting up will do the trick (on old macs you can delete the first run file so the mac goes in to setup mode and requests a new username and password, that will then allow access to your files)

For Windows, how about an F9 to safe mode?

For linux, what about changing the boot options in your boot manager to go in to safe mode, or simply choosing the safe mode section?

For all operating systems, if your stuff isn't encrypted we can still pop in a linux live cd and copy anything we want over.

So, how do you keep someone from doing this? Well, it requires hard-disk encryption (which is dangerous, because if you forget the password, you're SOL, and if you write it down, there is no point). A locked boot manager, that boots directly to the hard drive (with a good password, different than the BIOS). A good, strong password, and requiring a user name upon login will help too. The computer must have a lock on it, as if it is stolen, most of your defenses fall, except the hard disk encryption if you have any.

Network/Computer Admins

So, what if you have a whole slew of computers?

"Aha!" the school computer admin exclaims, "I know, I'll disable users from booting in to safe mode, by setting an Administrator password, lock the BIOS so they can't boot Linux, disable running third party executables on the desktops, and, um, disable listing of the C:\ directory by students!"

This is a very real example taken from the school district I was educated at :) Hmm. Students have nearly unlimited physical access to the computers. I would just pop out a clock battery, and hit a pin, resetting the BIOS; maybe I did this, and maybe not. Maybe when I did it, the clock was reset to the year 1400 (before the epoch?) and software freaked out before I fixed it.

Oh, and by the way, if you type: file:/// in to Firefox it shows the listing of the root directory, easy enough to grab a password file from a trusted app, and then run it through your password cracker later (in Windows). Also look out for applications like AutoCad that allow command line access through their interface.

A note to the administrators of schools and businesses across the world:
  • Your subjects are motivated to do what they want.
  • They outnumber you
  • They probably outnumber you enough to brute-force a problem
  • You have other obligations
You will not win, assume every connection to your network is hostile, and client machines are always needing a re-image.