Wednesday, January 30, 2013

HOWTO Debug Crashes in C/C++ Applications on Ubuntu

In this howto we'll cover:
  • Compiling C/C++ code for debugging
  • Allowing debugging
  • Viewing errors
  • Fixing common gdb issues
Your computer is happily humming along and your program is progressing fine, suddenly disaster strikes! Segmentation fault (core dumped) your computer yells before it plunges back to darkness and your friendly shell prompt reappears.

The way of debugging in Ubuntu when coding by hand is not nearly as nice as popping up an IDE, but when done right can be much faster.

Our Problematic Code

For demonstration we'll be using this bit of code that is written to cause crashes.
int main()
{
int* p = 0x0000007b; // The cause of many a Windows XP BSOD
int j;

for(j = 0; j < 10000000; j++)
{
p++;
*p = j;
}

return 1;
}
In order to get the best debugging results you'll have to compile your code with the -ggdb flag, in this instance: gcc -ggdb killer.c.
If you run this program, it will die nearly right away with the error: Segmentation fault (core dumped).

The -ggdb flag to the compiler instructs it to include lots of debugging information. However, you won't want this on production executables because it takes up a lot of space. This example program was 9.6K with symbols, and 6.2K without!

Enabling the Dump

By default, Ubuntu 12.10 won't output crash information for programs you make yourself, to fix this you'll need to run the command:
ulimit -c unlimited
This will need to be run every time you log back on to your system.

Debugging

Once you have logging enabled and your program crashes a file called core should appear in the directory from which you ran the program. Use the gdb command to see what information it contains about the crash.
gdb a.out core
Where a.out is the name of your program that crashed and created the core file.

Analyzing the Debug Output

GNU gdb (GDB) 7.5-ubuntu
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/joseph/Desktop/a.out...done.

[New LWP 13663]

warning: Can't read pathname for load map: Input/output error.
Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004004ed in main () at breakme.c:9
9 *p = j;
This is the output you'll get from the gdb command, all you want is the last three lines, from the top they tell you:
  1. What happened.
  2. Where the error happened (in main in file breakme.c on line 9)
  3. What the line was.

Common gdb Errors

  • warning: exec file is newer than core file. means the core file you're using wasn't made by your executable, you'll need to delete it and run your program again.
  • warning: core file may not match specified executable file. means the core file you're running against probably wasn't made by the program you specified, gdb will probably report wrong results.

Tuesday, January 29, 2013

Extended Euclidean Algorithm in Python

The Extended Euclidean Algorithm is a simple extension to the Euclidean Algorithm that in addition to returning the greatest common denominator also gives numbers s and t such that gcd = s * a + t * b.
Specifically, you may want to find 1 = sa + tb which is very useful in cryptography.

The Code

def extended_euclidean(a,b):
origa = a
origb = b

x = 0
lastx = 1
y = 1
lasty = 0

while b != 0:
q = a // b
a, b = b, a % b

full = "%d = (%d * %d + %d * %d) - %d * (%d * %d + %d * %d) " % ( b, origa, lastx, origb, lasty, q, origa, x, origb, y )
short = "%d = %d * %d + %d * %d" % ( b, origa, lastx - x * q, origb, lasty - y * q)
print("%s\t%s\t%s\t%s" % (a, b, full, short))

x, lastx = (lastx - q * x, x)
y, lasty = (lasty - q * y, y)

return (lastx, lasty)

The Output

Take the numbers 120 and 23:
>>> extended_euclidean(120,23)
23 5 5 = (120 * 1 + 23 * 0) - 5 * (120 * 0 + 23 * 1) 5 = 120 * 1 + 23 * -5
5 3 3 = (120 * 0 + 23 * 1) - 4 * (120 * 1 + 23 * -5) 3 = 120 * -4 + 23 * 21
3 2 2 = (120 * 1 + 23 * -5) - 1 * (120 * -4 + 23 * 21) 2 = 120 * 5 + 23 * -26
2 1 1 = (120 * -4 + 23 * 21) - 1 * (120 * 5 + 23 * -26) 1 = 120 * -9 + 23 * 47
1 0 0 = (120 * 5 + 23 * -26) - 2 * (120 * -9 + 23 * 47) 0 = 120 * 23 + 23 * -120
(-9, 47)

Thursday, January 24, 2013

HOWTO Setup Automatic Code Hilighting for Websites (and Blogger)

This is actually the method that this blog uses! The scripts below quickly find all code on a page, auto-detect what language it is, and highlight it accordingly.

Setting Up On Blogger

To set up on Blogger all you need to do is the following:
  1. Open the "Template" menu on the left side of your blog's dashboard.
  2. Press the "Edit HTML" button below your template.
  3. Just copy and paste the following code right before the </html> tag.

<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js" type="text/javascript"></script>
<script src="http://balupton.github.com/jquery-syntaxhighlighter/scripts/jquery.syntaxhighlighter.min.js" type="text/javascript"></script>
<script type="text/javascript">
$('pre').each(function() {
$(this).addClass( "highlight" );
});
$.SyntaxHighlighter.init();
</script>

On Your Website

Alternately, you can add this to your website for automatic code finding and hilighting! Just make sure to add it to the very bottom of your page, or add the third Javascript code block that runs once the page is loaded.

If all of your code is in <code> blocks, just change the pre on line four to code.

Side Note: You probably want to use your own versions of jQuery and jQuerySyntaxHilighter so you can update them yourself.

Sunday, January 20, 2013

Extract Email Attachments With Python

If you archive your email messages, like me, you may find that you want to pull out all of the attachments for those files so your desktop search will parse them better, or so you can quickly search through them.

This is a simple script that just recurses through your .eml messages in a directory and pulls out all of the base64 encoded attachments.

For those of you that are wondering what base64 is, it's an encoding that only uses sixty-four different characters to transmit information. The email system uses this to send documents around so that the protocol didn't have to be reconfigured to account for stuff that wasn't text.

Code

#!/usr/bin/env python

import email.parser
import os
import sys
import base64


fileList = []
rootdir = "/path/to/.eml/messages/"
for root, subFolders, files in os.walk(rootdir):
for file in files:
fileList.append(os.path.join(root,file))

id = 0

for path in fileList:
if not path.endswith(".eml"):
continue

fp = email.parser.FeedParser()
fp.feed(open(path).read())

message = fp.close()

for message in message.walk():
fn = message.get_filename()
if fn == None:
continue

try:
with open(fn, 'wb') as out:
out.write(base64.b64decode(message.get_payload()))
except TypeError:
with open(fn, 'wb') as out:
out.write(message.get_payload())

Extensions

  • This script isn't very efficient being that it uses python to decode.
  • It would be nice to pull arguments from the command line using sys.argv

Update 2013-09-04 Python 3


#!/usr/bin/env python3

import email.parser
import os
import sys
import base64
import binascii
import sys


def extract(rootdir):
fileList = []

for root, subFolders, files in os.walk(rootdir):
for file in files:
fileList.append(os.path.join(root,file))

for path in fileList:
if not path.endswith(".eml"):
continue

fp = email.parser.BytesFeedParser()
fp.feed(open(path, "rb").read())

message = fp.close()

print("Checking {}".format(path))

for message in message.walk():
fn = message.get_filename()
if fn == None:
continue
try:
try:
with open(fn, 'wb') as out:
out.write(message.get_payload(decode=True))
except (TypeError, binascii.Error):
with open(fn, 'wb') as out:
print(message.get_payload())
out.write(bytes(message.get_payload(), message.get_charset()))
except Exception:
print("Error extracting item from {}".format(path))

if __name__ == "__main__":
if len(sys.argv) == 1:
print("usage: {} path/to/.eml/files".format(sys.argv[0]))
exit(1)
extract(sys.argv[1])

Wednesday, January 16, 2013

HOWTO Redirect A Page Using Javascript

Occasionally you need to move a page on a website from one location to another, but don't have the ability to update all of the links that point to the old one or access to the server software to do the same thing.

All you need is a page with a little Javascript to solve the problem.

Example

<html>
<head>
<title>Redirecting...</title>
<script type='text/javascript'>
window.location.replace("http://your.site.here");
</script>
</head>
<body>
<p>Redirecting...if you are not redirected click
<a href='http://your.site.here'>here</a>.</p>
</body>
</html>

Notes

  • Include a link and Javascript to redirect for users that have JS turned off.
  • Use window.location.replace instead of window.location.href so the user doesn't get stuck in a loop when they press the back button.
  • Use window.location.replace instead of document.location
  • Insert if(top != self) top.location.replace("http://your.site.here"); in the top of the <script> tag to escape from frames.

Monday, January 14, 2013

Ubuntu and the Art of Deception

I am, and have been an Ubuntu user for almost six years; over the past four, I've given up Windows entirely (skipping over Vista and 7, save for rescuing family members' PCs occasionally). However, the more and more Ubuntu grows the more I notice discrepancies between the Linux that I loved originally, and the current incarnation.

The Deception

In the most recent release, the biggest change was the addition of a web API that interfaces with the desktop. Don't get me wrong, this is great, but a very underwhelming thing to take SIX MONTHS to develop. Had I wished to do so, I could have perfected the same thing (because the back-end is already totally there with DBus) in about forty hours of work. So, if Canonical had one developer working on it, for one week, it would have been done. What happened the rest of the time?

When Shuttleworth gets out and states all of the great things that came to us in 12.10 the major ones were these:

  • Webapps - as discussed about 40 hours of time
  • Online accounts - already integrated in Gnome, a simple port was all that was needed
  • Dash Previews - essentially a few more plugins to the dash, which should be about a week of programming, especially if you use the utilities already available, like the "file" command
  • Easy full disk encryption - Essentially a single change to the installer
  • Ditching Unity 2D - this should have freed up a lot of developer time!
  • And the hotly contested Amazon Search Results - advertisements for your PC that I'm sure Amazon would have implemented due to all the free advertising

The News

So, what did the news sites report on for six months? Theme changes, new button gradients, new backgrounds, etc. All the real work I could have done quicker, and not included the ads.

When all of this came out, webapps were touted as a great fusion of desktop and web, where in reality, they work on when the pages are already open, saving a few clicks at best, and opening your system to security flaws at worst. Why not support W3C apps? That would be a huge boon to the software overall, and a push to help Tizen. What about supporting Firefox apps? Quick to develop, quick and open to deploy.

The Amazon advertisements are a huge privacy concern, according to the EFF, rather than some magical new way to shop.

The Future

So, what does this all mean? Ubuntu has reached the feature creep stage, because it's founder won't take risks; maybe he's outpaced himself with a six month development cycle instead of continuous integration, but not enough is changing other than seeming regressions that users have to fight with each subsequent "release", but a great number of users refuse to admit this, probably because they don't know enough how quickly these changes could be made to a well designed system.

Perhaps Shuttleworth has spread himself too thin; rather than shoring up upstream apps and using them (i.e. docky could have easily been used instead of unity) he has created his own; rather than creating a unified configuration file set, we have a hack that works with each individual application.

It would be nice to see a desktop, particularly the most powerful one, contribute far more to the small projects that make it up, rather than fork them, until that happens, Ubuntu will become more and more of an impenetrable island that will eventually grind to a halt where a finite set of resources is used to do everything.

Wednesday, January 9, 2013

If You Think Your Users Are Stupid, You're Probably Doing it Wrong

I was recently in an office where a programmer and a designer were complaining about their users being too stupid.

The users weren't using the program they were tasked with developing "properly". If you develop an application that the users cannot figure out, you have failed to do your job, and are asking the users to do it for you.

As an application developer, it is your job to be invisible, and to make the software invisible. The user, novice or expert, should be able to get in to your system, figure out how it works immediately, do what they need to, and leave easily.

One of the most valuable things I've found out as a developer is that users rarely think in the same models as you. As programmers, we're taught to divide the world in to neat little problems and objects that flow through pipes and get transformed.

The user doesn't understand that the box that helps them compose email is a text pane wrapped in a scroll pane that has buttons that apply styles to the data inside that passes it to a renderer, and has a side process that is activated whenever a whitespace token is pressed that does a spelling check on the document. They probably don't imagine that the place to enter names that looks them up in their contacts database is a separate piece of software altogether. To the user, the object they see is a form.

This leads me to a few, simple rules for usability design that I think everyone should benefit from:

Anticipate the User's Needs

If you are writing an email app, auto-suggest contacts to use as recipients, offer printer-friendly versions of everything (preferably that auto-format for printers).

Consistency

Just because you're designing a website for My Little Pony watchers doesn't mean purple links is a good idea (forgive me if I've gotten this entirely wrong), blue links are standard, and they are what users look for.
  • Buttons are not links, and links are not buttons (links navigate and buttons change the current page)
  • If two things look the same, they should do the same thing.
  • If something is dangerous
Or, "where the hell am I?" generally you don't need folders in folders in folders; if you do, you're doing something horribly wrong. Even desktop applications benefit from clear location awareness.

If you have a confusing area in your software, use contextual popups (when the user is editing a form) or help can help ferry the user through potentially confusing areas.

Speed

This one is probably the most useful. Find out what 90% of your users are doing (it is probably either not what you intended, or they are doing what you wanted in a long and laborious way), so try doing the following:
  • Remove all items in a form that aren't required.
  • Allow sign-in from existing accounts, like Google/Twitter/Facebook if applicable.
  • Show the user the status of their most used items right away, using colors if possible.
  • Use consistent icons.
  • Use consistent navigation.
  • When the users are working around somthing you've made, fix your software to facilitate their way (ideally you'd make whatever way you had in mind the easiest so it would be quickly discovered and used)

Sunday, January 6, 2013

End of Blogging For A Month

Over the past month I made it a goal to write a post every day. It was tough trying to find something to write about every day. Even if it seemed easy to begin with.

The hardest part was trying to write well, rather than just send links (and yes, I cheated on Fridays with a link list, but so does everyone else, I regret nothing!)

Overall, the experience has been good, I've nearly now got a hundred posts and roughly fifty page views a day, and my most popular articles have exploded while others don't, I'm not entirely sure why yet, but I'm determined to find out.

In future months, I plan to cut back on the schedule, probably to three times a week, now that I know it isn't as bad as I imagined.

What's up next? Hmm, I'm not sure yet; I may write a book about something, exercise or whatever, I'll think about it for a day.

Here is the video that originally made me decide to try something new for thirty days every thirty days:




My last endeavor was to take a picture every day for a month, and it was great!

Saturday, January 5, 2013

Three New Interesting Operating Systems

AROS Icaros

Not an OS per se, but a desktop environment that allows you to run old Amiga software; it allows full emulation for Amiga software without the need for ROMs, in the latest version.
Icaros Desktop
Icaros Desktop

Whonix

This is an interesting concept of an OS, the user runs inside of a virtual machine where all the internet traffic is automatically routed through TOR. Somehow I doubt this will come to as great of use as it should, simply because running a VM is very costly, especially through VirtualBox (which this is), maybe if it were running on Xen. The project just got released so there is still time for improvement!

FreeBSD 9.1

The latest FreeBSD has been released (FreeBSD is the thing Apple based OSX on). New in this release are
  • Xen ethernet driver
  • IPv6 improvements
  • C++11 stack
  • New Intel Drivers

Thursday, January 3, 2013

Visualizing REGEXs

The excellent REGEXPER generates beautiful representations of Javascript regexes you send its way.

Example (IP Address)

\d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?


Tuesday, January 1, 2013

ODT Text Mining in Python

The ODT format is slowly gaining popularity, as governments worldwide begin to either use FOSS software, or transition to formats that are open enough to still be usable in at least a hundred years.

The Format

Within all .odt files (which are just zipfiles) there is an XML file that holds all of the document's text named content.xml. All of the characters between the XML tags are the document's text, so by cutting out all of the tags, all that remains is the text (what the script below does).

The Code (Python)

#!/usr/bin/env python3
'''

Copyright 2012 Joseph Lewis <joehms22@gmail.com> | <joseph@josephlewis.net>

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of the nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

This software recieves an odt file from the command line, and returns the text
from its path.

'''

import zipfile
import xml.etree.ElementTree
import sys


def extract_odt_text(fp):
myFile = zipfile.ZipFile(fp)

share = xml.etree.ElementTree.fromstring(myFile.read('content.xml'))

text_nodes = []

for elt in share.iter():
if elt.text != None:
text_nodes.append(elt.text.strip())

return " ".join(text_nodes)

if __name__ == "__main__":
if len(sys.argv) == 1:
print "usage: odt.py FILENAME [FILENAME...]"
else:
for arg in sys.argv:
try:
print(extract_odt_text(arg))
except zipfile.BadZipfile:
pass