David Brown's Blog http://blog.davidb.org Most recent posts at David Brown's Blog posterous.com Fri, 22 Jul 2011 16:49:53 -0700 Email clients suck http://blog.davidb.org/email-clients-suck http://blog.davidb.org/email-clients-suck

It’s been way too long since I’ve posted anything here, so I’m just going to rant a bit.

My most commonly used email client is Mutt. Mutt’s motto is “All email clients suck. This one just sucks less.” I seem to be frequently reminded of this, and every year or so, I get dissatisfied with Mutt and hope there is something better.

Usually, I’ll try some new-fangled client, Kmail, Thunderbird, etc. After getting over the shiny, I generally realize quickly that most of them are more clumsy to use, and significantly lacking in features. Even the ability to invoke an external editor (mandatory for fixing up quotes on replies) is often broken or missing.

A while back, I decided to give Gnus a try. Gnus is primarily a usenet newsreader that has been adapted as an email client. It is very powerful, and I found it to have a lot of useful features. Both early on, and later, I discovered some significant problems:

  • The config language is in emacs lisp. Although this is powerful, often it ends up with very complex configuration, requiring large blocks of code to configure various things.
  • Plugins are kind of chaotic and don’t always get along with each other.
  • It runs in emacs. Really, the subject of another post, but I’m tired of emacs. My wrists are tired of emacs. Vimpulse/viper mode almost works, but is still frequently annoying, and often hits minor modes that it doesn’t get along with.
  • It’s fairly slow. It has some help by being good about not fetching too many articles. But, I have lists with tens or even hundreds of thousands of messages, and Gnus just runs out of memory trying to read these (the work fine in Mutt).

There is one feature I really liked about Gnus: cross-post detection. When you mark articles as read, Gnus remembers the message ids, and upon encountering these in other folders, marks them as read there. For things like the various linux mailing lists, where cross-posting is common, this eliminates a lot of redundant messages.

For now, I’m back with mutt. I’m already discovering that my mail reading time has shortened back again. I’m also figuring out that I missed a lot of email with Gnus. It showed it to me, but the interface was sluggish enough that I tended to skim quickly and mark all as read more.

I may take some time and write a Python program to scan my mailboxes (imap) and propagate the “read” flags in my messages. This would give me this functionality with any email client I happened to choose.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Sun, 10 Oct 2010 08:06:00 -0700 Intelligent Parking Assist http://blog.davidb.org/intelligent-parking-assist http://blog.davidb.org/intelligent-parking-assist

Photo
2010 Prius

Yesterday (Oct 9, 2010), I took delivery of a new 2010 Toyota Prius V, with the advanced technology package. This is my first post in what I hope to be a series about some of the gizmos in this amazing car.

Finding the button

The console of the new prius has quite a few more buttons and controls than my 2004 Prius did, and it took me a while to find the button for the Intelligent Parking Assist. Having not read the manual, I found the on-screen help fairly useful. Each press of the button cycles between parallel park, back-in park, and nothing.

The basic mode appears to be the press the button before reaching the space to select the mode. You drive slowly past the space until it beeps, following the instructions. For parallel parking, you shift to reverse, confirm the location on the LCD camera view and then start backing up. You control the brake pedal (speed) and the car takes care of the steering.

For back in parking, the display prompt you to turn the wheel ½ to 1 turn to the left at the first beep, then center it on the second beep, and shift into reverse. After this, it is similar to the parallel parking mode.

Beep

The first thing to stand out is that the infernal beeping is back. I have vague memories of this on the 2004 Prius, and found on the web how to disable it. With the 2010 Prius, the backup beep appears to only be settable through a computer interface by a dealer. This will be something I get done at the first service.

Both parking modes seem to work fairly well. The prompt consists of a view from the backup camera, with a red or green rectangle superimposed where it thinks the car should be placed. The rectangle remains in place during the parking operation, and is a good confirmation that the car and the driver have the same idea about what is happening.

In back-in parking, the manual indicates it can park between two cars, or next to a car on either side. The prompting mode only works in one of the side configurations, but it can still be used. Parallel parking seems to require a vehicle in front of the desired space.

When the car is put in reverse, the image from the backup camera has a green or red rectangle overlayed on it. Red indicates that the IPA is unwilling to park the car. Sometimes with the green, there is also one or more warning flags (usually on the corner of one of the cars). It seems willing to park, but is warning that it is unsure of clearance on that corner.

Trying it out

I’ve probably parked 5 or 6 times so far in back-in mode. It seems to work fairly well. One lot didn’t give me the second beep and there were cars on the other side of the lot, so I stopped, backed up and restarted. It seemed to work the second time.

There isn’t a lot of parallel parking near my house. Yesterday evening I drove somewhere with parallel parking and tried it. Several of the spaces were red, and it wouldn’t assist me. When it did, it worked fairly well. The IPA definitely has a better idea of where the front of the car is than I do. At one point, I stopped, and asked my passenger to lean out the window to make sure I wasn’t going to hit the rear bumper of the car in front of me. It still had about a foot of clearance at that time.

Overall, the systems seems fairly useful. I will have a better impression of it after using it for realistic scenarios rather than just looking for places to park.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Sun, 03 Oct 2010 00:59:00 -0700 Piano videos from Sept 18, 2010 http://blog.davidb.org/29560299 http://blog.davidb.org/29560299

I finally got the video and audio from my last jazz piano recital edited and uploaded. Go to the YouTube pages for full details.

 

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Wed, 28 Jul 2010 11:42:27 -0700 Re-learning C++ http://blog.davidb.org/re-learning-c http://blog.davidb.org/re-learning-c

I first learned C++ somewhere around 1990, as a student, in a programming course. The main thing I remember from this time was that CFront at the time used a non-C++-knowing preprocessor, which occasionally caused weird problems with syntax errors in comments.

About 6 years ago, I tried using C++ again to build a unit test framework for a filesystem I was writing. It was probably a bad idea to use an important project as a place to learn C++, and I underestimated how much the language had changed. I ended up writing the test in OCaml, which helped with the type safety. I did spend a lot of time writing bindings to the C code I was testing, and it turned out to be a disadvantage to other members of the team, who hadn’t ever worked with ML languages, let alone functional languages.

About a month ago, I decided to actually learn a modern version of C++.

Changes

The changes I mainly had to learn where:

  • Namespaces
  • Multiple inheritance
  • Exceptions
  • Templates
  • The STL

Of these, the STL probably took the most time to learn. The other mechanisms are largely used in similar ways in other languages, although the specifics of multiple inheritance in C++ are different than other languages I’ve used, they seem to be coherent.

Misconceptions

I had some fairly serious misconceptions of C++ that I am glad to have been able to get past.

Strong typing

C++ is a fairly strongly-typed language, much more so than C. Although it allows C-style casting, it offers plenty of mechanisms to not require it, and the compiler asserts fairly strict type coherency.

What had confused me is that the template mechanism does not enforce type constraints. Languages such as Ada, or Scala have strongly typed generics (Java is less so), which requires a fairly rich type system, and also tends to force type relationships when they aren’t completely necessary (if a generic wants to use a feature of the type parameter, that parameter must be restricted to a type that supports that feature). Templates, on the other hand, are resolved at each instantiation. This is more flexible, still just as strongly typed, but tends to produce amazingly poor error messages.

Garbage collection

C++ does pretty much shun garbage collection. Although Boehm can be bolted on to a C++ app, it isn’t a style of programming that is commonly used in C++.

However, C++ makes up for this by providing full control over construction and destruction of objects. This allows for full memory management. The disadvantage is that it is harder to manage sharing of objects, which that usually requires some type of smart pointer that does reference counting. There does tend to be a bit more copying of objects with the C++ style, and I’m not completely sure of the tradeoffs between the extra copying, and the extra work of a garbage collector. It likely depends on the particular application.

Binding to C

Interfacing to C code and libraries is clearly where C++ wins over pretty much any other higher-level language. I suspect this is the major reason for the success of C++. The kind of programming I usually end up doing (systems type, such as backups) requires me to bind to system-level calls for I/O and such. In C++, I am able to call these functions directly, without having to worry about structure formats and calling conventions.

Conclusion

I’m not certain how much I’ll be using C++ for programming projects. I’ve started rewriting some small parts of JPool in C++. It is a good exercise in learning the language, but I still have quite a bit of learning to do before I can determine if it would be efficient language for writing code. I still like Scala quite a bit, and will definitely keep my Scala version of the backup software alive. Perhaps I will create a restore utility written in C++ to make it easier to restore from a rescue disk.

There still seems to be a lot of crappy C++ code out there. This is probably mostly because of the popularity of the language, not any inherit feature of it. I shouldn’t let the poor code I have seen deter me away from what benefits the language might offer.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Wed, 28 Jul 2010 10:58:39 -0700 Markdown support in posterous http://blog.davidb.org/markdown-support-in-posterous http://blog.davidb.org/markdown-support-in-posterous

It’s been a while since I’ve written anything up in my blog, mostly because I’ve been busy with work and other real-world things. But, now that Posterous supports Markdown as a formatting for blog postings, most of my barriers to writing posts should be gone. Part of the purpose of this post is to make sure this actually works.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Sun, 02 May 2010 22:56:00 -0700 A brief review of the new MacBook Pro http://blog.davidb.org/a-brief-review-of-the-new-macbook-pro http://blog.davidb.org/a-brief-review-of-the-new-macbook-pro

I got my new MacBook Pro 15-inch on Friday, several days earlier than I was expecting. Overall, I have to say that I'm quite happy with it, although Friday evening was a bit on the frustrating side. I learned the hard way that this new machine is not compatible with my old Netgear WGR614 router. It connects, but the connection is unreliable, drops a lot of packets, and makes for an otherwise unpleasant experience. I went to the Apple store Saturday morning, and picked up an AirPort Extreme, which was easy to set up, and works quite well.

I special ordered the machine with the higher-resolution screen (non-glossy, I'll get to that in a moment), and a 256GB SSD. I haven't really decided if the SSD is really worth the $600, but it certainly is nice. It boots almost instantly, and operations such as installs and such are much faster. It also makes for a very quite machine, and one that I don't have to worry about moving around. The higher resolution screen is very nice. I can get a lot more on the screen, but it also just makes images look nicer.

As far as glossy/non-glossy goes, this is a complete scam. The non-glossy screen is beautiful, hardly distinguishable from one of the glossy screens, except for the fact that I don't have to try to see past my own reflection in order to read the screen. I guess I do miss out on being able to see the aliens sneaking by behind me while I'm working, though. I really think this whole glossy screen business is a way to make the displays cheaper, and a bunch of marketing spin to convince people that the inferior screen is actually an advantage.

The keyboard on the machine has a fairly nice feel. I was a little concerned about the caps lock key (which I configured to be a control key), since my BT flat keyboard has problems missing that key. The keyboard on the MBP has a much smoother feel than the BT keyboard, and I really haven't had any problems with it. It's definitely better than the keyboard on my 2008 (with Santa Rosa). The keyboard is the one thing I miss from the ThinkPad, though.

Setup

Once I got networking, I did some significant setup. I booted the install DVD, and shrank the OSX partition by about 32GB to make room for a native Linux partition. I used parted to make a swap and root filesystem for Linux, and then booted into rEFIt and used it's partition fixer to fix the legacy partition so that the legacy bootloader could easily boot Linux. I installed Arch Linux without any real hitches, using the wired ethernet. The built-in wireless ethernet seems to be a very new device, and even the Broadcom driver didn't seem to work (it talks to it, but won't authenticate to the base station). I'm sure this will get working at some point, and it probably won't be long before I can use the b43 driver in the kernel.

I installed Virtualbox, and also put Arch Linux in that. It's a bit slower, but does integrate nicely with the OSX environment. That will probably be the normal environment I use Linux from on the machine.

As far as OSX software goes, I installed MacPorts to be able to easily build packages. I found a version of mplayer-enhanced which seems to be mostly as good as the Linux version. It's the only program I can find on the machine that will play 10Mb/s H.264 streams without glitching.

All in all, I'm quite happy with the new machine.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Fri, 30 Apr 2010 02:04:00 -0700 Jpool updated to Scala 2.8 http://blog.davidb.org/jpool-updated-to-scala-28-tags-jpool-scala http://blog.davidb.org/jpool-updated-to-scala-28-tags-jpool-scala

I have finished updating jpool to Scala 2.8 (currently 2.8.0.RC1). I will leave this development on the 'try-2.8' branch until 2.8.0 is released. However, now that this seems to be working fully, I will not be backporting changes to the old branch.

I ran into some interesting problems with the conversion. The most tedious to fix was that Scala 2.8 doesn't auto import parent packages from an import statement. Once I figured out the correct way of handling this, things got a lot better. Basically:

   package org.davidb.jpool.tools

can be changed to

    package org.davidb.jpool
    package tools

and this will auto-import both 'org.davidb.jpool' as well as the tools packages.

The other main effort was because of the conversion of the containers. Since jpool creates several of it's own containers, these had to be updated to use the new naming system. Stacks have been fixed to actually be implemented as stacks, which ended up simplifying some of the code that used them.

Previously, I was using streams to iterate the directory trees in the filesystem. Moving to 2.8 provoked some space leaks in my code. However, the new collection classes make Iterator as convenient to use as streams, while nicely maintaining the overwriting behaviour of the iterator. I converted all of the Streams into Iterators and eliminated the space leaks. This seems similar to Clojure, and streams are rather hard to not have space leaks on the JVM, since it doesn't seem very good about determining lifetime of locals.

Beyond this, I've now implemented a 'clone' tool. This tool individual snapshots to be migrated from one pool to another. This can be used as a kind of poor-man's garbage collection. I've been using this to make weekly pool snapshots, which helps keep this weekly pool smaller, since it has fewer snapshots in it.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Sun, 18 Oct 2009 04:25:00 -0700 Asure 1.00 released http://blog.davidb.org/asure-100-released http://blog.davidb.org/asure-100-released

I have released version 1.00 of my Asure file integrity program. This is a small python program that captures file hashes and permissions over a directory tree, and can be used to either look for changes (similar to tripwire), or verify that the files are properly restored when testing backups. It is primarily intended for the later. It has an important command update which only rehashes files that have been touched since the last run, which allows the database to be kept update to date fairly quickly.

This release mostly fixes a warning with newer versions of Python, and has a proper setup.py to make it easier to package. The page above also has a link to an Arch Linux User Repository (AUR) package that allows it to easily be installed on Arch Linux.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Mon, 05 Oct 2009 05:56:00 -0700 Arch Linux http://blog.davidb.org/arch-linux-30 http://blog.davidb.org/arch-linux-30 This weekend, I did a couple of installs of Arch Linux. So far, it seems like it is going to be a pretty good fit for what I want to do.

I had several frustrations with any of the Debian-based distributions: mostly that they have definitive releases, and getting recent versions of packages takes a long time. Even Gentoo is starting to get slow about releases, but the overlay system helps for cutting edge things. I've also found that both of these make it somewhat awkward and difficult to package up my own things.

This is where Arch really shines. The arch pacakge manager is a binary package manager, similar to dpkg, but a bit simpler. It manages dependencies and upgrades, and tracking files, although it just drops config files into /etc/filename.pacnew and lets the user manage the updates. Gentoo used to be similar, but several tools now help manage these, and I suspect could be adapted/written for Arch.

But, building Arch packages is really easy. The abs tool will synchronize all of the package descriptors for all of the arch packages that are “trusted”. These can easily be copied somewhere and the package rebuilt, similar to BSD ports. However, the result is a binary package that can be managed by pacman. The Arch User Repository holds packages uploaded by arbitrary users. It allows commentary, and voting, and well-done packages can be promoted into the regular Arch distribution.

I built my first package, of Aegis, which has it's very own package page.

The other thing that is nice about Arch is that the config files (in /etc) are much simpler than most distributions, more like BSD scripts than a typical Linux machine. Most stuff is a bunch of shell variables set in /etc/rc.conf, with a handfull of other things in other files. The installer just puts you in an editor with these files, and it is fairly easy to figure out.

What will be really interesting to see is how well it handles upgrades as time goes on, since this is the difficulty of any distro that does incremental upgrades.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Sat, 22 Aug 2009 15:03:00 -0700 NX and MacOS http://blog.davidb.org/nx-and-macos http://blog.davidb.org/nx-and-macos My primary home machine is a Mac Pro desktop. Although the machine dual boots between MacOS X and Linux, it is rather inconvenient to do so. I have been running Linux in a VM, but find that the performance isn't all that great.

I decided to give No Machine's NX system a try. I decided to give FreeNX a try, mostly because it is GPL, and the source is available. It was fairly easy to install on Gentoo, just:

sudo emerge nxserver-freenx
and wait a short while.

I downloaded the MacOS NX client from No Machine, since there doesn't appear to be a free version. This client is ppc only, and feels very much like a non-native Mac App. Fortunately, once you are past login, the application comes across as just a single large window.

Unfortuantely, it isn't a native MacOS app, but an X11 app. The first thing I discovered is that the keyboard layout is abysmal. After playing around with 'xev' and 'xmodmap', I came up with the following xmodmap config to make the keys better

keycode 66 = Alt_Lkeycode 63 = Super_Lkeycode 71 = Super_Rkeycode 69 = Alt_Rclear Mod1clear Mod4add Mod1 = Alt_L Alt_Radd Mod4 = Super_L Super_R
NX seems to update the keymap on the remote X server to match the current client, so it doesn't seem to be a problem switching between clients.

Although I tell the NX client to make the window the largest size, I still seem to have to click on the green “maximize“ button to make it fill the screen. There's still the Mac menu bar at the top, and a window border below that, and it's be really nice to get full screen to work, but this is quite usable now.

I tried using the connection over a wired LAN, WiFi, as well as an EvDO modem. All of these configurations are quite usable. With the local networks, videos play, although I don't have any audio (the server doesn't have speakers, and I suspect it is going that route).

The last thing I did was to update the keypair used to authenticate the NX ssh login. NX doesn't listen on a port, but uses ssh to connect to the server, always logging in as the 'nx' user. It ships with a keypair that allows the client to connect without any configuration, however, this now relies on the NX password authentication. Fortunately, simply running

sudo nxkeygen
generated a new keypair. I then looked at the file '/var/lib/nxserver/home/.ssh/client.id_dsa.key' and pasted the contents into the keypair in the NX client configuration. This matches my security model better, since I normally don't allow password logins on my machines.

I'll give this setup a try for a while, hopefully it will require me to 'unison' synchronize my data quite as much between so many different machines.

Update: Making clipboard sync work

Getting the clipboard to sync between MacOS and NX was challenging. NX had no problem with the sync, but Apple's X11 doesn't enable it by default.

To set this, completely exit the X11 program, and using a Terminal window, cd to ~/Library/Preferences and 'open org.x.X11.plist. Using the editor, change:


Field Value
enable_key_equivalents false
sync_clipboard_to_pasteboard true
sync_pasteboard true
sync_pasteboard_to_clipboard true
sync_pasteboard_to_primary true
sync_primary_on_select true

It's probably possible to use other settings, but I was able to make this combination work. None of it seems to work if the key equivalents is not disabled, which means you can't use the Apple key shortcuts.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Thu, 23 Jul 2009 09:24:00 -0700 Piano recital http://blog.davidb.org/piano-recital-13 http://blog.davidb.org/piano-recital-13 Thanks to some help from my newphew (thanks Dustin), I was able to get video and audio recordings of my recent piano recital. The audio was recorded with 2 Sure KSM-32 microphones, one for the piano, and one for the drums, and another mic of an unknown type in front of the bass player's amp. I used a Motu 828mkII as an audio input device, (post mixer) and recorded the audio with Boom Recorder.

Audio editing was done with Logic Pro and the videos with Final Cut Pro.

First is I'll remember April:

And second is You Own Sweet Way:

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Fri, 10 Jul 2009 06:44:00 -0700 ICFP, well I guess not. http://blog.davidb.org/icfp-well-i-guess-not http://blog.davidb.org/icfp-well-i-guess-not A few weekends ago, I got myself prepared to participate in the ICFP programming contest. I did it a few years back, and although it was a lot of work, it was fun. This year, I just couldn't motivate myself to even start. The only part of the problem that seemed interesting to me was the virtual machine itself, and even then, I couldn't motivate myself to do something so transitory.

So, instead, I started working on the Project Euler problems, in Haskell. I've been pushing my solutions to Github, but I don't recommend looking if you consider looking at the problems.

This has renewed my interest/fascination with Haskell, and I've since dug up my Haskell implementation of the backup software “harchive”. The code has suffered some bitrot in the few years and doesn't build any more, mostly a consequence of libraries I depend upon.

I'm currently working on implementing the new HashMap I came up with for Jpool. Ideally, I will have more than one implementation of this software that uses a compatible storage format. I enjoy programming in Haskell, but find that it also stretches my thinking a lot.

What's also been taking up a good bit of my time is practicing for a piano recital coming up on the 18th. I'll be doing two jazz songs with a trio, and we hope to get videos up on YouTube afterwards.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Sat, 13 Jun 2009 05:34:00 -0700 Verifying Backups http://blog.davidb.org/verifying-backups http://blog.davidb.org/verifying-backups How do I know if my backups are working? Ideally, you should test a full restore of a system to make sure that every part of the process works. I do this, but I don't think very many people do. Even just running a test restore onto a temporary directory or partition can tell you quite a bit. But still, how do I know if what I've restored is the same as what I backed up?

One way to do this is to compare the trees. A simple way to do this is:

tar -cf - -C /dir1 . | tar -df - -C /dir2
which will compare the contents of dir1 and dir2. The biggest problem with this approach is that the older the backup you are testing is, the more likely it is that the filesystem has changed since running the backup.

Another approach, is to compute hashes of the files, and check them:

find . -type f | xargs sha1sum > SHA1SUMS
and back the SHA1SUMS file up with the backup. Upon restore you can use the '-c' option to the sha1sum program to test the hashes. This works a lot better, but really only verifies the integrity of the contents, not of metadata.

There are programs, most famously Tripwire that do a better job of managing this integrity. As far as I can tell, nobody uses this to verify backups. All of them seem to want absolute paths, and have no real obvious way of verifying the integrity of a backup restored into a temporary directory.

A deeper problem, however, is that all of these scans involve computing the hashes of all of the files in the filesystem, for each backup. With good incremental or snapshot backups, this integrity scan can easily take longer than the whole backup itself.

Since I couldn't find anything that did what I wanted, I wrote my own program Asure. It's a fairly small Python program that manages integrity snapshots of trees of files. It's update command has the useful feature that if a file's timestamps haven't changed since the last scan, the hash will not be recomputed. This does miss modifications to the file caused by underlying hardware problems, or subversions of the operating system, but tends to make the scan fast enough that it remains useful.

I do feel it is important that this scan utility be a completely separate codebase than the backup software being used. It's tempting to use the file scanning libraries from jpool, since they do largely the same thing. However, much of what I'm trying to detect here is to detect bugs in jpool. Hopefully bugs in the Python code and bugs in the Scala code are less likely to happen in the same place.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Sat, 06 Jun 2009 05:41:00 -0700 Eee PC 900 Review http://blog.davidb.org/eee-pc-900-review http://blog.davidb.org/eee-pc-900-review Last week, I received my Woot special Asus Eee PC 900. Since I'm a fairly non-standard computer user, I wanted to give myself some time to get things setup, before writing a review. There are still more things to do, but I have a reasonably usable machine now. What I got has:

  • 512MB DDR2 533 RAM
  • 4GB SSD
  • 1024x600 9" color LCD
  • 3 high-speed USB ports
  • 900 MHz Celeron-M ULV 353 CPU
  • 2 speakers, microphone, headphone, and mic-jack
  • trackpad
  • a very tiny power adaptor, 12V 3A
  • a bunch of paper and cardboard
The Woot version is refurbished, and different “deals” seem to come in different configurations. They are all much smaller than standard versions that you buy, which essentially means that the Woot deals aren't really all that good of a deal.

The native ASUS-customized Xandros barely fits on the 4GB drive. To be honest, I didn't play with it very much. It asked for a password, which I apparently mistyped twice the same way, and was unable to ever log in. I plugged in a USB harddrive, and used it to make a basic Gentoo install. After that worked, I migrated it to the SSD and now run natively.

I discovered that I had a 1GB DDR2 667 SO-DIMM sitting around from another upgrade, which worked fine. I've ordered a 32GB SSD from Crucial for about US$90. Until I get the new SSD, I have /usr/portage and /usr/src/linux symlinked to directories on an external thumb drive. This allows me to mount the drive when I want to emerge or update the kernel, but otherwise use the machine without it.

The keyboard took some getting used to. The pitch is smaller, and the “edge” keys are significantly smaller. At first, it was hard to go back and forth with a regular keyboard, but I'm fine with that now. It's definitely usable, but not exactly natural or pleasant. I haven't spent much time coding to see how hard the punctuation is to use.

But, importantly, the switches in the keys are good. I used another one in Best Buy (not sure which brand it was), and the keys themselves were unusable. It wasn't just a wear issue because all of the switches had the same inconsistent feel.

I used it today as my work laptop, and it was fine for email checking and basic web browsing during meetings. The small form factor makes it much easier to carry around.

The Ratpoison window manager is a good fit with the tiny screen. With a larger screen, I tend to use xmonad since it is better at tiling and handling multiple desktops, but the single window at a time works nicely with such a small screen. Xfce's Terminal program using DejaVu Sans Mono 9, with sub-pixel anti-aliasing is nice and readable. Gentoo is nice enough to let me build just the terminal emulator without bringing in the rest of Xfce, or any other desktop manager.

The main thing I'm disappointed with so far is that the ACPI battery monitor doesn't provide all that much information. It reports a design capacity of 5200 mAh, but a last full capacity of 100 mAh. This appears to be so that the remaining capacity gets reported as a percentage. There isn't a lot of precision to it; the numbers always being multiples of 10. There also doesn't appear to be any kind of current metering, which makes estimating remaining time difficult.

I had no particular difficulty getting my Verizon EvDO USB card working, which gives me net access in most places. The WiFi also seems to work fine. I'm still setting things like mplayer and Skype up, so I don't have enough information to review the multi-media capabilities of the machine.

All in all, I'm reasonably happy with the machine. I'll know more after I try actually travelling with it. I'm looking forward to having a computer small enough to actually open on modern airplane seating.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Tue, 02 Jun 2009 04:17:00 -0700 Incremental Backups Considered Harmful http://blog.davidb.org/incremental-backups-considered-harmful http://blog.davidb.org/incremental-backups-considered-harmful Well, OK, maybe not harmful, but really hard.

It seems like it should be fairly easy. Most archive utilities contain some type of argument to only backup files that are newer than a certain date. With some clever scripting you can arrange to do a "full" backup initially, and then only dump files that have been modified since a certain date.

This has some real problems, mostly having to do with the ability to rename directories. You see, when you rename a directory, the files inside of the directory themselves aren't modified. At best, the incremental backup will completely miss that the directory has been moved (or will notice the new directory, but not the contents). Now modified files in the new directory name will restore in the new place, even though the restored image has the rest of the directory in the old place. Generally, the older the backup the less the resulting restore will actually look like the tree you backed up.

The other problem is that it is important to remove files that have been removed since the last backup as well. Otherwise, at best, the tree will be cluttered with old files that were intended to be deleted. At worst, there won't be enough room in the restore volume to hold all of the data.

Several programs try to do better at this. There are a couple of different approaches to this. Any of them can theoretically work, but minor bugs in the code tend to cause significant data loss.

The first approach is that used by the 'dump' program. This program only copies files that have changed since the last backup. Instead of worrying about figuring out where things have moved when dumping, dump just writes out the contents of directories (names and the inode number of each file) that have been modified. This places the burden of work on the restore utility to figure out how to put the filesystem back during the incremental restore. Restore maintains a database of every file restored, and tries to coordinate a set of file mores as part of the restore to do any directory rearrangement necessary to get the filesystem back into the state it was in. This usually works. The main advantage here is that it makes the backup process quite simple (and fairly fast). However, but bulk of the complexity is in the restore utility, which is run infrequently. Moving the program's complexity to a part that is rarely run tends to keep difficult bugs from being found.

Another approach is that used by GNU tar, as well as by my own Adump backup suite. This approach stores a small (or not so small) database of what the directories looked like at each backup. When performing an incremental, it compares against this list to determine what to back up. In general, when a directory is renamed, these programs will write the entire contents of the directory to the backup again, and write something to indicate that the old directory should be deleted. More of the complexity is moved to the backup portion, which is good. The main disadvantage here is that they tend to copy a lot more data than is really needed.

To finish, I'd like to discuss a little bit, the difference between incremental and differential backups. A clearer way to describe all of these types of backups is with the notion of backup levels. A level '0' backup is a full dump of everything in the filesystem. A level 'n' backup is a dump of all files that have been changed since the last level 'n-1' backup. Using this terminology, a differential backup would follow the pattern 0, 1, 1, 1, 1… and incremental backups would follow the pattern 0, 1, 2, 3… The differential backup has the advantage of only needing to restore two dumps. However, the dumps tend to grow larger and larger, eventually needing to do periodic level 0 dumps again. The incremental backups tend to be small, but the entire chain of backups has to be restored, again encouraging more frequent level 0 backups.

A good solution is a compromise between these. A simple algorithm, based on the Tower of Hanoi problem is to count in binary from zero. For each backup, use a backup level based on the number of set bits in the binary number. This results in the series 0, 1, 1, 2, 1, 2, 2, 3, 1… It makes a good compromise, periodically performing larger earlier level backups so that the incremental chain is shorter (log n, instead of n).

Coming up, something much better, content addressable storage.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Mon, 01 Jun 2009 05:08:00 -0700 Why do we do backups? http://blog.davidb.org/why-do-we-do-backups http://blog.davidb.org/why-do-we-do-backups A couple of years ago I awoke to the unpleasant smell of gently burning electronics. The power supply in my main mail-server had decided that, as far as it was concerned, 50 volts was as good as 12. Not much was happening with the machine. I don't remember the exact times, but I knew that after a few days, my email would start bouncing. A trip to the computer store and a few hours later, I had a nice shiny new computer all ready to go, with a completely blank hard drive.

I popped a rescue disc into the machine, booted it up, setup the partitions and filesystems, connected to the network, and restored the machines backup from the night before. The only thing I had to change was to build a new kernel to support some of the new devices on the new machine.

To me, this is the main reason I do backups. RAID would not have helped me in this instance, other than perhaps increasing the unpleasant smell a bit; with two drives to go bad. For this particular scenario, syncing the data to a remote machine (say rsync) would have been sufficient. However, there are other failures where this isn't good. Let's say I discover a file I overwrote with garbage last week. Rsync would be quite diligent to copy the corrupt data to the mirror, and would only give me two copies of the corrupt file.

From this point of view, the ideal backup would be to make a complete copy of everything on my computer, at least once a day. Each copy should be to new media, which would allow me to restore any version of any file from the past, at least within a single day. Unfortunately, for most of this, this isn't a very practical method. I'm not really setup to buy a new harddrive everyday for the rest of my life. In addition, my computer is going to be spending a lot of it's time making copies, over and over again, of the same data.

So, is there any way to get the best of both worlds? It turns out there is, several in fact. The most common technique used is known as incremental backups (or sometimes differential backups). More about this in a posted titled Incremental Backups Considered Harmful. Something even better, what jpool uses, is known as content addressable storage. More on this once I get incremental backups out of the way.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Sun, 31 May 2009 17:00:00 -0700 Evolution of Song http://blog.davidb.org/evolution-of-song http://blog.davidb.org/evolution-of-song Ok, this isn't exactly software, or jazz specifically, but my friend Leonard has made this excellent video response to "The Evolution of Dance"

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Sun, 31 May 2009 08:06:00 -0700 Jpool up on github http://blog.davidb.org/jpool-up-on-github http://blog.davidb.org/jpool-up-on-github I figure with wonderful tools like git, there's not really that much reason for me to avoid publishing the code. I'll be pushing the source to github's jpool page as I work on it.

It's not exactly ready for release, but if anyone wants to play with it, feel free. You'll have to have Apache Ant and Ivy installed in order to build it, as well as version 2.7.4 of the Scala compiler installed. Ivy will download the rest of the needed dependencies. Running ant test should run the unit tests.

There's only a couple of commands that it has so far. Everything needs a storage pool reference which is a URI of the form jpool:file://path, note that since the first two slashes are part of the URI, if the path is absolute (and it should be), you'll have three slashes in the path.

Things you are do are:


  • Store a tarball into the pool: tar -cf - ... | jpool save jpool:file:///path key=value key=value
  • Make a snapshot of a directory: jpool dump jpool:file:///path /path/to/dump key=value key=value
  • List the entries: jpool list jpool:file:///path
  • Extract a tarball: jpool restore jpool:file:///path hash --tar | tar -xf - ...
  • Extract a snapshot: jpool restore jpool:file:///path hash /path/to/restore

The tarball's are more intended for archiving than backup. You should not compress the data. Jpool does partially parse the tar headers. The contents will be compressed and the data de-duped when stored in the pool (meaning it will take little space to store similar tarballs).

The snapshots are intended for backup. Again the data is deduped, and jpool will remember file contents so that subsequent backups should be fast. However, these are not incremental backups. Each snapshot is a complete snapshot of the tree, but will share data with previous snapshots.

Associated with each backup are a set of key=value pairs. You can use whatever you want, I typically use host=hostname fs=root, and stuff like that. Using something informative is important, because as far as jpool is concerned, the only meaningful handle is the SHA-1 hash of the backup itself.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown
Sun, 31 May 2009 07:21:00 -0700 Backing up is hard to do http://blog.davidb.org/backing-up-is-hard-to-do http://blog.davidb.org/backing-up-is-hard-to-do I like learning programming languages. More on this in a minute.

Many years ago I wrote some backup software. It ran under Minix, and it wrote to floppies; lots of them. I've rewritten and enhanced, and rewritten again this same thing probably at least a dozen times. I've learned two things from it: First, backups are really hard to do correctly, and second, this is a really good problem for learning a new programming language.

The current version lives in Scala, and it's probably going to stay that way for a while. Scala, by far, is the best combination of language features I've run across.

I hope to use this blog to talk about both programming languages, as well as backup software, and possibly throw a little jazz in here and there.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/540485/d3z-face.jpg http://posterous.com/users/5AB4Zr5HiaSB David Brown davidb-org David Brown