HallmarcDotNet

Marc Elliot Hall's Blog

Headlines

Christmas card is up Check out the Flash...

Blog-o-licious We've got blogs...

Site Redesigned HallmarcDotNet has a new look...

 

Welcome to My Weblog

— also known as my vanity gripe page

I'm currently available for your project! Please view my résumé and Open Letter to Recruiters if you are looking for an experienced, senior technical manager, project manager, business analyst, team lead, software engineer, web application developer, webmaster, system administrator, technical writer, or technical editor.


Tue, 10 Jun 2008


Bone-headed Maneuvering

Server Outage

While the wife and kiddies are out of town, I thought I'd catch up on a few things with the website.

Among my aspirations was to update the operating system kernel to a more current version with various patches and improvements.

Debian GNU/Linux makes this process relatively simple with the marvel of their dpkg suite of tools, including such amazing functionality as apt-get update, apt-get upgrade, and apt-get dist-upgrade. Some prefer to use aptitude instead of apt-get; and some (who might prefer a GUI), use synaptic. But all of these commands do the same thing: grab the most reasonably current version of the software already installed on the system from a centrally-maintained repository, and upgrade it in-place on the local computer, for free. Normally, this is absolutely painless.

Linux is renowned for never needing a reboot; and to a large extent, this is true. However, when upgrading the kernel the very core of the system is being replaced. Because Linux keeps running programs in memory even when they've changed on disk, this requires that the system be rebooted -- because the kernel is a running program. Most other programs can be individually stopped and restarted; but the kernel controls all other programs -- if it's stopped everything is stopped. Thus the reboot requirement.

Unfortunately, I moronically chose to upgrade my kernel with one for the wrong CPU architecture, leaving me with a system that would not restart. Of course, initially, I didn't know that, and had to figure it out.

Being the experienced and skilled technical professional that I am, I had a reasonably current backup of the system. But just to be completely sure, I booted the system using Knoppix to verify that the filesystem on the server was still intact.

To my surprise and joy, it was.

My next step was to take another backup of the system so that I could be absolutely sure that I had all of my current data. This took roughly six hours to complete, as the system has 500 GB of disk and all of that data was passing through a USB 1.1 connection to an identical external hard drive. Normally, an incremental backup is sufficient; but I wanted the previous full backup to remain available if my new full backup was deficient in some way.

Six hours is a long time, and I still have a job: so I let it run overnight, and then went to work the next morning. Sleeping and working prevented me from determining the kernel incompatiblity until last night.

The a-Ha! moment came after multiple attempts to fix the bootloader (I use lilo) failed. lilo would run when I updated the configuration to point to my new kernel; but when I rebooted the system it would hang with a cryptic:

lilo
0x01 "Illegal command"
lilo
0x01 "Illegal command"
lilo
0x01 "Illegal command"
...etc...

Google results led me to this page, which claims, and I quote, "This shouldn't happen". Yeesh.

After making various changes to the lilo.conf file, each with similar unwelcome results, I was extremely frustrated. By this time, my Web, email, and media server had been out of commission for more than 24 hours. Had it belonged to anybody else, I would have recommended starting fresh and buying new hardware. However, the Wife Acceptance Factor of that decision would have been strongly negative for me.

Not to be deterred from having a functional system, I then elected to re-install the OS and restore my data from my redundant backup. After the eighth failure on the "Install the Base System" step in the Debian installer:

Jun 10 04:23:25 base-installer: error: exiting on error base-installer/kernel/failed-install
Jun 10 04:23:53 main-menu[1323]: WARNING **: Configuring 'base-installer' failed with error code 1
Jun 10 04:23:53 main-menu[1323]: WARNING **: Menu item 'base-installer' failed.

(and similar variations), I then tried installing a different kernel.

Wow!

By trying to use 2.6.18-4-686 instead of 2.6.18-4-486, I had hosed my system.

The lesson: Intel's Pentium III processor is not compatible with the full 686 architecture Linux kernel.

Next steps: Install all this stuff on the dual Xeon I bought two months ago, and retain the Pentium III as a redundant backup system, instead.

posted at: 10:59 |


Marc Elliot Hall St. Peters, Missouri 

Page created: 21 January 2002
Page modified: 14 November 2006

spacer About Us | Site Map | Privacy Policy | Contact Us | ©1999 - 2006 Marc Elliot Hall