SysAdmin
2008-01-17
Tracking down unstability and recovering FreeBSD systems
gambit our main server is now finally stable after several days of unstability after upgrading memory to 2GB. The nature of the problems, strongly makes me suspect that there is a hardware issue in the storage system (possibly chipset of memory/disk controllers). This server was bought on a tight budget a few years back. It is extremely rare for a FreeBSD point release to show random unstability, when it isn't under a heavy load. Unless you're doing something silly like compiling kernel with experimental features and chflags of -O88 -f14m1337.
At the data centre we found out that gmirror is causing kernel trap 9 upon reboot after the server starts having stability issues and random processes core dump. I'm not sure why yet at this stage. On a full disk mirror setup, gmirror module is loaded in /boot/loader.conf. So you will need to go into fixit mode from CD.
For those not familiar with rescuing FreeBSD systems, the first disk has a fixit live file system which you can access from the sysinstall installation menu. This gives you access to a variety of recovery tools and network access. This will allow you to dig around, mount file systems (including external drives) and backup vital data before you try to recover it.
Disabling gmirror leads to a reboot loop, even with correct fstab. If you want to get back the system to basic install again including GENERIC kernel, choose upgrade and map your mount points. If you haven't already, it will backup your /etc to /var/tmp/etc. This got the machine booted up normally again. A quick recompile of the kernel for firewall options and the server is back up again without gmirror.
So far it looks like it's running fine with no issue such as random crashes of processes.
Puzzling unstability
This has been most puzzling, on why upgrading to 2GB ram (from 1GB) would suddenly cause unstability. Key suspect would of course be ram, but overnight testing with memtest86 revealed no errors. Everything has been setup as before for which it has never crashed except due to the USB drive (which has been removed).
The kernel panics, lead to possible issue with gmirror, but testing outside of the data center, including multiple reboots and resets did not result in any unrecoverable errors or kernel panics. Removing gmirror did solve the problems, but it isn't a scientific explanation.
I did further testing at home on an even heavier loaded development server. This server has 2x80GB and 2x250GB gmirrored drives and 5 md mounted image files for jails. I ran a stress test of the file system, multiple read/writes through port updates, a gnome build, locate updatedb, file search, make buildworld -j4, and normal use (file server for music etc) simultanaeously. No problems. Which is to be expected, as probably thousands or people are using gmirror in production systems.
The only similar thing I've seen is a faulty network card (hardware). At this stage, I'm also thinking the same as the SiS 760MG chipset motherboard which doesn't even support ECC memory.
As long it stays stable, I'm going to hold off on getting a new server for now, with second drive holding full backups. The server has very little load even when serving multiple Zope/Plone sites and virtual servers. The schedule is to move to a proper quad core opteron with 4-8GB of ram from Dell, HP or Sun coinciding with the release of FreeBSD 7.1 later this year.
2007-12-28
Folding Shell Style Comments with Vim
I often experience with people new to Unix or even people who have been using it for a while, that they tend to brush off vi/emacs editors quickly and then use simpler editors like pico, nano, gedit and the like.
They then never see why a lot of old time Unix system administrators and developers use these tools. vi for example was designed to edit files over a very slow connection (300bps). It's still very useful now, as lag still exists. It's very painful for me to see young developers use the mouse and scroll up, scroll down, when without lifting your hands from the keyboard you can jump to different functions, mark locations and jump back.
If you need to comment out some lines over a laggy ssh connection you can simple type :4,12s/^/#/ instead of type # and using cursor keys over the next 8 lines.
Speaking of comments, here's a common scenario. Sometimes configuration files are huge, because they're well commented with examples. You want to edit it, but sometimes you just want to see your modifications and only comments for the section you're looking at and not scroll through everything. You can hide the # comments with vim's folding feature.
So let's define a function to do this in our ~/.vimrc:
function! FoldShellComments()
let &foldexpr = 'getline(v:lnum)[0]=="#"'
g/.*/ if foldlevel(line(".")) > 0 | s/$/ !!!/ | endif
set foldmethod=expr
endfunction
We then want to call this up easily so we define a user command:
command! -nargs=0 FoldShellComments :call FoldShellComments()
With this whenever we want to fold commented lines, we simple type :FoldShellComments This will tab complete by the way, you just need to type :Fo<tab>
That huge squid.conf file now looks like this:

Efficient use of Resources
This afternoon, I finished replacing Apache httpd with Cherokee on Inigo's main server. Now Squid is handling access to all http services first, with Pound handling the management of the backend servers.
There really wasn't much need for Apache httpd, as we don't use most of it's features and without tuning, it takes up a lot of memory just to do rewrites, http proxy for the backends and logging.
Squid uses much less memory, and is very fast. Kagesenshi is working on tighter integration with Cachefu for the Plone sites. Even without that, you will find that sites like http://foss.org.my are now snappy (1.02 seconds total according to firebug). It's improved the speed of http://mirror.inigo-tech.com also which is a slow external USB drive.
The performance bottleneck now is actually memory. While FreeBSD's virtual memory does an awesome job (we're using 1201MB of swap at time of writing), we are now running 4 separate virtual servers, each running it's own self contained services. Not much more we can optimize now. Long idle processes of course, take several seconds to swap back in. So adding another 1GB of memory (total 2GB) will give quite a bit of breathing space and get rid of that lag.
For those that are curious, all this including http://www.apdip.net which used to sit it's own server is running on RM2.5K worth of hardware.
2007-08-07
heartbeat
simple trick to hear status
Got this from the Time Management for System Administrators book
ping hostname | tr : ^G
To get the ^G (or system beep), you will need to press CTRL-V and then CTRL-G.
My server was down due to the usb drive issues, so having this line allowed me to work on other things, while waiting for the data center to restart it. Also useful when doing cabling.
Sound when used appropriately, can be a very user friendly notification tool on the status of your systems.
