I’m back from Dalian (China), where I’ve presented the Calibre project during a joint one day workshop organised by collegues in France and China, namely mainly by people from ObjectWeb and OrientWare (two consortium developping middleware), during the CISIS IT fair. Here’s a link to the program.
Here is a link to my slides : “Some results on the Calibre project”

I’ve experienced random crashes of the file-system on a Dell server, model PowerEdge 2650, with a Perc 3/Di SCSI controller, runninng a Debian testing system with the standard 2.6.8 Debian kernel (i686+smp), mainly during disk-intensive operations (for instance, I suspect such a crash happened when amanda backup task were launched on the machine).
There have been numerous discussions on the linux-poweredge mailing-list and many proposals for fixing this issue (see details on google).
The symptoms look like this :
Jun 9 20:52:58 myhost kernel: aacraid: Host adapter reset request. SCSI hang ?
Jun 9 20:52:58 myhost kernel: aacraid: Host adapter reset request. SCSI hang ?
Jun 9 20:52:58 myhost kernel: aacraid: SCSI bus appears hung
Jun 9 20:52:58 myhost kernel: aacraid: SCSI bus appears hung
Jun 9 20:52:58 myhost syslogd: /var/log/messages: Read-only file system
Jun 9 20:52:58 myhost kernel: scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0
Jun 9 20:52:58 myhost kernel: SCSI error : <0 0 0 0> return code = 0x6000000
Jun 9 20:52:58 myhost kernel: end_request: I/O error, dev sda, sector 401836233
Jun 9 20:52:58 myhost kernel: scsi0 (0:0): rejecting I/O to offline device
Jun 9 20:52:58 myhost kernel: scsi0 (0:0): rejecting I/O to offline device
I think I have come closer than never to a solution, applying the following steps :
- upgrading the firmware of the Perc 3/Di controller : look at the Dell site for the right version…
- disabling the cache with
afacli :
# afacli
open AFA0
AFA0 container set cache /read_cache_enable=FALSE /write_cache_enable=FALSE 0
AFA0 container show cache 0
Executing: container show cache 0
Global Container Read Cache Size : 0
Global Container Write Cache Size : 118259712
Read Cache Setting : DISABLE
Write Cache Setting : DISABLE
Write Cache Status : Inactive, cache disabled
- patching the 2.6.8
aacraid driver’s code with the following patch : aac-remove-handle-aif.patch), to avoid tacking the controller offline in some circumstances (see explanation in this post : http://marc.theaimsgroup.com/?l=linux-scsi&m=110252243627410&w=2).
- get the
kernel-source-2.6.8 package from stable
- unpack it and apply patch
- get the running (uname -r) kernel’s .config from /boot and copy it to the /usr/src/kernel-source-2.6.7/
- make-kpkg clean
- make oldconfig
- make-kpkg –append_to_version=patchaacremovehandleaif –initrd kernel_image
- install resulting kernel, and reboot
- pray
The machine had worked almost OK since it was in Debian’s 2.6.8 kernel with cache disabled and firmware upgraded, but it finally crashed again…
I hope that the patch against aacraid driver will solve the issue.