Field Update Failures - Linux Embedded systems

Just as change is ever-present as death and taxes, there is another omnipresent, universal constant: screw-ups. The sections in this include a fair amount of code and practice to make sure that when the update occurs, the data written to the device is good in a formal sense; that is, it may not perform to specifications, but it won’t crash the device. Thus, the chances of a failure are greatly reduced; but no matter what happens, there’s the possibility that the upgrade process will fail. What do you do in such a case? You can take several approaches.

Report Failure, Stop This seems like a non-solution, but telling the user that the device is in a compromised mode and requires service is much better than nothing at all. For a very minimal machine, a LED that can be turned on by the boot loader and then off by the kernel during the boot process gives the user some idea that the device isn’t functioning normally. A device with a screen offers much more bandwidth for communicating failure and how the user should get help.

Failsafe Root File System

This is a great use for the initramfs file system. The initial file system has the primary job of ensuring that the last boot or system update did not end in failure. Failure can be detected by the file system not mounting or a file being present on the device indicating that the boot process didn’t complete. The job of the failsafe root file system is to determine what steps are necessary to get the device back in working order, typically by downloading and installing on the board a failsafe root file system that can then run through the process again.

The nice feature about a failsafe root file system is that you have a great degree of control over how the failure should be handled. In some cases, the device can retry the update process; in others, the device can present a message telling the user the nature of the problem and direct them to technical support.

Failsafe Kernel

In this case, the board contains a kernel (with a small root file system) that is never changed. This failsafe kernel is the first thing to boot and is responsible for checking that the system booted properly the last power cycle. This communication is accomplished by writing data to an agreed-upon register or memory location, after the fail-safe kernel has determined that the system has booted improperly.

The GRUB boot loader has fail-over support as a feature. Consider the following GRUB configuration file:

The fallback kernel is Fallback Kernel, as indicated by the fallback 1 statement; that kernel is booted if the first kernel fails to boot or panics. Recall that GRUB numbers boot entries starting at zero, from the top of the file to that bottom.

In order to have a system that’s as survivable as possible, the fallback kernel has its own root file system located at (hd1,0), The root file system used by the fallback kernel should never be updated in the field so that it can’t be corrupted and therefore is unavailable to boot the system. As an extra measure of safety, the fallback kernel should also include a small initramfs so it can perform minimal tasks in case the fallback root file system can’t be mounted.After the primary kernel boots, another command needs to be issued. Thus, the boot loader knows to use the primary kernel for the next boot cycle. This command must be run each time the primary kernel boots successfully:

If you don’t issue this command, GRUB doesn’t know if the boot was successful; and the next time the system restarts, GRUB will try to use the fallback kernel.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd Protection Status

Linux Embedded systems Topics