Firmware glitch affects Trimble NetRS Receivers on 31 October, 2010
Trimble Engineering has traced the bug to an uptime counter that mistakenly pointed to linux system time, causing a crash when this number reaches a critical level. The recurrence interval of the crash is approximately 6.8 years: the next one will occur on 20 August, 2017 at 14:14:15 UTC.
Firmware version 1.3-1 was released in January, 2011 to fix the problem.
At 01:56 UTC on 31 October a bug affected the entire worldwide population of Trimble NetRS GPS receivers with firmware versions of 1.2-0 and higher. Firmware versions 1.1-5 and earlier did not appear to be affected.
All NetRS operators are encouraged to check that their continuously operating receivers have been functioning normally since then, as a small percentage may require restarts to restore full function; in most cases this can be done remotely through the web interface.
Any receivers deployed remotely that to not have communications installed should be visited as soon as possible to ensure that they are logging data.
We observed five different ways in which the receivers responded:
1. Graceful on-the-fly recovery:
Error messages were written to system logs, some processed may have crashed but were restarted. Satellite tracking, data logging and system operation continued uninterrupted. Data flow reports for this station set will show 100% for 10/31 and later. These receivers should be fine with no action required. About 25% of the UNAVCO-operated NetRS population exhibited this behavior.
2. System crash and restart without data loss:
Error messages appear in system logs, processes fail with fatal system errors, triggering a restart. The active data files at the time of the restart were preserved and resumed after the restart, so the only data outage was during the period when the receiver was restarting, between 01:56 and 01:59 UTC. Data reports for these stations will show 99% for 10/31 and 100% for 11/1 and 11/2. These receivers should be fine with no action required. About 40% of the UNAVCO-operated NetRS population exhibited this behavior.
3. System crash and restart with data loss:
Same behavior as (2) above, but the active data files that were being written at the time of the crash were deleted, and new files created after the restart. Daily files from these receivers will begin at 01:59 UTC, and hourly files that began at 01:00 will either be missing or contain a couple of epochs. Hourly files from 00:00 were preserved. Data reports for these stations will show 91% (for daily downloads) or 96% (if they are downloaded hourly) for 10/31, and 100% for 11/1 and 11/2. These receivers should be fine with no action required. About 30% of the NetRS population exhibited this behavior.
4. System stops tracking SV's and logging data but does not restart:
Error messages appear in the system logs, many processes crash, but a restart is NOT triggered. Receiver is left in a state that its web server is accessible, but is not tracking or logging, and data sessions revert to "Pending" instead of active (see the screen shots below for examples). The active data files at the time continue to show as being active in most cases. Data reports for these receivers will show either 8% or zero for 10/31 and zeros for 11/1 and 11/2. THESE STATIONS REQUIRE REBOOTS either via the web interface or by cycling power to resume normal operation. After the user restarts the receiver the data files from 10/31 have been preserved. About 5% of the UNAVCO-operated NetRS population exhibited this behavior.
If you encounter receivers in this state, go to the "Receiver Configuration -> System Reset" and do a simple restart.
5. System becomes unreachable through ethernet port:
The receiver has lost its network functionality and cannot be contacted through functional communications device such as a cellular modem (you can contact the modem but not the receiver). It is likely that a receiver in this state has also stopped tracking and logging data.
These receivers will need to be visited on-site to verify the state of the receiver, and whether restarting the device by disconnecting and reapplying power will restore normal function. We only documented one total failure on 10/31. <<1% of the UNAVCO-operated NetRS population exhibited this behavior.
There were no obvious patterns as to which receivers exhibited the different responses. The effects were seen in all NetRS receivers with firmware 1.2-0 and newer, regardless of configuration, location, internet connectivity or lack thereof. Even receivers that had not been tracking satellites at the time were affected. Receivers with firmware versions 1.1-5 and earlier do NOT appear to have been affected in any way.