|
Embedded Systems and the Year 2000 Problem Draft of 10 September 1999 Copyright 1998, 1999 Abstract: There is
another Year-2000 risk. It is distinct from the more widely reported
risks concerning impending failures of computers and software that
represent dates using two digits for the year. This risk involves Real
Time Clocks and their interactions with associated embedded processors
and logic arrays, dedicated electronic control and monitoring logic
incorporated into larger systems. These are essential to the operation
of a vast portfolio of infrastructures, from medical equipment, to
buildings (phone, security, heating, plumbing and lighting), to
transportation, to financial networks, to just-in-time delivery
systems, and so on. According to a recent study, the firmware
(permanently loaded instructions) that enables these systems to run is
date sensitive and not Year-2000-compliant in less than 1 percent of
the fifty billion microprocessors and microcontrollers used in embedded
systems installed worldwide by the end of the twentieth century. This
small fraction will fail, causing the systems they control to begin
failing around 1 January 2000 and for the first few years of the next
century. These failures are coupled with significant factors mitigating
their diagnosis and repair. These include concerns over legal
liability, the absence of standards and of reliable documentation of
Year-2000-compliance of date sensitive systems produced over the past
few decades. This poses formidable assessment issues. A
pessimistic, illustrative scenario is presented. It may be regarded as
below "worst credible case" and having some suitability for risk
management purposes though not appropriate for making predictions. It
describes disruption of essential infrastructure from electric power,
to food and fuel distribution, to communications, to financial
networks. Insufficient resources and time are available to completely
prevent and test against failures in critical infrastructures. It is
time to shift emphasis from repair to triage and contingency planning
and to make appropriate preparations for risk management against
massive loss of infrastructure. Introduction: Embedded
microprocessors and other time sensitive logic are silicon integrated
circuits, generally with permanently coded instructions (firmware -
where these serve as an operating system they may be called a
microkernel) that are not designed to be easily changed. These monitor,
regulate or control the operation of devices, systems, networks or
plants. These are generally in the form of silicon microelectronic
chips, such as microprocessors, microcontrollers, timers, sequencers
and controllers built-in to machinery from small devices such as wrist
watches and consumer electronics, to dedicated processors controlling
large industrial plants. The term "embedded" refers to the instructions
that are permanently loaded in one of the (ROM) chips comprising the
system. The IEE give a broader definition that includes dedicated,
code-driven, systems (IEE, 1997) "Embedded" can also denote that the
microprocessor and other hardware are installed within the device at
hand at a depth that they may not be obvious to the user (and possibly
experts) without disassembly. Typically, an
embedded system will be comprised of a microprocessor, Read Only Memory
(ROM), input/output circuitry (for monitoring and control, e.g. Digital
to Analog Converters), Random Access Memory (RAM), communications
circuitry (e.g. a link with a central computer) a system clock and
possibly a Real Time Clock (RTC). Several of these elements may be
integrated onto a single chip (or multi-chip-module) which may be
called a microcontroller. A typical embedded system contains
approximately ten individual chips. This number varies greatly
depending on the age of the design, the technologies used, the desired
functionality and finally with cost. Generally, chip counts tend to
decrease with design date for the same level of functionality. A
treatment of the basic technical elements of digital electronics may be
found in (Horowitz, 1989). See note [1]. A treatment
of the distinguishing characteristics between Year-2000 failures [2] in
Information Technology (IT - computers and software) and embedded
systems (ES - dedicated processors, logic and firmware) may be found in
(Smith, 1998). GartnerGroup [3] estimates that there will be fifty
billion microprocessors and microcontrollers used in embedded systems
worldwide and that under 1 percent of these devices will have Year-2000
(Y2K) related failures leading to shutdowns, erroneous results or
chaotic behavior. Of this, a fraction are in mission critical systems,
leaving on the order of 25 million microprocessors and microcontrollers
(deployed in systems containing these and other chips) which must be
repaired world wide. This, in turn, causes the devices in which they
are incorporated to fail or behave unpredictably. The implications for
society are widespread. A pessimistic scenario (Schwartz, 1996),
(Williamson, 1997) and note [4] will be presented for risk management
purposes; thus proactive and reactive responses will be described in a
section on recommended actions rather than as part of the scenario
itself. This scenario is not intended as a prediction. Discussion: The problem
exists in a surprisingly large number of systems, particularly with
systems with no design requirement to keep a date. Why is this? The
reasons are largely economic. It is very expensive to engineer a custom
integrated circuit. These non-recoverable engineering costs can exceed
$100,000 including the salary of an ASIC (Application Specific
Integrated Circuit) designer and several engineering runs at a
commercial chip foundry. When manufacturers of embedded systems need to
incorporate any form of
real timing capability (seconds, minutes, etc. - as opposed to system
clock cycles) into a system, they face a "build or buy" decision. In
the case of time sensitive designs, they will generally purchase an
off-the-shelf, general purpose, timing chip, (or the rights to its
design). This costs about one dollar. The same
economics drive the manufacturers of these timing chips to develop a
"one chip fits all" solution for their customers, the Original
Equipment Manufacturers, or OEMs. In this case the OEMs utilize a
general-purpose real time chip that is more versatile than they
require. After accessing only those capabilities that they actually
need, they employ these in their products. There is no gain in
reinventing this very common wheel. This results in capability
significantly beyond the OEMs' design requirements being embedded,
including date keeping, as indicated below. What are the
capabilities of a general-purpose timer chip? One of these is capacity
to keep "absolute time" that is, time with respect to a specific time
and date in the past. This is due to the large number of applications
that are manifestly date-dependent. This includes devices from FAX
machines that automatically keep track of the difference between
standard and Daylight Savings Time, to load management systems at
electric utilities, (In manifestly date dependent systems, the date may
be set, as well as read, by an agent external to the chip). Devices
that do not require dates (or absolute time) are conveniently built
using chips that keep absolute time. This is because relative time (for
example the time since an automobile ignition switch was engaged) may
be synthesized from differences in absolute times. This arrangement
works because the dates that are subtracted from one another are both
"wrong" by the same amount of time and this amount drops out when the
subtraction is performed. There is no concern whether the date is
properly set in this arrangement. Thus, both absolute time and relative
time applications can be served with the same absolute time capable
chip, making production of purely relative time capable chips somewhat
redundant. Please see note [5]. Notes in [6] describe two
manufacturers' Real Time Clock (RTC) hardware, firmware and date and
time field addressing and in both Year-2000-compliant and non-compliant
variations. Thus, even
when only relative time is required by the OEMs, this may often be
derived from chips that keep absolute time internally. Those chips that
represent absolute time using two digit dates are subject to Year-2000
failures just as with computers and software as has been more widely
reported. The logic "It does not need to keep dates, therefore it does not keep
dates." has no basis in the internal operation of the chip. This has
resulted in a number of systems being declared Year-2000-compliant when
in fact their firmware has not been tested. The question is not "Does
it need a date?" the question is "Does it use time in any way?" Please
see note [7]. Examples of systems containing unassessed code include
remote control load management switches installed at consumer sites by
electric utilities, automobile power train transmission control modules
and major household appliances. Please see note [8]. In the case
where no external agent sets the date, the system defaults to its
"epoch" [9] date. This could be the design date, the date of
manufacture, or some other, arbitrary, date. Non-compliant systems are
subject to failure when the internal date reaches 1/1/2000, 1/28/2000
or other critical dates, which in general will not be in step with
actual time. There is no means, or need, to input actual time at the
turn-on point. In general, such a system will reach 1/1/2000 internally
after 1/1/2000
actually occurs. This is due to natural delays introduced by the
production life cycle, shelf life and possibly the duty cycle (the
fraction of the time the system is "powered up"). For
non-Year-2000-compliant architectures, these delays increase the
likelihood that most of their failures will occur after 1 January 2000. One manufacturer has
released documentation to its customers that some of their systems will
not fail until 2006. [10] Internally
date sensitive systems observed to be functioning normally after
1/1/2000 are not guaranteed to be compliant. The inability to access
and therefore to roll forward the dates of these internally date
sensitive chips is an impediment to testing for compliance. Indeed,
many chip manufacturers have not documented whether their systems are
"Year-2000-compliant." This is because, in part, the
Year-2000-compliance has only recently entered the "specification" and
in part due to the potential legal exposure this represents to the
manufacturer (Guida, 1998). Depending on the OEM's purchasing
practices, integrated circuits may be obtained from the spot, gray or
black markets with absent or unreliable documentation and without
traceability. One independent attempt to circumvent these barriers
between chipmakers and OEMs by establishing an anonymous clearinghouse
for the compliance status of embedded systems, "Project Damocles,"
closed due the threat of lawsuits. Please see reference (Melymuka,
1998). Manufacturers must devise a strategy to assess their own chips
and to release this information to customers with the smallest legal
exposure. The Federal Year 2000 Information and Readiness Disclosure
Act [11] is intended to remove antitrust impediments to information
sharing, as well as to reduce the threat of opportunistic litigation. Assessment: This lack of
documentation makes it difficult or impossible for the chip
manufacturers' customers, the OEMs, to evaluate the
Year-2000-compliance of their products that depend upon these chips.
This places a significant assessment, remediation and testing burden on
organizations with a large investment in embedded systems within their
mission critical infrastructure. The electric power utilities are among
the organizations with the greatest exposure. Utilities that have
completed a thorough assessment program have generally elected to test
all embedded systems, including those with existing documentation, due
to significant variations between observed performance and
documentation. There are as many as ten levels affecting the overall
Year-2000-compliance of an individual embedded system in a particular
application that may be affected by the chip maker, the OEM, the
end-user or various combinations (Strem, 1997). Beginning at the "black
box" or "device" level it is appropriate to examine the individual
embedded system from as many as ten technological viewpoints. These are: • Chips and
microcode (with either manifest or internal date functions) In short, the
manufacture and configuration of the embedded system and its
application contain factors affecting overall Year-2000-compliance. See
note [12] for a complementary perspective. While Real Time Clocks are
generally the "source" of the Y2k issue in embedded systems, these
other factors can render a device completely compliant for its
application or they can exacerbate the Y2k issue. Estimates
place the assessment; repair and testing phases of a Year-2000
conversion effort for a medium sized non-nuclear plant at 21 months. It
will cost approximately fifty-percent [13] of the budget for the Y2K
program for the utility's Information Technology assets. Remediation
and testing of the embedded systems in capital equipment costs on the
order of $50,000 for a functional unit such as a turbine [14].
Technicians must identify, repair and test the five hundred or more
non-compliant embedded systems out of the tens of thousands in a
typical plant. Please see reference [15]. However, embedded systems are
not confined to large intricate industrial applications; individual
systems will require much less time for assessment, remediation and the
(generally time consuming and expensive) testing and the impact of
their failures may be more localized. Examples of embedded system that
are essential to the operation of many devices that function in the
background of everyday life, include the following: • Aerospace,
Aviation, Avionics [16] A more
complete list may be found in reference [32]. (A partial listing of
vendor supplied Year-2000 compliance status has been compiled at [33].
A number of embedded systems solution providers may be found in
reference [34].) Each of these areas poses unique assessment,
remediation and testing challenges. Experience
has shown that one cannot assume that identically marked chips exhibit
identical Year-2000-compliance. The underlying cause of this fact is,
again, that for decades "Year-2000-compliance" was never part of the
"spec" (detailed list of specifications) for time sensitive chips.
Thus, Year-2000 compliance exists outside of any quality and control
protocols of both the chipmaker and the original equipment
manufacturer. Please see reference [35] for more details. This adds
enormously to the assessment and testing phases. For example, suppose a
facility owns 22 copies of a system using embedded chips. This could be
a volumetric valve, where 20 copies are installed and operating, with
two spares in the warehouse, all of which were purchased from the same
OEM, on the same purchase order, which have consecutive serial numbers,
that were all built on the same day in the same factory. It is not sufficient to test one of the spares in the
warehouse. Each of the twenty-two valves, including the twenty that are
in active service, must be tested, repaired if necessary, and tested
again for compatibility with other repairs. In many cases this means
that the facility must interrupt normal operations or shut down for
some period while on the order of forty tests are performed on each
embedded system [36]. Shell estimates that a typical offshore oil rig
uses approximately ten thousand embedded chips [37] of which
approximately 12 % are not Year-2000-compliant. Some fraction of these
chips are installed below the surface or in other regions of limited
accessibility. These issues
demand new levels of testing of complete systems, throughout the
Year-2000-conversion process. This is known as "regression testing." Extrapolation: The magnitude
of the risk exposure increases when one considers the interactions
among systems. As a simple example, consider a single OEM that
manufactures a single product using several manifestly date sensitive
chips made by a variety of companies. OEMs that have made the effort to
identify and replace all
of the affected chips with their Year-2000-compliant counterparts have
encountered system failures in spite of the fact that each element of
the system has been replaced with a pin-compatible (a direct, plug- or
solder-in, replacement), Year-2000-compliant chip. This is due to the
fact that each chipmaker solved their Year-2000 problem in their own
individual way, yet the system fails because the fixes are not
compatible with one another. There are no worldwide Year-2000-compliant
standards in force (e.g.
ISO8601 [38]) for the representation of a date. By similar
logic, failures in one system can induce failures in other systems over
networks. Embedded systems can introduce corrupt data into computers
and networks and Year-2000 failures in computer systems (a topic
outside the scope of this paper). Likewise, software non-compliance can
cause both Year-2000-compliant and non-compliant embedded systems to
fail. Examples include embedded sensors that time stamp their
measurements. External Factors: Looking
beyond the level of systems, to entire organizations, the magnitude of
the risk further increases. For example, consider the organization that
has completely addressed their Year-2000 risks. They have paid for
hundreds if not thousands of person hours of programmers, using the
most expert tools and systems to have their software and computer
hardware ready for the rollover on 1/1/2000. Suppose that this
remediation allows their mission critical systems to retain the ability
to function. Such an
organization remains at risk. It is at risk because the electric
utility may go off-line. It is at risk because the building lighting
control system may not work. It is at risk for a failure of the water
supply. (The valves, which measure the rate (volume per unit time) of
flow, conclude that an infinite amount of fluid has passed and could
shut down.) The elevators could return to the basement and shut down -
they might now "think" that it's been one hundred years since their
last regular maintenance. The heating system could shut down (and there
is no manual override for the building supervisor, with flashlight and
walkie-talkie in hand, to flip the heat on or off when someone upstairs
complains that it's too hot or too cold). The private branch exchange
(PBX) could fail, leaving the building without telecommunication. Some
fraction of the employees may not get to work because their cars (each
of which contain about fifty embedded systems on average) will not work
or because they cannot buy gasoline; the pumps won't work and their
credit cards are not recognized. Another fraction remain at home
because they are resolving issues there, for example preparing food
with an electronically controlled stove (that has no manual override),
keeping their homes warm, or attending to children because the schools
and day care facilities are closed. Year-2000 compliance and
non-compliance listings for building related embedded systems are
available in reference [39]. Timing: When will
these failures occur? For embedded systems that are explicitly date
dependent the minority of systems that are non-compliant will
experience a peak of failures at the rollover point, midnight on 1
January 2000. For example many sensors used by electric utilities "date
stamp" every "event" (allowing synchronization, exact frequency control
and off-line analysis of process control data). A spectrum of other
"critical" or "spike" dates are provided in (Jones, 1998) and in note
[40]. Non-compliant systems that are not explicitly date-dependent will
fail at other times, perhaps years into the new century. As examples
of systems that may keep absolute time (or dates) internally, and have
no way to "know" the actual date from an external source, consider the
power train control system in an automobile. Consider the controller in
a stove or microwave or load management switches (that allow the
electric company to reduce peak demands by temporarily shutting down
some of their customers' water heaters or air conditioners). These systems
"wake up" (power on) with some predetermined, epoch date. Depending on
the application, the explicit form of the non-compliance and the
difference between the default time and actual time, these systems will
fail at some other time determined by the interval between the epoch
date and 1/1/2000. Impact: Will these
failures be soft or hard? The closeness in time in which these failures
will occur mentioned above is a critical and undocumented element [41].
Should these begin in 1998 and end in 2006 (to pick the earliest and
latest dates known personally to the author) with a gradual onset and
without sharp peaks in the number of systems that fail in a given day,
this is much easier to handle than otherwise. Perhaps the only
certainty that can be taken from this is to note that systems observed
to be functioning normally after 1 January 2000 are not guaranteed to be Year-2000-compliant. They
may fail years in the future, depending on when their internal clocks
were set (their epoch dates). In addition
to the timing of the failures, there is the question of their severity.
Consider an internally non-compliant automotive system with an epoch
date of 1/1/1980. If the system is powered up in step with the
ignition, it will never reach its failure point on 1/1/2000. That would
require that the car be left running for twenty years! At the other
extreme, consider the systems that monitor and control the frequency
and absolute phase of the alternating current supplied by a small,
rural, electric cooperative that has not completed its remediation
project. In tests, the failure modes of some controllers cause the
plant to go off-frequency. In this case, the plant's current is not in
phase with that of the electric grid to which it is connected. This has
the potential to take down the entire grid. (For this reason, larger
electric utilities may be forced to fund remediation efforts in smaller
providers [42].) For a list of documented failures during date rollover
tests, including frequency instability, see reference [43]. Status
reports on the electric grids from the North American Electric
Reliability Council to the US Department of Energy are available at
[44]. The electric
utilities face a risk condition known as "system black". System black
is a state where every generating plant that belongs to a grid is shut
down, including its designated "hot spare". This spare is a detached,
operating plant held in reserve to supply power to other electric
plants as they recover from a blackout. This recovery is normally
accomplished following a bootstrap procedure using power supplied by
the spare plant. However, this spare plant faces the same Year-2000
risks as any other, hence the risk of the system black condition. One
immediate consequence of this failure is that time required for the
bootstrap procedures will be lengthened. This is due to the requirement
to use alternative power sources (e.g. generators to run turbine
startup motors) to bring up a plant that will serve the role of the
spare in the bootstrap procedures of the other plants. Due to the
months required to remediate the embedded systems in an electric
generation plant, a consequence of this fault is that utilities may be
forced to complete their Year-2000 remediation after 1/1/2000, under
degraded conditions. This is a strong incentive for electric utilities
to complete Year-2000 assessment, remediation and testing for several
hot spares within each grid system (even at the expense of the overall
conversion effort) so that post 1/1/2000 Year-2000 remediation efforts
are not completed under system black conditions. Please see reference
[45]. While it is
preferable to repair and certify as many embedded systems as possible,
it is not necessary to restore all of them before the generating plants
and grids can operate reliably. This is because while the number of
root causes for failure may be large, the operational outcome of these
failures is relatively small, and a given component will work or fail.
Utility operators have a number of options for partially or completely
restoring capacity. Please see reference [46]. For details on
restarting an electric grid after a blackout and generating stations
that are already "black
start capable" please see reference [47]. If one
returns to the larger picture of the relationships between
organizations, such as systems that allow for the just-in-time delivery
of food and other services one finds that these relationships are
threatened. There will be failures. What is the impact on the
effectiveness of a Year-2000-compliant organization in a largely
non-compliant world? Industry: The
management of embedded systems remediation has pitted some of the
brightest and most capable minds against relatively small parts of an
enormous problem. After large investments, major semiconductor
manufacturers [48] have concluded that it is cheaper to let some of
their manufacturing facilities fail, rather than to complete the
analysis, much less the repair, much less the testing (the longest and
most expensive component of the problem) required to address the
problem. These are among the most technologically intensive on the
planet, where the value output for each "fab" is measured in millions
of dollars per hour. It may be more efficient to fold in
Year-2000-compliance with the large scale upgrading that accompanies
each new set of design rules (for example going from 0.35 to 0.25
micron technology) than to mount a separate Y2K effort. Some companies
find that the Y2K conversion process is best managed within their
overall quality programs, just as environmental safety and health
concerns are now seen as components of the quality equation. The
electric utilities do not have the option of shutting down all of their
plants, as they must operate continuously. Similarly, hydro, food
distribution and transportation systems also do not have this option. Human Factors: The enormity
and complexity of the problem is personally taxing to those on the
front line. When one ponders the implications, for some a sense of
resignation is present, that of impending loss. Perhaps a description
using Kübler-Ross' (Kübler-Ross, 1969) five stages of grief
is more appropriate: denial, anger, bargaining, depression and
acceptance. This description may be applied to organizations as well as
individuals. The Wyoming Legislature voted in March 1998 to spend no
dollars to assess their Year-2000 issue. Which stage best describes
this action? While the basis of the risks may be technical, clearly
addressing the human element will be as important, if not more
important, than technical actions. The extraordinary capabilities of
human beings in crisis or emergency situations have the potential to
dramatically lessen their severity, yet these actions cannot be
guaranteed or accurately modeled. To harness these effectively calls
for proactive leadership, before the technological failures, before the
social response. Action: It is time to
prepare, fund and implement contingency plans and to institute triage.
Fix systems that are both critical and repairable, ignore what is not
essential, retire the rest. [49] Encourage organizations to relinquish
the view that they must either solve all the problems or go out of
existence. Instead, undertake preparations for foreseeable and
unforeseen failures. Pay attention to business processes, not merely
individual IT and embedded systems. This includes external partnerships
and dependencies with external organizations, infrastructure and
communities. Develop excess capacity to function in the absence of key
systems and infrastructures. Not only a safety margin, excess
capability can be effectively applied outside of the organization. It
may be granted not only on a "good neighbor" (or "good vendor" or "good
supplier") basis but as a new business opportunity. Embrace the human
elements, including grief, where present. The more stages through which
each individual and organization can pass, the more acceptance that is
generated, before the
failures begin, the greater the availability of critical resources to
face the risks beginning in 1999 and extending into the next decade.
According to one analyst (Guida, 1998) as the awareness of the
Year-2000 risks increases, resources will be reallocated from other
high technology projects to Year-2000 remediation; 30% in 1998, 60-80%
in 1999 and 100% in 2000 until mission critical systems are
operational. Contingency
preparations (GAO, 1998a) not simply a massively parallel "fix on
failure", effort can be made in time: ·
Inform
the general public and people at all levels of the organization.
Emphasize that this is a shared, global issue and that organizations
that might otherwise provide assistance may operate at reduced capacity
until their own Year-2000 issues are mediated. ·
Assess
your organization's readiness. See (GAO, 1997a). ·
Have
paper backups of mission critical information. ·
Organize
maintenance of compliant systems with 100% compliant replacement
components, hardware, firmware and software. ·
Maintain
spreadsheets and other printed documentation of execution of best
practices for internal purposes as well as legal evidence of "due
diligence" in any lawsuit. See: ·
Retain
obsolete, yet Year-2000-compliant, systems. Keep these for back-up
purposes; for example all versions of the Apple Macintosh computer
hardware and Mac OS operating system are Year-2000-compliant. http://www.apple.com/macos/info/2000.html
·
Install
manual overrides for critical embedded systems. ·
Have
"deprivation weeks": A week without heat. A week without elevators. A
week without computers. Allow stimuli to creatively invent new ways of
functioning. Use these as a call to accept the new reality before we
lose many of the resources that, wisely used now, can ease the
transition to degraded infrastructures in the new century. ·
Reinstitute
selected decades old practices. Increased manual labor. Paper and
pencil spreadsheet accounting. Larger crew complements. Paper file
systems. Paper Rolodexes. How was the business of business, government
and household conducted before the microprocessor era? This is clearly
inappropriate for large corporate and governmental functions, however
it may allow smaller functional units to bridge some of the
interruptions that may occur. This will allow only a small fraction of
the present, machine based, processing capability however this is
significantly different from zero and may be appropriate for highly
critical services. Please see reference (GAO, 1997a). ·
Ask
every organization you depend on about their Year-2000 contingency
plans as well as their risk management and remediation efforts. ·
Ask
yourself what would you do if that organization were to disappear for
one week, for one month, for one year? Choose an appropriate duration
and make preparations. ·
Emulate
successful organizations. One example of an organization that has
invented its own largely "in house" approach is Cargill (http://www.cargill.com/) See the
Year2000.com Announcement list mailing (http://mysite.verizon.net/frautsch/cargill.txt).
·
Read
about embedded systems: ·
Example
of one vendor's systematic release of embedded Year-2000-compliance
data: http://www.ragts.com/webstuff/y2k.nsf/Pages/Brands-Allen-Bradley?OpenDocument
·
General
(Information Technology as well as embedded systems) Year-2000
books: ·
(Yardeni,
1999) ·
http://www.amazon.com/exec/obidos/subst/categories/computer-programming/year-2000-article
·
http://www.yourdon.com/books/coolbooks/coolbooks.html
·
Broad
social implications, Douglass Carmichael's discussion of four
scenarios: http://www.tmn.com/y2k/y2kwho.htm
·
(Petersen,
1998) ·
(GAO
1998b) ·
Chronological
list of general Y2K articles: ·
Author's
Year-2000 bookmarks: Summary: ·A pessimistic, illustrative scenario has been
presented using a series of anecdotes. It is intended to illustrate the
extent and possible consequences of the risks stemming from embedded
systems failures and a general call to manage them. It is not
intended as a prediction. In
fact, the author has considerable optimism for events extending through
the new century. Perhaps the greatest reason for this is the capacity
for individuals and groups to respond proactively to challenges. (This
capacity is difficult to measure and model or predict and has been
excluded from this article.) The risks are of disastrous proportions;
however, we know when they will begin. One of the positive factors is
that the failures will not all occur at once. The experience of the
early failures may be applied to preparations for subsequent ones. Acknowledgements: Trey Cundall
read the initial draft. Teresa Bennett has provided critical feedback
and support, Professors Barry J. Blumenfeld, Bruce A. Barnett, Aihud
Pevsner and Chih-Yung Chien of the Department of Physics and Astronomy
at Johns Hopkins University provided readings and commentary on the
first drafts. Charles Danforth also provided useful commentary.
Christopher Pankratz of Hopkins gave editorial and formatting input.
Professor Andreas Andreou of the Electrical and Computer Engineering
Department at Hopkins also provided useful commentary on several drafts
of the article. Juliana Whitmore of Fermi National Accelerator
Laboratory and Peter Wilson of the Department of Physics at The
University of Chicago read early versions of this paper. Timothy Thomas
of the Department of Physics and Astronomy at The University of New
Mexico and Richard Breedon of the Department of Physics at The
University of California, Davis provided useful commentary. Douglass
Carmichael first alerted me to the global and systemic nature of the
Year-2000 problem and had conversations that allowed for possibilities
beyond Armageddon or apocalypse-not. He informed me of the Year-2000
issues in embedded systems; this was the source of my desire to
understand in detail, which I structured in the present document. He
suggested making the document available on the web. With this
transition, a larger community contributed, including Harlan Smith,
Roleigh Martin, Dick Mills, Critt Jarvis, Steve Davis, Paula Gordon,
Susumu Adachi, Pete Holzmann, Richard Collins and many others. I
appreciate the feedback that many have provided, including the sites
that now link to this article, I wish in particular to thank my
critics, for I continue to learn the most from them. I am responsible
for all errors and omissions that remain in this article. About the Author: A
biographical sketch, resumes and associated information may be found at
http://mysite.verizon.net/frautsch/. Notice: The following
notes and references and those elsewhere in this article, including
world wide web links, are provided for informational purposes only, and
should not be construed as endorsements of services, offerings or
viewpoints that these may present. Likewise, references to this article
by other entities should not be assumed to have been made with the
author's knowledge or to imply similar endorsement by or of these
entities. Notes: ·
For a
directory of microcontroller manufacturers please see http://www.microcontroller.com/main/microcontrollers.htm. ·
This
is the so-called "two digit year" or "Y2K" problem. In one form, the
year "2000" is represented as "00" and may be confused with "1900".
When two dates are subtracted (e.g. to determine client ages or loan
maturaties); for example 1957 is subtracted from 2001 the result is
dependent on whether 4 digits instead of two are used to represent the
dates. Four digit arithmetic results in 2001 - 1957 = 44; however, two
digit arithmetic, yields 01 - 57 = -56. This can cause negative times,
zero times, and other forms of corrupted data. In the above example,
the addition of the negative sign may cause the number of bytes
necessary to represent the answer to increase. Thus, a field of three
instead of a field of two ([-56] instead of [44]) that may result in
truncation ([-5] or [56]). This may cause corruption of data in
neighboring fields. This in turn may lead to further corruption in
systems that exchange data with non-compliant systems. Several failures
and instances of non-compliance are tabulated at the following sites: ·
http://www.gartner.com In a
December 1997 report, GartnerGroup gave a world wide failure rate of
between 1 and 3 percent. This was substantially updated in October 1998
where the figure was reduced to 0.001 percent for "freestanding
microcontroller chips": ·
For
additional resources on scenario planning, please see: ·
There
is a significant exception for systems that do not represent time in
minutes and seconds. Instead, they keep time using the fundamental unit
of system clock cycles. The desired amount of relative time is
calculated in terms of clock cycles. Such a system is a relative time
device and will not be the source of Year-2000 failures. In general
this application is limited to timing functions involving only fixed
time delays. An example of a compliant device that avoids the use of a
timing chip is the Scientific Atlanta Digital Control Unit DCU M1180.
It keeps time by counting clock cycles of its Intel 40C49
microprocessor: http://www.sciatl.com/productinformation/utility/dcu%2Dm1180.html.
If this clock frequency is 32.768 kHz (the digital watch crystal
frequency) then a frequency counter divided by 215 (2 raised
to the power 15) will count seconds and may be used as the basis for a
timing chip. This frequency corresponds to the smallest power of 2
beyond the range of human hearing, which may account for its popularity. ·
For a
concise review of compliant and non-compliant representation of dates
in real time clocks and a listing of one manufacturer's product status,
see: http://dbserv.maxim-ic.com/appnotes.cfm?appnote_number=562,
specifically, the Dallas Semiconductor DS1287 ·
In
March 1999 the author conducted an informal survey of several
manufacturers of Real Time Clocks. The intent of the survey was to
establish whether Real Time Clocks were commonly constructed without date keeping capacity. Four manufacturers’
web pages were surveyed. A representative confirmed the Dallas
Semiconductor findings. Of the devices surveyed, ignoring
sub-variations of the basic designs, the following results were found
in terms of (date-keeping : non-date-keeping) designs: ·
Private
communications with the author's electric utility, and the
manufacturers of his personal automobiles and domestic appliances in
March 1998. See also: ·
http://www.its.bldrdoc.gov/fs-1037/dir-014/_2044.htm ·
Confidential
conversation with a vendor sales manager, 25 February 1998. There are
some cases of failure post 1/1/2000 that are induced by register
overflows rather than by y2k logic errors. These are mentioned
generically in http://www.fema.gov/rams/cshib/csb3.ram. ·
The
Year-2000 Information Readiness and Information Disclosure Act: ·
In a
contrasting analysis, TAVA Technologies sees ten classes of equipment
affecting plant operations: ·
http://www.tavabeck.com/challenr.htm#Solving ·
Private
conversation with the Year-2000 project manager of an electric utility
in the Midwestern United States, 22 January 1999. ·
http://www.y2ktimebomb.com/Computech/Issues/mrtn9809ii.htm ·
For a
treatment of the testing for y2k issues in Boeing and
McDonnell-Douglass aircraft, see: ·
U.S.
Chemical Safety and Hazard Investigation Board: http://www.cshib.gov/y2k/. ·
http://www.zdnet.com/pcweek/y2k/0798/06case.html ·
U.S.
Federal Energy Regulatory Commission: http://www.ferc.fed.us/fercy2k/y2k.htm ·
The
North American Electric Reliability Council: http://www.nerc.com/y2k/ ·
http://tycho.usno.navy.mil/gps_week.html
and ·
British
Institute of Grocery Distributors http://www.igd.org.uk/it2000.html ·
The
Building Owner and Manager’s Association: http://www.boma.org/year2000/ ·
A
pacemaker manufacturer has certified that all of its pacemakers are
Year-2000-compliant: ·
http://www.fda.gov/cdrh/yr2000/year2000.html ·
http://www.utexas.edu/y2k/equip.html ·
http://www.y2ktimebomb.com/Special/Opinion/Readers/asumu9824.htm ·
http://www.ship2000.com/ and ·
http://www.amwa-water.org/y2k/ ·
http://www.basicint.org/y2krept.htm ·
http://www.compinfo.co.uk/y2k/examples.htm#embedded ·
EDS’
Y2k compliance web site: http://www.vendor2000.com/ ·
http://ourworld.compuserve.com/homepages/roleigh_martin/y2k_com.htm ·
http://www.computerweekly.co.uk/news/8_5_97/08598503239/H1.html ·
For
example, see the General Motors System Checklists and supporting
spreadsheets: ·
http://www.shell.co.uk/news/speech/spe_beatbug.htm ·
The
International Organization for Standardization: http://www.iso.ch/. ·
The
Year-2000-compliance of building related embedded systems is being
tabulated at: ·
For
"critical" or "spike" date references and lists see (Jones, 1998) and: ·
Special
Value - flag date, other variations with all 9’s ·
This
documentation, where it exists, could be viewed as evidence of criminal
negligence in a lawsuit; therefore there are strong incentives to hold
this information as confidential. However as the amount of Year-2000
litigation increases, organizations may find that it is in their best
interests to develop and disperse this information as an expression of
what is legally termed "due diligence" against an allegation of
negligence. According to one estimate, legal costs and damages stemming
from Year-2000 failures are expected to be ten to twenty times
remediation costs for the same failure. See (Guida, 1998). ·
For
statements from electric utilities concerning the risks that Year-2000
failures in one power plant can disrupt other plants sharing an
interconnected electric grid, see communication from the Idaho Power
Company: ·
http://www.euy2k.com/reallife.htm
and references therein. ·
http://www.garynorth.com/y2k/detail_.cfm/1721 ·
Dick
Mills gives an illustrative example in http://www.y2ktimebomb.com/PP/RC/rc9828.htm ·
http://www.y2ktimebomb.com/PP/RC/dm9832.htm ·
http://www.techweb.com/investor/story/INV19980717S0005
·
http://www.year2000.ca.gov/Correspondence/Embedded.pdf References: Davis, J. (1997) Nuclear Utilities Year
2000 Readiness, NEI/NUSMG
97-07 Davis, J. (1998) Nuclear Utilities Year
2000 Readiness Contingency Planning, NEI/NUSMG 98-07 General
Accounting Office (1997a) "The Year-2000 Computing Crisis, an
Assessment Guide": GAO/AIMD-10.1.14 http://www.gao.gov/special.pubs/y2kguide.pdf General
Accounting Office (1998a) Exposure Draft: "Business Continuity and
Contingency Planning":
GAO/AIMD-10.1.19 http://www.gao.gov/special.pubs/bcpguide.pdf General
Accounting Office (1998b): "Year 2000 Computing Crisis: Potential
for Widespread Disruption Calls for Strong Leadership and Partnerships": GAO/AIMD-98-85 Guida, A.,
(1998) 17 March 1998 CNN/fn interview with Larry McArthur, CEO Ascent
Logic Corp.: http://www.alc.com/Y2kVideo.html Horowitz, P.
and Hill, W. (1989) The Art of Electronics, 2nd Edition, Cambridge
University Press IEE, The
(U.K.) Institute of Electrical Engineers: http://www.iee.org.uk/2000risk.
1997 et seq. Jones, C.
(1998), Dangerous Dates for Software Applications, Kübler-Ross,
E. (1969) On Death and Dying, Simon & Schuster, New York. Melymuka, K.
(1998), "Year 2000 Whistleblower Derailed" Computerworld 18 May 1998 http://www2.computerworld.com/home/print.nsf/all/9805184CBE Petersen,
J.L. (1998) "The Year 2000: Social Chaos or Social Transformation" Schwartz, P.
(1996) The Art of the Long View: Planning for the Future in an
Uncertain World, New
York: Doubleday Smith, H
(1998) http://www.y2knews.com/harlansmith.htm, Strem, R.
(1997), A Suggested Process To Assist In Identifying Embedded
Devices And Systems With A Year 2000 Compliance Problem, (white paper), TransAlta Utilities,
Calgary, Alberta, Canada. http://www.esofta.com/pdfs/Y2KEmb.pdf. Williamson,
L. (1997) "How to Build Scenarios": Wired Magazine Yardeni, E.
(1999) "Year 2000 Recession?" Deutsch Bank Securities, New York The author
frequently corrects and updates this article and includes the following
notice to reduce the circulation of out-of-date drafts. Copyright Notice: This article
(including, but not limited to text, content, photographs, video and
audio) is protected by copyright under US copyright and other laws. You may not
copy, reproduce, distribute, publish, display, perform, modify, create
derivative works, transmit, or in any way exploit any part of this
article, except that you may download this article for your own
personal, noncommercial use as follows: You may
create web links to this article's URL. You may make
excerpts from this article provided that you give the date of the draft
cited and a link to the URL of the current version. You may
distribute complete unedited and unmodified printed and or machine
readable copies of this article, provided that the word "draft" and the
date of the draft appears with a link to the URL of the current
version. Without
limiting the generality of the foregoing, you may not distribute any
part of this article over any network, including a local area network,
nor sell nor offer it for sale. In addition, this file may not be used
to construct any kind of database. |
|
Year 2000 Readiness Disclaimer: Note: The
contents of this document and these web pages are the sole opinion of
the author and should not be taken as recommendations or advice. Please
consult with technical and legal experts before using this material in
your business, school or home. While the author makes a reasonable
effort to maintain the currency and accuracy of this site he is not in
a position to guarantee this. The information is supplied "as is".
Links to other sites and listings of other services and products are
for informational purposes only and should not be taken as
recommendations or statements of suitability.This statement is a Year
2000 Readiness Disclosure under The United States Year-2000 Information
Readiness and Disclosure Act: |
</div>