Software Testing in Space Programs

Myron Hecht and Douglas Buettner

As space-system software grows in size and complexity, adequate testing becomes more difficult—and more critical.

Ariane 5 launch vehicle failure

The Ariane 5 launch vehicle failed on its maiden flight in June 1996. About 40 seconds after liftoff, a software bug in the flight controller made the rocket veer off course, leading to its destruction via ground command. Ariane 5 reused software from Ariane 4 without proper testing. Contributing to the mishap, run-time range checking had been turned off because of processor limitations. Also, the backup channel had failed milliseconds earlier because of the same coding defect. (Photo courtesy of European Space Agency; Ada code from http://www-aix.gsi.de/~giese/swr/ariane5.html)

Failures attributed to software defects are becoming increasingly visible in space systems. Recent newsworthy examples include the failure of the Mars rover Spirit to execute any task that requested memory from the flight computer; the unanticipated descent of the Mars Climate Orbiter into the Martian atmosphere, ultimately traced to a unit conversion defect in a navigation system; and the crash of the Mars Polar Lander onto the Martian surface after a premature shutdown of its descent engines. In 1996, the first launch of the Ariane 5 booster ended with a spectacular crash off the coast of French Guiana. The cause was traced to a variable overflow that affected software running in both channels of its dual redundant inertial reference system. Earlier this year, the European Space Agency's Huygens probe successfully beamed back only half of its image data. The other half was lost because of a single missing line of code.

In the period from 1998 to 2000, nearly half of all observed spacecraft anomalies were related to software. Anomalies, less severe than failures, have been occurring with increasing frequency on U.S. national security space vehicles. One reason is that space-system software has been growing more complex to meet greater functional demands. Another reason is that software quality is inherently difficult to determine. The challenge in developing the next generation of national security space vehicles will be to ensure reliability despite increasing software size and complexity. Software testing is an important factor in meeting this challenge.

Types of Software Testing

Software testing methods generally fall into two categories: "black box" and "white box" (while some authors also identify a third category, the "ticking box," which involves not doing any testing).

Black-box methods disregard the software's internal structure and implementation. The test data, completion criteria, and procedures are developed solely to test whether the system meets requirements, without consideration of how the software is coded. Black-box testing is used at all levels of testing and is particularly applicable at higher levels of integration, where the underlying components are no longer visible.

Anomaly trend

Anomaly trend attributed to software in five-year increments from the first three years of the spacecraft's operation using available failure data from a wide range of satellite categories.

White-box testing, on the other hand, does account for the internal software structure in the formulation of test cases and completion criteria. The most common types of white-box testing include branch testing, which runs through every instruction in each conditional statement in a program, and path testing, which runs through every set of conditional statements or branches. White-box testing is typically conducted at the unit level (i.e., the smallest testable component of software) and at the unit integration level.

Both methods would typically include some sort of nominal testing, in which test cases are designed to mimic normal operation, and negative testing, in which test cases are selected to try and "break" the program. For example, the software might be run using input values of the correct type and within the expected range to verify conformance with nominal requirements. It might also be run using input values and data rates beyond expected ranges to check failsafe and recovery capabilities (see table, Typical Black-Box and White-Box Test Methods).

The Testing Program

White-box and black-box testing is performed within the context of an overall software test program that starts during the requirements phase and continues through product release and maintenance. Software development standards provide a basis for defining the activities of the overall test program. Although the use of such standards declined in the 1990s, they are now increasingly recognized as an important way to help ensure software quality despite rising complexity.

For example, the National Reconnaissance Office (NRO) and the Air Force Space and Missile Systems Center (SMC) recently asked Aerospace to recommend a set of software development standards to be used as compliance documents on NRO and SMC contracts. Aerospace assisted with a detailed survey of existing life-cycle standards and recommended the use of MIL-STD-498 or its commercial equivalent, J-STD-016-1995. However, MIL-STD-498 was canceled in the mid-1990s, and J-STD-016 is no longer being maintained by the technical organizations that produced it. Therefore, SMC and NRO felt that a new software standard should be developed.

Aerospace helped analyze MIL-STD-498 in greater detail and identified ways to modernize J-STD-016. Based on this effort, Aerospace prepared a new standard, published as Aerospace Report No. TOR-2004(3909)-3537, "Software Development Standard for Space Systems." It uses MIL-STD-498 as a foundation, but incorporates additional requirements from J-STD-016. It also adds exit criteria for various levels of software testing and requirements that bring the standard up to date with modern terminology and best practices in software development.

Many software development standards, including MIL-STD-498 and the Aerospace revision, set forth requirements for three major activities of software testing: planning, definition, and execution (see table, Key Software Test Issues).

Software test planning addresses all levels of coding and integration, from the highest-level software package down to the lowest-level software units. The results are documented in a software test plan. Lower-level test plans are independently created if the software's size and complexity warrants it. The software test plan enables the program manager to assess the adequacy of test planning for each of the software items and for the software system qualification testing. In addition, the software test plan lists the issues that should be considered in the development of the software test definition.

In the test definition stage, the test preparations, test cases, and test procedures are all described and documented. This may involve a significant design and development effort—in some cases, equal to or exceeding that of the software itself. This is particularly true for software item qualification testing, in which individual software components are accepted for integration into the system. Software item qualification testing is critically dependent on the accuracy of the software test definition.

Once the test definition has been completed, it is possible to actually run the tests and record the results in the software test report. As part of this process, the test organization should emphasize findings and observations of anomalies. The software test report can also include suggestions for further testing based on the limitations of the test equipment or limitations arising from budget or time constraints. The software test report documents the test results and includes accumulated test analyses, results, summaries, deviations from dry runs, and metrics.

Limitations of Software Testing

Despite its obvious importance, software testing is only a partial solution to creating reliable software. In a sense, the purpose of testing is to show that a program has bugs. Thus, while it can provide a means to find and fix defects, it cannot by itself provide an assurance of failure-free operations. Software testing must be pursued in conjunction with other appropriate practices in systems engineering, requirements definition, and software development (such as inspection, the use of automated development aids, static source code and design analysis, and peer review).

A significant limitation is that software testing cannot occur until after the code is written—about halfway or more through project development. The cost of fixing errors rises dramatically as the project progresses because more deliverables are affected. For example, requirements errors cost 10 times more to fix in the code phase than in the requirements phase. Methods of software verification other than testing (under the broad categories of inspection, analysis, or demonstration) must be used to catch errors in the earlier phases of design.

A related limitation is that the effectiveness of a testing program is no better than the requirements on which it is based. Aerospace analysis has shown that the generation of software requirements is a major source of errors in system development. Specific challenges include poorly stated requirements, changing or "creeping" requirements, and nonfunctional requirements. A study of requirements-originated software failures showed that roughly half resulted from poorly written, ambiguous, unclear, and incorrect requirements. The rest came from requirements that were completely omitted. Most problems introduced into software can be traced directly to requirements flaws.

An additional limitation is the difficulty—and hence the time, cost, and effort—of software testing. Ideally, a software system could be exhaustively tested and thereby proven correct. However, this is impossible for all but the simplest systems. Many space-system software applications are so complex, and run in such an interdependent environment, that complete testing can never be achieved. Instead, program managers must prioritize their testing objectives and optimize their testing procedures to ensure that the most important tests are completed. Skill in risk analysis is therefore essential for establishing an appropriate test coverage objective—usually stated as a proportion of the requirements, input data, instructions, or program paths tested (e.g., testing is complete when the tests addressing 100 percent functional coverage of the system have all executed successfully).

Proper selection of input data can increase the testing efficiency by either increasing the error-detection effectiveness or reducing the number of test cases needed to achieve a given test coverage objective. For example, tests can be partitioned to exercise the same code using only one representative case. The number of test cases for each class of failure behavior can be limited. If software inspection is used in the development process, the distribution of defects (by category) detected by inspection can be used to drive the distribution of test data. The amount of coupling (intermodule referencing of variables or subroutines) can be used to focus test cases—particularly if a significant amount of software changes have been made. Test cases can also be concentrated on areas exhibiting an abnormally high number of failures. Test case input data can also be selected using a "design of experiments" approach.

error-detection effectiveness graph

This graph shows the benefit of error-detection effectiveness under the assumption that the defect detection can be modeled as a nonhomogenous Poisson process (NHPP). As the proportion of defects removed per test case or interval moves from 0.2 to 0.8, the number of test intervals needed to remove 80 percent of the defects goes from 8.03 down to 2.01.

How Much Testing is Enough?

Considering that complete test coverage is generally not possible, project managers face a difficult question in deciding when to stop testing. In practice, this decision is often based not on specific and quantifiable goals but on deadlines, budgets, or completion of an arbitrary number of test runs.

For national security space systems, a better criterion would be the point at which the software reaches an acceptable level of reliability, as measured in time between failures. This method, often referred to as software reliability engineering, is a recommended practice by the American Institute of Aeronautics and Astronautics (AIAA).

The fundamental premise of software reliability engineering is that the rate at which software defects are found and removed can be described mathematically and therefore predicted. These discovery and removal rates can be constant or variable, depending on the models used. If the testing environment simulates the operational environment, then failure rates observed at any point in the test would be similar to the operational failure rates, and the model would enable a prediction of the future failure rate as the testing program proceeded. They would therefore provide an ability to predict the software's future reliability.

Software reliability engineering originated in the 1970s and has been the subject of extensive research since that time. Tools have been developed to fit various models to test data to enable determination of the best fit and subsequent extrapolation to enable prediction. Software reliability engineering provides a cost-effective method to determine when to stop testing. Cost typically ranges from 0.1 to 3.0 percent of project development costs.

software reliability modeling

This figure shows the output of a software reliability modeling tool called CASRE (Computer Aided Software Reliability Estimation) developed at CalTech/JPL. Two of the models, the nonhomogenous Poisson process (NHPP) model and the Schneidewind model, closely fit the cumulative defect history curve from system testing for a flight software project. The blue part of the curve displays the end of data bar and the failure prediction results two weeks into the future.

To help improve the accuracy and value of these prediction models, Aerospace has been working to develop a database schema for software reliability data. The project, Space Systems Mission Assurance via Software Reliability Monitoring, will correlate software life-cycle engineering practices (including test) with the reliability measured from deployed space-systems software. An eventual goal is to provide a risk-assessment tool for program managers that will allow them to compare key software life-cycle metrics and test practices from their program to historical data from other programs. The database is being designed to support three types of analyses: exploratory, quantitative, and qualitative. Exploratory analysis would allow users to investigate relationships that could be used to predict software and system reliability based on project, structural, and test program attributes. Quantitative analysis would allow users to extract event data to predict software reliability. Qualitative analysis would allow users to address questions such as what are the major failure causes, effects, or developmental problems.

Safety-Critical Software

Although software reliability engineering can benefit many types of software, special considerations must be made for safety-critical software—the failure of which can lead to death, major injury, or extensive property damage. A good example is the software supporting the Global Positioning System (GPS). An undetected failure in the navigation signal from any of the GPS satellites might result in an aircraft receiving misleading information on its position or altitude, thereby exposing its occupants to a high risk of a crash landing. Thus, the software components involved in integrity monitoring, which would detect and announce a navigation signal failure, must receive special scrutiny.

Aerospace is supporting the GPS program office in producing high-integrity software for the next-generation GPS constellation. For safety-critical software, testing is part of a process of analysis, documentation, and traceability that starts at the beginning of the project and continues throughout the system lifetime. For example, when requirements are being formulated, a preliminary or functional hazard analysis is performed to identify major hazards and develop mitigation strategies. At the design phase, two more system-safety analyses are performed to determine the safety impact of the software components in their normal and failed states. For critical software components, verification, testing, and documentation must be performed intensively. For example, in aviation applications, the RTCA DO 178B standard provides for testing of all combinations of conditions in branches in such software.

Even intensive testing has the same limitation discussed earlier: it can only prove the presence of defects in software, not their absence. Thus, Aerospace and other organizations are researching methods that use mathematical techniques to prove the correctness of the specification, the verification test suite, and the automatic code generators that create the software. The goal is to use formal methods and testing together to significantly decrease development time while producing dependable software.

Conclusion

With the addition of progressively more software functionality in both space and ground segments, program managers will face tougher challenges in ensuring software reliability. Software testing efforts will require better analytical methods and oversight approaches to meet the greater demand without adversely affecting budgets and schedules.

By participating in software test planning and data analysis, reviewing software development standards and practices, and by performing research on software reliability, Aerospace is helping to make the software testing process more efficient and effective. The results of this research should augment software-intensive system acquisition practices with tools to help program managers ensure mission success.

Further Reading

  • Aerospace Report No. TOR-2004(3909)-3537, "Software Development Standard for Space Systems." (The Aerospace Corporation, El Segundo, CA, 2004)
  • AIAA/ANSI R-013-1992, Recommended Practice: Software Reliability, American Institute of Aeronautics and Astronautics (Reston, VA).
  • P. Cheng, "Ground Software Errors Can Cause Satellites to Fail too—Lessons Learned," Ground Systems Architecture Workshop (Manhattan Beach, CA, March 4, 2003); (last visited April 29, 2005).
  • G. Durrieu, C. Seguin, V. Wiels, and O. Laurent, "Test Case Generation Guided by a Coverage Criterion on Formal Specification," IEEE International Symposium on Software Reliability Engineering (ISSRE, Nov. 2004).
  • J. T. Harding, Using Inspection Data to Forecast Test Defects, Crosstalk (May 1998); (last visited January 19, 2005).
  • K. Hayhurst, et al., A Practical Tutorial on Modified Condition/Decision Coverage, [PDF] NASA TM-2001-210876 (NASA Langley Research Center, May 2001) (last visited May 10, 2005).
  • M. Hecht and H. Hecht, "Digital System Software Requirements Guidelines," NUREG/CR-6734, Vol. I, Office of the Chief Information Officer, U.S. Nuclear Regulatory Commission (Washington, DC, 2001).
  • C. Kaner, An introduction to Scenario-Based Testing, [PDF] (last visited, January 22, 2005).
  • D. Leffingwell and D. Widrig, Managing Software Requirements (Addison Wesley, Longman, Reading, MA, 1999).
  • S. McConnell, "Gauging Software Readiness with Defect Tracking," IEEE Software, Volume 14, Issue 3, p. 135 (May-June 1997).
  • J. Musa, Software Reliability Engineering (McGraw Hill, New York, 1998).
  • D. R. Wallace, Is Software Reliability Modeling a Practical Technique? [PDF] 2002 Software Technology Conference (last visited January 19, 2005).
  • M. C. K. Yang, A. Chao, "Reliability Estimation and Stopping Rules for Software Testing, Based on Repeated Appearances of Bugs," IEEE Transactions On Reliability, Vol. 44, No. 2, p. 315 (June 1995).
  • U.S. Department of Defense, Military Standard, Software Development and Documentation, December 1994; also available in a commercial variation as EIA/IEEE J-STD-016 from http://standards.ieee.org.

To Fall 2005 Table of Contents



Home   Contact Us   FAQ  |   (options)
Copyright and Terms of Use, © 1995-2010 The Aerospace Corporation. All rights reserved. Send any questions or comments regarding this service to .

This page was last modified on 05/17/07