Executive Summary, Space Shuttle Independent Assessment team, Report to Associate Administrator, Office of Space Flight

By SpaceRef Editor
March 9, 2000
Filed under ,

Section 1 Executive Summary

The Shuttle program is one of the most complex engineering activities undertaken anywhere in the world at the
present time. The Space Shuttle Independent Assessment Team (SIAT) was chartered in September 1999 by
NASA to provide an independent review of the Space Shuttle sub-systems and maintenance practices. During
the period from October through December 1999, the team led by Dr. McDonald and comprised of NASA,
contractor, and DOD experts reviewed NASA practices, Space Shuttle anomalies, as well as civilian and military
aerospace experience.

In performing the review, much of a very positive nature was observed by the SIAT, not the least of which was the
skill and dedication of the workforce. It is in the unfortunate nature of this type of review that the very positive
elements are either not mentioned or dwelt upon. This very complex program has undergone a massive change
in structure in the last few years with the transition to a slimmed down, contractor-run operation, the Shuttle Flight
Operations Contract (SFOC). This has been accomplished with significant cost savings and without a major
incident. This report has identified significant problems that must be addressed to maintain an effective program.
These problems are described in each of the Issues, Findings or Observations summarized below, and unless
noted, appear to be systemic in nature and not confined to any one Shuttle sub-system or element. Specifics are
given in the body of the report, along with recommendations to improve the present systems.

Issue 1

NASA must support the Space Shuttle Program with the resources and staffing
necessary to prevent the erosion of flight-safety critical processes.

Human rated space transportation implies significant inherent risk. Over the course of the Shuttle Program,
now nearing its 20
year, processes, procedures and training have continuously been improved and
implemented to make the system safer. The SIAT has a major concern, reflected in nearly all of the
subsequent “Issues”, that this critical feature of the Shuttle Program is being eroded. Although the reasons for
this erosion are varied, it appears to the SIAT that a major common factor among them is the reduction in
allocated resources and appropriate staff that ensure these critical processes and procedures are being
rigorously implemented and continually improved.

The SIAT feels strongly that workforce augmentation must be realized principally with NASA personnel rather
than with contract personnel. The findings show that there are important technical areas that are staffed “one-deep”.
The SSP should assess not only the quantity of personnel needed to maintain and operate the Shuttle
at anticipated future flight rates, but also the quality of the workforce required in terms of experience and
special skills. In the recent fleet wiring investigation, work force skill shortages created the need to use Quality
Assurance personnel inexperienced in wiring issues to perform critical inspections. Note that increasing the
work force carries risk with it until the added work force acquires the necessary experience.

Issue 2

The past success of the Shuttle program does not preclude the existence of problems
in processes and procedures that could be significantly improved.

The SIAT believes that another factor in the erosion referred to in Issue 1 is success-engendered safety
optimism. The SIAT noted several examples of what could be termed an inappropriate level of comfort with
certain apparently successful “acceptance of risk” decisions made by the program. One example was the
number of flights with pinned liquid oxygen injectors flown without prior hot-fire testing that did not experience
pin ejection before the STS-93 pin ejection rupture incident. These successful flights created a false sense of
security that pinning an injector could be treated as a standard repair. There were 19 incidences of pin
ejection that did not result in nozzle rupture prior to STS-93 and this created an environment that led to the
acceptance of risk. Similarly the wire damage that led to the short on STS-93 is suspected to have been
caused 4 to 5 years prior to the flight. The SSP must rigorously guard against the tendency to accept risk
solely because of prior success.

Issue 3

The SSP’s risk management strategy and methods must be commensurate with the
‘one strike and you are out’ environment of Shuttle operations.

While the Shuttle has a very extensive Risk Management process, the SIAT was very concerned with what it
perceived as Risk Management process erosion created by the desire to reduce costs. This is inappropriate
in an area that the SIAT believes should be under continuous examination for improvement in effectiveness
with cost reduction being secondary. Specific SIAT findings address concerns such as: moving from NASA
oversight to insight; increasing implementation of self-inspection; reducing Safety and Mission Assurance
functions and personnel; managing risk by relying on system redundancy and abort modes; and the use of
only rudimentary trending and qualitative risk assessment techniques. It seemed clear to the SIAT that
oversight processes of considerable value, including Safety and Mission Assurance, and Quality Assurance,
have been diluted or removed from the program. The SIAT feels strongly that NASA Safety and Mission
Assurance should be restored to the process in its previous role of an independent oversight body, and not
be simply a “safety auditor.” The SIAT also believes that the Aerospace Safety Advisory Panel membership
should turnover more frequently to ensure an independent perspective. Technologies of significant potential
use for enhancing Shuttle safety are rapidly advancing and require expert representation on the Aerospace
Safety Advisory Panel. While system redundancy is a very sound element of the program, it should not be
relied upon as a primary risk management strategy; more consideration should be given to risk
understanding, minimization and avoidance. It was noted by the SIAT that as a result of choices made during
the original design, system redundancy had been compromised in 76 regions of the Orbiter (300+ different
circuits, including 6 regions in which if wiring integrity was lost in the region, all three main engines would shut
down). These were design choices made based on the technology and risk acceptance at that time. Some
of these losses of redundancy may be unavoidable; others may not be. In either case, the program must
thoroughly understand how loss of system redundancy impacts vehicle safety.

Issue 4

SSP maintenance and operations must recognize that the Shuttle is not an ‘operational’
vehicle in the usual meaning of the term.

Most aircraft are described as being “operational” after a very extensive flight test program involving hundreds
of flights. The Space Shuttle fleet has only now achieved one hundred flights and clearly cannot be thought
of as being “operational” in the usual sense. Extensive maintenance, major amounts of “touch labor” and a
high degree of skill and expertise by significant numbers of technician and engineering staff will be always
required to support Shuttle operations. Touch labor always creates a potential for collateral and inadvertent
damage. In spite of the clear mandate from NASA that neither schedule nor cost should ever be allowed to
compromise safety, the workforce has received a conflicting message due to the emphasis on achieving cost
and staff reductions, and the pressures placed on increasing scheduled flights as a result of the Space
Station. Findings of concern to the SIAT include: the increase in standard repairs and fair wear and tear
allowances; the use of technician and engineering “pools” rather than specialties; a potential complacency in
problem reporting and investigation; and the move toward structural repair manuals as used in the airline
industry that allow technicians to decide and implement repairs without engineering oversight. The latter
practice has been implicated in a number of incidents that have occurred outside of NASA (Managing the
Risks of Organizational Accidents, Chapter 2, p. 21). When taken together these strategies have allowed a
significant reduction in the workforce directly involved in Shuttle maintenance. When viewed as an
experimental / developmental vehicle with a “one strike and you are out” philosophy, the actions above seem
ill advised.

Issue 5

The SSP should adhere to a ‘fly what you test / test what you fly’ methodology.

While the “fly what you test / test what you fly” methodology was adopted by the Shuttle Program as a
general operational philosophy, this issue arose specifically with the Space Shuttle Main Engine (SSME). For
the SSME, fleet leader and hot-fire (green-run) testing are used very effectively to manage risk. However, the
concept must be rigorously adhered to. Recent experience, for instance the pin ejection problem, has shown
a breakdown of the process. An excellent concept, the fleet leader is also applicable to other systems, but its
limitations must be clearly understood. In some cases (e.g., hydraulic testing, avionics, Auxiliary Power Unit)
the SIAT believes that the testing is not sufficiently realistic to estimate safe life.

Issue 6

The SSP should systematically evaluate and eliminate all potential human single point

In the past, the Shuttle Program had a very extensive Quality Assurance program. The reduction of the
quality assurance activity (“second set of eyes”) and of the Safety & Mission Assurance function
(“independent, selective third set of eyes”) increases the risk of human single point failures. The widespread
elimination of Government Mandatory Inspection Points, even though the reductions were made
predominantly when redundant inspections or tests existed, removed a layer of defense against maintenance
errors. Human errors in judgment and in complying with reporting requirements (e.g., in or out of family) and
procedures (e.g., identification of criticality level) can allow problems to go undetected, unreported or reported
without sufficient accuracy and emphasis, with obvious attendant risk. Procedures and processes that rely
predominantly on qualitative judgements should be redesigned to utilize quantitative measures wherever
possible. The SIAT believes that NASA staff (including engineering staff) should be restored into the system
for an independent assessment and correction of all potential single point failures (see also the concerns
concerning the Safety and Mission Assurance function in Issue 3).

Issue 7

The SSP should work to minimize the turbulence in the work environment and its
effects on the workforce.

Findings support the view that the significant number of changes experienced by the Shuttle Program in
recent years have adversely affected workforce morale or diverted workforce attention. These include the
change to Space Flight Operations Contract, the reduction in staffing levels to meet Zero Based Review
requirements, attrition through retirement, and numerous re-organizations. Ongoing turbulence from cyclically
heavy workloads and continuous improvement initiatives (however beneficial) were also observed to stress
the workforce. While the high level workforce performance required by the Shuttle program has always
created some level of workforce stress, the workforce perception is that this has increased significantly in the
last few years. Specifically, the physical strain measured in the Marshall Space Flight Center workforce
significantly exceeded the national norm, whereas the job stress components (e.g., responsibility levels,
physical environment) were near normal levels. This typically indicates the workforce is internalizing chronic
instability in the workplace. Similarly, feedback from small focus groups at Kennedy Space Center indicates
unfavorable views of communication and other factors of the work environment. Clearly, from a health
perspective, one would seek to reduce employee stress factors as much as possible. From a vehicle health
perspective, stressed employees are more likely to make errors by being distracted while on the job, and to
be absent from the job (along with their experience) as a result of health problems.

The SIAT believes that the findings reported here in the area of work force issues parallel those that were
noted by the Aerospace Safety Advisory Panel. The SIAT is concerned that in spite of the Aerospace Safety
Advisory Panel findings and recommendations, supported by the present review, these problems remain.

Issue 8

The size and complexity of the Shuttle system and of the NASA/contractor relationships
place extreme importance on understanding, communication, and information handling.

In spite of NASA’s clear statement mandate on the priority of safety, the nature of the contractual relationship
promotes conflicting goals for the contractor (e.g., cost vs. safety). NASA must minimize such conflicts. To
adequately manage such conflicts, NASA must completely understand the risk assumptions being made by
the contractor workforce. Furthermore, the SIAT observed issues within the Program in the communication
from supervisors downward to workers regarding priorities and changing work environments.

Communication of problems and concerns upward to the SSP from the “floor” also appeared to leave room
for improvement. Information flow from outside the program (i.e., Titan program, Federal Aviation
Administration, ATA, etc.) appeared to rely on individual initiative rather than formal process or program
requirements. Deficiencies in problem and waiver tracking systems, “paper” communication of work orders,
and FMEA/CIL revisions were also apparent. The program must revise, improve and institutionalize the
entire program communication process; current program culture is too insular in this respect .

Additionally, major programs and enterprises within NASA must rigorously develop and communicate
requirements and coordinate changes across organizations, particularly as one program relies upon another
(e.g., re-supplying and refueling of International Space Station by Space Shuttle). While there is a joint
Program Review Change Board (PRCB) to do this, for instance on Shuttle and Space Station, it was a
concern of the SIAT that this communication was ineffective in certain areas.

Issue 9

Due to the limitations in time and resources, the SIAT could not investigate some
Shuttle systems and/or processes in depth.

Follow-on efforts by some independent group may be required to examine these areas (e.g., other propulsion
elements, such as the Reusable Solid Rocket Motor, Solid Rocket Booster, External Tank, Orbiter
Maneuvering System, and Reaction Control System, and other wiring elements besides those in the Orbiter).
This independent group should also review the SSP disposition of the SIAT findings and recommendations.

The Shuttle Upgrades program creates the opportunity to correct many of the observed deficiencies, e.g., the
76 areas of compromised redundancies (300+ circuits), and to incorporate design for maintainability and
continuous improvement. However, without careful systems integration and prioritization, some of the
deficiencies observed by the SIAT will be exacerbated, e.g., in wiring, hydraulics, software, and maintenance
areas. Additionally, the elements of maintenance must be rigorously analyzed, including training,
maintainability, spares support maintenance, and accessibility.

Return to Flight

The SIAT was asked by the SSP for its views on the return to flight of STS-103. The SIAT had earlier considered
this question and had concluded that a suitable criterion would be that STS-103 should possess less risk than, for
example, STS-93. In view of the extensive wiring investigation, repairs and inspections that had occurred this
condition appeared to have been satisfied. Furthermore, none of the main engines scheduled to fly have pinned
Main Injector liquid oxygen posts. The SIAT did suggest that prior to the next flight the SSP make a quantitative
assessment of the success of the visual wiring inspection process. In addition, the SIAT recommended that the
SSP pay particular attention to inspecting the 76 areas of local loss of redundancy and carefully examine the
OV102 being overhauled at Palmdale for wiring damage in areas that were inaccessible on OV103. Finally, the
team suggested that the SSP review in detail the list of outstanding waivers and exceptions that have been
granted for OV103. The SSP is in the process of following these specific recommendations and so far has not
reported any findings that would cause the SIAT to change its views.

Shortly before completing this report , the SIAT was gratified to learn that a number of steps had been taken by
NASA to rectify a number of the adverse findings reported above. Of particular note was the strengthening of the
NASA Quality Assurance function for the Shuttle at Kennedy Space Center. Upon completion of STS-103, the
SIAT was pleased to learn that only two orbiter in-flight anomalies were experienced, a reduction from past trends
(see Appendix 11).

SpaceRef staff editor.