Testimony of Arthur I. Zygielbaum at Senate Commerce, Science, and Transportation Committee Hearings: “International Space Station”
Given at a Science, Technology, and Space Hearing:
International Space Station
Wednesday, October 29 2003 – 2:00 PM – SR- 253
The Testimony of Mr. Arthur I. Zygielbaum, Director, National Center for Information Technology and Education, University of Nebraska
Mr. Chairman and Distinguished Members of the Subcommittee: I am honored to have been invited to testify with regard to the safety of the International Space Station. Although I am testifying as a private citizen, I am a member of the administrative faculty, an associate professor of computer science and engineering (a courtesy title) and head of a research center in educational technology at the University of Nebraska-Lincoln (UNL). My testimony does not reflect any position or opinion of University of Nebraska-Lincoln. I joined UNL in January 1998 after spending nearly 30 years at the NASA/CALTECH Jet Propulsion Laboratory. While at JPL I held positions in electronic and software engineering as well as in line and program management. In August 2001 I was appointed as a consultant to the NASA Aerospace Safety Advisory Panel (ASAP). Three days after the Columbia tragedy, the NASA Administrator appointed me as a full member of the Panel. As you are aware, I resigned that appointment about a month ago.
In presenting my view of space station safety, I will first address the International Space Station (ISS) program within the context of NASA safety. Second, I will address specific issues impacting ISS safety and some over-sensationalized headlines attributed to me. My major points will be to recommend the establishment of independent safety oversight for NASA and the creation of a centralized, but international, management structure for the International Space Station.
Is ISS safe? The answer cannot be “yes” or “no”. For an enterprise as complex as space station, or the space program, or even driving to work, the answer is “probably.” We can only act to reduce the risk of an accident – a bad day. The actions proposed in this testimony are designed to reduce risk by providing a back-stop function for safety and by reducing the pressure to cave in to the ever present pressures of limited time and resources.
I. ISS Safety as a part of NASA Safety
The International Space Station program exists within the organization and culture of NASA. Its safety organization and assignment of safety responsibilities is similar to that in other NASA programs, including the Space Shuttle. The Challenger and Columbia disasters can be traced, at least in part, to allowing safety margins to erode in the face of budget and schedule pressure. The Aerospace Safety Advisory Panel has repeatedly called for independence of safety organizations and for clear and clean lines of safety responsibility, accountability and authority to provide the checks and balances that resist such erosion.
Independent Safety Oversight
The call to establish greater independence for NASA’s safety organization is not a new one. The 1999 Shuttle Independent Assessment Team stated “NASA’s safety and mission assurance organization was not sufficiently independent.” The Rogers Commission investigating the Challenger disaster called for independent oversight. The Columbia Accident Investigation Board (CAIB) report included the following, “NASA’s safety system lacked the resources, independence, personnel, and authority to successfully apply alternate perspectives to developing problems. Overlapping roles and responsibilities across multiple safety offices also undermined the possibility of a reliable system of checks and balances.”
The Aerospace Safety Advisory Panel could not provide the needed oversight. The Panel’s $500,000 annual budget only allowed panel members to spend 2-5 days per month in meetings or in the field. In 1978, Herbert Grier, ASAP Chairman, testified to this very Senate Subcommittee, “The Panel’s objective, and the limitation on the members’ time, indicate that we can be expected to review NASA operations only to the extent necessary to judge the adequacy of the NASA management system to identify risks and to cope with them in a safe, efficient manner.”
In the words of the CAIB, the Aerospace Safety Advisory Panel was “not very often influential.” Despite the fact that ASAP’s Annual Reports had for at least three decades identified technical problems and deficiencies in safety organization authority, accountability, responsibility, independence and funding, an attempt was made in a Senate Appropriations Committee report to hold ASAP accountable for not identifying the cultural problems found by the CAIB. ASAP was an advisory group – by definition to answer questions asked of it – to give advice. When my colleagues and I resigned from ASAP it was to facilitate the establishment of a safety oversight group with needed independence and authority. It was to establish an oversight group whose authority matched its responsibility.
An independent oversight board can provide effective checks and balances against the forces that erode safety – changing culture, budget, schedule, aging equipment, inadequate processes, etc. The Navy’s technical warrant process, the National Transportation Safety Board (NTSB), and the Nuclear Regulatory Commission are all examples of oversight organizations providing strong checks and balances to implementing organizations.
Unlike ASAP, the, for want of a name, NASA Safety Board should be full-time and include a small staff of researchers to aid in field work, reviews, and investigations. It should have sufficient funding to hire its own research personnel and to task NASA safety experts for specific studies. The Board must have the ability to communicate with all levels of NASA management in order to ask questions and examine safety-related processes and standards. While the Board could report to the NASA Administrator, it could be chartered under Congress, like the NTSB and the National Research Council, to achieve greater independence. It would act as a final authority in issues related to safety.
From our experience in ASAP, this Board must be constituted outside the Federal Advisory Committee Act (FACA). While FACA’s purpose in controlling committees is laudable, it has several provisions that would weaken an oversight group. In particular, FACA requires that a Federally Designated Official accompany committee members in any fact finding activities. The act also requires that all recommendations to the government be first aired in a public meeting. These restrictions impede investigation and effectively prohibit dealing with sensitive programmatic or personnel issues.
Waiver Authority
In response to a request by the NASA Administrator during our March 2003 annual meeting, ASAP began a study of NASA’s safety organization and culture. I headed the Safety Organization and Culture Team (SOCT) that was assigned that task. The Team’s initial findings and recommendations were presented publicly at Kennedy Space Center last September. The report, which was approved by ASAP as a whole, is appended to this testimony.
Although there were many initial findings, the Team reached one clear initial conclusion: isolate the obligation to meet safety critical requirements from the pressures to meet schedules and budgets. Issued before the Columbia Accident Investigation Board Report, the single initial recommendation was nonetheless strongly in concert. Quoting from the Team report:
“It is traditional in NASA for project and program managers to have the authority to authorize waivers to safety requirements. Safety critical waiver authority should reside with an independent safety organization using independent technical evaluation. Moving this authority would increase the management oversight of safety-related decisions and would strongly support the creation of a well-respected and highly-skilled safety organization.
Recommendation:
ASAP recommends that NASA institute a process change that requires that waiver requests to safety critical requirements be submitted by project and program managers to a safety organization independent of the program/project. That organization would have sole authority, excepting appeal outside the program/project potentially moving up to the level of the Administrator.”
In the present NASA organization, if safety personnel identify a safety critical problem, they report it to a project manager who has the authority to ignore or waiver the requirement. The safety organization could appeal to the next level of project or program management to override the waiver.
ASAP proposes that safety is paramount. Under the proposed recommendation, once a safety critical problem is identified by safety personnel, the project manager would have to apply to the safety organization for a waiver. If it is not granted, he or she would appeal to the next higher level in the safety organization.
The project manager’s responsibility for setting and enforcing technical requirements would remain unchanged. The authority to issue waivers to safety critical requirements would move to a safety organization. The responsibility to meet safety critical requirements would thereby not be easily weakened in response to cost, schedule, or other influence.
This process is similar to the Technical Warrant process used in the US Navy Sea Systems Command. A technical authority is created who holds final authority for waivers and changes to technical requirements. The technical authority is an expert who is isolated from the project manager’s schedule and budget pressures. (I am now part of an Independent Review Team examining the state of this process for the Navy.)
Caveat
Nothing in the suggestions for an oversight board or independent waiver authority should be construed to remove responsibility for safety from project and programs. Oversight boards or independent authorities cannot replace safety functions integral to the engineering, management, and operation of NASA’s projects and programs. Accountability for safety must remain with those who have implementing authority.
II. International Space Station: An Accident Waiting to Happen?
Several weeks ago headlines appeared world-wide stating that I, as an ex-NASA advisor, declared that the International Space Station (ISS) was in critical danger. In fact, what I stated, at a public ASAP meeting in September, was that incidents had occurred that might be a trend indicating problems with Space Station safety and operational processes.
The 2002 ASAP Annual Report included this statement, “Several events during the past year triggered the Panel’s concern. For example, shortly after the docking of STS-113 with ISS, there was loss of ISS attitude control due to lack of coordination of the system configuration. In another case, lithium thionyl chloride batteries were used on board ISS over the explicit objection of several partners. Although this occurred within appropriate existing agreements and without incident, the precedent is potentially hazardous. The Panel notes that differences exist in the safety philosophies among the partnering agencies. There is the potential for hazardous conditions to develop due to disagreements.”
In September a Russian controller sent commands to fire thrusters before American controllers disengaged the Control Moment Gyroscope system. The result was one attitude control system countering the actions of the other. Both attitude control incidents resulted in a relatively short loss of attitude control.
Although ISS was not seriously endangered by any of these incidents individually, the concern of the Panel was that miscommunication or misunderstandings about the system configurations could lead to extremely hazardous conditions. The Panel indicated that it would investigate this trend to understand if it was real and if actions were being taken to improve the situation.
The Russian and American organizations involved in ISS have cultural differences that impact safety. These differences are manifested in several ways. In a briefing by ISS managers, we were told that Russian safety organizations tend to fit hierarchically into their operational organizations. This differs from the American philosophy of parallel safety organizations that offer at least some level of independence. Of greater concern, however, is the sensitive nature of the interface between the American and Russian agencies. Clouded by issues of international protocol, national pride, security, and technology transfer, it was difficult for ASAP to obtain hard information about the Russian side of the command and control incidents.
ISS is a complicated spacecraft. It is a remarkable achievement. As an engineer I appreciate the difficulties that have been overcome in developing interfaces that function well across physical, electronic, and electrical connections. As a manager I am concerned about the highly decentralized management that operates space station.
Had I remained with ASAP I would have argued for a 2003 recommendation to investigate mechanisms to create a centralized international ISS management structure and an independent international safety oversight board. As ISS builds toward “core complete” and beyond, complexities will increase, coordination will become more critical, and the chance for accident will grow exponentially. A stronger management and safety structure is, in my opinion, the only means to salve this concern.
I am pleased to note that in a recent conversation with the Space Station Program Manager, William Gerstenmaier, he indicated that the Columbia tragedy had been a “wake-up call” to both the Russian and American teams. The result was improved communication and better exchange of technical information. Despite my concerns, I am amazed and in awe of how much has been accomplished by Bill, his people, and their Russian counterparts.
III. Other Issues
For the record, in its 2002 Annual Report and during meetings with NASA officials, ASAP expressed concerns and made specific recommendations that impact ISS. The recommendations included:
· Assure adequate funding for the development and maintenance of micrometeoroid/orbital debris (MMOD) software. · Continue priority efforts to find a solution to the lack of a crew rescue vehicle in the period from 2006 to 2010, between the planned end of Soyuz production and the availability of the Orbital Space Plane. · Review crew performance in light of apparent crew fatigue during EVA. This recommendation was sparked by a near miss collision between the ISS remote manipulator system and a docked space shuttle. · Assure that American and Russian segment control computers can each operate safety critical functions in all segments to mitigate hazards caused by computer failure in any segment. (American computers cannot control the propulsion system in the Russian segment, for example.)
The Panel was concerned about the availability of Russian Soyuz spacecraft and Progress supply vehicles. ISS is still a developmental vehicle. As such, the reliability and interoperability of systems and components is being learned. Sufficient “up” and “down” mass capability must be available to support hardware replacement and crew consumable resupply. While a crew can turn off the lights and come home in an emergency, that is not the best answer in terms of protecting the ISS investment nor lives and property on the ground if ISS makes an uncontrolled atmospheric reentry.
IV. Final Comments
The Aerospace Safety Advisory Panel effectively came to an end when all of its members and consultants resigned last month. I am very proud of my short tenure with ASAP. Over its 36 year history, ASAP was populated by individuals outstanding in their fields of expertise and in their commitment to space exploration. As a group they identified significant safety issues that ranged from organizational problems through major technical flaws. If we were really “often not very influential” it was not for lack of technical expertise or tenacity in attempting to get a point across.
We grieved with NASA and the world at the loss of Columbia and her gallant crew. We tried to understand our role with respect to the tragedy. At no time did we attempt to identify individuals who might be responsible. Rather we focused on processes that failed and on organizational structures that were faulty. We are convinced that no one within NASA wants to be unsafe or to unnecessarily endanger people or property. Given the enormity of the disaster it is easy to forget that NASA is fundamentally safe. There are thousands of potentially dangerous processes, such as moving heavy machinery and working with caustic chemicals, accomplished safely every day by NASA personnel and contractors.
Our single-minded purpose as a Panel was to assure the safety of ongoing and future NASA projects. It is up to those who follow to assure that safety remains the number one concern of the NASA family.
Appendix
Aerospace Safety Advisory Panel Safety Organization and Culture Team
Initial Findings and Recommendations
August 20, 2003
This paper documents initial findings of the Safety Organization and Culture Team. This paper also includes an initial recommendation worthy of consideration for immediate action. The Team will continue to develop these findings and issue recommendations through the Panel by benchmarking outside organizations, reviewing documents, interviewing individual NASA personnel, and discussing issues with NASA management and safety organizations.
For purposes of this study, the Team is organizing its investigation and review into three categories: Culture, Formalism of Safety, and Safety Organizations.
Initial Findings 1. Culture: Attitudes, Behavior, and Identity. The NASA “safety culture” includes safety attitudes and behavior evidenced by individuals and organizations. In addition, safety culture includes a sense of community and responsibility for that community among all individuals involved in NASA.
NASA is focused on safety throughout the agency. Notwithstanding the Columbia disaster, NASA personnel deal daily with hazardous materials, processes, and procedures. Accidents are infrequent, and, safety is explicitly prized by the agency as a whole.
However, NASA’s “can do” attitude could motivate projects to continue despite resource and schedule constraints. ASAP is concerned that safety is treated as a “consumable” in the same sense as schedule and budget in the push to meet flight commitments and schedules. Work-arounds, “within family” rationale, acceptance of out of specifications conditions, etc., have became standard practice. By contrast, the U.S. Navy submarine force and nuclear reactors programs, as shown in the Navy Benchmark Study, vest safety authority in independent organizations that oversee all programs and projects. There are no waivers to safety-critical requirements in any circumstances short of dire emergency.
The Panel also notes that in its review of the Orbital Space Plane (OSP) program, safety requirements did not appear at the upper levels of program requirements documents. The program made a conscious decision to leave the formulation of those requirements to the contractors. In the absence of high level safety requirements, there is little basis for a safety comparison among proposals. Without recording such requirements, there is risk that schedule and funding pressures may lead to degradation of safety. OSP acceleration could compound this problem.
As indicated in the ASAP 2002 Annual Report, many jobs in safety organizations are not held in high regard. There is a general belief that individuals in those positions are not useful in “getting the job done.”
2. Formalism of Safety. Safety formalism at NASA includes documentation of requirements and guidelines, defined processes, training and certification of personnel, and ongoing assessment and evaluation.
NASA has compiled large numbers of safety requirements and guidelines, which are published in a hierarchy of documents. The Panel is concerned that “requirements” and “guidelines” seem to be used interchangeably. While many NASA Standards and Guidelines are useful, they have been weakened over time to accommodate project constraints. Standards and guidelines must be kept vital in both senses of the word. They must be considered a necessary part of all development efforts. They must be kept updated, current, and appropriate to their intent.
Safety engineering at the systems level needs to be improved. System safety can best be achieved by eliminating and controlling hazards through specific design and operating approaches. It is compromised by inadequate systems engineering practices, and is characterized by bottom-up analysis and an over-emphasis on component engineering. While the Panel supports the use of Probabilistic Risk Assessment (PRA), the Panel cautions that the PRA is not a substitute for a rigorous system safety design process.
The NASA process for assuring compliance with safety requirements is weak. This derives from the ability to waive requirements at the program level. It is exacerbated by inadequate safety organization authority. Because safety compliance may degrade over time, strong trend analysis capability is needed. The Panel is concerned that there is insufficient authority, responsibility and accountability vested in safety organizations.
NASA needs to have stronger processes or structures in place to keep technical requirements current and validated. Similarly, the certification of systems against those requirements can diminish over time. In Shuttle, there are examples where components and procedures have changed without requisite recertification against safety and system level requirements.
3. Safety Organizations. The NASA safety organization includes implicit and explicit safety organizations spanning Headquarters, the Centers, and contractors. These organizations interrelate with each other, and with programs, projects, technical, and support organizations through lines of responsibility, authority, and accountability.
Safety organizations and related authority, responsibility, and accountability, vary from Center to Center, project to project, and program to program. The organizational architecture is constructed on an as-needed basis rather than through a defined and approved process. Standards on how to develop and operate safety organizations do not always exist or are not rigorously followed.
There is no single assignment of responsibility for compliance with safety requirements (technical and procedural). In most cases, this lies with the program/project manager. It is not likely that that manager has a strong background in safety analysis, standards, or methods. Because the manager has full authority, recommendations from safety officials can be easily over-ridden. In the Navy, for example, safety issues are under the full authority of the safety organization.
The Panel is concerned that the OSP program shows no clear ownership of system safety requirements. These requirements are caught up in a struggle between safety and systems engineering organizations. OSP safety is weakened by the lack of cooperation and clear authority and responsibility.
In some cases, safety organizations receive base funding independent of projects. In others, safety organizations depend solely on project funds. In all cases examined by the Panel, safety organizations do not have real authority in terms of control of funds spent by the project. At best, their approval is advisory to the project manager. There is, therefore, little independent assessment of safety and minimal impetus to attract top-level, highly-qualified, and well respected system safety engineers.
Initial Recommendation
Comment:
It is traditional in NASA for project and program managers to have the authority to authorize waivers to safety requirements. Safety critical waiver authority should reside with an independent safety organization using independent technical evaluation. Moving this authority would increase the management oversight of safety-related decisions and would strongly support the creation of a well-respected and highly-skilled safety organization.
Recommendation:
ASAP recommends that NASA institute a process change that requires that waiver requests to safety critical requirements be submitted by project and program managers to a safety organization independent of the program/project. That organization would have sole authority, excepting appeal outside the program/project potentially moving up to the level of the Administrator.