Recommendations: Space Shuttle Independent Assessment team, Report to Associate Administrator, Office of Space Flight Flight

By SpaceRef Editor
March 9, 2000
Filed under ,

Section 5: Recommendations

Category 1: Immediate

Prior to Return to Flight

1. The reliability of the wire visual inspection process should be quantified (success rate in locating wiring
defects may be below 70% under ideal conditions).

2. Wiring on OV102 at Palmdale should be inspected for wiring damage in difficult-to-inspect regions. If any of
the wires checked are determined to be especially vulnerable, they should be re-routed, protected, or

3. The 76 CRIT 1 areas should be reviewed to determine the risk of failure and ability to separate systems when
considering wiring, connectors, electrical panels, and other electrical nexus points. Each area that violates
system redundancy should require a program waiver that outlines risk and an approach for eliminating the
condition. The analysis should assume arc propagation can occur and compromise the integrity of all
affected circuits. Another concern is that over 20% of this wiring can not be inspected due to limited access;
these violation areas should as a minimum, be inspected during heavy maintenance and ideally be corrected.

4. The SSP should review all waivers or deferred maintenance to verify that no compromise to safety or mission
assurance has occurred.

Category 2: Short Term

Prior to making more than four more flights

1. NASA should expand existing data exchange and teaming efforts with other governmental agencies
especially concerning age effects.

2. A formal Aging and Surveillance Program should be instituted.

3. NASA and USA quality inspection and NASA engineers should review all CRIT 1 system repairs.

4. The failure of all CRIT 1 units should be fully investigated and corrected without waivers.

5. All testing of units must be minimized and documented as part of their total useful life. Similarly, maintenance
operations must be fully documented.

6. The SIAT recommends comprehensive re-examination of maintenance and repair actions for adequate
verification requirements (e.g., visual, proof test, or green run).

7. Human error management and development of safety metrics, e.g., Kennedy Space Center Shuttle Processing Human Factors team, should be supported aggressively and implemented program-wide.

8. Communications between the rank and file work force, supervisors, engineers and management should be

9. NASA should expand on the Human Factors research initially accomplished by the SIAT and the Air Force Safety Center. This work should be accomplished through a cooperative effort including both NASA and AFSC. The data should be controlled to protect the privacy of those taking the questionnaires and participating in interviews. Since major failures are infrequent occurrences, NASA needs to include escapes
and diving catches (see Appendix 3) in their human factors assessments.

10. Maintenance practices should be reviewed to identify and correct those that may lead to collateral damage.

11. Shuttle actuator soft goods should be adequately wetted to prevent downtime seepage.

12. Tank time and cycle data must be carefully logged to ensure safe life criteria are not exceeded.

13. Critical operations, especially those involving Self-Contained Atmospheric Protective Ensembles, must be staffed with technicians specifically experienced and properly trained with the operations.

14. Fleet Leader testing must be carefully scrutinized to ensure adequate simulation of operating conditions, applicability to multiple sub-systems, and complete documentation of results.

15. Vendor supplied training should be evaluated for all critical flight hardware.

16. The true mission impact of a second main engine pin failure (internal engine foreign object debris) during
flight, similar to that which took place last July, should be determined.

17. The SSP should consider more frequent lot sample hot fire testing of the Solid Rocket Booster motor
segments at full-scale size to improve reliability and safety and verify continued grain quality.

18. An independent review process, utilizing NASA and external domain experts, should be institutionalized.

19. NASA, USA, and the SSP element contractors should develop a Risk Management Plan and guidance for communicating risk as an integrated effort. This would flow SSP expectations for risk management down to
working level engineers and technicians, and provide insight and references to activities conducted to
manage risk.

20. Risk assessment matrix and Failure Modes and Effects Analysis should be updated based on flight failure
experience, aging and maintenance history, and new information (e.g., wiring, hydraulics, etc.).

21. The SSP should revise the risk matrix for probable and infrequent likelihood for critical 1R** and 1R* severity
to require a greater level of checkout and validation.

22. NASA Safety and Mission Assurance surveillance should be restored to the Shuttle Program as soon as

23. The Safety & Mission Assurance role should include: mandatory participation on Prevention/Resolution
Teams and in problem categorization, investigation of escapes and diving catches (see Appendix 3), and
dissemination of lessons learned.

24. The SIAT believes that software systems (flight, ground, and test) deserve a thorough follow-on evaluation

25. Due to time constraints, the SIAT only examined Orbiter wiring; many other systems associated with the
Shuttle also have critical wiring. The findings and recommendations in this report are applicable to all Shuttle
systems, but unique conditions that may require additional actions.

26. During the inspection of wiring, several connector issues were also apparent. Loose connector backshells
and wire strain relief that can potentially chafe wiring were noted. Under certain conditions loose backshells
can compromise electrical bonding between shielding and structure. Movement of the backshell can also
cause chafing between the wiring and strain relief. In either case, these are unacceptable conditions and
should be eliminated by periodic inspection and connector design.27. Arc track susceptibility of aged wiring and circuit protection devices that are sensitive to arcing should be

28. The need to examine wiring in areas that are protected or where damage may be induced by physical wiring
inspection should be evaluated. Wiring should be continuously evaluated by conducting extensive electrical
verifications on systems. When wiring damage is found in an area previously not examined, the remaining
Orbiters should also be inspected

29. Wire aging characteristics should be evaluated, including hydrolysis damage, loss of mechanical properties,
insulation notch propagation, and electrical degradation. Testing should be performed by an independent

30. A database that continually evaluates wiring system redundancy for the current design, modifications, repairs,
and upgrades should be maintained. System safety should evaluate the overall risk created by wiring failures

31. NASA engineering should specifically participate in industry and government technology development groups
related to wiring. The SAE AE-8 committees (specifically A and D) are excellent forums for identifying wiring

32. Wiring subjected to hypergolic contamination should be replaced since high pH fluids are known to degrade
polyimide type wire insulation.

33. The current quality assurance program should be augmented with additional experienced NASA personnel.

34. Technician/inspector certification should be conducted by specially trained instructors, with the appropriate
domain expertise.

35. The SIAT recommends an evaluation of depot repair documentation be performed to determine if the
transition process attained a necessary and sufficient set of vendors for each Line Replaceable Unit, Shop
Replaceable Unit, and special test equipment.

36. Teamwork and team support should be enhanced to mitigate some of the negative effects of downsizing and
transition to Shuttle Flight Operations Contract. Most immediately needed is the provision of relief from
deficits in core competencies, with appropriate attention to the need for experience along with skill
certification. Further development of the use of cross-training and other innovative approaches to providing
>on-the-job training in a timely way should be investigated.

37. Work teams should be supported through improved employee awareness of stresses and their effect on
health and work. Workload and “overtime” pressures should be mitigated by more realistic planning and
scheduling; a serious effort to preserve “quality of life” conditions should be made.

Category 3I: Intermediate term

Prior to January 1, 2001

1. Standard repairs on CRIT1 components should be completely documented and entered in the Problem
Resolution and Corrective Action system.

2. The criteria for and the tracking of standard repairs, fair wear and tear issues, and their respective
FMEA/CIL’s should be re-examined.

3. The SIAT recommends comprehensive re-examination of maintenance and repair actions for adequate
verification requirements (e.g., visual, proof test, or green run).

4. The avionics repair facility should be brought up to industry standards.

5. Selected areas of staffing need to be increased (e.g., the Aerospace Safety Advisory Panel advised 15
critical functional areas are currently staffed one deep).

6. The SIAT recommends that the SSP implement the Aerospace Safety Advisory Panel recommendations.
Particular attention should be paid to recurring items.

7. The SIAT believes that Aerospace Safety Advisory Panel membership should turnover more frequently to
ensure an independent perspective.

8. The root cause(s) for the decline in the number of problems being reported to the Problem Resolution and
Corrective Action system should be determined, and corrective action should be taken if the decline is not

9. The root cause(s) for the missing problem reports from the Problem Resolution and Corrective Action system
concerning Main Injector liquid oxygen Pin ejection, and for inconsistencies of the data contained within the
existing problem reports should be determined. Appropriate corrective action necessary to prevent
recurrence should be taken.

10. A rigorous statistical analysis of the reliability of the problem reporting and tracking system should be

11. Reporting requirements and processing and reporting procedures should be reviewed for ambiguities,
conflicts, and omissions, and the audit or review of system implementation should be increased.

12. The SSP should revise the Problem Resolution and Corrective Action database to include integrated analysis
capability and improved problem classification and coding. Also, improve system automation in data entry,
trending, flagging of problem recurrence, and identifying similar problems across systems and sub-systems.

13. All critical data bases (e.g., waivers) need to be modernized, updated and made more user friendly.

14. There are a number of cryogenic fluid mechanical joints and hot-gas mechanical joints that represent
potential risks that should therefore be examined in detail.

15. All internal Foreign Object Debris (e.g., pins) occurrences during the program should be listed, with pertinent
data on date of occurrence, material, and mass. The internal Foreign Object Debris FMEA/CIL’s and history
should be reviewed and the hazard categorized based on the worst possible consequence.

16. Any type of engine repair that involves hardware modification — no matter how minor (such as liquid oxygen
post pin deactivation) — should be briefed as a technical issue to the program management team at each
Flight Readiness Review. The criticality of a standard repair should not be less than basic design criticality,
based on worst case consequences, and all failures of standard repairs should be documented and brought
to the attention of the Material Review Board.

17. The design and the post Solid Rocket Booster recovery inspection and re-certification for flight should be
looked at and analyzed in careful detail by follow-on independent reviews.

18. The inspection and proof-test logic to screen for flaws or cracks in the Super-Light-Weight Tank should be
reviewed in light of the reversal in fracture-stress-against-flaw-size between room and cryogenic

19. The SSP should explore the potential of adopting risk-based analyses and concepts for its critical
manufacturing, assembly, and maintenance processes, and statistical and probabilistic analysis tools as part
of the program plans and activities. Examples of these analyses and concepts are Process FMEA/CIL,
Assembly Hazard Analysis, Reliability Centered Maintenance, and On Condition Maintenance.

20. Failure analysis and incident investigation should identify root cause and not be artificially limited to a sub-set
of possible causes.

21. Software requirements generated by Shuttle system upgrades must be addressed

22. Enhanced software tools should be considered for potential improvements in reliability and maintainability as
systems are upgraded.

23. An assessment of using lower fatigue-crack-growth thresholds and their impact on fracture critical parts or
components needs to be reviewed to establish life and verify the inspection intervals. Retardation and
acceleration model(s) should be used to assess the type of crack-growth history under the Orbiter spectra.

24. Assessments of the impact of any new Orbiter flight loads on structural life should continue as responsibility
for the Orbiter structure is transferred to the contractor.

25. The Orbiter Corrosion Control Review Board should consider incorporating the framework suggested by the
Federal Aviation Administration for Corrosion Prevention and Control Plans of commercial airplane operators
into their corrosion database to provide focus to the more serious occurrences of corrosion.

26. Hidden corrosion problems require a proactive inspection program with practical and reliable non-destructive
evaluation techniques; at this point, this inspection is done on a randomized basis. An assessment of the
impact of hidden (or inaccessible) corrosion and the repairs of identified corrosion on the integrity of the
Orbiter structure should to be made.

27. Current wire inspection and repair techniques should be evaluated to ensure that wire integrity is maintained
over the life of the Shuttle vehicles. Several new inspection techniques are available that use optical,
infrared, or electrical properties to locate insulation and conductor damage, and should be explored for use
on the Shuttle.

28. All CRIT 2 circuits should be reviewed to determine to what extent redundancy has been compromised in
wiring, connectors, electrical panels and other electrical nexus points. The primary concern is that single
point failure sources may exist in the original design or have been created by system upgrades or

29. The Shuttle program should form a standing wiring team that can monitor wire integrity and take program
wide corrective actions. The team should include technicians, inspectors, and engineering with both
contractor and government members. The chair of the team should have direct accountability for the integrity
of the Shuttle wiring. One area that should be evaluated is the techniques that can detect an exposed
conductor that has not yet developed into an electrical short.

30. The long term use of primarily polyimide wiring should be minimized, and wire insulation constructions that
have improved properties should be evaluated and compared to the current wire insulation used on the
Shuttle program. Alternate wire constructions should be considered for modifications/ repairs/upgrades.
There are several aerospace wire insulation constructions that can provide more balanced properties.

Category 3L: Long term

Prior to January 1, 2005

1. Where redundancy is used to mitigate risk, it should be fully and carefully implemented and verified. If it
cannot be fully implemented due to design constraints, other methods of risk mitigation must be utilized.

2. Serious consideration should be given to replacing the hydrazine power unit with a safer and easier to
maintain advanced electric auxiliary power unit for the Thrust Vector Control hydraulic unit.

3. Due to obsolescence, Shuttle Reaction Control System propellant valves and propellant flight-half couplings
should be replaced with ones that are more tolerant of the oxidizer environment.

4. The Problem Resolution and Corrective Action system should be revised using state-of–the-art database
design and information management techniques.

5. Inspection technique(s) for locating corrosion under the tiles and in inaccessible areas should be developed.

6. Consideration should be given to modifying the Shuttle internal hydraulic line routing to the mold line to permit efficient facility hydraulic hose connections.

7. Non-intrusive methods of reliably detecting wiring damage should be developed, including those areas not
accessible to visual inspection.

8. Quantitative methods of risk assessment (likelihood of failure) should be developed.

9. Quantitative measures of safety (likelihood of error), including assessment surveying techniques should be
developed, e.g., Occupational Stress Inventory and MEDA.

10. Quantitative methods of risk assessment and safety (see above) need to be integrated to develop the ability
to perform trade-off studies on the effect of new technology, aging, upgrades, process changes, etc. , upon
vehicle risk.

SpaceRef staff editor.