Internal Report Paints Bleak Picture of Human Life Science Research at NASA (part 1)
![]() |
You can download the entire 100 page report (with appendices, charts, etc.) here (5.6 MB PDF).
The first portion of the document describes how the Space and Life Sciences Division at NASA JSC is supposed to conduct business. The second part of this report (excerpts below) opens by saying “Despite the apparent order of the process described above, the reality of the current program tells a more chaotic story.”
The third section of this report “Recommendations” ends with “The issue is clear. Voodoo science is not worth the cost. The limb of the fault tree Life Sciences is perched upon is perilously close to breaking.”
The last portion of this report contains a detailed statistical analysis of JSC life science research.
None of the problems described in this document arose overnight. Indeed, they are the result of decades of bad decisions – both at JSC as well as at NASA HQ. These problems are also the result of a failure on the part of advisory committees – both those sponsored by NASA as well as those chartered external to the agency.
Having been deeply involved myself in the advisory, peer review, and payload integration aspects of NASA’s life sciences programs in the 1980s and 1990s, I saw much of this with my own eyes. It hasn’t gotten any better.
NASA may soon be handed a new mandate for humans to do new things in space. Unless NASA gets its life sciences research house in order, NASA will not be able to respond to that mandate.
Editor’s note: The following was sent by email on 21 Oct 2003 by L.H. Kuznetz, the author of this white paper, in response to its posting online: “The Whitepaper entitled, Human Life Sciences Research Aboard the International Space Station and the Space Shuttle: A White Paper, dated June 18, 2003, was a first draft document intended by the author to solicit management feedback with regard to life science projects aboard the ISS and Shuttle. This document was for internal review only, and marked NOT FOR DISTRIBUTION for that reason. It was the first in a series of drafts intended to illicit constructive feedback from key program individuals and as such, was intended only for their review. The opinions expressed in the leaked copy were solely that of the author prior to editing and feedback. Based on the feedback received in the intervening period since the first draft, several of the major concerns were addressed in subsequent drafts and others are in the process of being addressed. The author regrets the inadvertent leak of this document and disassociates himself from any interpretations related to this draft. The goal of this work was to take the devil’s advocate position so crucial to the program following the STS-107 accident in order to improve life science research as a whole. It is unfair to judge the program prior to due process. Such a rush to judgment is a disservice to Life Sciences and NASA as well.” |
Human Life Sciences Research Aboard the International Space Station and the Space Shuttle: A White Paper
— LH Kuznetz
06/18/03
CONFIDENTIAL … NOT FOR DISTRIBUTION
Page 3
Acknowledgment
This manuscript was compiled from the input of many people in the Space and Life Sciences Division and represents 21 months of effort. Special thanks go to Al Feiveson, who provided statistical guidance for the Analytical Heirarchy Pairwise method and created the Excel program for the Moving Target Approach. Others whose data formed the foundation of this work include the Experiment Science Managers (ESMs); Internal and External Principal Investigators; Increment Scientists; DSO Flight Experiment Managers and Medical Operations Personnel. Without the input of these passionate and dedicated people, the key conclusions reached in this white paper would not have been possible.
Page 17
2. STATE OF THE CURRENT PROGRAM
Despite the apparent order of the process described above, the reality of the current program tells a more chaotic story. Metrics for the program appear in Figure 7, which reveal that of the 45 experiments either in or about to enter the flight queue, only 13 are designated Red 1 or highest priority, compared to 27 that are Yellow or lower priority. More disturbing is the fact that 18 of these 27 areYellow 2, the next to lowest tier of importance. (The Red, Yellow, Green designation was established by the REMAP commission for a balanced program (Appendix 1). Ignoring for the moment what this says about the science value, only 6 of the 45 are countermeasures (15%) while 85% constitute mechanistic or fundamental studies with no clear path to a countermeasure. This is a clear contradiction to the dictate established by the Young Commission that the development of countermeasures to prevent the deleterious effects of microgravity is the primary mission of ISS.
Another foreboding statistic relates to the number of studies that have overlapping objectives (as noted by the Critical Path Risks and Questions) and measure similar parameters yet are manifested and treated as if they were unrelated. This lack of commonality stretches out the queue far beyond what it need be, accumulating costs and waste in return. This is especially true for the Cardiovascular and Neurovestibular Disciplines, with 12 experiments between them. To put it in perspective, they would take 8 years to complete at an average duration of 4 years per experiment if 6 could be run in parallel and manifested at the same time, as opposed to 48 years if they were run in series. While the latter is a play on extremes, it gets the point across that not combining resources for related experiments is terribly wasteful. Throw in the fact that 6 of the 12 experiments in cardio and neuro are Yellow 2’s that clog the queue with second tier objectives and block new Category Reds from entering, and the stated BR&C goal of obtaining and implementing a countermeasure after 3 category Red experiments per discipline is pie in the sky.
Another concern is the quality of the science itself. Few if any of the experiments have valid controls in the usual sense of the word. They appear to, using pre and post flight data on the same individual, but subject-to-subject variations; restrictions imposed by Medical Operations; data sharing and other constraints (addressed below) conspire to confound results. Unfortunately, there is a “we have to live with it, that’s the nature of the beast,” mentality that has become the mantra, and controlled or cross disciplinary studies aimed at devariable-izing the mix are few and far between. Together with the small N size typical of most flight experiments, the line between real and wishful science is continually being blurred.
Page 19
NRAs.
The problems above start and end with the solicitation process. The flight NRAs seek research proposals that will “lead to the development of effective countermeasures or operational techniques for problems associated with one of the 12 disciplines covered by the Critical Path Roadmap.” However, of the 21 proposals that made the final cut in the 2001 solicitation, only 1 was a bona-fide countermeasure (Rubin) and it, in all likelihood, will be downgraded to a ground study prior to flying because it “lacks sufficient ground data pedigree in humans.” The pattern is not restricted to Flight NRAs. The 2002 Ground NRA solicitation drew over 100 proposals and only 5 countermeasures (2 of which were non funded international studies). Such a track record is discouraging, especially in light of the fact that the current flight program is already bloated with 84 % mechanistic studies. The problem doesn’t end here, however, the underpinnings of the entire NRA process are questionable. Specifically:
- The process seeks to be all inclusive with NASA’s international partners, but our largest partner, the Russians, are excluded and have their own separate process
- The peer review process uses 6 factors to rank proposals (see ILSRA-2001), but the winning entries are usually based on their science score
- Peer reviewers are typically unfamiliar with the realities of the ISS and Shuttle as a research platform
- The peer reviewers are used to the NIH model but NASA is not the NIH. The subject count, N is much lower; the science platform (ISS and Shuttle) are far more complicated and unforgiving; and the costs and schedule and implementation restrictions are many
- While there is a “panel of technical experts from NASA and other cooperating space agencies” assigned to evaluate the feasibility of carrying out flight experiments according to the “relevance to NASA’s programmatic needs and goals,” they do their job after the solicitation process has chosen winners.
- The process seeks to be uniform and fair, stating that “all proposals will be evaluated for scientific and technical merit by independent peer-review panels,” but these panels use different standards in each discipline. A winning score for a Behavior and Performance proposal might be 75, for example, while that for a Cardiovascular team could be 95
- The process seeks to be fair but an old boys network is still in place with many of the same investigators having been around for decades. By favoring prior experience, the process throws a gauntlet in front of new investigators while “feeding the gravy train,” of the old ones. One experiment in Pharmacokinetics, for example, received a poor science score and failed to get in to a NASA NRA but snuck in the back door with a changed name the next year by the NSBRI.
The above issues are only half the problem. The other half is the confusion arising from the fact that there is not one NRA process but many and they confuse not only the principal investigators but the peer reviewers and NASA implementers overseeing them. The pyramid of Figure 6 is a gross simplification. In effect, there are 9 separate solicitation mechanisms that lead to overlapping and conflicting experiments: NASA Flight NRAs; NSBRI Flight NRAs; CEVP Flight NRAs; NASA Ground NRAs; NSBRI Ground NRAs; CEVP Ground NRAs; SMOs; grants and unsoliticited proposals. In theory, these processes are somehow woven together, in fact they are anything but.
Page 20
CPR.
The Critical Path Roadmap drives the direction and quality of the NRAs but it is flawed. In principal, this document attempts to meet the goals of the Young Commission by means of risk reduction, mitigation, and management. In practice its reach far exceeds its grasp. Like a computer model of a complex system, its fidelity rests on the strength of its underlying assumptions and its inputs. By carving the human body into 12 distinct disciplines, none of which are really interconnected, the CPR attempts to do too much. Since the body is a complex system in which every subsystem depends on every other one, the CPR ought follow the same philosophy. It does not. The CPR fails because it assumes that the PIs answering the fundamental questions in a particular discipline will be able to cull out the cross-disciplinary effects from other disciplines. In fact, data sharing hurdles thrown in its path by astronaut privacy restrictions and other obstacles negate this assumption. The CPR is supposed to be a “living document” but there’s not enough resources, human or otherwise, to change it fast enough to reflect program changes. The conundrum of the CPR can best be appreciated by the following anecdote. A PI team proposing a cardiovascular experiment was told they would have to downscale their study because cardiomyopathy addressed Critical Questions 3.06 and 3.18 under Critical Risks 13 and 14 of the CPR. This was of lesser importance (yellow and green) than the arrhythmia portion of their experiment which addressed Critical Question 3.01 of Critical Risk 13 (Red). Their response to this was that “we are cardiologists who’ve been doing this for decades and we believe arrythmias are caused by cardiomyopathy. The CPR is wrong,”
A second problem with the CPR is that the assignment of critical questions to particular experiments is a judgment call in many cases. For example, does the Merfeld Sensory Integration experiment, Critical Risk 33, address Critical Question 9.09 or 9.25 or both; does the Alendronate SMO countermeasure (Critical Risk 9) also address Critical Risk 10, and Critical Question 2.19, 2.98 and 2.06 or all of them; does Bungo-Levine’s CARDIO experiment address Critical Risks 13, 14 and 15, and Critical Questions 3.01, 3.06 and 3.18 or just the highest priority items (Red 1)? Figure 2 lists these overlapping critical questions and risks, and the consequence misjudgment can be serious. Take Oman’s VOILA experiment, for example, addressing Critical Risk 20 in Behavior and Performance and labeled a Green 1 (lowest priority). It also appears to address Critical Risk 33 under Neurovestibular, however, which would elevate it to a Red 1, the highest priority. The difference between this label could be the difference between flying and deselection. Pierson’s SWAB experiment is another example, labeled as both an Immune discipline, Critical Risk 22, and a Food/Nutrition discipline, Critical Risk 8; or Schneider’s TREADMILL, which cuts across 4 disciplines, addressing Critical Risks 49, 19, 17 and 30 (at least it’s labeled a cross-disciplinary experiment). These are just some of the issues that are brought to bear when trying to use the CPR as a tool to categorize and prioritize flight research experiments. Another problem with the CPR is its apparent disconnect with evidence-based space medicine, ie, observations gleaned from actual flight experience. The documented history of physiological problems lists behavioral problems and kidney stones as the most frequent and serious problems but the CPR pays more attention to cardio and neuro while kidney stone experiments are ranked as Yellow 2 (Renal Stone). And while behavior experiments abound in the flight queue (there are 5), there is not a proposed countermeasure in the lot and redundant objectives in 3 of them. In summary, the CPR is a tool that attempts to extrapolate programmatic priorities using a cookbook approach to the human body that is prone to misinterpretation and confusion.
Page 21
Confounding variables
The program has justified the use of a small subject size, N, through studies such as Evan’s and Ildstad’s Small Clinical Trials, which in the case of human life science flight experiments has subjects serving as their own controls though preflight, in-flight and post-flight data collection. The editors of this work, however, probably never envisioned the number of confounding variables that would negate a small N under spaceflight conditions. Figures 8 and 9 are cases in point, showing the dispersion in two of the most important parameters in human microgravity studies, aerobic capacity and bone loss. In the words of the one of the PIs, “The numbers are %change per month with SD in (). If you take +-3*SD as the range (contains 97% of the data), then you see the variability is huge. For example, for total femur trabecular BMD, the percent change is 2.5+-0.9, which means you have a range of 0 to about 5% loss per month. At the end of 6 months, the extreme range is 30% or so of lost total femur trabecular BMD. These data clearly document large variabilities.”
How is one to determine the root cause of the data scatter in such studies when multiple parameters are varying concurrently, parameters that cannot be culled out due to data sharing issues (addressed later); different techniques or instruments for measuring the same thing (Biopsy (US) vs Myon ( Russian); Profilaktika vs CEVIS; different types of ultrasound, DEXA’s, MRIs, etc). Throw the small N into the mix and conclusions of worth become rare indeed. How, for example, can one justify an N of 3 for the Foot experiment, 4 for H-reflex and 5 for Spatial Cues under such circumstances? The case of exercise countermeasures is especially noticeable, since they are supposed be beneficial on multiple fronts, from muscle strength to bone loss. There are three exercise countermeasures on ISS, the TEVIS (treadmill), CEVIS (bicycle ergometer) and IRED (resistive force device). The exercise prescriptions used for all 3 went from a research protocol to a countermeasure application before they were fully mature. The effect of exercise on the various organ systems is poorly understood as a consequence. Many assume that exercise is beneficial to bone loss, for example, but more than one principal investigator has looked at Figures 8 and 9 and wondered how anyone could reach such a conclusion. And while the in-flight exercise countermeasure is mandatory, the exercise prescriptions vary greatly, oftentimes left up to the astronauts to do “their own thing.” To compound matters, there are too many fingers in the exercise prescription pie: exercise physiologists, ASCRS, flight surgeons, etc. It is also interesting to note that the ASCRS are under the auspices of Flight Surgeons, while the Exercise Physiology lab is under the Human Adaptation and Countermeasure Office, ie., flight research. The lab is also beholding to Med Ops since it responsible for a number of medical requirements (MRIDS). The concluding example relates to the use of drugs as a confounding variable and the lack of well-thought out exclusion controls. The soon to be implemented Alendronate study to reduce bone loss would have used the same subjects eligible for other bone loss studies such as VIBE (using vibration) and Renal Stone (using potassium citrate) if the potential interactions had not been inadvertently detected. In short, a systematic, rigorous means of preventing multiple variables from confounding flight research data is sorely lacking. The program is full of selfdestructive inconsistencies, many of them owing to poor management (see below).
Part 1 | 2
