From: Ames Research Center
Posted: Tuesday, May 3, 2016
Engineers, and their scientist colleagues, who saved NASA’s Kepler spacecraft – twice – will answer questions about what it took to recover Kepler and get it back on the job of searching for exoplanets and a menagerie of astrophysical phenomenon on Wednesday, May 4 at 2 p.m. EDT during a Reddit.com "Ask Me Anything" or AMA.
The engineers and scientists at NASA's Ames Research Center in California's Silicon Valley, Ball Aerospace and the Laboratory for Atmospheric and Space Physics (LASP) at the University of Colorado, both located in Boulder, had saved Kepler once before in 2013, using the subtle pressure from our sun as balance after wheels keeping the spacecraft steady failed soon after it completed an additional year in an extended mission. This save gave the spacecraft a new job called the K2 mission. K2 continues the legacy of planet hunting but has presented new opportunities to study supernovae, star clusters and galaxies far, far away. On April 8, right before it was slated to embark on K2's Campaign 9, a monumental scientific expedition to search for far out worlds, engineers found the spacecraft in a fuel-intensive “coma.” On April 22, the spacecraft was recovered to science mode and began making observations for the K2 mission once again.
NASA's Kepler and K2’s mission manager Charlie Sobeck, who will also be participating in Wednesday's AMA, sat down to talk with us about what happens once a spacecraft goes into emergency mode.
MJ: So let's start at the beginning. What happened to the Kepler spacecraft on April 8?
CS: Well, the first thing to remember was that we weren’t expecting anything like this. We had talked with the spacecraft four days earlier and everything was ready. It was scheduled to make the turn to its observing attitude (MJ: where the spacecraft points the telescope to make observations) in the blind – that is, on its own without supervision from the ground. We’d completed eight previous campaigns, and although this one was going to be the first that looked in the forward velocity vector (MJ: instead of looking towards where it’s been, the spacecraft will look in the direction of where it’s going), there wasn’t much point in tying up an antenna at NASA's Deep Space Network (DSN) just to receive confirmation it was turning. Instead, we’d scheduled our next DSN contact for when the spacecraft should have thermally stabilized and had been collecting data for a few hours. We expected to find it happily humming away. Instead, on April 8 at 1:05 a.m. PDT, we found it in Emergency Mode. Not Safe Mode, mind you, where it’s gone in the past due to anomalies, but Emergency Mode, just a step from being altogether lost. This was the first time the spacecraft had ever been so desperate. Even when the reaction wheels failed, we never went into Emergency Mode.
So immediately, people started to gather. The Ball Aerospace engineers who manage the spacecraft operations on a day-to-day basis were already on-station (MJ: the Ball team was in place and in contact with the DSN station) for the contact, and as we had prearranged, they didn’t wait for the rest of us to get in before they started the recovery process. The first thing we knew was that the spacecraft had been in Emergency Mode for about 30 hours before we began our contact. That told us that whatever had happened, it happened before the spacecraft ever began the turn to the forward velocity vector. That eliminated the possibility that we had planned the turn wrong, or that the reaction wheels were a part of the problem, since they don’t start to spin until we get to our observing attitude.
We also knew that the fault which sent us to Emergency Mode was a Sun Avoidance fault – a pointing response, rather than say, an under-voltage or over-voltage condition. Beyond that, we were pretty blind. Telemetry is limited in Emergency mode.
The first order of business was to bring the spacecraft back from the edge, so to speak, to a more amenable Safe Mode, where we could gather some more data and lower the rate of fuel burn.
Those first data indicated a multi-system problem – thrusters, communication hardware, wheels, etc. Since it is unlikely that many things would fail at once, this suggested that it was more likely a problem of the systems properly reporting their status. You can see how the pieces of the puzzle start coming into focus, one piece at a time.
Once we had established a stable Safe Mode, we still needed to bring it back one step farther before we could begin the investigation in earnest. In both Safe Mode and Emergency Mode, the spacecraft points the solar panels towards the sun and goes into a slow spin to ensure that the transmitting antenna will sweep past Earth and give us a link. But this meant we could only gather limited data for 20 minutes every couple of hours when the antenna was pointed toward Earth during each rotation. To really dig into the problem we had to stop the spin while the antenna was pointed towards the Earth. When we did this, the recovery was able to really pick up speed.
MJ: What state is Kepler in now? Is it back to normal operations?
CS: Yes, Kepler is back to normal operations and has begun the K2 mission's Campaign 9, two weeks late. We still don’t know exactly what started all the problems, but once we completed the recovery all systems tested normal and it made no sense to keep it from its job while we dug into all the data that we collected and talked to the experts about what might have occurred. We’ll continue the investigation while Kepler goes about its observations, though we’ll check on the spacecraft a bit more often until we gain confidence that is truly healthy and not just feeling OK.
But unless something new pops up, all the signs are that it should have no ill effects from its spree.
MJ: What is emergency mode and what does it mean to declare a spacecraft emergency?
CS: Emergency Mode is the spacecraft's last-ditch effort to save itself if all other actions fail to work. As such, it assumes that none of the regular tools in the toolbox are working properly (or it wouldn't have gotten to this state), and it reverts to only the most basic set of tools.
The most important distinction between Emergency Mode and any other mode the spacecraft works in, is the computers used to control the spacecraft. Kepler has two main computers, a prime and a secondary. It also has two back-up computers, prime & secondary. Emergency mode assumes that neither of the main computers is working and shuts them down, defaulting to the back-up pair. The back-up computers are more robust, but less capable than the main computers, and they also aren't trying to do as much.
Emergency Mode also turns off all “non-essential” equipment. So the photometer and data recorder are turned off. So are the reaction wheel and star trackers, along with the main computers and some other subsystems. The critical systems for Emergency Mode to keep on are the backup computers, the solar panels, a minimum set of thrusters and the communications systems to allow contact with the ground.
Data is limited in Emergency Mode, and is not stored, but simply transmitted in real time.
The spacecraft is pointed with the solar panels toward the sun to maximize the available power, and with the non-essential systems powered off, the power needs are minimized. The spacecraft is put into a slow spin about the sun-line, at about one full turn every two hours, or 20 seconds to move one degree. With the wheels off, thrusters must be used to establish the orientation, begin the spin and keep the solar panels toward the sun. This means a significantly higher rate of fuel burn, hence the need to respond quickly.
Declaring a spacecraft emergency establishes priority access to the DSN antennas. Typically, the DSN works with missions to allocate antenna access weeks to months in advance. When something unusual occurs this coordination can be shortened considerably, with the DSN facilitating negotiations between the various missions that use the antennas. But when a spacecraft is at serious and substantial risk of being lost, and the project manager is authorized to declare a “Spacecraft Emergency,” and negotiations are bypassed entirely, with all the necessary resources made available to support the mission at risk. Because other missions are doing unique and important work, disrupting them with an unplanned emergency is not an action that is taken lightly. We do not declare a spacecraft emergency when the spacecraft merely goes into Safe Mode, or if we simply don’t know what is going on. We use the spacecraft emergency card only when we truly believe the loss of the spacecraft is imminent without it.
This was the first, and hopefully last, use of a spacecraft emergency by the Kepler/K2 team.
MJ: Take us back to the days immediately following the spacecraft emergency declaration. What steps did the team take to recover the spacecraft from emergency? Who was involved? How did the team respond to the high-stakes nature of the situation?
CS: I described many of the steps we took in the answer to the first question. The very first steps taken were to wake up the team members who were not already on duty. Normal operations are conducted with a staff that consists of a flight director and flight operators working at the University of Colorado’s Laboratory for Atmospheric and Space Physics (LASP) in Boulder Colorado, and a mission operations manager and flight engineers at Ball Aerospace, also in Boulder. The staff at the LASP is the folks that are directly talking with the spacecraft, receiving the data and issuing commands through the DSN. The folks at Ball Aerospace have the responsibility to oversee that work and in addition, calculate and write the commands and determine what commands should be sent, in what sequence and with what timing. In our parlance, LASP is the mission operations center, and Ball operates the flight planning center. These are both professional and experienced organizations.
When the spacecraft was found to be in Emergency Mode, a network of phone calls went out to bring in additional staff and expertise. In particular, the mission director, the project systems engineer and the project manager from Ames were called in, as well as the Ball program manager. These additions would provide real-time, authoritative decisions, such as the declaration of a spacecraft emergency, and the ability to bring on specific resource as required. Resources such as the people who designed and built the spacecraft in the first place.
As I recall, I received a call from the mission director at Ames, Marcie Smith, at 1:25 a.m. Friday morning. Knowing that there was a planned spacecraft contact, I expected that she would tell me that the spacecraft point was just a bit off, and we’d have to give it a nudge. Instead I heard, “We’re in Emergency Mode.” Within two minutes we confirmed what steps should be taken, and what resources needed to be immediately brought in, and that the flight team in Boulder had already begun the recovery actions. I headed into the office.
When I got to the office, Marcie was already at her desk with an open phone line that included both the Boulder groups as well as Ames, and the project systems engineer at Ames, Stephen Walker, joined us soon thereafter.
We pretty much lived in that environment for the next three days as we recovered the spacecraft to a manageable state and were able to end the spacecraft emergency declaration.
Throughout the process the team was focused and professional. I was impressed with the commitment, which everyone on the team demonstrated, and the cool, thoughtful approach that was taken. As part of my roll, I alerted Ames and NASA management of the problem and kept them informed with regular status updates. Again, I was impressed with everyone’s ability to help when they could, and to stay out of the way when they couldn’t.
MJ: [Operating in emergency mode is fuel-intensive.] Has the fuel-intensive emergency mode impacted remaining plans for the K2 mission? Will fuel conservation measures be needed or will plans be altered?
CS: It is too early to adjust any plans based on the fuel status. It’s clear that this emergency consumed fuel at an accelerated rate, but it’s not clear how much was consumed, or why. It appears to me as though we lost more fuel than I had hoped, but less than I had feared. With the fuel loss, there has been a noticeable drop in the fuel tank pressure, but the pressure drop in not linear, so it isn’t immediately obvious what this means. I suspect it will take a few months of normal usage to recalibrate our fuel estimates. Generally we do this annually, and it seems that each year our estimates of our fuel efficiency is better than the year before.
The K2 mission has always been fundamentally limited by fuel, so to perform the maximum amount of science observations conserving fuel is an ongoing job. As we gain experience in operating the spacecraft in its two-wheel mode, we learn ways to improve our efficiency. Several steps have already been taken, which have doubled our initial mission duration estimates, but we’ve probably already made most of the gains that can be expected, so I don’t expect a lot more.
Measure the quantity of a liquid in space is a difficult business, so how much fuel we have left is uncertain. It has been our plan to continue operating the K2 mission until the fuel runs out. Meaning that at some point we will begin a campaign and will never hear back from the spacecraft.
MJ: It was reported that the cause is likely a transient event. What is a transient event and when will you know the root cause of the spacecraft anomaly?
CS: By a “transient event,” I mean something that existed for a relatively short period of time, and then went away, either on it’s own or because of the emergency mode and its recovery. Transient events might result from highenergy cosmic rays that can randomly hit a sensitive piece in the electronics. Power surges or dropouts that can cause the electronics to perform atypically for a period of time, or by a race condition that results from a timing conflict between two contradictory signals can also cause transient events.
Whatever the cause, what distinguishes a transient event is the fact that it is reversible, and the systems can be restored. Often when the system is restored, the nature of the transient remains unknown, and this may be true in this case as well. This is as opposed to a “hard failure,” such as a fuse blowing out or a hard disk physically crashing. These things are not reversible.
We are all used to such unexplained transient events in our daily lives: our cell phones drop out, our computer hangs up and the lights dim. Sometimes these are explainable (the lights dim when the refrigerator compressor comes on), but often they are not. We learn to live with them as a normal part of life. We call back and we reboot the computer.
Spacecraft are designed and built to be more reliable than many of our everyday appliances, but it doesn’t mean they are totally immune from these failures. The spacecraft today, looks to be operating just as it did before the event. So whatever happened, it appears to have not only been reversible, but has now reverted to its previous state.
MJ: Kepler has had mechanical problems in the past. Is this recent event connected to previous issues, and does this signal end-of-life for the spacecraft?
CS: The Emergency Mode doesn’t appear to be related to any of previous problem, the main one that comes to mind is the reaction wheels. The wheels were not spinning and not being used when the Emergency Mode occurred.
We have seen other surprises during the course of the mission: counters that rolled over to zero, optical reflections of bright objects. But this event doesn’t seem to resemble these… at least, so far. We don’t yet know what spawn the problem, and we may never know, but the first effects that we’ve found were a sudden series of alarms that caused the onboard fault protection to react. Although the fault protection seems to have responded appropriately to each of the alarms, the alarms themselves seem to be erroneous: That is, they were false alarms that didn’t accurately reflect what was going on. As a result, the spacecraft’s response didn’t address the real situation, only the situation that was reported. In such conditions the resulting actions can, and this case were, detrimental rather than helpful.
We have seen erroneous alarms before, but not like this.
The good news is that everything seems to have returned to normal, and while this still may be a sign of the aging of the systems, it could have also been a random occurrence.
MJ: Had Kepler been unrecoverable, what were some of the planned scientific targets that we may have missed out on?
CS: If the spacecraft were truly unrecoverable, then no further science will be gathered and the K2 mission would end. We would have completed eight of the expected 18 or so campaigns. The fields of view of the remaining planned campaigns can be found at the Kepler Science Center site. The K2 targets are entirely selected through competitive process, with proposals considered for two to three campaigns at a time. Information on the observed and planned targets can also be found at the Kepler Science Center.
Q9: How often is the status and health of Kepler checked in on, typically, and how closely is it being monitored now?
CS: Typically the spacecraft is contacted at least twice a week to verify that it remains in its expected state of health. During the initial recovery, it was monitored as continuously as possible, with occasional gaps of three to four hours in order to allow the ground antennas to check on other NASA spacecraft. These gaps occurred overnight, while the ground team got some sleep. Once the spacecraft was out of immediate danger and we released the declaration of a spacecraft emergency, it was monitored as much as possible, given the constraints of also operating other missions, but at least several hours each day.
Now that the spacecraft is back in normal operations we will generally contact it daily for a couple of weeks while we build confidence that there is no persistent problem. Eventually I expect that we will return to our normal practice of checking on it twice a week.
MJ: The Kepler mission, and the follow-on mission called K2, is one of NASA's most visible missions. How did it feel to manage the team through the crisis as many watched with great interest and anticipation? Did you have your doubts that the spacecraft would return to make new scientific observations?
CS: I think there are many people who face this kind of situation daily: first responders, emergency rooms, etc. There is a sense of satisfaction in doing a job well and doing it under pressure. This was our emergency, and our opportunity to respond.
I was fully aware that the situation was serious and needed focused attention. But I also knew that we had a good team with a lot of experience. There was no panic. Rather there was a focused determination. The team worked professionally, dealing with the problem at hand, prioritizing actions and implementing solutions.
For the most part, we weren’t occupied with worrying about the future, but focusing on the present before us. When there were periods where there was time to reflect, most of the discussions were speculations on the potential causes, what those causes might mean in the near term, and what actions could be taken to mitigate them. I don’t believe anyone had more than a momentary thought that the mission had ended.
MJ: Charlie, thank you for your candor and walking us through an incredible experience—once again demonstrating when faced with adversity, a calm and collected response prevails. In that vain, what advice would you give to the next generation of engineers and scientists interested in pursuing the type of work you do at NASA?
CS: For my part, I feel that NASA does important work, and it’s work that I wanted to be a part of. I’ve enjoyed my job and am grateful to have had the opportunities I’ve had. My advice for someone interested in pursuing a job at NASA is much the same as I would give to anyone else: Do what you enjoy. Do what you’re good at. Do something you feel is important. And whatever it is you do, try to do it well. Be open to opportunities. Be helpful.
There are many opportunities. Not all of us are astronauts, but we can all be helpful and productive as we continue to explore the space around us and far, far away.
Kepler and K2 mission manager
NASA's Ames Research Center
// end //