The Space Reviewin association with SpaceNews

Huygens on Titan
Huygens’ successful landing on Titan would have been for naught had some dedicated engineers not caught a critical flaw in the probe’s communications system. (credit: ESA)

How Huygens avoided disaster

Titan was full of surprises when the Huygens probe descended into its atmosphere Friday morning, returning images and other data from that mysterious moon. But at least one very unpleasant surprise—a fatal design flaw in its communications system—had already been revealed and circumvented. Otherwise, the entire Huygens mission would have been in jeopardy.

The combination of Cassini’s tremendous speed and the sharp air-drag deceleration of the Huygens probe creates a significant Doppler shift in the probe’s signals as seen from Cassini (the expected Huygens-Cassini velocity shift would have been up to 5.5 km/sec). Engineers at Alenia Spazio, the Italian company that built the radio link, properly anticipated the need for adequate receiver bandwidth to accommodate the frequency shift.

Early in the design process, the engineers also understood one other consequence of the Doppler effect. But somehow, as the design was developed, this feature somehow dropped from sight. All design reviews of the Probe Data Relay Subsystem (PDRS), including those conducted with NASA participation, also failed to notice this oversight.

“We would have lost a substantial amount of data” if the problem wasn’t corrected, Mitchell said. “We had to fix the problem.”

Richard Horttor, a NASA interplanetary probe project manager and chief of telecommunications engineering for JPL’s Telecommunications Science and Engineering Division discussed the problem with me last year. “The receiver must synchronize its demodulator with the bit stream,” he explained. That is, it must know how to “punctuate” the continuous stream of zeroes and ones, and break them down into individual measurements, group by group. Each measurement begins with a special pattern, a “synchronization pulse”, that is recognized by the receiver.

Here was the crucial flaw. The Doppler shift did not only change the frequency of the incoming signal, it also squeezed it into a slightly shorter time period. As a result, Cassini’s receiver would have been unable to recognize the timing pulse in its expected location, and thus the incoming data stream would become unreadable.

In that case, the incoming data stream from the Huygens “would have been sporadic”, Horttor said. There would have been periods of clear reception interspersed with frequent loss of lock. “We would have lost a substantial amount of data,” Robert Mitchell, Program Manager for the Cassini Mission at NASA’s JPL, told me. “We had to fix the problem.”

“They made the receiver bandwidth extremely narrow,” Horttor continued, “with a scheme that actually reduced the bandwidth step by step in response to increasing signal strength.” Horttor never got an explanation of why it was built this way (“It is a design feature of another application in Earth orbit and they just reused it,” he told me, adding “I don’t know why anyone would ever want to build it that way.”), when “all that was really needed was a fixed bandwidth”.

The scheme was implemented by firmware loaded in the receiver. A simple change to some operating parameters would have fixed the problem but that code was not designed to be changeable after launch.

“We have a technical term for what went wrong here,” Zarnecki said. “It’s called a cock-up.”

The NASA reviewers had never been given the specs of the receiver. As explained to me by Mitchell, “Alenia considered JPL to be a competitor and treated the radio design as proprietary data.” Horttor elaborated that NASA probably could have insisted on seeing the design if it agreed to sign standard non-disclosure agreements, but didn’t consider the effort worthwhile. “We had never thought that Doppler would be a problem because we knew how to design for it”, based on decades of American experience with interplanetary probes.

“If we had seen a schematic with the multiple gain states that were signal-level dependent, we’d have seen the problem immediately”, Horttor continued. Mitchell agreed. “We have a technical term for what went wrong here,” British scientist John Zarnecki of the Planetary Science and Space Research Institute of Britain’s Open University told reporters. “It’s called a cock-up.”

Once the equipment had been fabricated, it was extensively tested on the ground. This included feeding simulated Doppler-displaced signals through the radio equipment. However, a “full-up” high-fidelity test would have required physical disassembly of some of the communications components, a proposal that was rejected. Even so, NASA testers had found an unrelated polarization error in the equipment, which was quickly fixed.

“Budget was a key part” of this decision, Mitchell explained. “Such a test would have been very difficult to set up and the cost was not considered warranted relative to the expected risks.” The reassembled vehicle would then have had to undergo exhaustive and expensive recertification. In hindsight, these testing failures were embarrassing. “We had three safety nets set up to catch things like this,” said John Credland, head of ESA’s science projects, “and it now appears that we fell through all three.”

But even after launching the fatally-flawed spacecraft, engineers had a chance to save it. Because of the years-long cruise out to Saturn, there was all sorts of time and opportunity for in-flight testing of Cassini and Huygens. In February 2000, one particular communications test was being discussed. “This test had been proposed, and rejected once,” Mitchell explained to me. “Then it was proposed again, in an easier mode—and it was accepted.”

Boris Smeds, a Swedish engineer at the ESA operations center in Darmstadt, Germany, was the chief advocate of the test. Once it had been approved, he proposed modifying it to perform a full-up simulation of the probe relay signal processing.

“We had three safety nets set up to catch things like this,” said Credland, “and it now appears that we fell through all three.”

“He had to argue with those who didn’t think it was necessary,” Mitchell recalled. The approved plan merely was to send carrier tones, but Smeds developed a test signal pattern on his office computer and persisted in championing it. In the end, his plan was accepted because it was easy to do, even though nobody but him seemed to think it was worth doing. The simple carrier-tone test, Mitchell added, would never have uncovered the problem.

NASA performed the test at its Deep Space Network facility in Goldstone, California. “We shipped off the data to Darmstadt,” he continued, and they spent several months digesting it. Mitchell recalled getting a note that “there’s something curious here”, and JPL began examining the data more closely itself. “They realized they just didn’t have the data they expected to get,” he explained. There was initial skepticism in Europe that there was any problem, since the receiver had been flown in a previous European satellite (which was never identified to the NASA side), but by September 2000 Smeds had persuaded his side that something might be very, very wrong.

“Boris was the principal in all of this,” Mitchell stated unequivocally. “Without him we wouldn’t have known we had a problem.” Added John Zarnecki, “The guys who pushed the original test through are heroes.”

The next step was to perform additional testing to verify the failure mode that was suspected. This occurred over a five-day period in February 2001. The bad news was that the flaw was found to be real, but the good news was that it looked just like what was suspected, and that there were four more years to figure out what to do.

ESA immediately convened an inquiry board, with two NASA observers. The board was astonished to discover that Doppler requirements and design specifications had never been documented. “This error of omission was perpetuated throughout the life of the project before launch with not a single recorded question raised on the subject in any ESA, NASA, or independent review…” the report stated, identifying this error as the “root cause” of the flaw. “The tolerance on frequency stability of the subcarrier is wide enough to cover the effects of frequency drift and Doppler shift but unfortunately the specified tolerance on the data stream clock rate is more limited and not wide enough to cover the Doppler frequency shift.” It added that “an increase of less than 1 Hz in the loop bandwidth of the bit detector would have been more than adequate”.

A number of steps were listed that, if fully implemented, could have avoided the anomaly. Project requirements should have been traceable. An end-to-end test should have been performed. Flexibility in allowing ground commanding and software reloading should have been greater. And all issues of access to proprietary data should have been resolved at the beginning. These principles apply to other spacecraft projects as well.

Richard Horttor, who was one of the NASA observers on the inquiry board, recalled: “We worked our way out by being totally candid from top to bottom, once we detected the problem. There was no hesitancy or lack of resources.” Nor was there any “nation-to-nation finger-pointing”. Moreover, NASA management, burned by its blindness to signs of trouble on the two doomed NASA Mars probes in 1999, was much more receptive. Mitchell was asked how much persuasion he took to recognize he had a real problem. “None,” he told me.

“We worked our way out by being totally candid from top to bottom, once we detected the problem,” said Horttor. “There was no hesitancy or lack of resources.”

From a variety of get-well strategies, the Cassini team crafted a response plan that centered on reducing the Doppler shift adequately so the timing pulses would remain within the recognition range of the receiver. This was accomplished by raising the altitude of the Cassini probe as it flew past Titan while the probe was entering its atmosphere. As a result of this geometrical rearrangement, the probe’s major deceleration component was normal to the Huygens-Cassini line-of-sight rather than mostly along it.

By the time the recovery plan was developed, interplanetary navigation experts had already laboriously developed Cassini’s multi-year flight plan around for visiting Saturn’s moons. There were 44 close fly-by passes of Titan, 8 more of smaller moons, and between 50 and 100 more distant passes of these other moons. Reconstructing this celestial ballet from scratch would have been prohibitively expensive.

So the navigators designed a trajectory in which Cassini entered a lower and faster orbit around Saturn, dropped off the probe at the desired distance, and then hit a specific point in space that coincided with a point on the previously planned path. There it would fire its rocket engine again to get back on the original course. It would make three orbits of Saturn during this altered period instead of the original two, but the extra rocket fuel to make the changes was available because Cassini’s navigation had been so precise up to then, that a lot of fuel allocated to course corrections had not been used.

Instead of landing on Titan in November 2004, Huygens was deployed in December for a landing on January 14, 2005. The stunning images of Titan’s surface returned by Huygens are due in large part to the efforts of some persistent, insightful engineers, who circumvented the one surprise scientists didn’t want to encounter on Friday.