The Space Reviewin association with SpaceNews

Shuttle mission control
An acceptance of problems with no concerns of potentially dangerous consequences was at the root of the “cultural” problems that led to two shuttle tragedies. (credit: NASA)

What does a sick “space safety culture” smell like?

In the months following the Columbia shuttle disaster two years ago, the independent Columbia Accident Investigation Board (CAIB) sought both the immediate cause of the accident and the cultural context that had allowed it to happen. They pinpointed a “flawed safety culture”, and admitted that 90% of their critique could have been discovered and written before the astronauts had been killed—but NASA officials hadn’t noticed.

The challenge to NASA workers in the future is to learn to recognize this condition and react to it, not to “go along” to be team players who don’t rock the boat. NASA has supposedly spent the last two years training its work force to “know better” in the future, and this is the greatest challenge it has had to face. It’s harder than the engineering problems, harder than the budget problems, harder than the political problems—and in fact might just be too hard.

From personal experience, perhaps I can offer a case study to help in this would-be cultural revolution.

Fixing NASA’s “flawed safety culture” will be harder than the engineering problems, harder than the budget problems, harder than the political problems—and in fact might just be too hard.

I remember what a flawed safety culture smelled like—I was there once. It was mid-1985, and the space shuttle program was a headlong juggernaut with the distinct sense among the “working troops” that things were coming apart. My job was at Mission Control, earlier as a specialist in formation flying and then as a technical analyst of how the flight design and flight control teams interacted.

Very deliberately, I’ve tried to insure that this memory wasn’t an edited version, with impressions added after the loss of the Challenger and its crew the following January. No, I recall the hall conversations and the wide-eyed anxiety of my fellow workers at the Johnson Space Center in Houston. Something didn’t smell right, and it frightened us, even at the time—but we felt helpless, because we knew we had no control over the course of the program.

In June there had been a particularly embarrassing screw-up at Mission Control. On STS 51-G, the shuttle was supposed to turn itself so that a special UV-transparent window faced a test laser in Maui, allowing atmospheric transmission characteristics to be measured for a US Air Force experiment.

Instead, the shuttle rolled the window to face high into space, ruining the experiment. When it happened, some people in the control room actually laughed, but the flight director—a veteran of the Apollo program—sternly lectured them on their easy acceptance of a major human error. Privately, many of the younger workers later laughed at him some more.

The error was caused by smugness, lack of communications, assumptions of goodness, and no fear of consequences of errors. All of these traits were almost immediately obvious. Nothing changed afterwards, until seven people died. And then, for a while only, things did change, only to tragically change back until another seven lives were lost.

The following description of the event ventures into technical areas and terminology, but I’m doing my best to keep it “real world” because the process was so analogous to the more serious errors that would, at other times, kill people. It was a portent—one of many—that NASA’s leadership failed to heed.

Then, as with the ancestry of many, many engineering errors, somebody had a good idea to improve the system.

The plan had been to use a feature of the shuttle’s computerized autopilot that could point any desired “body vector” (a line coming out of the shuttle’s midpoint) toward any of a variety of targets in space. You could select a celestial object, the center of the Earth, or even another orbiting satellite. Or, you could select a point on Earth’s surface.

That point would be specified by latitude, longitude, and elevation. The units for the first two parameters were degrees, of course, but for some odd reason—pilot-astronaut preference, apparently—the elevation value was in nautical miles.

This was no problem at first, when only two digits were allowed on the computer screen for the value. Clearly the maximum altitude wasn’t 99 feet, so operators who were puzzled could look up the display in a special on-board dictionary and see what was really required.

Then, as with the ancestry of many, many engineering errors, somebody had a good idea to improve the system.

Because the pan-tilt pointing system of the shuttle’s high-gain dish antenna was considered unreliable, NASA approved a backup plan for orienting the antenna directly towards a relay satellite. The antenna would be manually locked into a “straight-up” position, and the shuttle would use the pointing autopilot to aim that body axis at an earth-centered point: the “mountaintop” 22,000 miles above the equator where the relay satellite was in stationary orbit.

It was a clever usage of one software package to an unanticipated application. All that was required was that the allowable input for altitude (in nautical miles) be increased from two digits to five. It seemed simple and safe, as long as all operators read the user’s manual.

“If it can go wrong in space, it will”

The backup control plan was never needed, since the antenna pointing motors proved perfectly reliable. In addition, the ground-site pointing option was rarely used, so Mission Control got rusty in its quirks.

Then came the Air Force request to point a shuttle window at a real mountaintop. Simple enough, it seemed, and the responsible operator developed the appropriate numbers and tested them at his desktop computer, then entered them in the mission’s flight plan.

The altitude of the Air Force site was 9,994 feet. That’s 1.65 nautical miles—but that number never showed up in the flight plan.

Instead, because the pointing experts used a desktop program they had written that required feet be entered (they weren’t pilots, after all), they had tested and verified the shuttle’s performance when the number “9994” was entered. So that’s what they submitted for the crew’s checklist.

As the hour approached for the test, one clue showed up at Mission Control that something was amiss. The pointing experts had used longitude units as degrees east, ranging from 0 to 360, and had entered “203.74” for the longitude. Aboard the shuttle, the autopilot rejected that number as “out of range”.

A quick check of the user’s manual showed that the autopilot was expecting longitude in degrees with a range of plus or minus 0 to 180. The correct figure, “–156.26”, was quickly computed and entered, with an “oops” and a shoulder shrug from the pointing officer. He did not ask himself—and nobody else asked him—that if one parameter had used improper units and range, was it worth the 30 seconds it would take to verify the other parameters as well? No, it was assumed, since the other values were “accepted” by the autopilot, they must be correct.

So as ordered, when the time came, the shuttle obediently turned its instrument window to face a point in space 9,994 nautical miles directly over Hawaii. The astronauts in space and the flight controllers on Earth were at first alarmed by the apparent malfunction that ruined the experiment. But then came the explanation, which most thought funny. After all, nobody had been hurt. The alarm subsided.

The breadth of the stink

A young engineer from a contract team that supported the pointing experts later showed me the memo he had written, months earlier, correctly identifying the errors in the two parameters that had been written down in the crew checklist. They were inconsistent with the user’s manual, he had pointed out, and wouldn’t work—and he also showed the computer simulation program that verified it. The memo was never answered, and the engineer’s manager didn’t want to pester the pointing experts further because his group was up for contract renewal and didn’t want any black marks for making trouble.

Other friends of mine in other disciplines confided in me their growing desperation of encountering a more and more sloppy approach to spaceflight, as repeated successes showed that “routine” was becoming real and that carelessness was turning out to have no negative consequences.

Nor was the space press all that interested in drawing alarming conclusions from this and other “straws in the space wind” that were becoming widely known. NASA had announced its program for sending a journalist into space. The classic penchant of big bureaucracies to adore press praise and resent press criticism was well known, and NASA wasn’t immune to this urge, as space reporters well knew. So it was safer for their own chances to fly in space if they just passed over this negative angle.

Other friends of mine in other disciplines—in the robot arm, in electrical power budgeting, in life science experiments—confided in me their growing desperation of encountering a more and more sloppy approach to spaceflight, as repeated successes showed that “routine” was becoming real and that carelessness was turning out to have no negative consequences. People all around them, they lamented, had lost their fear of failure, and had lost respect for the strict discipline that forbade convenient, comfortable “assumptions of goodness” unless they were backed up by solid testing and analysis.

It was precisely this sort of thinking that led to the management decision flaws that would lose Challenger (that specific flaw was at Cape Canaveral, but it reflected a NASA-wide cultural malaise), and a generation later, lose Columbia (the flaws then were squarely in the Houston space team and at the Alabama center that built the fuel tank whose falling insulation mortally wounded the spaceship’s wing).

It is that sort of thinking that space workers, and workers in any activity where misjudgment can have grievous impact, must vigorously learn to smell out. This time, too, they must know that they must act, and not “go along”, or else it’s only a matter of time that the real world finds another technical path that leads to a new disaster.