The Castor 600 rocket motor’s nozzle disintegrated during its inaugural test in May 2019, setting off an intense investigation. (credit: Northrop Grumman)

CSI: Rocket Science

by Jeffrey L. Smith
Monday, July 13, 2020

In the failure review process, engineers and technicians work together to perform two separate but equally important tasks: the Investigation to determine the accident’s Root Cause, and the Recovery to implement the Corrective Action.

These are their stories.

DUN DUN

Every rocket project in history has run into problems. How an engineering team responds to failures, and even plans them, is almost more important than the failures themselves. Over decades of experience, engineers have devised and refined ways to anticipate failures, systematically analyze them, and get their rocket projects back on track. These efforts take on the character of crime investigations, complete with crime scene photography, forensic lab analysis, and sifting through suspects to find the real culprit. Northrop Grumman’s OmegA team found themselves in just such a situation on May 29, 2019, and they would work long hours to try to solve the case.

The engineering team wanted to understand those risks as well as possible, so they pushed the motor to its limits to test as many individual features as possible.

The OmegA rocket is Northrop Grumman’s entry in the US Air Force’s National Security Space Launch competition. This summer, the Air Force intends to pick two rockets to launch their communication, GPS and intelligence satellites that are critical to national security. The OmegA team had precious little time to investigate and resolve the issue, or else their rocket project would become derailed and they’d find themselves unable to meet the Air Force’s brisk timeline.

The scene of the crime

May 29, 2019 started out like any other day, or at least any other day where a brand-new rocket motor is being tested for the first time. The Castor 600 is the first stage that will power Northrop Grumman’s new OmegA rocket. It’s also the first in a new family of solid rocket motors, along with its siblings, the smaller Castor 300 and larger Castor 1200. These solid rocket motors represent the newest components of the OmegA rocket, and hence the riskiest parts. The engineering team wanted to understand those risks as well as possible, so they pushed the motor to its limits to test as many individual features as possible.

Castor rocket motor cases, with attached metal joint rings" the black carbon fiber contrasts with the tan EPDM insulation inside. The Castor 600 firing would test both the case’s design and its manufacturing processes. (credit: Northrop Grumman)

As the newest family of large solid rocket motors in a generation, the Castor 600 would incorporate a mix of proven and updated technologies, but at a larger scale. The carbon composite case, 3.65 meters (12 feet) in diameter, is the largest of its kind and scales up the technology found on Northrop Grumman’s Orion and GEM lines of rocket motor cases. But, these cases are so large—like the Space Shuttle’s SRBs from which these technologies are derived—that they have to be made in sections and joined together to create a complete rocket motor. To demonstrate the motor case would work on hot summer days, the Castor 600 was heated above its maximum temperature rating of 32 degrees Celsius (90 degrees Fahrenheit.) This would show that the carbon composite case and its metal joint rings would properly contain the highest possible internal pressures and temperatures.

Another update to Space Shuttle technology is the Castor’s Thrust Vector Control (TVC) system provided by Moog. During the test, the TVC would swivel the nozzle (up, down, left, and right) just as it would during a normal flight to steer the rocket. The nozzle is based on the Northrop Grumman’s Orion series of motors, its composite construction reducing the weight while maximizing overall performance.

The Castor 600’s nozzle disintegrated during the final moments of the test. (credit: Northrop Grumman)

Even with all the data safely recorded, neither the customer, nor anyone on the team, likes it when parts that are attached to a rocket become unattached.

At the appointed time of the test, everything worked spectacularly. The Castor 600’s redundant dual-igniter system lit the motor on command and the motor roared to life, producing some 8.9 million newtons (2 million pounds) of thrust. The motor burned for the intended two minutes, and as it reached the end of the firing it began to extinguish itself, as expected. But in the closing seconds of the test, the aft cone of the nozzle disintegrated and blew apart.

In the immediate aftermath, the test was deemed an overall “success” because it had collected reams of the intended data. But, even with all the data safely recorded, neither the customer, nor anyone on the team, likes it when parts that are attached to a rocket become unattached. Program management acknowledged that they had, “observed… something strange.” That’s among the most dreaded words in rocketry— “observation”, “anomaly”, “incident”—that signifies that something didn’t go the way it was intended. After every test, there are always standard activities to do: hardware has to be removed and inspected, data needs to be reviewed and compared to expected results. But, when something doesn’t go right, you have to do something much more thorough: an investigation.

The investigation: just the facts, ma’am

Over decades of experience, the space industry has developed a system to investigate rocket failures and determine what happened. Each organization will have slightly different names for this: Mishap Investigation Board (MIB), Failure Investigation Team (FIT), or Accident Investigation Board (AIB). However, the goal is always the same: to find the root cause of the event, and determine a corrective action to ensure it never happens again.

Before the first piece of evidence is handled or a hypothesis offered, any good crime story needs a lead detective. In rocketry, this is the Failure Investigation Team Lead Engineer. The lead engineer is an experienced engineer from within the organization, who’s familiar with the technologies involved and is a veteran of many past failure investigations: just as grizzled as any pulp noir detective, but with a pocket protector instead of a badge. The investigation will proceed according to the lead engineer’s experience (hunches) and the data (evidence) the team gathers. From this point on, all decisions for the case, from how to analyze evidence to determining if a clue needs further investigation, will go through the lead engineer.

The lead engineer’s first responsibility to set the tone for the rest of the investigation team, to make sure the investigation team keeps an open mind, and will be led by the evidence. After any incident, engineers will use intentionally vague language (“anomaly”, not “explosion”; “observation”, not “blew up”). While this can seem like engineers are oblivious to a spectacular failure, it serves to put the entire investigation team in the right mindset and be open to all possibilities. At the earliest stages, it’s impossible to distinguish causes from effects and all the grainy cellphone video in the world doesn’t help (did the fire cause the explosion, or did the explosion cause the fire?) If you say “explosion” enough times, eventually everyone will ignore all other evidence to find some source of an explosion, even if none occurred. The problem has to be investigated by strictly sticking to the physics and data.

The importance of the lead engineer’s role has taken on even greater importance in the Air Force’s launch competition. The Air Force has intentionally made each company responsible for investigating and resolving any issues that might occur with their rocket. In generations past, the Air Force would have taken over the investigation to find the root cause of the issue. However, today the military wants to buy a launch service and leave the design and ongoing rocket upgrades to the particular company. The Air Force and their technical advisors at The Aerospace Corporation still participate side-by-side in any investigation, but the investigation and its results rest squarely on the engineering team. The ability to impartially investigate one’s own problems, and dispassionately implement fixes, is a key discriminator in this rocket competition. The lead engineer and their team have to follow a hard-nosed approach and quickly get to the bottom of the case.

Engineers have one advantage the police can only dream of: they know where the scene of the crime will be before it happens.

A criminal investigation normally starts with the detective showing up to the scene of the crime, passing through police barricade tape and getting a sense of what happened. The scene is a flurry of activity with photographers capturing every detail, officers questioning witnesses, and investigators documenting and preserving evidence for any clue that might crack the case.

Rocket failure investigations have their own versions of this, but adapted to the special considerations of a rocket test. Before anything else, the site has tocleared of any hazardous materials for engineers and technicians to conduct their investigation. In the case of the Castor 600 failure, solid propellant is fairly benign and, in this case, only the nozzle came apart. For other rockets and spacecraft that use storable propellants, workers in hazmat suits have to clean up the area first. Each piece of debris is photographed and catalogued, and only then is it carefully moved so as not to damage the evidence.

The moment of the crime: engineers preposition cameras to capture every detail—protected by a wall of concrete, of course. (credit: Northrop Grumman)

Engineers have one advantage the police can only dream of: they know where the scene of the crime will be before it happens. Detectives have witnesses, with their faulty memories, or who may have only glimpsed what happened. On the other hand, engineers have the perfect witness: hard data. Cameras are set up to capture every angle at any desired wavelength of light, recording hundreds or even thousands of frames per second. Thermocouples are attached all over the rocket motor to measure ever temperature variation. Accelerometers and pressure transducers record vibrations at hundreds of thousands of individual samples per second. Each frame of video and each data point is time stamped so they can be combined to perfectly recreate the event down to the millisecond. With this overwhelming amount of data, it’s easy to understand why an organization like NASA would insist on a ground test for their own SLS rocket prior to its first flight.

Each noted cable goes to one of over 700 sensors used to record every detail of the test (original photo credit: Derek Richardson / SpaceFlight Insider)

Finding the root cause: the usual suspects

Once the evidence is collected, the investigators have to organize it so they can determine what was responsible for the failure. Just as with a police lineup, engineers have their own list of usual suspects who might be an accessory to the crime. In rocket failures, the usual suspects are things like the hardware design, correct test execution, proper manufacturing ,and the environment on the day of the test. Each section is further broken down into ever more specific subsections and a junior detective (junior engineer) is assigned to track down each and every lead.

The engineering version of the police link board is the fault tree (credit: Beacon Pictures/ABC Studios).

While suspects in a court of law are innocent until proven guilty, rocket hardware is afforded no such benefit. In the failure review process, all components are guilty until proven innocent. This is done to ensure each part of the rocket is systematically reviewed and to remove any personal bias about one’s own contribution. Engineers have to sift through data from before, during, and after the test to make sure that, each step of the way, each individual component looked and behaved exactly the way it was supposed to. Build records, x-rays, and close-out photos from before the test are examined. Temperature levels, vibration data, and thrust measurements during the test are scrutinized down to the millisecond and beyond. The rocket is carefully disassembled after the test to preserve any clues, and each portion is inspected to make sure the wear patterns on the components are exactly where they’re supposed to be, and absent where they shouldn’t be. At any step of the process, any deviation from the expected value is cause for further investigation. A smudge on an x-ray can indicate the rocket motor would burn too quickly, a spike in temperature can indicate insulation that is too thin, and black soot can indicate the failure of an O-ring.

At the larger size of the Castor 600, the motor experienced a greater pressure drop than expected at the end of the burn. The failure was like standing on a soda can.

Inspection of the hardware showed that only the aft cone of the nozzle had failed. The team had lots of experience with these types of nozzles on smaller rocket motors and ground testing had always worked just fine. It was considered a low-risk item for this test. The build and inspection reports showed that everything had been built exactly as intended. Upon inspection of the hardware after the test, the surviving nozzle parts, including the throat, didn’t show unusual wear. The aft portion of the nozzle that disintegrated was connected directly to the throat, and it was still intact. How could this be? Just like when you cover a garden hose with your thumb to spray water, it’s at the throat where the hot combustion gases are sped up—for rockets, beyond supersonic velocities. The throat experiences high pressures and the most extreme heating; if there were a defect in the design or manufacturing, this is the place that should fail first.

It was only upon combining the knowledge of the hardware with the test data that the picture became clear. At the larger size of the Castor 600, the motor experienced a greater pressure drop than expected at the end of the burn. During a normal flight, the outside air pressure naturally decreases as the rocket climbs higher and higher. During a normal flight, the air pressure would have dropped so much by the end that it never could have damaged the nozzle. But the C600 wasn’t tested on a normal flight. Instead, it was tested on the ground where the air pressure doesn’t change.

Fault trees are colored to show how each item contributed to the incident: Highly Contributing (red), Contributing (yellow) and Not Contributing (green).

The failure was like standing on a soda can. The internal pressure of an unopened can soda can easily support the weight of a full-grown adult. But after it's opened, the internal pressure is released and the weight of that same adult will crush the can flat. As the Castor 600 reached the end of its burn, the thrust (and internal pressure within the motor) began to drop, but more than expected. Something that didn’t change, though, was the pressure of the surrounding air. The pressure of the fast-moving gases inside the nozzle dropped dangerously below that of the surrounding atmosphere, and the outside air crushed the nozzle in an instant, just like a soda can. The nozzle wasn’t made wrong, it was tested wrong. The team had its suspect.

Recovery

Of course, the engineering team knew about these effects and had already planned for them on the next test of the Castor 300, when the effects would be even more severe. Nozzles on upper stages are commonly made shorter for ground testing to reduce the external force on them. At the time of the Castor 600 test, the next nozzle for the Castor 300 was nearing completion in the factory and had in fact been made 1.7 meters (5.5 feet) shorter for exactly this reason. But now with the Castor 300 incident, all that work was out the window. Determining the root cause of the failure was only half the battle. The other half was formulating a corrective action and implementing it with enough time to show the customer.

While the US Air Force can be tough to convince, Mother Nature is the ultimate judge: she doesn’t care about fancy arguments and her judgments are final. The only way to prove you truly understand a failure mode is to demonstrate the fix, and this was going to take time. The Castor 300 test was scheduled for September 2019, which was a date the team was no longer going to meet. Not only did the team have to redesign a new nozzle, build it, and integrate it with the rest of the motor, but they still had to meet the rest of the tests original objectives if they wanted the overall project to stay on track.

Rockets are intended to work in space, but they are designed, built, and tested right here on the ground. The new nozzle would have to do a better job of taking this into account.

The test of the Castor 300 wouldn’t be just a retread if the Castor 600 test with a new nozzle; additional limits of the motor family had to be tested. Instead of heating the motor, this time it would be chilled below 4 degrees Celsius (40 degrees Fahrenheit). The cold temperature would simulate launching on a winter day and would put greater stress on the insulation rather than the case. The TVC would again swivel the nozzle to show it functioned properly at low temperatures. Also, the motor would intentionally have only one of its dual-igniters to prove the ignition system was truly redundant. The Castor 600 test had taken years to get ready, and the team didn’t have years to fix the issue and continue on to the Castor 300 test.

The OmegA team was again ready to put their hard work to the test on February 27, 2020. (credit: Northrop Grumman)

Thankfully, they weren’t starting from square one. A thorough failure investigation, like the one done for the Castor 600 test, verifies what worked just as much as it shows what didn’t. Without the reams of data from each step of the design, construction, and test process, the engineering team never would have known which components had to be improved and which should be left alone. The team could confidently continue forward with the rocket components that had worked successfully, without second-guessing every decision they’d made along the way. The other components of the Castor 300 were still on schedule and were installed on the test stand for the original September test date, and there they would sit waiting for a new nozzle.

Rockets are intended to work in space, but they are designed, built, and tested right here on the ground. The new nozzle would have to do a better job of taking this into account. Some changes would have to be more drastic than others. Since the nozzle had been crushed during the last test by the surrounding air pressure, the most important change would be to stiffen the nozzle’s aft cone. Stiffening the nozzle added a weight penalty, but it was kept small enough not to impact the overall structure’s ability to support the extra mass or the TVC’s ability to move the nozzle quickly while in flight. Based on the Castor 600 test data, engineers also took the opportunity to optimize the contour of the nozzle and the insulation thickness, for improved performance and to better maintain the desired nozzle temperature in specific places, respectively. With the new nozzle design in place, the blueprints were handed over to the manufacturing team to build it. But all of this was taking time, of which the resource the team couldn’t get any more.

The OmegA team had already made changes to speed up their development process and better compete for the Air Force’s launch contract. Both the design and manufacturing teams had implemented agile development techniques to decrease the time it took to engineer and build new rocket motor designs. After the manufacturing team built the new nozzle, the only part of the process left to make up for lost time was with the test team, and they would be expected to pick up the slack. Following the lead of the design and manufacturing group, the test team jumped into action, co-locating all the representatives from each specialty in one place so any roadblocks could be identified, discussed and resolved more quickly. Even with the nozzle in hand, the test team who built the new nozzle would have a list of tasks they had to perform before the motor was ready for test: assemble the motor components, install the complete motor on the test stand, attach the hundreds of sensors all over it, and finally check out everything to make sure all the parts were talking to each other. The effort paid off. The test team shaved months off the recovery schedule and, by February 2020, they were ready for a second try.

Prior to the firing, the test team took one last opportunity to enjoy the fruit of their labor. Their planning notes would be offered up to rocketry gods. (credit: Northrop Grumman)

On February 27, 2020, the day of the Castor 300 test arrived. This would be their last chance for the OmegA team to make a good impression on their Air Force customer. If this didn’t work, there wouldn't be enough time to implement a new fix. The future of the OmegA rocket and their place in launching national security payloads rested on this test. With all this on the line, the test team counted down the seconds and fired the rocket motor. The single igniter lit the Castor 300 and the chilled rocket motor roared to life, passing the first hurdle. Next, the TVC swiveled the nozzle, functioning as expected even in its cold state. As the test extended beyond the two-minute mark, the rocket motor consumed its remaining fuel and the fiery jet shrank until the motor was extinguished. The nozzle was still there.

The Castor 300 test was a complete success; this time the nozzle performed flawlessly. (credit: Northrop Grumman)

Amongst the handshakes, cheers and high-fives, more than one audible sigh of relief was heard: they had done it. The team had investigated the failure, found the culprit, and implemented the fix in time for the Air Force and everyone else to see. Because the team hadn’t cut corners and had stuck to a disciplined engineering approach for the Castor 600 test, they knew exactly what to focus on for this test. While the decision of the rockets the US government will choose is still in the future, one thing was clear to everyone that day: the rocket would work, and this was the team to make it work.

Case closed.

Jeffrey L. Smith, P.E. is a propulsion engineer. In his heart, he secretly knows it’s only a matter of time until he gets dragged into another failure investigation. The ideas expressed here (however misguided) are his own. He can be reached at JLSmith322@gmail.com.

Note: we are temporarily moderating all comments submitted to deal with a surge in spam.