As a space fan, I couldn’t resist snatching the recently released book Bringing Columbia Home by Michael Leinbach and Jonathan Ward from the new books display at the library. I read almost half of it one Sunday. Not only do I not often have the opportunity to read on the weekends, but I’m not usually the kind of person who can sit still long enough to read a lot in one sitting. The book’s in-depth look at Columbia’s destruction, recovery, and lessons learned drew me in. It also tickled my engineering brain: it was difficult for me as a quality assurance engineer not to read through QA lenses.
A lot of factors contributed to the tragic loss of this space shuttle 15 years ago, ranging from engineering flaws to cultural shortcomings. A few aspects left me thinking about my role throughout my career as a quality assurance engineer, the QA team’s purpose and responsibility, how we fit into the company, and whether the company’s culture allows us to do the best job we can do.
- How easy is it to broach concerns?
- What happens when we raise concerns?
- Are we mindful of all possible malfunction mechanisms, no matter how unlikely or absurd?
- Where are gaps in our testing and coverage?
- Is our culture such that we handle problems appropriately?
- How are we at assessing risk?
- What will users of our products experience?
- Do we do enough post-crisis evaluation of major bugs that make it to production?
It sounds like some NASA employees were generally mindful that at some point in time, a catastrophic accident and disintegration might be possible, yet it seems like no one pressed to come up with that hypothetical emergency response proposal. Instead, most recovery procedures dealt with a nearly intact shuttle. I don’t remember a discussion in the book about assessing and prioritizing emergency preparedness and sequences. We’ve certainly had conversations like that where I’ve worked. We can’t prepare for all breakage nor can we mediate everything, so let’s focus on the most possible defects, perhaps coupled with one or two of the most severe, and address them.
During Columbia’s mission, NASA was aware of a certain set of problems from the launch. Engineers explored some of the possible risks and failure scenarios. It seems like people too easily dismissed some of them and closed themselves off to opposite opinions. Repair options are very limited while the shuttle orbits the Earth. To me as an outsider, the analysis in the book brought to mind some conversations I’ve had during my career where software developers have said things like “Oh, let’s not deal with that brokenness because the solution is too complex. We don't have the time or resources for that.” It seems like NASA managed to convince themselves and others that the likelihood for something terrible happening during Columbia’s reentry was slim. Instead of acting and devising contingencies, it almost seems like they went into a state of denial.
Hindsight provides a lot of insights, of course. NASA took a lot of steps to learn everything they could from Columbia: from what happened to the materials composing the space shuttle to the processes in NASA as an organization that supported or hindered quality assurance, risk assessment, and disaster planning. Many changes have taken place. Astronauts will be safer. The organization is more open to differing perspectives now.
As a quality assurer, that last part particularly appeals to me. I’m often pointing out problems and risks folks might not have fully derived themselves. I don’t always like when I am the messenger with bad news. Being in an environment that doesn’t seem to welcome dissention can be tricky. And, yes, I have worked with a developer or two who didn’t like that I sometimes found problems (just like I have occasionally unnerved one or two when finding nothing wrong). We all work together, developers and QA, to improve everything here at PatientsLikeMe: our products, our processes, our teams. We’re doing a pretty good job. Of course, we don’t have to prepare for nearly impossible tasks like bringing home a crew of seven people orbiting the Earth in a spacecraft that sustained unknown damage during launch. What we work on seems so simple in comparison.