The Recursive Cycle for Ultra Reliability

By Hongbin Li

Source concept: Rational Accidents: Reckoning with Catastrophic Technologies by John Downer

Core Thesis

Ultra-high reliability cannot be engineered solely through pre-service design, redundancy, and testing. Due to the inherent unpredictability of complex systems, ultra-reliability is only achieved through a recursive cycle of operational experience: collecting real-world fault data, studying failures, and continuously refining designs.

The Reality of Complexity

Traditional reliability methods — modeling, simulation, bench testing, HAZOP, LOPA, SIL — assume reliability can be proven before deployment. However, Downer argues that complexity introduces insurmountable challenges to pre-service reliability:

Unpredictable Behavior: As interacting components increase, unforeseen interactions emerge. Adding redundancy creates new failure vectors (e.g., voting/mediation schemes can fail; fault diagnosis/reconfiguration can trigger incorrect actions).
Unknown Premises/Conditions: Designs rest on assumptions that may later prove incorrect or incomplete, causing unanticipated failures.

We cannot simply design a system to be free of failures. No matter how much redundancy and complexity are added, the inherent unpredictability of complex systems means that failures will still occur.

The Aviation Model: The Recursive Cycle

Jetliners are the only systems that actually achieve ultra-high reliability. This success is driven by a reinforcing cycle of three interconnected factors:

Design Stability: Designs are refined incrementally, allowing manufacturers to gather extensive data over millions of service hours, identify failure patterns, and make targeted improvements.
Accumulated Service Hours: Thousands of aircraft operating simultaneously ensure deficiencies emerge quickly. Extensive data collection promotes ongoing refinement.
Effective Market Feedback: Safety issues immediately impact manufacturer/airline reputation, revenue, and stock. Market forces motivate companies to prioritize safety over innovation — especially because reliability issues will manifest within the career span of the decision-makers.

Nuclear Industry vs. Aviation

Unlike aviation, the nuclear industry struggles to achieve ultra-high reliability due to three key differences:

Fewer Operating Hours (Stark Data Disparity)

Nuclear: ~450 reactors worldwide = ~17,000 total operating years (as of 2019).
Aviation: Amasses ~45 million service hours annually.
Result: Nuclear has far less real-world data; failures take longer to manifest, delaying vital feedback.

Design Changes

Nuclear reactors undergo significant design changes between generations, disrupting the accumulation of meaningful service hours.

Regulatory vs. Market Feedback

Nuclear relies on static government regulation, whereas aviation relies on dynamic, rapid-acting market feedback.

Conclusion

Relying exclusively on pre-service design and testing is insufficient for ultra-high reliability. True reliability requires recursive learning — ongoing adaptation and improvement driven by real-world operational experience and failure analysis.

The only way to achieve ultra-high reliability is through the accumulation of operational experience — collecting data on faults, studying these failures, revising our understanding, and continuously improving system design based on real-world performance.

Jetliners are the only type of systems that actually achieve ultra-high reliability. He argues that their success results from a powerful interplay between three factors: design stability, accumulated service hours, and market feedback.

The Recursive Cycle for Ultra Reliability