What is Design Failure Mode and Effects Analysis (DFMEA)?

DFMEA is a detailed, methodical method for identifying potential failure points and causes for projects. While initially developed for rocketry (where rockets have a high risk of failure due to complexity and failures are usually catastrophic), it is now used in many industries to reduce failures that can be avoided.

DFMEA is used to identify these failure states during each design and redesign phase of a projects. This takes the form of a five step process:

1. Failure modes and Severity

In this section you define the individual systems and subsystems of a project, along with the Failure Modes and Severity.

Failure modes:

Full Failure
Partial Failure
Inconsistent Failure
Degraded Failure
Unintended Failure

Severity:

Usually ranked from 1-10, with a 1 being an insignificant failure:

2-4: A minor annoyance. Things like a loud screech of a microphone when turning on, or an occasional visual “flutter” on a screen that doesn’t significantly impair function.

5-6: Degradation or complete loss of a minor or secondary function of a device, like the clock in a car or the sound card in your computer.

7-8: Degradation or complete loss of a primary function of a device, like the ignition of your car or a motherboard failure in your computer.

9-10: Catastrophic and dangerous implications, often violate regulations. Your car’s brakes failing and airbag failing to deploy, or you computer overheating to the point it catches fire.

2. Causes and mechanisms of failure:

In this section you define the causes of failure; this varies by type. For example a car’s brake’s failing may be due to inferior construction materials in the brake fluid line causing it to degrade quickly and snap.

You then sign an Occurrence ranking of 1-10 for the likely failures based on your knowledge of the design (and then reassign Severity):

1: A failure prevented by current processes.

2: Design is similar enough to an existing design that failures are unlikely.

3-4: Isolated failures; failures that are so rare as to be hard to replicate (and therefore hard to fix).

5-6: Occasional failures have been experienced in testing or in the field with the current or a similar enough design.

7-9: New design with no data.

10: New design with no knowledge of technology (purely theoretical or experimental).

3. Current Design Controls Inspection: Actions done to verify design safety.

You assign tests based on severity, and carry out those tests if possible (i.e. you have a prototype). You also define detection rankings, also on a 1-10 scale which varies from project to project. In general though, this will be a range of 1 meaning a failure was prevented by the design and standards itself to 10 meaning it’s impossible to evaluate.

4. Risk Priority Number: The conglomerate of Severity Occurrence Detection.

This would put failures that are high Severity (high risk) that Occur frequently and are hard to Detect at the top.

You then determine Recommended Actions:

Eliminate high severity Failure Modes
Lower Occurrence
Lower Detection

5. Repeat until RPN is below desired threshold, or it is determined that this is impossible. Record results.

Following these steps properly will result in less (preferably no) unexpected failures in design and determine how common failure points can be avoided.

Previous post: What’s a Skills Matrix?

Next post: Rolled Throughput Yield (RTY) – An Alternative Take