In manufacturing operations, it is said that 15% to 40% of total costs are maintenance-related. If your factory utilizes machinery and tools, improving your maintenance practices is something you cannot ignore.
Preventative maintenance is one of the 5 key success factors behind high-performing manufacturing operations. It has a direct impact on the total cost, quality, and delivery, in any factory that relies on machines and/or tooling.
Certainly not.
Let’s frame the question in another way. Have you ever visited a plant where all machines were available to work at the desired speed and the expected quality standard? Probably not, right?
There are many reasons for equipment downtime. They are usually categorized this way.
If some of your machinery is expensive, you probably want to collect these data in detail, to understand where you lose the opportunity to make good products. Here is an example.
In this example, one can draw a few conclusions:
In the table shown above, the data is summarized into three ratios:
And these three indicators are summarized into one ratio, the OEE (Overall Equipment Effectiveness).
Many people doubt the OEE should be tracked, though. Its three components are meaningful, but the OEE is too synthetic (it includes numbers the manufacturing managers are not responsible for), and it can be gamed easily.
If your operations use equipment, the response is very probably yes. The more complex and high-tech the equipment, the more you need good maintenance.
Here are typical ways this happens:
(You'll see some more examples in the sections below.)
Does it make sense to spend more on preventive activities?
To a certain extent, yes it does make sense, especially for highly-automated operations. This was the case of the Dodge Durango plant in Newark, Delaware (note: this facility was managed by CMC's Founding Partner David Collins).
At the time, the average automotive paint shop was down (typically because of robots, conveyors, air flow, or ovens) about 15% of the time during normal operating hours. That’s a lot, in an industry that prides itself to be THE most efficient.
In years 2, 3, and 4 of operations, the Dodge Durango plant had one of the highest preventive maintenance budgets of all plants in the Chrysler group and had zero downtime that impacted production for three years in a row. And what happens when things are not planned properly?
The Hyundai plant of Montgomery, Alabama, which opened in 2005, is among the most highly automated automotive plants in the world. They reportedly had 20% downtime throughout the entire factory for an extended period, which significantly affected the production.
A worn-out tool or a machine functioning abnormally creates unacceptable products that have to be reworked or scrapped. Poor quality often accompanies high equipment downtime. They are both indicative of insufficient maintenance.
Let’s take a simple analogy. If the brake system on your car wears down and is not replaced in time (maintenance), what happens? The car is no longer able to stop as expected (quality).
There is actually considerable overlap between these two preventive activities:
How does it play out in different processes?
There are many ways a robot can create quality issues. It may lose some of its spatial integrity (ability to move precisely as expected) because a component got worn. The servo motor might not run as smoothly as it is supposed to. The tooling at the end of an arm might start to wear and might no longer be capable of completing its job properly. The list goes on.
A CNC machine moves a tool along 3 bars (one for each axis). The machine’s ability to do its job as per specifications is compromised if the bars are not properly lubricated, or if the bars wear out (and get a bit thinner, allowing the tool to move around).
A manufacturer believed their steel tool, which was cutting plastic, never had to be changed or sharpened simply because the plastic was so much softer than the tool. When we convinced them to change the tools, the plastic parts suddenly started to be within specification.
Poor maintenance of the mold in a die casting process can generate too much flash (which means extensive rework of the parts and much scrap) and often pitting.
SMT machines (in the electronics industry) need regular cleaning and adjustment. Otherwise, the throw rate (i.e. the proportion of scrapped components) is high, and it costs a lot of money.
All these quality issues come from malfunctioning equipment, which was not taken good care of. As you will see in the sections below, most of these issues can be avoided at a much lower total cost.
First, let’s quickly describe what the opposite of preventive maintenance is: Reactive maintenance (also called breakdown maintenance, or correction maintenance).
I call it maintenance 1.0: ‘wait until it breaks before fixing/replacing it’.
A reactive approach (waiting for something to break down before repairing/replacing it) is appropriate in some circumstances:
However, for most production equipment, a more preventative approach is advised. Let’s run through the four main benefits of preventative maintenance.
By letting the equipment run to failure less often, the organization saves a lot of money.
Here are typical numbers:
Repair of an unplanned breakdown: 4 days down, 80,000 RMB of parts, much idle labor as well as overtime for some technicians, and 20,000 RMB of scrapped material processed just before the machine was stopped.
Planned overhaul: The machine is stopped towards the end of a shift, 20,000 RMB of spare parts, no unplanned downtime and no bad surprises.
Letting a machine work until it breaks down reduces its lifetime. In many cases, the time to get to the next failure becomes shorter and shorter. It can look like this:
Some unplanned breakdowns have a disproportionately high economic impact on the factory. Here are a few examples.
You might already have come across the concept of the “bathtub curve”:
Many machinery components fail (i.e. break down) with the following pattern:
As a component or tool gets to stage 3, should you wait until it breaks down? In many cases, it makes sense to order spare parts and do the replacement at a convenient time.
How to know what the right time is for replacements? Often, the equipment manufacturer can give an estimate for you to follow. (As you will see below, it might not be the best estimate in your situation).
The higher-tech a machine is, the more important prevention is
An automated machine that includes high-tech components is more likely to break down unexpectedly. The reason is, there are more single points of failure. There are ways to increase these system’s reliability (e.g. by adding redundant components), but it comes at an extra cost.
For instance, the SMT process in electronic factories is a mature process that has been in use for decades. And yet, we noticed that 20-25% downtime is typical in Chinese plants. These are high-precision robots that place components on PCBs, and they need regular maintenance efforts.
Looking into the future, as Chinese factories automate their processes use higher-tech machinery, they will have to switch to a more preventive approach.
Just like your company probably has a zero-accident objective, and maybe also a zero-defect objective, it can aim for zero-breakdown. For that to happen, a more proactive approach is called for.
Basic preventative maintenance usually takes the form of a time-based maintenance schedule. It includes activities such as:
Here is an example of a PM plan for pumps:
Image credit: catpumps.co.uk
You can see another example, more structured, here. And you can download a very similar template (in Excel) here.
Sometimes it also takes into account the age (e.g. time in operation or number of cycles performed) of the equipment. One good example is that of the oil change in a traditional car – typically “every 3,000 miles”.
However, does it apply to all cars on the road? Not really. Some cars have to carry heavier loads, drive mountainous and dusty roads, have to put up with extremely hot conditions, and so forth.
This one-size-fits-all approach can be discarded if the equipment condition can be monitored regularly.
To keep the same example, what if you check your car’s oil level and color? If the color is murky, it is time to change it. When this approach is possible, it is much superior to time-based maintenance. Here is a flowchart to sum up the options nicely:
Chart Source: Maintenance, Replacement and Reliability: Theory & Applications, 2nd edition, by Jardine and Tsang, CRC Press
Let’s say you want your organization to start taking the preventive route for some of its equipment.
Is it difficult to achieve this switch?
In many factories, yes. There are two reasons for this.
The good news is, for a factory that is setting up good process controls to maintain consistent quality, this is nearly no extra work. As explained in section 4 above, one single plan can guide preventative maintenance and process controls efforts. They can be merged for little extra workload.
We see many manufacturers who have the machine supplier come in for a routine check every year. It sure doesn’t hurt. In the vast majority of cases, it is NOT adequate and sufficient.
Jigish Vaidya (a senior manager of operational statistics at the Long Island Rail Road), gives solid advice in an article published in December 2017 in Quality Progress. Here are the first three steps in setting up a PdM system
1. Establish maintenance objectives
What are the priorities, to be implemented first? What equipment will impact your schedule and cost the most if it breaks down? Where do your most significant maintenance and spare part replacement costs come from?
How to make sure you get closer to your objectives? Vaidya also suggests tracking both lagging indicators (e.g. the history of mean time between failures) and leading indicators (e.g. mean time to the first failure).
2. Adopt a condition monitoring program
Think of how to collect data on the condition of the machines or tools you designated as priorities in step 1.
3. Install sensors and smart systems
Based on the equipment to monitor, get the right devices and sensors to collect the information you will need (heat, vibration, oil contents, etc.).
Also, think of where to store that information and how to analyze it. Don’t try to consume those “big data” right out of the firehose – you will need them to be presented in a certain way, or you will drown in information. Microsoft (Azure & Cortana), IBM (Operational Analytics), Amazon (AWS), and others offer solutions for this.
Sure. It is a good standard and will steer pretty much any manufacturing company in the right direction.
ISO 55001 is a set of requirements for a system related to ‘asset management’ (which includes equipment maintenance). It draws a list of maintenance good practices that more or less mirror ISO 9001’s quality good practices, among which feature:
As mentioned several times by now, there is considerable overlap between quality and maintenance. A company that has taken ISO 9001 very seriously is well on its way to being ISO 55001 compliant.
Since this standard is meant to apply to any organization that has ‘assets’ to manage, it does not prescribe one specific approach. However, two best-in-class approaches are very effective at cutting total production costs.
Two separate and complementary approaches have been developed to achieve excellent equipment maintenance:
First, a few remarks about the use of ‘TPM’.
There is a great deal of misuse and mix-up when it comes to this acronym.
The first pillar of TPM: autonomous maintenance
The idea is to involve everybody in the factory. In the words of Seiichi Nakajima, considered a father of TPM:
TPM is a company-wide program for improving equipment effectiveness—something maintenance alone could not do. When TPM came to America, we realized we probably made a mistake calling it Total Productive Maintenance. Probably should have been Total Productive Manufacturing.
More precisely, the idea is to have everybody look after the equipment, clean it, report anything weird, do small adjustments, and so on.
There are several benefits, among which:
Again, in Seiichi Nakajima’s words:
The word ‘Total’ in TPM has these meanings: total effectiveness—pursuit of economic efficiency or profitability; total PM—maintenance prevention and activity to improve maintainability as well as preventive maintenance; and total participation—autonomous maintenance by operators and small group activities in every department at every level.
Marc-Antoine Talva from Mobility Works suggests following these autonomous maintenance steps.
5S is a systematic process through which employees make space, set the tools, materials, and equipment in order (‘a place for everything and everything in its place’), and regularly clean & inspect the tools and equipment. We regularly guide manufacturing organizations through 2 or 3 cycles of 5S, for great results in terms of quality, cost, and safety.
CLAIR stands for “clean, lubricate, adjust, inspect, minor repair”. In an organization implementing TPM, these are typically all handled by operators.
There is considerable overlap between these two approaches. A good plan for 5S will include CLAIR in a factory that relies on machinery and tooling.
5S is a great basis for a TPM implementation. It helps give a purpose to the 5S efforts, which often meet great resistance and are always at risk of being abandoned.
You might think if the operators handle most of the maintenance work, is there still a need for specialized staff in that department? First, remember that involving the operators in the cleaning and inspection brings superior results, simply because they are much more likely to detect an anomaly simply by being on site 8-10 hours a day.
Second, major repairs/overhauls still need to be handled by specialized technicians. There might be a need for deep expertise, not only to do the job well but also for safety reasons. These specialists can also work on setting up reliability-centered maintenance (see section 8).
Yes, there are eight other pillars:
However, the other pillars are seldom fully implemented. Factories that have not yet implemented TPM should start with pillar 1, it will take them time to do it well, and it will probably help them reap most of the benefits they can get from TPM.
Based on our experience, it has been very difficult to find vocational help for handling maintenance and other technical jobs in Chinese factories.
One challenge will be hiring, training, and retaining, the right types of specialists (in electrical, control, and/or mechanical systems) in a maintenance role.
Why is that?
Chinese people tend to be split into 2 categories:
Another key issue we noticed is the managers’ reluctance to give training to the operators. It makes the implementation of TPM very difficult.
Finally, and this is true in any country, specialists don’t want to train others to do what only they know how to do. It means they lose some of their ‘power’ and are no longer irreplaceable. A high-trust, low-threat environment makes the transition much easier.
RCM is a very deep topic. Think of it as a central pillar of predictive maintenance (just like autonomous maintenance is the core pillar of most implementations of TPM). Its whole purpose is collecting data for making informed decisions.
RCM typically involves heavy statistics, and there are software packages to help you with getting the most out of it. In this section, we will only touch on high-level concepts that managers of manufacturing operations need to understand.
(We are planning to go deeper into the tools and insights of RCM in another document, to be available at a later time.)
I already mentioned this type of data collection in section 1 above.
Data also needs to be collected and recorded for each individual piece of equipment.
Let’s say you have 3 similar machines, but they are not loaded the same way (just like the car example of section 6 above). Their failure pattern might look like this:
(A cross indicates a breakdown event.)
As you can see, merging all these data together won’t make much sense because the patterns are so different. Each machine has to be considered individually.
As mentioned in section 5 above, as a component or tool gets to stage 3 in the bathtub curve, it often makes sense to order spare parts and do the replacement without waiting for a breakdown.
However, can we assume that the risk of breakdown truly increases past a certain time, or a certain number of cycles?
In fact, we can’t. Most components follow patterns like these:
If this applies to some of your equipment, is a “basic” preventive maintenance policy appropriate? Clearly not. Time in operation, number of cycles, or other time/age based measurements, do not help predict a breakdown.
This is what United Airlines found out in the 1960s, as they tried to understand what drives the failure of some aircraft parts. A jet engine does not have a specific “lifetime” beyond which it becomes unreliable.
First, if you can add redundancies (so that one component’s failure does not cause the whole system to fail) at a modest extra cost and weight, do it at the design stage.
Second, if you can monitor the condition of your equipment in a way that alerts you with sufficient notice before a breakdown, go down that path. Doing so has gotten increasingly easy thanks to:
Third, if you can record historical data for each piece of equipment, you can also record its mean time to failure as well as other useful statistics.
The idea behind condition monitoring is getting an early warning and reacting to a deviation from a standard before it leads to a breakdown.
Not all your injection presses, or stamping machines (such as for mechanical products), are in the same condition, even if you purchased them all together. Maintenance technicians often check machines for:
Those technicians’ job is to:
You need to analyze the causes of past breakdowns. Hence the need to analyze those events on the spot and to record that analysis.
For example, for mechanical components such as bearings, the most common causes of failure are typically:
Note that you don’t need to have a full history of each breakdown. Even if you only have past failure incidents data on 5 or 6 of these 20 machines, you can still rank them and run a Weibull analysis.
One often overlooked approach is to redesign a piece of equipment in order to improve:
Let’s take an example. The maintainability of an iPhone is not very good. One has to bring it to a specialized store / service center for operations as simple as changing the battery. On the other hand, its reliability is very high and compensates (in the eyes of most users) for low maintainability.
Both reliability & maintainability are strong cost drivers. These are the levers to act on, to cut total maintenance costs.
In this context, a redesign is often guided by:
Without maintenance, there can be no ‘Industry 4.0’. It is that simple.
There are several reasons for that:
Companies such as IBM, Bosch, Microsoft, and Amazon, are all developing analytical tools that aim at making sense of these data and presenting actionable insights. What some auto plants put in place in the 1990s is now available to hundreds of thousands of other factories.
However, the reality in China is quite different. 2-3% of factories do a good job of this. And the vast majority is busy ‘putting out fires’ and haven’t set up a preventive plan.
An engineer described the situation in his US factory in the 1980s, and we often see this nowadays with Chinese manufacturers that invest in high-tech equipment without upgrading maintenance efforts:
All these wonderful machines performed their intended functions, on test, but when they were put into operation in our plants, with our people, they were out of business so much of the time for this and that kind of failure that our overall costs, instead of going down, went up. No one had evaluated the overall probable failure rate and maintenance. As a result, we were continually caught with stoppages and with not enough space parts, or with none at all; and no provision for alternate production lines.
China’s industrial sector is in motion. They are upgrading their processes and learning how to work with high-tech automation. They are going very fast. They are also switching to a preventive maintenance approach, but more slowly. We are not sure it will be sufficient to keep up with all the changes.
Some people talk about “Maintenance 4.0”. In our understanding, it is composed of the basic preventive measures + autonomous maintenance (including good 5S) + reliability-centered maintenance (based on extensive and real-time condition monitoring).
Naturally, all this only makes sense for some pieces of equipment – those supporting a claim of reaching “Industry 4.0”. Here at CMC we have seen this in only a few Chinese companies, mainly tier-1 suppliers of auto parts. Let’s see how fast it can spread in other verticals.