How a Preventive Maintenance System Cuts Costs in Chinese Factories

1. Is your equipment working all the time?

Certainly not.

Let’s frame the question in another way. Have you ever visited a plant where all machines were available to work at the desired speed and the expected quality standard? Probably not, right?

There are many reasons for equipment downtime. They are usually categorized this way.

The Six Major Issues

If some of your machinery is expensive, you probably want to collect these data in detail, to understand where you lose the opportunity to make good products. Here is an example.

In this example, one can draw a few conclusions:

There was no breakdown this week.
Tool changes take time. There might be ways to address this with the SMED approach.
Cleaning and inspection take a bit of time. See if it can be done just before a shift starts.
Short stoppages at exit cause regular time loss. Someone should probably look into this.
There are many products with holes! Someone should start a problem resolution initiative.

In the table shown above, the data is summarized into three ratios:

Availability
Performance
Quality

And these three indicators are summarized into one ratio, the OEE (Overall Equipment Effectiveness).

Many people doubt the OEE should be tracked, though. Its three components are meaningful, but the OEE is too synthetic (it includes numbers the manufacturing managers are not responsible for), and it can be gamed easily.

2. Is Poor Maintenance Costing Your Company Money?

If your operations use equipment, the response is very probably yes. The more complex and high-tech the equipment, the more you need good maintenance.

Here are typical ways this happens:

Unplanned downtime disrupts the production plan and shipping schedules – triggering expediting costs and customer penalties
Poorly maintained equipment does not function as expected (short-circuits, explosions, etc.) and can result in accidents
An operator sometimes needs to be paid to “supervise” 1 or 2 machines and stop them in case something goes wrong
You can also save electricity – it is easier for a fan to pull in air through a filter if that filter is kept clean
When equipment breaks down, and the corresponding process becomes a bottleneck, it places a limit on the capacity of your factory. The rent, the equipment, etc. cost you the same amount, but a lower uptime means you can get fewer products out
In the worst case, an entire factory is down because a critical machine is not functioning. The Information Technology Intelligence Consulting Research sends out a survey to estimate that cost. 98% of organizations report that a single hour of downtime costs over $100,000, and 33% estimate that cost over 1 million USD

(You'll see some more examples in the sections below.)

Does it make sense to spend more on preventive activities?

To a certain extent, yes it does make sense, especially for highly-automated operations. This was the case of the Dodge Durango plant in Newark, Delaware (note: this facility was managed by CMC's Founding Partner David Collins).

At the time, the average automotive paint shop was down (typically because of robots, conveyors, air flow, or ovens) about 15% of the time during normal operating hours. That’s a lot, in an industry that prides itself to be THE most efficient.

In years 2, 3, and 4 of operations, the Dodge Durango plant had one of the highest preventive maintenance budgets of all plants in the Chrysler group and had zero downtime that impacted production for three years in a row. And what happens when things are not planned properly?

The Hyundai plant of Montgomery, Alabama, which opened in 2005, is among the most highly automated automotive plants in the world. They reportedly had 20% downtime throughout the entire factory for an extended period, which significantly affected the production.

3. Is Poor Maintenance Causing Quality Issues?

A worn-out tool or a machine functioning abnormally creates unacceptable products that have to be reworked or scrapped. Poor quality often accompanies high equipment downtime. They are both indicative of insufficient maintenance.

Let’s take a simple analogy. If the brake system on your car wears down and is not replaced in time (maintenance), what happens? The car is no longer able to stop as expected (quality).

There is actually considerable overlap between these two preventive activities:

What maintenance staff call cleaning, inspecting, adjusting/repairing, and replacing
What quality staff call process control

How does it play out in different processes?

There are many ways a robot can create quality issues. It may lose some of its spatial integrity (ability to move precisely as expected) because a component got worn. The servo motor might not run as smoothly as it is supposed to. The tooling at the end of an arm might start to wear and might no longer be capable of completing its job properly. The list goes on.

A CNC machine moves a tool along 3 bars (one for each axis). The machine’s ability to do its job as per specifications is compromised if the bars are not properly lubricated, or if the bars wear out (and get a bit thinner, allowing the tool to move around).

A manufacturer believed their steel tool, which was cutting plastic, never had to be changed or sharpened simply because the plastic was so much softer than the tool. When we convinced them to change the tools, the plastic parts suddenly started to be within specification.

Poor maintenance of the mold in a die casting process can generate too much flash (which means extensive rework of the parts and much scrap) and often pitting.

SMT machines (in the electronics industry) need regular cleaning and adjustment. Otherwise, the throw rate (i.e. the proportion of scrapped components) is high, and it costs a lot of money.

All these quality issues come from malfunctioning equipment, which was not taken good care of. As you will see in the sections below, most of these issues can be avoided at a much lower total cost.

4. What Are the Benefits of Preventive Maintenance?

First, let’s quickly describe what the opposite of preventive maintenance is: Reactive maintenance (also called breakdown maintenance, or correction maintenance).

I call it maintenance 1.0: ‘wait until it breaks before fixing/replacing it’.

Maintenance 1.0

A reactive approach (waiting for something to break down before repairing/replacing it) is appropriate in some circumstances:

A pen is thrown away when out of ink and replaced by a new pen.
A laser printer is replaced once it no longer works properly.

However, for most production equipment, a more preventative approach is advised. Let’s run through the four main benefits of preventative maintenance.

1) Saving money by minimizing bad surprises

By letting the equipment run to failure less often, the organization saves a lot of money.

Here are typical numbers:

Repair of an unplanned breakdown: 4 days down, 80,000 RMB of parts, much idle labor as well as overtime for some technicians, and 20,000 RMB of scrapped material processed just before the machine was stopped.

Planned overhaul: The machine is stopped towards the end of a shift, 20,000 RMB of spare parts, no unplanned downtime and no bad surprises.

2) Extending a machine’s lifetime

Letting a machine work until it breaks down reduces its lifetime. In many cases, the time to get to the next failure becomes shorter and shorter. It can look like this:

3) Avoiding catastrophic failures

Some unplanned breakdowns have a disproportionately high economic impact on the factory. Here are a few examples.

A “monument” (a very costly piece of equipment with a special component such as a motor for which there is no spare part in stock) is down and prevents the whole factory from operating.
In a pressure vessel, the locking mechanism or a valve breaks down, and a chemical reaction happens that causes very heavy losses.
If the motor that keeps circulating the paint in an electrocoat tank shuts down for a certain time, the whole content of the tank has to be removed at considerable expenses, and all the material is lost.
If a bearing shatters in a high-speed turbine system, an entire shaft might get bent.

4) Taking the lifecycle into account

You might already have come across the concept of the “bathtub curve”:

Many machinery components fail (i.e. break down) with the following pattern:

In the first days/weeks: they might have been poorly installed or poorly set, and operators poorly trained. This results in a higher failure rate.
After that initial period: the failure rate is lower and appears somewhat random (e.g. a power surge damages an electrical system).
After a particular time: some components are worn out and start to fail at an increasing rate.

As a component or tool gets to stage 3, should you wait until it breaks down? In many cases, it makes sense to order spare parts and do the replacement at a convenient time.

How to know what the right time is for replacements? Often, the equipment manufacturer can give an estimate for you to follow. (As you will see below, it might not be the best estimate in your situation).

The higher-tech a machine is, the more important prevention is

An automated machine that includes high-tech components is more likely to break down unexpectedly. The reason is, there are more single points of failure. There are ways to increase these system’s reliability (e.g. by adding redundant components), but it comes at an extra cost.

For instance, the SMT process in electronic factories is a mature process that has been in use for decades. And yet, we noticed that 20-25% downtime is typical in Chinese plants. These are high-precision robots that place components on PCBs, and they need regular maintenance efforts.

Looking into the future, as Chinese factories automate their processes use higher-tech machinery, they will have to switch to a more preventive approach.

5. Plan for Preventive Actions

Maintenance 2.0

Just like your company probably has a zero-accident objective, and maybe also a zero-defect objective, it can aim for zero-breakdown. For that to happen, a more proactive approach is called for.

A time-based plan

Basic preventative maintenance usually takes the form of a time-based maintenance schedule. It includes activities such as:

Lubricate every two days
Change the filter every three weeks
Change this tool every eight months
Clean and paint every year to avoid rust and notice leakages easily

Here is an example of a PM plan for pumps:

Image credit: catpumps.co.uk

You can see another example, more structured, here. And you can download a very similar template (in Excel) here.

An age-based plan

Sometimes it also takes into account the age (e.g. time in operation or number of cycles performed) of the equipment. One good example is that of the oil change in a traditional car – typically “every 3,000 miles”.

However, does it apply to all cars on the road? Not really. Some cars have to carry heavier loads, drive mountainous and dusty roads, have to put up with extremely hot conditions, and so forth.

This one-size-fits-all approach can be discarded if the equipment condition can be monitored regularly.

Condition monitoring

To keep the same example, what if you check your car’s oil level and color? If the color is murky, it is time to change it. When this approach is possible, it is much superior to time-based maintenance. Here is a flowchart to sum up the options nicely:

Condition Monitoring Flowchart - China Manufacturing Consultants

Chart Source: Maintenance, Replacement and Reliability: Theory & Applications, 2nd edition, by Jardine and Tsang, CRC Press

6. Reactive to Preventive Maintenance

Let’s say you want your organization to start taking the preventive route for some of its equipment.

Is it difficult to achieve this switch?

In many factories, yes. There are two reasons for this.

It takes discipline for the production manager to stick to the plan, rather than “make production now and worry about those risks later”.
It also takes discipline to systematically document the actions, to show they are done on time as per the plan. Fortunately, with some of today’s inexpensive IT tools, this can be done nearly effortlessly.

The good news is, for a factory that is setting up good process controls to maintain consistent quality, this is nearly no extra work. As explained in section 4 above, one single plan can guide preventative maintenance and process controls efforts. They can be merged for little extra workload.

We see many manufacturers who have the machine supplier come in for a routine check every year. It sure doesn’t hurt. In the vast majority of cases, it is NOT adequate and sufficient.

How to put in place a predictive maintenance (PdM) system?

Jigish Vaidya (a senior manager of operational statistics at the Long Island Rail Road), gives solid advice in an article published in December 2017 in Quality Progress. Here are the first three steps in setting up a PdM system

1. Establish maintenance objectives

What are the priorities, to be implemented first? What equipment will impact your schedule and cost the most if it breaks down? Where do your most significant maintenance and spare part replacement costs come from?

How to make sure you get closer to your objectives? Vaidya also suggests tracking both lagging indicators (e.g. the history of mean time between failures) and leading indicators (e.g. mean time to the first failure).

2. Adopt a condition monitoring program

Think of how to collect data on the condition of the machines or tools you designated as priorities in step 1.

3. Install sensors and smart systems

Based on the equipment to monitor, get the right devices and sensors to collect the information you will need (heat, vibration, oil contents, etc.).

Also, think of where to store that information and how to analyze it. Don’t try to consume those “big data” right out of the firehose – you will need them to be presented in a certain way, or you will drown in information. Microsoft (Azure & Cortana), IBM (Operational Analytics), Amazon (AWS), and others offer solutions for this.

Is ISO 55001 a well-thought-out standard?

Sure. It is a good standard and will steer pretty much any manufacturing company in the right direction.

ISO 55001 is a set of requirements for a system related to ‘asset management’ (which includes equipment maintenance). It draws a list of maintenance good practices that more or less mirror ISO 9001’s quality good practices, among which feature:

Thinking in terms of processes and system – no isolated silos
Planning based on the major perceived risks
Data collection and analysis
Regular management reviews based on data, driving continual improvement
Documented procedures, work instructions, and other standards
Records of certain actions
Engagement of the workforce

As mentioned several times by now, there is considerable overlap between quality and maintenance. A company that has taken ISO 9001 very seriously is well on its way to being ISO 55001 compliant.

Since this standard is meant to apply to any organization that has ‘assets’ to manage, it does not prescribe one specific approach. However, two best-in-class approaches are very effective at cutting total production costs.

What are best-in-class approaches to maintenance?

Two separate and complementary approaches have been developed to achieve excellent equipment maintenance:

TPM (which many people have heard of), which is centered on people;
RCM (very powerful and yet relatively unknown), which is centered on the equipment.

7. Involving All Employees (Total Productive Maintenance, or TPM)

First, a few remarks about the use of ‘TPM’.

There is a great deal of misuse and mix-up when it comes to this acronym.

TPM originated in Japan, just like TQM (Total Quality Management) and TPS (Toyota Production System). They share many of the same principles, but they are not the same thing!
Many people say “TPM” when they mean “a good maintenance system”. However, TPM is only a certain approach to maintenance. The next section covers 'Reliability-Centered Maintenance,' a complementary approach with different tools.

The first pillar of TPM: autonomous maintenance

The idea is to involve everybody in the factory. In the words of Seiichi Nakajima, considered a father of TPM:

TPM is a company-wide program for improving equipment effectiveness—something maintenance alone could not do. When TPM came to America, we realized we probably made a mistake calling it Total Productive Maintenance. Probably should have been Total Productive Manufacturing.

More precisely, the idea is to have everybody look after the equipment, clean it, report anything weird, do small adjustments, and so on.

There are several benefits, among which:

The operators and local leaders are more likely to see a leakage or to detect an abnormal noise, since they are constantly on site, so issues are found out much earlier;
Specialized maintenance staff can be freed to carry out more value-added tasks;
Much of the operators’ maintenance work is part of the 3rd S of 5S, and a good implementation of a 5S program also helps with quality and safety.

Again, in Seiichi Nakajima’s words:

The word ‘Total’ in TPM has these meanings: total effectiveness—pursuit of economic efficiency or profitability; total PM—maintenance prevention and activity to improve maintainability as well as preventive maintenance; and total participation—autonomous maintenance by operators and small group activities in every department at every level.

What are the steps to put autonomous maintenance in place?

Marc-Antoine Talva from Mobility Works suggests following these autonomous maintenance steps.

Provide training to production operators, local leaders, and engineers.
Initial cleaning & inspection by all involved parties – this can take a long time and should lead to identification of many signs of deterioration; the purpose is to restore the machine’s performance.
Eliminating contamination and inaccessible areas – making sure deterioration can’t take place again by remove sources of dirt etc., as well as make it easy to access the parts of the machine that need regular cleaning and inspection.
Develop standards for cleaning, lubrication and inspection – to make sure the good work can be kept up over time.
Inspection and monitoring – looking out for issues on an ongoing basis and making small adjustments.
Finalize standards and document the whole process

TPM, 5S, and CLAIR

5S is a systematic process through which employees make space, set the tools, materials, and equipment in order (‘a place for everything and everything in its place’), and regularly clean & inspect the tools and equipment. We regularly guide manufacturing organizations through 2 or 3 cycles of 5S, for great results in terms of quality, cost, and safety.

CLAIR stands for “clean, lubricate, adjust, inspect, minor repair”. In an organization implementing TPM, these are typically all handled by operators.

There is considerable overlap between these two approaches. A good plan for 5S will include CLAIR in a factory that relies on machinery and tooling.

5S is a great basis for a TPM implementation. It helps give a purpose to the 5S efforts, which often meet great resistance and are always at risk of being abandoned.

What about the specialized maintenance technicians?

You might think if the operators handle most of the maintenance work, is there still a need for specialized staff in that department? First, remember that involving the operators in the cleaning and inspection brings superior results, simply because they are much more likely to detect an anomaly simply by being on site 8-10 hours a day.

Second, major repairs/overhauls still need to be handled by specialized technicians. There might be a need for deep expertise, not only to do the job well but also for safety reasons. These specialists can also work on setting up reliability-centered maintenance (see section 8).

Is there more than autonomous maintenance to TPM?

Yes, there are eight other pillars:

Autonomous Maintenance
Focused Improvement
Planned Maintenance
Quality management
Early equipment management
Education and Training
Safety, Health, Environment
Administrative & office TPM

However, the other pillars are seldom fully implemented. Factories that have not yet implemented TPM should start with pillar 1, it will take them time to do it well, and it will probably help them reap most of the benefits they can get from TPM.

HR challenges in China

Based on our experience, it has been very difficult to find vocational help for handling maintenance and other technical jobs in Chinese factories.

One challenge will be hiring, training, and retaining, the right types of specialists (in electrical, control, and/or mechanical systems) in a maintenance role.

Why is that?

Chinese people tend to be split into 2 categories:

Very educated engineers who don’t necessarily want to be “the greasy guy fixing the equipment in the factory”;
People with lower levels of education who haven’t been taught the technical skills necessary to work on production machinery and tooling.

Another key issue we noticed is the managers’ reluctance to give training to the operators. It makes the implementation of TPM very difficult.

Finally, and this is true in any country, specialists don’t want to train others to do what only they know how to do. It means they lose some of their ‘power’ and are no longer irreplaceable. A high-trust, low-threat environment makes the transition much easier.

8. Working Smarter, Based on Data (Reliability-Centered Maintenance, or RCM)

RCM is a very deep topic. Think of it as a central pillar of predictive maintenance (just like autonomous maintenance is the core pillar of most implementations of TPM). Its whole purpose is collecting data for making informed decisions.

RCM typically involves heavy statistics, and there are software packages to help you with getting the most out of it. In this section, we will only touch on high-level concepts that managers of manufacturing operations need to understand.

(We are planning to go deeper into the tools and insights of RCM in another document, to be available at a later time.)

Data analysis per machine/tool

I already mentioned this type of data collection in section 1 above.

Data also needs to be collected and recorded for each individual piece of equipment.

Let’s say you have 3 similar machines, but they are not loaded the same way (just like the car example of section 6 above). Their failure pattern might look like this:

(A cross indicates a breakdown event.)

As you can see, merging all these data together won’t make much sense because the patterns are so different. Each machine has to be considered individually.

Data analysis per component type

As mentioned in section 5 above, as a component or tool gets to stage 3 in the bathtub curve, it often makes sense to order spare parts and do the replacement without waiting for a breakdown.

However, can we assume that the risk of breakdown truly increases past a certain time, or a certain number of cycles?

In fact, we can’t. Most components follow patterns like these:

If this applies to some of your equipment, is a “basic” preventive maintenance policy appropriate? Clearly not. Time in operation, number of cycles, or other time/age based measurements, do not help predict a breakdown.

This is what United Airlines found out in the 1960s, as they tried to understand what drives the failure of some aircraft parts. A jet engine does not have a specific “lifetime” beyond which it becomes unreliable.

What to do if “basic” preventative maintenance is not sufficient?

First, if you can add redundancies (so that one component’s failure does not cause the whole system to fail) at a modest extra cost and weight, do it at the design stage.

Second, if you can monitor the condition of your equipment in a way that alerts you with sufficient notice before a breakdown, go down that path. Doing so has gotten increasingly easy thanks to:

The multiplication of inexpensive sensors that can be placed permanently
The availability of relatively cheap testing equipment that can be used regularly

Third, if you can record historical data for each piece of equipment, you can also record its mean time to failure as well as other useful statistics.

How to monitor the condition of your equipment?

The idea behind condition monitoring is getting an early warning and reacting to a deviation from a standard before it leads to a breakdown.

Not all your injection presses, or stamping machines (such as for mechanical products), are in the same condition, even if you purchased them all together. Maintenance technicians often check machines for:

Vibration
Heat
Noise (can be by ultrasonic analysis if necessary)
Oil analysis

Those technicians’ job is to:

Record and analyze data
Take action above a certain threshold

How to know what to look out for, when monitoring your equipment?

You need to analyze the causes of past breakdowns. Hence the need to analyze those events on the spot and to record that analysis.

For example, for mechanical components such as bearings, the most common causes of failure are typically:

Poor lubrication
Excessive load
Contamination
Poor assembly/setup

Note that you don’t need to have a full history of each breakdown. Even if you only have past failure incidents data on 5 or 6 of these 20 machines, you can still rank them and run a Weibull analysis.

Is there a way equipment design can be modified to improve reliability & maintainability?

One often overlooked approach is to redesign a piece of equipment in order to improve:

Reliability (so that it is less likely to go down), and/or
Maintainability (the ease of maintaining that piece of equipment over time)

Let’s take an example. The maintainability of an iPhone is not very good. One has to bring it to a specialized store / service center for operations as simple as changing the battery. On the other hand, its reliability is very high and compensates (in the eyes of most users) for low maintainability.

Both reliability & maintainability are strong cost drivers. These are the levers to act on, to cut total maintenance costs.

In this context, a redesign is often guided by:

Reliability theory (e.g. adding components in parallel for redundancy, adding a backup, etc.).
Analysis of past data (most common failure modes and their root causes).
Comments from users (e.g. accessing the filter is much work, there is a risk of electrical shock, etc.).
Stress testing of a few prototypes (e.g. working constantly on high voltage, with high temperature and high humidity), finding their failure modes, and driving design iterations accordingly.

9. What Is the Role of Maintenance in ‘Industry 4.0’?

Without maintenance, there can be no ‘Industry 4.0’. It is that simple.

There are several reasons for that:

Higher-tech, more complex equipment has a higher number of failure modes. Without appropriate maintenance, it will be down often.
At the same time, the number of inexpensive sensors all feeding a centralized database represents an opportunity for better decision making.

Companies such as IBM, Bosch, Microsoft, and Amazon, are all developing analytical tools that aim at making sense of these data and presenting actionable insights. What some auto plants put in place in the 1990s is now available to hundreds of thousands of other factories.

However, the reality in China is quite different. 2-3% of factories do a good job of this. And the vast majority is busy ‘putting out fires’ and haven’t set up a preventive plan.

An engineer described the situation in his US factory in the 1980s, and we often see this nowadays with Chinese manufacturers that invest in high-tech equipment without upgrading maintenance efforts:

All these wonderful machines performed their intended functions, on test, but when they were put into operation in our plants, with our people, they were out of business so much of the time for this and that kind of failure that our overall costs, instead of going down, went up. No one had evaluated the overall probable failure rate and maintenance. As a result, we were continually caught with stoppages and with not enough space parts, or with none at all; and no provision for alternate production lines.

Source: Out of the Crisis, W. Edwards Deming

Conclusion

China’s industrial sector is in motion. They are upgrading their processes and learning how to work with high-tech automation. They are going very fast. They are also switching to a preventive maintenance approach, but more slowly. We are not sure it will be sufficient to keep up with all the changes.

Some people talk about “Maintenance 4.0”. In our understanding, it is composed of the basic preventive measures + autonomous maintenance (including good 5S) + reliability-centered maintenance (based on extensive and real-time condition monitoring).

Naturally, all this only makes sense for some pieces of equipment – those supporting a claim of reaching “Industry 4.0”. Here at CMC we have seen this in only a few Chinese companies, mainly tier-1 suppliers of auto parts. Let’s see how fast it can spread in other verticals.

Preventive Maintenance System Guide