Mean Time Between Failures and Mean Time To Repair
MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair) are two very important indicators when it comes to availability of an application. Despite its importance in the performance of the processes, most managers do not make full use of these key performance indicators (KPIs) in their control activities. Find out in the next few lines the differences between these two metrics and how they can be used to improve the efficiency of the processes in your company.
What is MTBF and MTTR
MTBF, or Mean Time Between Failures, is a metric that concerns the average time elapsed between a failure and the next time it occurs. These lapses of time can be calculated by using a formula.
Whereas the MTTR, or Mean Time To Repair, is the time it takes to run a repair after the occurrence of the failure. That is, it is the time spent during the intervention in a given process.
The difference of MTBF and MTTR to MTTF
Remember that we are dealing with systems, facilities, equipment or processes that can be repaired. If we were talking about something irreparable, the correct KPI would be the MTTF (Mean Time To Failure). Differentiating these concepts is essential for businesses of all sectors, especially those working with high-availability environments where failures can result in large losses with sales forgone or with loss of confidence in the delivery of services.
The two formulas
Conceptual differences, different formulas! Check the ways to calculate MTBF and MTTR:
MTBF
total time of correct operation in a period/number of failures
For example: a system should operate correctly for 9 hours During this period, 4 failures occurred. Adding to all failures, we have 60 minutes (1 hour). Calculating the MTBF, we would have:
MTBF = (9-1)/4 = 2 hours
This index reveals that a failure in the system occurs every 2 hours, leaving it unavailable and generating losses to the company. The opportunity to spot this index allows you to plan strategies to reduce this time.
MTTR
total hours of downtime caused by system failures/number of failures
Using the same example, we come to the MTTR, by using the following formula:
MTTR = 60 min/4 failures = 15 minutes
Above, we have the average time of each downtime. Therefore, the company knows that every 2 hours, the system will be unavailable for 15 minutes. Being aware of our limitations is the first step to eliminate them.
Uptime calculation
The uptime calculation involves MTTR and MTBF. We can get to the uptime of a system, for instance, using these 2 KPIs. Let’s check the formula:
uptime = MTBF/(MTBF + MTTR)
To be more clear, nothing better than a practical example. Imagine the following situation:
A. How long the system should work: 36 hours
B. How long the system was not working: 24 hours
C. How long the system has been available: 12 hours
D. A total of 4 failures occurred.
uptime: (A-B/D) / [(A-B/D) + (B/D)] = (36-24/4) / [(36-24/4) + (24/4)] = 3 / 9 = 33%
Benefits in the use of these performance indicators
MTTR and MTBF are two indicators used for more than 60 years as points of reference for decision-making. MTBF is a basic measure of the reliability of a system, while MTTR indicates efficiency on corrective action of a process.
If the MTBF has increased after a preventive maintenance process, this indicates a clear improvement in the quality of your processes and, probably, in your final product, which will bring greater credibility to your brand and trust in your products. The MTBF increase will show that your maintenance or verification methods are being well run, a true guide to support teams.
In the case of MTTR, the effort should be exactly the opposite: to reduce it as much as possible to avoid loss of productivity for system unavailability. A lower mean-time-to-repair indicates that your company has quick answers to problems in their processes, which demonstrates a high degree of efficiency.
As it can be noticed, MTTR and MTBF are two powerful performance indicators that should be used to expand the company’s knowledge about processes and reduce losses in productivity or quality in the products offered.
Have you got any questions on these two indicators? Continue browsing our blog to learn more about technology issues and don’t forget to share this article with your co-workers. To learn more about the availability calculation please read our article about the costs of a downtime.
Software for monitoring MTTR (Mean Time To Respond) and MTBF (Mean Time Between Failure)
To monitor both MTTR and MTBF, it is necessary to use some kind of solution for monitoring the infrastructure. From the availability of the environment managed it is possible to measure the average time between failures and the average time for repair. All outages are alerted on the platform with the possibility of generating reports to measure MTTR/MTBF.
As developers of OpMon, a solution for monitoring IT infrastructure and business processes, we always indicate it if customers want to measure this type of indicator besides, of course, all its technology park. If you are interested, click the button below:
GET TO LEARN ABOUT OPMON AND MONITOR YOUR IT INFRASTRUCTURE
Have you got any questions about these two referentialities? Keep browsing our blog to learn more about technology topics and be sure to share this article with your coworkers. To learn more about availability calculations, read our article on the costs of a downtime.