Strategies for Managing High Resource Consumption Risks

What is CPU overload 

It’s not unusual for security solutions to reach a critical CPU threshold. You can push a car past the RPM gauge redline, but there is a limit to the number of revolutions per minute a motor can safely endure without harming itself. Unless optimized to handle higher loads, the pistons will probably end up trying to find the shortest way to self-destruct.  

CPU overload occurs when cyber defense systems are not optimized to protect the organization from cyber threats or can’t handle the load of continuous traffic analysis. The direct implication could be a severe impact on business applications and operations that might hinder cybersecurity efficiency. It might also trigger a fail-open mechanism to try and preserve the rest of the system resources. Fail-open approach favors the choice of business uptime, allowing all traffic to continue flowing through the network uninspected. No matter how strict your security policy is, the opportunity makes a thief when you leave the internal network unguarded. You wouldn’t want to let cybercriminals indirectly breach your internal network or exfiltrate data.

The butterfly effect 

A benign business operation like a weekly backup operation or a one-time download of business data could have a non-linear impact on the CPU consumption of the network firewall. This behavior could, in turn, shut off all inspection processes or even cause the security solution to freeze. It could take weeks of investigation to find out that the reason VOIP stopped working is not a sporadic router malfunction but a high CPU consumption of the network firewall. The root cause might be a non-related business operation, a security misconfiguration, or a particular connection that needs to be excluded from vulnerability protection inspection. 

These events are too common and disruptive to the organization’s cybersecurity efficiency.

This graph shows the average CPU usage of selected security solutions over a period of 30 days. We can see that there are many CPU spikes close to and way past the 0.7 which is defined as the critical threshold for triggering fail-open or fail-close mechanism
The CPU usage rFFT plot shows high intensity of CPU usage at certain patterns causing potential performance issues. It may indicate that a particular process or application is consuming a significant amount of CPU resources.

END Uncertainty 

System changes are nearly never spontaneous. They seldom change because some of the system’s users decide that new rules are preferable. They stand for the business’s acute need to respond to erratic behavior or a constraint. In this case, frequent high resource consumption incidents that might cause a severe impact on business operations and user experience. To end the uncertainty of the potential outcome in this state, you need to confirm the following: 

  1. Real-time alerts and anomaly detection mechanism for every security solution’s resource consumption. 
  2. Gather all relevant data, including logs, telemetries, and security configurations from the relevant security solutions. 
  3. Automate the analysis and correlation of all data from all sources mentioned above to find the accountable misconfiguration in seconds. 

By scaling up the daily routine of infrastructure security teams, you end uncertainty and reduce the chance of business disruption or exposing your organization to cyber risk.  

Does Block action consume fewer resources than alert action?    

When in alert or detect mode, all traffic is inspected and matched against every checked vulnerability protection to identify potential threats. Suppose a particular session is malicious. Instead of initiating a proper response and freeing up the relevant resources, the security solution will outrageously continue taking system resources to complete the matching process (of the same session that was already found malicious) against the rest of the protections (sequentially or in parallel – depending on the vendor) until the very end of the list. 

Striking a balance between security and business uptime 

Taking a restrictive action like switching vulnerability protections to ‘block’ holds grave consequences on both business uptime and resource consumption. This is one of the main reasons organizations tend only to partially use their security solutions. To improve this, you should implement a methodology of balancing security with business considerations when enabling vulnerability protections. This means verifying zero business impact for every protection in ‘alert’ before you set it to ‘block’. This is done by first ensuring no security logs are generated for a particular time. No security logs necessarily mean no potential business impact once switching over to ‘block.’ Since this is far from a scalable procedure that should run every day, every organization needs to automate this tedious routine to maximize the organization’s security posture without the risk of causing high CPU use. 

  

Subscribe to our BLOG

Get the latest security insights, news and articles delivered to your inbox.