KubeAIOps enables more reliable and proactive operations.

Product 1

Deep learning-based failure proactive processing


Data collected from monitoring or systems is converted into trainable data through Data Receiver & Process. The prepared data is used to create an anomaly model through unsupervised deep learning learning, and anomalies are detected through periodic evaluation.


Infrastructure failures are often caused by a combination of factors, making it difficult to analyze the root cause of a failure with a single metric. Feature engineering enables you to identify and correlate root causes from a combination of factors.


Reads metric data and compares anomaly data to existing ruleset thresholds when anomaly data is found through learning. Analyze if the threshold settings are appropriate and provide guidance for adjustments to the current ruleset thresholds based on the results of the analysis.


Anomalies detected by complex machine learning models can be difficult to explain. To make them easier to understand, we provide Explainable AI. It uses the RIPPER algorithm to extract explainable rulesets for anomalies. The extracted rulesets can be proposed as new rulesets and added to existing monitoring tools to prevent recurring failures in the future.

Product 2

Rule-based failover automation


Alert Hub is an integrator for consolidating alerts from various multi-clusters of a customer and analyzing alert-based anomalies. Alerts generated by each customer’s cluster can be viewed by ruleset or individual resource (Nodes, Pods, PVCs, etc.) by the time they were generated and resolved, enabling correlation analysis with metrics and log data.


The KubeAIOps anomaly detection system uses machine learning (Bayesian Belief Network) and Prometheus alert rules to detect anomalies for selected monitored objects (Nodes, Pods, PVCs, etc.). The anomaly detection engine calculates anomaly probability and automatically generates incident tickets for detected anomalies for anomaly or failure management.


Anomaly detection automatically creates a new incident ticket with basic information and attaches the alert causing the anomaly to the incident. Resolution tasks that address the anomaly are attached to the incident and run automatically or with admin approval in Fault Tolerance Advisor.


When Anomaly Detector creates an incident ticket, Resolution Advisor presents predefined remediation actions on an alert-by-alert basis per monitored target. These remediation actions are registered as attachments to the ticket. You can also pre-configure autorun actions that will run as soon as they are attached to the ticket.


Phone: +1 310 844 7260
400 Continental Blvd 6F El Segundo, CA 90245
4F, 125 Wangsimni-ro, Seongdong-gu, Seoul 04766