Technology platforms that promote machine learning (ML), automation, and artificial intelligence (AI) capabilities are easy to find. As DevOps became mainstream, similarly named processes, technologies, and IT cultures appeared. Examples are Cloud Ops, Data Ops, Sys Ops, and AI Ops.
Some may be skeptical of the concept of applying machine learning to IT operations to get business and IT value. Skepticism is good, but be prepared. I have a lot of opportunities here, and I can say for sure that AI Ops is one of the DevOps capabilities to emerging in 2021.
The IT environment has grown more complex over the past decade. It is a mix of auto-scaling public and private clouds, edge computing infrastructure supporting the Internet of Things (IoT), machine learning experiments using very large databases, new integrations, frequent application deployments, essential legacy systems, and microservices used everywhere. There are also many variables outside of it's control, such as security incidents, heterogeneous end-user computing configurations, and volatile application usage patterns.
This is a difficult environment for anyone responsible for responding to incidents, solving application problems, performing root cause analysis, diagnosing complex user problems, assessing operational risks, identifying security weaknesses, or estimating computing costs.
AI Ops can help in this area. In my last article, I wrote about how AI Ops can help IT and SRE improve application monitoring and address incidents. But there was a part I wanted to learn more about how different solutions can enable data cleansing, analytics, machine learning, and automation to simplify IT and deliver business benefits.
Six AI Ops solution providers answered my questions. Their answers give you a big picture of what AIOps solves for business and IT, what types of machine learning algorithms are used in their solutions, and how their products support automation.
Devo Offers Real-Time Ops and Security Visibility
Devo's head of IT operations and search Pako Huerta said AIOps helps IT departments stay one step ahead of end-user challenges. "It gives you the insights you need so that operators can pinpoint the exact cause of the problem before end users are affected."
The IT department faces constant pressure. Debo helps you pick up the noise and quickly find the root cause of the problem and assess the risk. Within Devo, various open-source and proprietary ML algorithms are used, such as the detection of time series anomalies and the ML workbench to develop and distribute models. Debo's model is stream-based, so it continuously learns and adapts quickly.
Micro Focus, focus on identifying and solving IT operational problems
Michael Prokopio, product marketing manager for AI Ops at Micro Focus, said that the full-stack AI Ops helps IT departments look through a vast data set to find and fix problems. It goes beyond what it can handle. Machine learning can sift through hundreds of alerts or millions of log files into a handful of suspicious items that can be easily handled by humans. Data reduction makes finding problems faster, and automation is key to faster problem resolution. "When we can connect the two to provide a seek-and-solve solution that requires little human intervention, it is called a full-stack AI ops."
Micro Focus's AIOps solution includes the Operations Bridge. Operation Bridge collects all events, metrics, and logs, including system patch level and compliance data from over 200 third-party tools and technologies. The service map, topology, and dependency data are then correlated to build an accurate business service model.
The platform utilizes unsupervised ML including clustering, regression, inference statistics, custom logic, and seasonality algorithms. It also improves system accuracy through operator feedback and guides future actions.
Moogsoft strengthens cognitive capabilities of IT operations
Moogsoft CTO Will Capelli emphasized that AI is necessary for IT operations to keep pace with the rapid change driven by DevOps. “The operation of modern IT systems is complex, and CI/CD (continuous Development), the components, and connection topology are also constantly changing. AI is needed to predict problems and disruptions using self-descriptive data such as logs, event records, and metrics generated by modern IT systems, and to cope with problems presented in signals interpreted by AI technology. I said.
Moogsoft's AI performs several functions in sequence. First, a high-information data set is selected within noise-containing data aggregated from log files and other operating systems. Then, it finds a correlation pattern in such a high-information data set and determines which correlation is a causal relationship. Finally, it assists in the execution of robotized responses.
Moogsoft states that AIops can have a direct impact on revenue and brand reputation. When an intelligent response is robotic, it shortens the MTTR (mean time to recovery) of incidents that impact customers and employees.
Moogsoft claims that AI Ops can have a direct impact on revenue and brand reputation. When the robot's intelligent response is realized, the meantime to recovery (MTTR) of incidents affecting customers and employees is shortened.
Ops Lamp helps achieve IT service level goals
Neil Pearson, senior product manager for event management and automation at OpsRamp, said AI Ops' automation helps IT departments improve their capabilities, which is a business benefit. “AI Ops is the application of a variety of AI technologies, including ML, deep learning, and robotic process automation (RPA), to automate complex and manual repetitive tasks,” Pearson said. In general, it absorbs large amounts of data in different formats from different sources for this purpose. OpsLamp focuses on predicting and preventing repetitive alerts and accidents, from anomaly detection and early detection to resolution. The key is to improve people's ability to work measurably and help companies improve their business.”
OpsRamp finds the root cause of the problem by absorbing large data sets from multiple sources such as metrics, logs, network packets, and traces. It is the same process as finding a needle on the sandy beach. Deep learning and natural language processing algorithms are used to remove noise, recommend solutions to problems, and support operations by preventing recurrence. With OpsRamp, IT departments can design automated response policies that reduce manual intervention and prioritize issues based on business impact.
Resolve promotes agile autonomous IT operations
Resolve CEO Vijay Kerkal believes “self-healing IT” can be a reality, bridging the gap between problem and solution using AI and automation. “AI Ops tools quickly identify existing or potential performance problems, detect anomalies, find the root cause of problems, predict future problems, and develop proactive solutions before the business is impacted,” Kerkal said. Combining the insights and automation gained from AI allows organizations to maximize the value and potential of these technologies and seamlessly connect discovery, analysis, detection, prediction, and automation to bring them closer to self-healing IT.”
Resolve can also automatically discover applications and infrastructure, generate detailed topology maps, and identify dependencies between critical business applications and underlying infrastructure. Understanding these relationships makes problem-solving easier and overall IT management easier through a single-window that provides a clear view of a complex, multi-domain environment. By pushing this data to a CMDB (Configuration Management Database) in near real-time, you can obtain accurate inventory information and build a solid ITSM foundation.
Resolve Insights utilizes a number of ML algorithms, including anomaly detection, event pattern identification, and prediction algorithms. The goal is to improve the overall customer and employee experience by improving the performance of core apps and infrastructure, maximizing uptime, and providing insights to use for optimization work.
Splunk Helps IT Department Manage Complex Operating Environments
Andy Mann, Splunk's chief technical advisor, is a DevOps leader and author of books on innovation and IT operations. Mann advised that IT departments should move from a legacy operating model designed to support monolithic applications to an operating model that embraces data-driven automation and focuses on how to deliver services.
“As modern approaches accelerate technology adoption and contact in the global 24/7 electronic marketplace, the complexity of modern systems has become too high for humans to effectively manage, and 'traditional' designed for legacy monoliths. IT operational techniques cannot keep pace with this trend. Advanced algorithmic processing and machine learning, artificial intelligence, response automation, and workflow orchestration--that is a data-driven approach with AIOps that allows service delivery teams to cope with new levels of complexity. Splunk addresses these challenges with AI Ops, providing a data-driven approach to IT Ops, observability, and security to ensure the performance, availability, functionality, stability, and impact that business and customers demand.”
Splunk uses a “white box” approach to machine learning, which pre-loads 30 algorithms required for anomaly detection and classification, clustering, cross-validation, feature extraction, preprocessing, regression, and time series analysis. Enter. It also includes more than 300 open source Python algorithms from the library scikit-learn, pandas, statsmodels, NumPy, and SciPy.
AI Ops can be of great help to any IT team
Mann recalls the old days of working with IT operations teams to maintain the high availability and performance of web applications. System and application monitors were deployed when customers and employees reported problems. We have created playbooks and standard operating procedures for resolving recurring incident types. If possible, I wrote a script to restart the webserver, clean up database tablespaces, and find and keep old files on the primary storage system.
However, given the scale, complexity, and service expectations of today, IT must speed up this process. This is where the AI Ops solution is used. The AI Ops platform provides a framework for centralizing and organizing operational data and using machine learning to find and automate various problems. The end goal is to provide a better experience, reduce hard work, free time for your IT department, and enable you to pursue projects and innovations for your business.
0 Comments