Best Practices for Enriching Network Telemetry to Support Network Observability

Phil GervasiDirector of Tech Evangelism

February 1, 2023

Table of Contents

Identifying Key Performance Indicators (KPIs) in networking Identifying and collecting data from auxiliary data sources Logs and events Endpoint telemetry Application-level telemetry Leveraging AI and ML to gain insights Key considerations for managing and storing enriched telemetry data Conclusion Network Telemetry FAQs What is network telemetry?What is meant by streaming network telemetry?How does network telemetry improve network security?What are some common network telemetry data sources?

Summary

Collecting and enriching network telemetry data with DevOps observability data is key to ensuring organizational success. Read on to learn how to identify the right KPIs, collect vital data, and achieve critical goals.

Network observability is critical. You need the ability to answer any question about your network—across clouds, on-prem, edge locations, and user devices—quickly and easily.

But network observability is not always easy. To be successful, you need to collect network telemetry, and that telemetry needs to be extensive and diverse. And once you have that raw telemetry data, you need to interpret it. And even then, key questions— such as, Am I using my network resources effectively?—are not always easy to answer.

To answer the business-level questions that can move the needle, you need to enrich your network telemetry. This post will provide concrete guidance on how to do just that. We’ll look at how, by combining DevOps observability data with network telemetry, you can get strong, network-focused observability. Let’s begin with a discussion of KPIs.

Identifying Key Performance Indicators (KPIs) in networking

The first step toward comprehensive network observability is identifying your key performance indicators. Here are some examples of network-related KPIs:

Network latency
Packet loss
Throughput
Connections per second
Bandwidth utilization

Note that these KPIs can be aggregated at different levels of the hierarchy—individual endpoints or instances, multi-instance services, entire data centers, across regions, and globally.

After identifying and categorizing the relevant KPIs, you need to gather data about these KPIs. Network monitoring tools use various techniques for data gathering, including polling, collecting metrics from network devices, and scraping traffic logs.

In the cloud, you can ingest network telemetry data from cloud providers into your network observability platform. In your own data centers, you will need to choose, install, and configure network monitoring tools.

Identifying and collecting data from auxiliary data sources

The next step is to collect the auxiliary data that will be used to enrich the network telemetry data. Let’s cover the different major types of auxiliary data.

Logs and events

Log files from network devices, servers, and applications can contain information relevant to your network observability KPIs. The basic process looks like this:

Extract relevant information.
Correlate that information with network telemetry using timestamps, shared tags, and geographical locations.
Ingest the auxiliary data into your network observability platform.

Events, such as alerts generated by network devices, can also be ingested into the observability platform, potentially triggering a higher-level alert.

Endpoint telemetry

Endpoint telemetry refers to data collected from devices that are connected to the network, such as laptops, tablets, and smartphones. This data may include performance metrics and resource usage of the devices, as well as the applications and services running on them. This endpoint telemetry data, too, can be used to enrich network telemetry.

For example, if you see a spike in CPU usage on endpoint devices, this might indicate an issue on the network, causing the devices to work harder than usual.

As another example, let’s assume you see an increase in network latency. As part of your investigation into the issue, you can use endpoint telemetry data to see if there are changes in network access patterns on endpoint devices.

Application-level telemetry

Application-level telemetry refers to data collected from the applications and services running on the network, such as web servers, databases, and custom business applications. This data includes performance, errors, and resource usage by these applications and services.

Imagine that your monitoring of application-level telemetry shows a spike in response times. This might indicate an issue on the network that is causing the application to wait longer for network responses. Application-level telemetry can help you determine if your network is having problems. When properly correlated with network telemetry, it can even help you with root cause analysis.

When considering observability at the application level, take advantage of distributed tracing, making sure to use it comprehensively. This can be especially helpful for enriching network telemetry if your system is based on a microservice architecture.

Leveraging AI and ML to gain insights

Your network observability platform should have dashboards and visualizations for humans to understand overall network health and performance. However, at scale, humans alone can’t detect and respond to issues fast enough.

Machine learning (when implemented and trained properly) excels at digesting high-dimensionality data like enriched network telemetry. It can identify trends, predict future outcomes, and discover anomalies. These are network observability insights that even keen-eyed human operators would be unable to spot.

In addition, AI/ML-backed tools can be used to summarize and consolidate complex data to make it digestible by humans. As it helps human operators understand the state of the network, these tools can also recommend courses of action during incidents.

Key considerations for managing and storing enriched telemetry data

Now, let’s look at a few of the key considerations you’ll want to consider when you manage and store all this enriched telemetry data.

First, when collecting the data, accounting for user privacy is imperative. You need to be aware of the types of data you feed into your network observability platform and ensure you comply with all relevant laws and regulations.

Next, observability doesn’t come cheap. It is easy to collect a lot of data, but you must consider the cost of collection, storage, and analysis and weigh that against the value that you derive from your data. For example, do you need to capture and analyze every network packet, or is it sufficient to analyze only 10% of the packets? Do you need to store your flow logs forever, or can you purge them after two years?

Finally, the value of enriching network telemetry is clear. However, an organization must manage and store all that data appropriately in order to reap the benefits. This is where a network observability platform like Kentik comes in. You need a solid platform that follows industry best practices, integrates with all the standard tools and network providers, and offers a turnkey (yet customizable) solution.

Conclusion

Let’s recap. Network telemetry is fundamental to network observability, but it can be much more useful if you enrich it with data from auxiliary sources such as logs, events, endpoint device telemetry, and application-level telemetry. Once you identify your network observability KPIs, you can collect all the relevant data and feed it into your network observability platform.

Meanwhile, you should leverage AI/ML-backed tools to understand your network, detect problems early, and provide predictive analysis.

Ongoing analysis of network telemetry data is crucial for maintaining network health and performance. Enriched network telemetry can level up your network observability effectiveness significantly. This is because your network—and the traffic attempting to pass through it—is dynamic and constantly changing. Real-time analysis of the current state and behavior of the network can help network administrators (and automation safeguards) to identify issues and take proactive measures to resolve them.

To take advantage of enriched network telemetry and realize the goal of true network observability, you need a robust network observability platform like Kentik.

Network Telemetry FAQs

What is network telemetry?

Network telemetry, a fundamental concept for network observability, is the process of collecting, processing, and interpreting data from various network devices and components to monitor their performance, behavior, and status in real-time.

In the broadest sense, network telemetry encompasses a vast array of measurements that cover everything from latency and bandwidth utilization to packet loss and connections per second. This data is usually obtained by active monitoring methods like synthetic tests or passive monitoring methods such as packet capture or flow data.

Once this data is collected, it is transmitted from the network’s edge to a central location for storage, analysis, and visualization. This telemetry data provides network administrators with valuable insights into the health and performance of the network, enabling them to detect and troubleshoot issues before they impact end-users or escalate into larger problems.

What sets network telemetry apart from network performance metrics is its focus on depth, detail, and dynamism. It’s not just about capturing basic metrics or sporadic snapshots of network activity. Instead, network telemetry strives to collect comprehensive, granular, and real-time data across all areas of the network, providing a more accurate and timely picture of network behavior and performance.

In an era where networks are increasingly complex, distributed, and vital to business operations, network telemetry has become an indispensable tool for maintaining network reliability, efficiency, and security. By enriching network telemetry with auxiliary data sources such as logs, events, endpoint device telemetry, and application-level telemetry, network operators can enhance their network observability and manage their networks more effectively.

What is meant by streaming network telemetry?

Streaming network telemetry refers to the continuous, real-time transmission of network data from network devices to a centralized system or platform for analysis and visualization. Unlike traditional methods which involve periodic polling or snapshotting of network status, streaming telemetry enables the network administrators to have a more dynamic, up-to-the-minute view of the network’s health and performance.

In the context of network observability, streaming telemetry provides more granular insights into network behaviors, helping to identify patterns, predict potential issues, and initiate quick troubleshooting. With streaming network telemetry, network anomalies can be detected and addressed as they happen, reducing the risk of network downtime or performance degradation.

The increased visibility and real-time nature of streaming network telemetry make it a crucial component of modern, robust network observability strategies, especially in complex, distributed network environments. By leveraging this technology, network administrators are better equipped to ensure network reliability, performance, and security.

How does network telemetry improve network security?

Network telemetry data can provide in-depth visibility into network traffic, which is crucial for detecting security threats. Detailed telemetry data can help identify patterns or anomalies that might indicate a security breach or cyberattack. By using telemetry data, security teams can respond quickly to threats, isolate affected systems, and prevent further damage.

What are some common network telemetry data sources?

Network telemetry data sources span a wide array of network types and elements. Common sources include:

Cloud infrastructure: Service meshes, transit gateways, and ingress gateways specific to cloud environments.
Data center: Leaf and spine switches, top of rack switches, and API gateways for digital services.
Internet and broadband infrastructure: Includes access and transit networks, edge and exchange points, and Content Delivery Networks (CDNs).
4G, 5G networks: Components like evolved packet core (v)EPC, Multi-access edge computing (MEC), optical transport switches (ONT/OLT), and Radio Access Network (RAN).
IoT: IoT endpoints, gateways, and industrial switches.
Campus network: Ethernet switches, layer 2 and 3 switches, hubs, network extenders, wireless access points and controllers.
Traditional WAN: WAN access switches, integrated services routers, and cloud access routers.
SD-WAN: Access gateways, uCPE, vCPE, and composed SD-WAN services including their cloud overlays.
Service provider backbone: Edge and core routers, transport switches, optical switches, and Data Center Interconnects.
MSO (Multiple System Operators): Cable Access Platforms (CAP), CMTS, Optical Distribution Network (ODN), and Broadband Network Gateway (BNG).

Additionally, there are observation points that could generate telemetry data:

Network devices: Physical and virtual routers, switches, wireless access points, application delivery controllers, and other network elements.
Endpoints: Client and server/service endpoints, including physical, virtual, and overlay/tunnel interfaces.
Controllers: Software-defined network controllers, orchestrators, and path computation applications.
Network TAPs, SPAN ports, and Network Packet Brokers (NPBs): These provide access to network traffic for monitoring and analysis.
L4-7 network elements: These include web appliances, content delivery networks, and application delivery controllers.
Firewalls and security appliances/services: These act as gateways, enforce policies, and generate telemetry data.
Application layer: Elements like Application Delivery Controllers (ADCs), load balancers, and service meshes.

In terms of telemetry protocols, these devices may use standardized formats such as NetFlow, IPFIX, or VPC Flow Logs, amongst others.

In modern networks, visibility across all these data sources is crucial for comprehensive network observability. However, due to the diversity of devices from multiple vendors, achieving unified visibility can be a challenge. With a well-designed data platform, it’s possible to iteratively work towards complete coverage, starting with key areas and expanding over time.

Best Practices for Enriching Network Telemetry to Support Network Observability

Summary

Identifying Key Performance Indicators (KPIs) in networking

Identifying and collecting data from auxiliary data sources

Logs and events

Endpoint telemetry

Application-level telemetry

Leveraging AI and ML to gain insights

Key considerations for managing and storing enriched telemetry data

Conclusion

Network Telemetry FAQs

What is network telemetry?

What is meant by streaming network telemetry?

How does network telemetry improve network security?

What are some common network telemetry data sources?

Explore more from Kentik

Platform

Solutions

Technology

New and Notable

Learn

Company