Choosing the right distribution is crucial for optimizing the use of system resources and building a robust OpenTelemetry pipeline. In this post, we will explore in detail what OTEL distribution is, its architecture, and the principles to consider when selecting the distribution for your purpose. We will also look at the pros and cons of building a custom distribution.

Before understanding the logic of distributions, let’s revisit the architecture of the OpenTelemetry collector. Feel free to skip this section if you are already familiar with it.

Architecture of the OpenTelemetry Collector

The primary purpose of the collector is to receive, process, and export OpenTelemetry data (also referred to as signals) into the pipeline. The collector achieves these goals by combining components for:

Receiving data

A collector can receive data from various sources, such as another Opentelemetry Collector, application Opentelemetry instrumentation, Kafka queue, and many others, by using corresponding components.

Processing data

Between receiving and exporting data, a collector can process it using various processors. For example, it can enrich the data with attributeprocessor, sample data with tailingsampleprocessor, combine data into batches with batchprocessors for optimizing performance, and much more.

Exporting data

Similarly to receiving data, an OpenTelemetry Collector can use various components for sending data to multiple destinations, such as another collector, Observability Backend, Kafka, and more.

Additional purposes

Additionally, a collector can utilize connectors to connect multiple pipelines and extensions for features that do not directly relate to data processing. We will not discuss them in this manual, but the logic of using them is the same as for other components.

You can configure pipelines in OTEL Collector by combining multiple receivers, processors, and exporters, as shown in the diagram above. Within a single collector, you have the flexibility to use any number of components to create multiple pipelines, as long as these components are part of the collector build.

For example, you can create a pipeline inside the OpenTelemetry collector, which

Receives traces from applications over the OTLP protocol using otlpreceiver

Enriches them with attributes from the Kubernetes cluster using k8sattributesprocessor

Samples them based on probability using probalisticsampleprocessor

Combines in batches using batchprocessor

And exports results to Elastic backend using elasticsearchexporter

In the same collector, you can create another pipeline, which

Reads Kubernetes logs using k8slogreceiver

Enriches them with attributes from the Kubernetes cluster using k8sattributesprocessor

Sample them based on probability using probalisticsampleprocessor

Combine in batches using batchprocessor

And exports results to Elastic backend using elasticsearchexporter

For such a case, the collector build must include otlpreceiver, k8slogreceiver, k8sattributesprocessor, probalisticsampleprocessor, batchprocessor, and elasticsearchexporter, with minimal possible overhead.

The OTEL community and third-party vendors maintain a wide range of components that, when assembled, form a distribution.

What is a distribution?

The OpenTelemetry community defines a distribution as:

A distribution is a customized version of an OpenTelemetry component. A distribution is a wrapper around an upstream OpenTelemetry repository with some customizations.

Put simply, the distribution is a customized version of the OpenTelemetry collector, which may include:

a custom set of components,
custom default settings,
additional tests,
performance tunings,
and a few other specifics.

The central part of a distribution is the set of included components. Common components are typically located in the otel-collector-contrib and opentelemetry collector repositories, while vendor-specific ones may reside in third-party repositories. The distribution pulls the required ones from any of the available repositories, as demonstrated in the diagram below.

More components in distribution means more available features to use, but, at the same time, including excessive components increases the binary size, which leads to higher resource consumption and increased security risks. Therefore, the ideal distribution should include the minimum amount of excessive components.

When deciding on the OpenTelemetry distribution for your organization, you have a few options:

Use one of the pre-built OpenTelemetry community distributions
Use third-party distribution maintained by Vendors
Build a custom collector

Let’s examine each option in detail.

Opentelemetry community distributions

The OpenTelemetry community maintains a few pre-built distributions. It means that the community owns the building pipeline and version releases.

OpenTelemetry collector contrib distro

This distribution contains all the components from both the otel-collector-contrib and opentelemetry collector repositories. It provides a convenient way to explore the full range of features with minimal effort, but it may be too resource-intensive for production. Generally, the OpenTelemetry community does not recommend the otel collector contrib distro for real-time use.

Opentelemetry collector core distribution

This distribution is a ‘classic’ distribution. It includes the compact set of components to work with OTLP protocol, Kafka, Zipkin, Prometheus, Jaeger, and a few other features.

Example. The collector pipeline, when core distribution may be a fit:

The collector collects Metrics and traces via OTLP, enriches them with additional attributes, applies probabilistic sampling, and exports to Kafka. Additionally, it may export metrics to Prometheus-based metrics storage and traces to Jaeger.

If the collector also needs to collect events from the Kubernetes API*, the Otel collector core distro* won’t fit*.*

Opentelemetry Collector EBPF profiling distribution

The goal of this distribution is to collect data about processes running on the system. It contains a limited set of components for collecting K8s metadata, EBPF data, and exporting them to a file or via OTLP.

Example. The collector pipeline, when EBPF profiling distribution may be a fit:

The collector runs on a Kubernetes node, collects EBPF data, enriches it with k8s attributes, and sends it via OTLP to another OTLP collector. But you can’t include components for collecting metrics or traces via OTLP.

Opentelemetry Collector Kubernetes distro

This is the distribution optimized for Kubernetes. It contains components to collect information from journald, K8s events, fluentd, OTLP, and others. It also includes a few basic exporters, such as OTLP exporter, fileexporter, load balancer exporter. The full list of components is available in the manifest.

Example. The collector pipeline, when the Kubernetes distro may be a fit:

The collector collects Metrics, traces, and logs via OTLP, enriches them with Kubernetes attributes, applies probabilistic sampling, and additionally collects Kubernetes events and logs from Fluentd. The collector exports processed data over OTLP to another Opentelemetry collector.

The distribution won’t fit if the collector exports data to Kafka instead of OTLP or exports to the Zipkin Backend.

Opentelemetry Collector OTLP distro

It is the most minimalistic distribution. It contains modules for receiving and exporting data over the OTLP protocol. It is a good option to serve as a proxy, protocol translation, and a few other scenarios.

Example. The collector pipeline, when OTLP distro may be a fit:

The collector runs as a sidecar for a Kubernetes application, collects data from this application over OTLP, and sends it to another OTEL collector.

Vendors distributions

Some organizations maintain and distribute their own versions of OpenTelemetry collectors. Vendor distributions often offer additional features, such as adapting the collector for easier integration with vendor software, easier configuration, optimized performance, and others.

For example, Elastic EDOT Distribution of OpenTelemetry Collector includes components from OpenTelemetry Collector Core Distro and Elastic Collector components. It offers additional features that can be useful for Elastic users.

Other examples include DataDog distribution, AWS Distro distribution, and others.

Custom Build distribution

For the highest level of control and flexibility, consider building a custom collector. By building your own collector, you gain full control over which components to include, which default configuration to use, when to upgrade the version, whether to include custom components, and so on. As a downside, you have to handle the version upgrades on your own if you choose this option.

Summary

We can summarize all the benefits and drawbacks of each option in the table below:

Distribution	Benefits	Drawbacks
Otel community distribution	✓ The choice of distributions for several use cases	✗ The set of components cannot be changed
Third-party distribution	✓ Extended support and optimizations for vendor software	✗ The upgrades’ availability depends on the Vendor's release schedule
Custom distribution	✓ Full control over the component set	✗ The need to organize and maintain the building and the release pipeline

Conclusion

The community and third-party vendors maintain different OpenTelemetry collector distributions for various use cases. However, you don’t have to limit yourself to choosing one distribution for solving all observability pipeline tasks. Your infrastructure may utilize multiple distributions simultaneously and form a solid pipeline together. For example, you can use the k8s distribution for collecting k8s-specific information, the EBPF distribution for EBPF data, the Elastic EDOT distribution to process data for elastic backends, and so on. If none of the pre-built distributions fit your needs, or for the maximal flexibility and control, you can build a custom OpenTelemetry collector distribution.

By understanding these pros and cons, you can select the distribution that best aligns with your infrastructure's needs.

How to choose the OpenTelemetry collector distribution

Architecture of the OpenTelemetry Collector

Receiving data

Processing data

Exporting data

Additional purposes

What is a distribution?

Opentelemetry community distributions

OpenTelemetry collector contrib distro

Opentelemetry collector core distribution

Opentelemetry Collector EBPF profiling distribution

Opentelemetry Collector Kubernetes distro

Opentelemetry Collector OTLP distro

Vendors distributions

Custom Build distribution

Summary

Conclusion

Comments

More from this blog

How to send cross-service OpenTelemetry traces from Python to Jaeger: end-to-end setup with Docker Compose

Metrics in OpenTelemetry for beginners: easily explained

Command Palette

Architecture of the OpenTelemetry Collector

Receiving data

Processing data

Exporting data

Additional purposes

What is a distribution?

Opentelemetry community distributions

OpenTelemetry collector contrib distro

Opentelemetry collector core distribution

Opentelemetry Collector EBPF profiling distribution

Opentelemetry Collector Kubernetes distro

Opentelemetry Collector OTLP distro

Vendors distributions

Custom Build distribution

Summary

Conclusion

Comments

More from this blog