Kubernetes Networking: CNI, Service Mesh, and Network Policies (Deep Dive)

Kubernetes networking is, without question, the most complex aspect of running production clusters. Unlike compute or storage, which map relatively cleanly to familiar concepts, Kubernetes networking introduces a layered model of abstractions — from pod-level connectivity through service discovery, ingress routing, and inter-cluster communication — that can overwhelm even experienced engineers. In our consulting engagements, networking misconfigurations remain the single most common root cause of production incidents.

The 2025-2026 period has brought significant shifts. Cilium has cemented its position as the dominant CNI, eBPF has moved from experimental curiosity to production standard, the Ingress NGINX controller is officially retiring, and the service mesh landscape has matured with Istio’s Ambient mode eliminating sidecar fatigue. This guide covers everything you need to understand and implement production-grade Kubernetes networking in 2026.

The Kubernetes Networking Model

Before diving into specific tools and implementations, it is essential to understand the foundational networking model that Kubernetes enforces. Every Kubernetes cluster must satisfy four fundamental requirements:

Pod-to-Pod communication — Every pod receives its own unique IP address, and all pods can communicate with every other pod without NAT. This flat network model is one of the most distinctive design decisions in Kubernetes.
Pod-to-Service communication — Services provide stable virtual IPs (ClusterIPs) that load-balance traffic across a set of pods, surviving pod restarts and rescheduling.
External-to-Service communication — Traffic from outside the cluster must reach internal services through NodePort, LoadBalancer, or Ingress/Gateway resources.
Container-to-Container communication — Containers within the same pod share a network namespace and communicate over localhost.

The critical design principle is IP-per-Pod: each pod gets a routable IP address, visible to every other pod in the cluster. There is no network address translation between pods. This simplifies application design — services do not need to negotiate port assignments or handle NAT traversal — but it places significant demands on the underlying network implementation.

Kubernetes itself does not implement networking. Instead, it defines the Container Network Interface (CNI) specification, delegating the actual implementation to third-party plugins. The choice of CNI plugin fundamentally shapes your cluster’s performance, security capabilities, and operational complexity.

CNI Plugins Deep Dive: Cilium vs Calico vs Flannel

The CNI plugin you choose is arguably the most consequential infrastructure decision you will make for your Kubernetes cluster. In 2026, four plugins dominate production deployments, each with distinct strengths and trade-offs.

Cilium

Cilium has become the clear market leader, holding over 50% of CNI deployment share according to the 2025 Isovalent State of Kubernetes Networking Report. When including managed services powered by Cilium (Azure CNI Powered by Cilium, GKE Dataplane V2), coverage exceeds 60%. Cilium graduated from the CNCF in October 2023 and has seen nearly 10,000 pull requests contributed in 2025 alone.

Cilium is built on eBPF, executing networking logic directly in the Linux kernel without the overhead of iptables chains. It provides L3/L4 and L7 network policy enforcement, transparent encryption via WireGuard or IPsec, built-in observability through Hubble, and can replace kube-proxy entirely.

Calico

Calico remains a strong second choice, particularly for organisations that prefer a more traditional approach or need support for non-Kubernetes workloads. Calico supports both eBPF and iptables/nftables data planes, giving teams flexibility to adopt eBPF incrementally. Its BGP-based routing avoids overlay encapsulation overhead, making it well-suited for on-premises deployments with existing network infrastructure.

Flannel

Flannel remains popular in development environments and smaller clusters due to its simplicity. It uses VXLAN overlay networking and requires minimal configuration. However, Flannel does not support network policies natively — a critical limitation for any production deployment with security requirements. Organisations starting with Flannel often migrate to Cilium or Calico as their clusters grow.

AWS VPC CNI

For teams running on Amazon EKS, the AWS VPC CNI assigns real VPC IP addresses to pods, enabling native integration with AWS security groups, VPC flow logs, and other AWS networking features. From version 1.21.0, it supports network policy enforcement. The trade-off is IP address consumption — each pod consumes a VPC IP, which can exhaust subnet ranges in large clusters. Many organisations now run EKS with Cilium as a hybrid approach, gaining eBPF capabilities whilst retaining VPC integration.

Weave Net: Deprecation Warning

If your clusters still use Weave Net, migration should be a priority. Weaveworks closed in early 2024, and the project has been archived with no releases since January 2021. Kubespray, Rancher, and other distributions have deprecated or removed Weave Net support. We strongly recommend migrating to Cilium or Calico before your clusters reach a Kubernetes version that drops Weave compatibility entirely.

CNI Performance Comparison

The following benchmarks reflect typical production measurements across recent evaluations:

CNI Plugin	Throughput	Latency	Network Policy	eBPF Support	Best For
Cilium	~9.2 Gbps	0.20 ms	L3/L4/L7	Native	Production clusters, security-focused
Calico (BGP)	~8.5 Gbps	0.25 ms	L3/L4	Optional	On-premises, hybrid environments
Flannel (VXLAN)	~6.5 Gbps	0.40 ms	None	No	Development, simple clusters
AWS VPC CNI	Native VPC	Native VPC	L3/L4	No	AWS-native EKS deployments

In pod-to-service throughput tests, Cilium achieves approximately 28.5 Gbps compared to Calico’s 22.1 Gbps, translating to roughly 25% better response times for high-throughput applications. On a 100-node cluster, Cilium’s additional memory overhead compared to Flannel amounts to approximately 12-17 GB across the fleet, but the performance gains typically allow organisations to reduce node counts by 10-15%, more than offsetting the cost.

kube-proxy and the eBPF Revolution

Every Kubernetes cluster runs kube-proxy on each node to implement service load balancing. Traditionally, kube-proxy used iptables rules to translate ClusterIP addresses to pod endpoints. This approach has a well-documented scaling problem: iptables rules are evaluated linearly. With 20,000 services, a single rule replacement can take up to 5 hours. Every new service adds more rules to traverse, creating measurable latency increases at scale.

iptables vs IPVS vs eBPF

The Kubernetes community has pursued three successive approaches to solving the kube-proxy bottleneck:

iptables mode is the original implementation. It creates NAT rules in the kernel’s netfilter framework. Below approximately 1,000 services, performance is acceptable. Beyond that threshold, rule chain evaluation becomes a measurable bottleneck, consuming CPU and adding latency to every connection.

IPVS mode was introduced as an improvement, using hash tables instead of linear rule chains for service routing. IPVS handles 10,000+ services more gracefully than iptables. However, as of Kubernetes v1.35, IPVS mode has been officially deprecated, with removal planned in future releases. The community has concluded that eBPF provides a more comprehensive solution.

eBPF-based kube-proxy replacement eliminates both iptables and IPVS by implementing service load balancing directly in the kernel using eBPF programmes. Rather than traversing rule chains, eBPF uses hash table lookups with O(1) complexity, scaling to over 1 million rules without degradation. Connection tracking bypasses the kernel’s conntrack subsystem entirely, reducing CPU usage by 25-40% in typical deployments.

Both Cilium and Calico (in eBPF mode) support full kube-proxy replacement. In our production deployments, switching from iptables-based kube-proxy to Cilium’s eBPF replacement consistently delivers 20-30% throughput improvement, with the gap widening as cluster size increases. For any cluster running more than 500 services, we consider eBPF-based kube-proxy replacement a baseline recommendation.

Network Policies and Zero Trust

Kubernetes clusters ship with a default allow-all network posture. Every pod can communicate with every other pod across all namespaces. This is arguably the single most dangerous default in the entire platform. In our Kubernetes security best practices guide, we identify unrestricted pod-to-pod communication as the top misconfiguration in new cluster deployments.

The Default Deny Pattern

The foundation of Kubernetes network security is a default deny policy applied to every namespace:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

This policy blocks all inbound and outbound traffic for every pod in the namespace. You then explicitly allow only the communication paths your application requires. This approach aligns with zero trust principles — no implicit trust exists between any workloads.

Namespace Isolation

After applying default deny, define policies that permit traffic within and between namespaces based on labels:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend-api
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          tier: frontend
      podSelector:
        matchLabels:
          app: web
    ports:
    - protocol: TCP
      port: 8080

L7 Policies and DNS Allowlisting with Cilium

Standard Kubernetes NetworkPolicy operates at L3/L4 only — you can restrict traffic by IP, namespace, and port, but not by HTTP method, path, or DNS name. Cilium extends this with CiliumNetworkPolicy resources that support L7 filtering:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: restrict-external-dns
spec:
  endpointSelector:
    matchLabels:
      app: payment-service
  egress:
  - toFQDNs:
    - matchName: "api.stripe.com"
    - matchName: "api.paypal.com"
  - toEndpoints:
    - matchLabels:
        app: postgres
    toPorts:
    - ports:
      - port: "5432"
        protocol: TCP

This policy restricts the payment service to communicating only with specific external payment APIs (by DNS name) and the internal database. Any other egress traffic is silently dropped.

Critical Warning: CNI Support Required

Network policies are silently ignored if your CNI plugin does not support them. Flannel, for instance, does not enforce NetworkPolicy resources at all. Pods continue communicating freely despite policies being applied. This is one of the most dangerous gaps in Kubernetes — the API accepts the resource without error, but enforcement never occurs. Always verify that your CNI supports the policy types you rely upon, and include network policy enforcement in your cluster monitoring and observability stack.

Ingress, Gateway API, and the NGINX Retirement

The Kubernetes ingress landscape is undergoing its most significant transition in years. The Ingress NGINX controller, which has served as the de facto standard for HTTP routing into Kubernetes clusters, is officially retiring. Best-effort maintenance continues until March 2026, after which there will be no further releases, bug fixes, or security vulnerability patches.

Why the Shift

The original Ingress resource was deliberately minimal — it supported basic host and path-based routing but lacked standardised mechanisms for TLS configuration, traffic splitting, header manipulation, or rate limiting. Controller-specific annotations filled the gaps, creating a fragmented ecosystem where Ingress resources were effectively non-portable between controllers.

Gateway API: The Modern Replacement

The Gateway API reached v1.0 in late 2023 and has matured rapidly since. It introduces a role-oriented resource model:

GatewayClass — Defines the controller implementation (analogous to IngressClass)
Gateway — Represents a load balancer or proxy instance, owned by infrastructure teams
HTTPRoute / GRPCRoute / TLSRoute — Define routing rules, owned by application teams

This separation allows platform teams to manage infrastructure whilst application teams define their own routing rules without requiring cluster-level permissions. Features that previously required annotations — traffic splitting, header modification, request mirroring, URL rewrites — are now first-class API fields.

Migration Guidance

SIG Network recommends that all Ingress NGINX users begin migration immediately. The ingress2gateway tool automates conversion of existing Ingress resources to Gateway and HTTPRoute resources. We have published a detailed walkthrough covering the migration process, including handling of custom annotations, TLS certificates, and canary deployments in our Ingress NGINX to Envoy Gateway migration guide.

Service Mesh Landscape in 2026

Service meshes operate at Layer 7, providing capabilities that sit above the CNI layer: mutual TLS between services, advanced traffic routing, distributed tracing, and fine-grained access control. The landscape has consolidated significantly, with three primary options remaining in active production use.

Istio: Sidecar vs Ambient Mode

Istio has been the most feature-rich service mesh for years, but its sidecar-based architecture imposed substantial overhead. Every pod received an Envoy proxy sidecar, consuming CPU, memory, and adding latency to every request. This “sidecar fatigue” drove many organisations to avoid service meshes entirely.

Istio’s Ambient mode (now production-ready) fundamentally changes this equation. Instead of per-pod sidecars, Ambient mode uses per-node ztunnel proxies for L4 mTLS and optional waypoint proxies for L7 processing. A single ztunnel consumes approximately 0.06 vCPU and 12 MB of memory, compared to 0.20 vCPU and 60 MB for a sidecar.

Linkerd

Linkerd has always prioritised simplicity and performance. Its Rust-based micro-proxies are significantly lighter than Envoy, and the project maintains a focused feature set rather than pursuing feature parity with Istio. For organisations that need mTLS, observability, and basic traffic management without the complexity of Istio, Linkerd remains an excellent choice.

Cilium Service Mesh

Cilium has expanded beyond CNI into service mesh territory, leveraging eBPF to provide mTLS, L7 traffic management, and observability without any sidecar proxies at all. For organisations already running Cilium as their CNI, adding service mesh capabilities requires no additional infrastructure — the same eBPF programmes that handle networking also enforce L7 policies and encrypt traffic.

mTLS Latency Overhead Comparison

Research published in late 2025, including a detailed performance comparison of service mesh frameworks, measured mTLS latency overhead across different implementations:

Service Mesh	mTLS Latency Overhead	CPU Overhead	Architecture
Istio (Sidecar)	+166%	+24.3%	Per-pod Envoy proxy
Istio (Ambient)	+8%	+4.8%	Per-node ztunnel
Linkerd	+33%	~10%	Per-pod Rust micro-proxy
Cilium	+99%	~8%	eBPF in-kernel

The results are striking. Istio’s sidecar mode nearly triples request latency, and at 12,800 requests per second, the load generator could not even reach the target throughput due to proxy overhead. Ambient mode reduces this to just 8% — a transformative improvement that makes Istio viable for latency-sensitive workloads. Linkerd sits in the middle with 33% overhead, reflecting its lighter proxy architecture.

For most organisations in 2026, we recommend evaluating Istio Ambient mode first. If you are already running Cilium and do not require the full breadth of Istio’s traffic management features, Cilium’s built-in service mesh capabilities offer the simplest operational model.

Network Observability

You cannot secure or optimise what you cannot see. Network observability has evolved from basic flow logs to rich, context-aware visibility into every packet traversing your cluster.

Hubble

Hubble is Cilium’s built-in observability layer, providing real-time visibility into network flows, DNS queries, HTTP requests, and policy enforcement decisions. Hubble’s CLI and web UI allow operators to inspect traffic between any two workloads, trace dropped packets back to the specific network policy that blocked them, and identify services communicating unexpectedly.

Hubble integrates natively with Prometheus and Grafana, exporting metrics that can be incorporated into your existing cloud-native monitoring stack. For clusters already running Cilium, Hubble requires minimal additional configuration.

Microsoft Retina

Microsoft Retina reached version 1.0 in 2025, providing CNI-agnostic network observability powered by eBPF. Retina’s distinctive feature is its integration with Hubble’s control plane regardless of the underlying CNI. For organisations running Calico, Flannel, or other non-Cilium CNIs on AKS or elsewhere, Retina surfaces deep network insights through the familiar Hubble interface.

This approach is particularly valuable for multi-cloud organisations that run Cilium on some clusters and different CNIs on others, allowing a unified observability experience across the entire fleet.

Calico Observability

Tigera’s commercial Calico offering (Calico Cloud and Calico Enterprise) provides network flow visualisation, service graphs, and anomaly detection. For organisations running Calico as their CNI, the integrated observability stack avoids the need for separate tooling, though the advanced features require a commercial licence.

Multi-Cluster Networking

As organisations mature their Kubernetes deployments, multi-cluster architectures become increasingly common — for geographic distribution, blast radius reduction, regulatory compliance, or simply to isolate development and production workloads. Multi-cluster networking enables pods and services in separate clusters to discover and communicate with each other.

Cilium ClusterMesh

Cilium ClusterMesh connects multiple Kubernetes clusters into a unified network, enabling pod-to-pod communication and service discovery across cluster boundaries. Unlike gateway-based approaches, ClusterMesh provides direct pod connectivity using BGP or tunnelling. Each cluster runs a clustermesh-apiserver that exposes cluster state to peers, and Cilium agents establish secure tunnels between clusters.

ClusterMesh supports global services (a single service name resolves across clusters), shared network policies, and transparent failover. The primary constraint is that all participating clusters must run Cilium as their CNI.

Submariner

Submariner is a CNCF sandbox project that takes a CNI-agnostic approach to multi-cluster connectivity. It establishes encrypted tunnels between clusters using IPsec or WireGuard, enabling direct pod-to-pod communication regardless of the underlying CNI. This flexibility makes Submariner the natural choice for organisations running heterogeneous cluster configurations — for example, connecting an EKS cluster running the VPC CNI with an on-premises cluster running Calico.

Skupper

Skupper operates at Layer 7, creating a virtual application network using AMQP-based messaging within mTLS tunnels. Unlike Submariner and ClusterMesh, Skupper requires no cluster-level administrative privileges and is entirely independent of the underlying CNI and network infrastructure. This makes it exceptionally easy to adopt in environments with strict security controls or where modifying cluster networking is not feasible.

Research presented at FOSDEM 2026 highlighted Skupper’s ease of configuration and balanced performance profile, making it particularly suited for connecting services across organisational boundaries or hybrid cloud/on-premises environments.

Choosing a Multi-Cluster Solution

Solution	Layer	CNI Requirement	Admin Privileges	Best For
Cilium ClusterMesh	L3/L4	Cilium only	Yes	Homogeneous Cilium clusters
Submariner	L3	Any	Yes	Heterogeneous CNI environments
Skupper	L7	Any	No	Cross-organisation, hybrid cloud

Putting It All Together: A Production Networking Stack

Based on our experience across hundreds of production Kubernetes deployments, here is the networking stack we recommend for most organisations in 2026:

CNI: Cilium with eBPF kube-proxy replacement. The performance benefits, integrated security features, and observability make it the default choice for new clusters.
Network Policies: Default deny-all in every namespace, with explicit allow rules. Use CiliumNetworkPolicy for L7 and DNS-based filtering where needed.
Ingress/Gateway: Migrate to Gateway API with an implementation matching your environment (Envoy Gateway, Istio Gateway, or Cilium Gateway).
Service Mesh: Evaluate Istio Ambient mode for organisations needing advanced traffic management. For Cilium users, start with Cilium’s built-in mesh capabilities before adding Istio.
Observability: Hubble for network flow visibility, integrated with Prometheus and Grafana for alerting and dashboards.
Multi-Cluster: Cilium ClusterMesh if all clusters run Cilium; Submariner or Skupper for heterogeneous environments.

The most important principle is to treat networking configuration as code. Store network policies, Gateway resources, and CNI configurations in Git, enforce them through CI/CD pipelines, and audit changes through pull request reviews. This GitOps approach to networking ensures consistency, traceability, and the ability to roll back misconfigurations quickly. For a broader view of cloud-native practices, see our cloud-native DevOps with Kubernetes guide.

Simplify Your Kubernetes Networking with Expert Guidance

Kubernetes networking spans CNI selection, eBPF migration, network policy design, Gateway API adoption, service mesh evaluation, and multi-cluster connectivity. Getting it right requires deep expertise across each layer — and getting it wrong results in security vulnerabilities, performance bottlenecks, and operational complexity that compound over time.

Our team provides comprehensive Kubernetes consulting services to help you:

Design and implement production networking architectures with Cilium, eBPF-based kube-proxy replacement, and zero-trust network policies tailored to your workloads
Migrate from Ingress NGINX to Gateway API before the March 2026 end-of-maintenance deadline, with zero-downtime cutover strategies
Evaluate and deploy service mesh solutions including Istio Ambient mode and Cilium service mesh, with performance benchmarking against your specific latency and throughput requirements

We have helped organisations across regulated industries build Kubernetes networking stacks that scale to thousands of services whilst maintaining strict security and compliance standards.

Speak with our Kubernetes networking specialists —>