~/blog/how-to-build-internal-developer-platform-guide-2026
zsh
[ENGINEERING]

Internal Developer Platform: We Built IDPs for 50+ Teams (Guide)

author="Engineering Team" date="2026-02-16"

Internal Developer Platforms have moved from conference-talk curiosity to boardroom priority. According to Gartner, 80% of large software engineering organisations will establish platform teams by 2026. Yet the gap between announcing a platform initiative and delivering one that developers actually use remains vast.

We have built, rescued, and scaled Internal Developer Platforms for over 50 engineering teams ranging from 30-person startups to 2,000-engineer enterprises. The patterns that succeed and the mistakes that derail projects have become remarkably consistent. This guide distils those lessons into a practical framework you can follow regardless of your organisation’s size or cloud provider.

If you are still clarifying the boundaries between DevOps, SRE, and platform engineering roles, our post on the differences between DevOps, SRE, and platform engineering provides a useful foundation before diving in here.


What Is an Internal Developer Platform?

An Internal Developer Platform (IDP) is a self-service layer that sits between developers and the underlying infrastructure. It standardises how teams provision environments, deploy applications, manage secrets, and observe production workloads. The goal is to reduce cognitive load on developers while maintaining the guardrails that operations and security teams require.

An IDP is not a single product you purchase. It is an opinionated composition of tools, workflows, and abstractions tailored to your organisation. The CNCF Platform Engineering Maturity Model describes it as a curated set of capabilities presented as a coherent product to internal users.

What an IDP provides:

CapabilityDeveloper ExperienceOperations Benefit
Self-service environmentsSpin up a staging environment in minutesStandardised, reproducible infrastructure
Golden pathsPre-approved templates for common workloadsReduced configuration drift and audit burden
Deployment automationOne-click or git-push deploysConsistent rollout and rollback procedures
Secrets managementInject secrets without knowing vault detailsCentralised rotation and access control
ObservabilityPre-configured dashboards per serviceUniform telemetry across all teams
Policy enforcementGuardrails that prevent misconfigurationsAutomated compliance without ticket queues

The critical distinction is between a platform and a portal. A portal gives developers a user interface to view information. A platform changes how work gets done. We have seen too many organisations invest heavily in a portal only to discover that developers still bypass it for kubectl and ad-hoc scripts. The platform must embed itself into the developer’s actual workflow.


The Five-Plane Architecture Model

Every successful IDP we have built follows a layered architecture. We use a five-plane model that separates concerns cleanly and allows teams to evolve each layer independently.

1. Developer Control Plane

This is the surface area developers interact with daily. It includes the developer portal (built with tools like Backstage, Port, or Cortex), CLI tools, and API interfaces. The developer control plane abstracts away infrastructure complexity and presents capabilities through service catalogues, scaffolding templates, and self-service actions.

Key design principles:

  • Meet developers where they work (IDE plugins, CLI, pull request comments)
  • Minimise context switching between tools
  • Provide clear feedback loops for every action

2. Integration and Delivery Plane

This plane handles CI/CD pipelines, artifact management, and deployment orchestration. It is the engine that turns developer intent into running software. Tools like ArgoCD, Flux, GitHub Actions, and Tekton operate here.

For teams already invested in GitOps workflows, our guide on GitOps with Helm and ArgoCD covers the delivery patterns that integrate well with an IDP.

3. Resource Plane

The resource plane manages infrastructure provisioning and configuration. This is where Terraform, Crossplane, Pulumi, and cloud-provider APIs come into play. The resource plane translates high-level developer requests (such as “I need a PostgreSQL database”) into properly configured, policy-compliant infrastructure.

Crossplane deserves special attention here. It brings infrastructure management into Kubernetes, allowing platform teams to define custom resource definitions (CRDs) that map to cloud resources. Developers provision infrastructure using the same kubectl and GitOps workflows they already know.

If your organisation runs on AWS and uses Terraform for infrastructure, our Terraform EKS module guide demonstrates patterns for standardised cluster provisioning that feed directly into an IDP.

4. Monitoring and Observability Plane

This plane provides unified visibility across all services and infrastructure. It includes metrics collection (Prometheus, Datadog), log aggregation (Grafana Loki, Elasticsearch), distributed tracing (Jaeger, Tempo), and pre-built dashboards. The observability plane ensures that every service deployed through the platform is automatically instrumented and monitored.

5. Security and Governance Plane

Security is not a bolt-on. The security plane enforces policies at every layer, from image scanning in CI to runtime policy enforcement in Kubernetes. Tools like Open Policy Agent (OPA), Kyverno, and cloud-native security services operate here.

Policy categories to implement from day one:

  • Network policies restricting pod-to-pod communication
  • Resource quotas preventing runaway costs
  • Image provenance verification (signed container images)
  • Secret access controls with audit logging
  • Compliance-as-code for regulatory requirements (SOC 2, ISO 27001, HIPAA)

Build vs Buy vs Assemble: The Decision Framework

One of the first strategic decisions is whether to build your IDP from scratch, buy a commercial platform, or assemble one from open-source components. After advising over 50 teams on this decision, we have developed a clear framework.

Build from Scratch

Best for: Organisations with 500+ engineers, unique workflow requirements, and a dedicated platform team of 5 or more engineers.

Advantages:

  • Complete control over abstractions and user experience
  • Deep integration with proprietary systems
  • No vendor lock-in on the platform layer

Risks:

  • High initial investment (12-18 months to meaningful value)
  • Ongoing maintenance burden absorbs platform team capacity
  • Risk of building a “snowflake” that only your organisation understands

Buy a Commercial Platform

Best for: Organisations that want rapid time-to-value and have budget for commercial tooling. Products like Humanitec, Cortex, and OpsLevel offer this path.

Advantages:

  • Faster time-to-value (weeks instead of months)
  • Vendor handles upgrades, security patches, and new features
  • Pre-built integrations with common tools

Risks:

  • Vendor lock-in on the platform abstraction layer
  • Limited customisation for unique workflows
  • Ongoing licensing costs scale with organisation size

Best for: Mid-size organisations (50-200 engineers) that need flexibility without the overhead of building everything from scratch.

This is the approach we recommend most frequently. You select best-of-breed open-source tools, integrate them with lightweight glue code, and present a cohesive experience through a developer portal.

A typical assembled stack:

LayerToolPurpose
PortalBackstage or PortService catalogue, scaffolding, self-service
CI/CDGitHub Actions + ArgoCDBuild, test, and deploy
InfrastructureTerraform + CrossplaneCloud resource provisioning
PolicyOPA/KyvernoGovernance and compliance
ObservabilityPrometheus + Grafana + LokiMetrics, dashboards, logs
SecretsHashiCorp Vault or AWS Secrets ManagerSecret lifecycle management
Service meshIstio or LinkerdTraffic management, mTLS

The key advantage is that each component can be replaced independently as requirements evolve. You are not locked into a single vendor’s roadmap.


Phased Implementation Roadmap

Attempting to build an IDP in one monolithic effort is the most common cause of failure. We use a phased approach that delivers incremental value while building organisational buy-in.

Phase 1: Discovery and Alignment (2-4 Weeks)

Before writing any code, invest in understanding the actual developer experience today.

Activities:

  • Shadow 5-10 developers through their daily workflow
  • Map the current deployment pipeline end-to-end (commit to production)
  • Identify the top 3-5 pain points by frequency and severity
  • Audit existing tooling and identify integration points
  • Define success metrics aligned with leadership priorities

Deliverables:

  • Developer journey map documenting current friction points
  • Platform vision document with prioritised capabilities
  • Stakeholder alignment on MVP scope

The discovery phase often reveals that the biggest productivity killers are not where leadership assumes. In one engagement, a fintech team believed they needed a sophisticated service mesh. Shadowing revealed that developers spent 40 minutes per day waiting for staging environments to provision. We fixed that first.

Phase 2: MVP Build (6-8 Weeks)

Build the minimum viable platform that addresses the top pain point from discovery. Resist the urge to build a comprehensive solution in this phase.

Typical MVP scope:

  • Service catalogue with 2-3 golden path templates
  • Automated environment provisioning (dev and staging)
  • Basic CI/CD pipeline template integrated with GitOps
  • Single-click deployment to a non-production environment
  • Pre-configured observability for deployed services

Technical decisions to make:

  • Choose your developer portal (Backstage is the most common open-source choice)
  • Define your golden path template format (Cookiecutter, Yeoman, or custom)
  • Establish the GitOps repository structure
  • Set up the Kubernetes namespace and RBAC model

Anti-pattern to avoid: Do not build an “enterprise-grade” platform in the MVP. We have seen teams spend six months building a platform with multi-tenancy, RBAC, audit logging, and cost allocation before any developer used it. Ship something useful in six weeks.

For teams adopting Kubernetes-based platforms, our complete guide to DevOps automation in 2026 covers the CI/CD and infrastructure automation patterns that underpin a successful IDP.

Phase 3: Production Readiness (6-8 Weeks)

With the MVP validated by early adopters, harden the platform for production use.

Activities:

  • Implement RBAC and multi-tenancy
  • Add production deployment workflows with approval gates
  • Integrate security scanning into golden path pipelines
  • Configure cost allocation and chargeback reporting
  • Build runbooks for platform incidents
  • Establish SLOs for platform services (portal uptime, deployment success rate, provisioning time)

Security hardening checklist:

  • Enable audit logging for all platform actions
  • Implement network policies for platform components
  • Configure secret rotation automation
  • Set up vulnerability scanning for platform container images
  • Establish break-glass procedures for emergency access

Phase 4: Scaling and Adoption (Ongoing)

With a production-ready platform, focus shifts to adoption across the organisation and feature expansion driven by developer feedback.

Adoption strategies that work:

  • Champion programme: Identify 2-3 enthusiastic developers per team as platform advocates
  • Migration sprints: Dedicate two-week sprints where platform engineers pair with product teams to migrate services
  • Friction logging: Create a simple mechanism for developers to report pain points
  • Show-and-tell sessions: Weekly demos of new platform capabilities

Feature expansion priorities:

  • Database provisioning through self-service
  • Feature flag integration
  • Cost visibility per team and service
  • Automated canary deployments
  • Developer environment parity (local dev matches staging)

Golden Paths: The Heart of a Successful IDP

Golden paths are opinionated, pre-built templates that encode your organisation’s best practices for common tasks. They are not restrictions. They are the fastest, most supported way to accomplish something.

Designing Effective Golden Paths

A golden path should cover the full lifecycle of a common workload type:

  1. Scaffold: Generate a new service with standard project structure, CI/CD configuration, Dockerfile, Kubernetes manifests, and observability setup
  2. Build: Automated pipeline that compiles, tests, scans, and produces a deployable artifact
  3. Deploy: GitOps-driven deployment to development, staging, and production environments
  4. Observe: Pre-configured dashboards, alerts, and SLOs
  5. Operate: Runbooks, scaling policies, and incident response procedures

Example golden paths by workload type:

WorkloadTemplate Includes
REST API (Node.js)Express scaffold, OpenAPI spec, Helm chart, Prometheus metrics, health checks
Event consumer (Python)Kafka/SQS consumer, dead-letter handling, retry policies, tracing
Scheduled job (Go)CronJob manifest, idempotency patterns, monitoring, alerting
Frontend SPA (React)CDN deployment, feature flags, error tracking, performance monitoring

Golden Path Anti-Patterns

  • Too rigid: If the golden path cannot accommodate 80% of use cases without modification, it is too opinionated
  • Too many choices: Offering five golden paths for REST APIs creates confusion rather than simplicity
  • Unmaintained: A golden path that falls behind current tool versions erodes trust quickly
  • Undocumented: If developers need tribal knowledge to use the template, it is not a golden path

Common Anti-Patterns We Have Seen (and How to Avoid Them)

The Platform-as-Project Trap

Treating the IDP as a project with a fixed end date is a guaranteed path to failure. Platforms are products. They require continuous investment, a product owner, a roadmap, and regular user feedback. When the “project” ends and the team disbands, the platform decays within months.

Fix: Staff the platform team permanently. Assign a product manager. Maintain a public roadmap. Treat developers as customers.

The Portal Trap

Investing heavily in a beautiful developer portal while neglecting the underlying automation. A portal that displays service information but cannot actually deploy, provision, or configure anything is a dashboard, not a platform. We have encountered organisations that spent an entire year building a Backstage portal without connecting it to any meaningful self-service actions.

Fix: Start with the automation layer. Build the portal as a thin interface on top of capabilities that already work via CLI or API. The portal should be the last mile, not the first.

Ivory Tower Development

Building the platform in isolation, then unveiling it to developers and expecting adoption. Platform engineers who do not regularly pair with product developers build platforms that solve imagined problems.

Fix: Embed platform engineers in product teams during discovery. Run fortnightly feedback sessions. Track adoption metrics obsessively. If developers are not using a feature, find out why before building the next one.

The Abstraction Overreach

Creating such thick abstractions that developers cannot debug issues when something goes wrong. If a developer encounters a deployment failure and the platform hides all the Kubernetes details, they are stuck waiting for the platform team to investigate.

Fix: Provide progressive disclosure. Show the simple view by default, but let developers drill into the underlying Kubernetes resources, logs, and events when they need to. The platform should accelerate common tasks without blocking uncommon ones.


Measuring IDP Success: Metrics That Matter

You cannot justify continued investment in a platform without quantifiable results. We recommend measuring across three dimensions.

DORA Metrics

The DORA research programme provides the industry standard for software delivery performance. Track these four metrics before and after platform adoption:

  • Deployment frequency: How often your teams deploy to production
  • Lead time for changes: Time from code commit to running in production
  • Change failure rate: Percentage of deployments causing incidents
  • Mean time to recovery (MTTR): How quickly you restore service after an incident

Typical improvements we see after IDP adoption: deployment frequency increases 3-5x, lead time drops from days to hours, and change failure rate decreases by 30-50%.

SPACE Framework

The SPACE framework from Microsoft Research provides a more holistic view of developer productivity:

  • Satisfaction and well-being: Developer survey scores
  • Performance: System throughput and reliability metrics
  • Activity: Deployment counts, PR merge rates, environment provisioning frequency
  • Communication and collaboration: Cross-team contributions, documentation quality
  • Efficiency and flow: Time in flow state, interruption frequency

Developer NPS and Platform Adoption

Track these platform-specific metrics monthly:

  • Developer Net Promoter Score (NPS): “How likely are you to recommend the platform to a colleague?”
  • Adoption rate: Percentage of services using golden paths versus custom configurations
  • Self-service ratio: Percentage of infrastructure requests fulfilled through self-service versus tickets
  • Time to first deploy: How long it takes a new developer to deploy their first change

A healthy IDP should achieve a Developer NPS above 30 within six months. If it is below zero, the platform is creating more friction than it removes.


Security-First IDP Design

Security must be woven into every layer of the platform, not bolted on after development is complete. A security-first approach means that the most secure path is also the easiest path for developers.

Supply Chain Security

  • Sign all container images using Sigstore or Notary
  • Verify image provenance before deployment using admission controllers
  • Pin dependencies in golden path templates and automate dependency updates
  • Generate SBOMs (Software Bill of Materials) for every artifact

Runtime Security

  • Enforce pod security standards using Kyverno or OPA Gatekeeper
  • Restrict container capabilities (no privileged containers, read-only root filesystem)
  • Implement network policies that default-deny and explicitly allow required communication
  • Enable runtime threat detection using Falco or cloud-native equivalents

Access Control

  • Implement least-privilege RBAC at every layer (Kubernetes, cloud provider, platform portal)
  • Use short-lived credentials rather than long-lived API keys
  • Enforce MFA for any action that touches production infrastructure
  • Audit all platform actions with tamper-proof logging

For a broader perspective on cloud native security tooling that integrates with IDPs, see our guide on cloud native DevOps with Kubernetes.


Tools Landscape in 2026

The IDP tooling ecosystem has matured significantly. Here is our assessment of the leading tools across each layer.

Developer Portals

ToolStrengthsConsiderations
Backstage (CNCF)Largest ecosystem, highly extensible, strong communitySteep learning curve, requires dedicated maintenance
PortIntuitive UI, fast setup, strong self-service actionsCommercial product, less customisable than Backstage
CortexScorecards for service maturity, strong service catalogueCommercial product, focused on service ownership
OpsLevelService ownership, maturity rubrics, integrationsCommercial product, focused on larger organisations

Infrastructure as Code

ToolBest For
TerraformMulti-cloud provisioning, mature ecosystem
CrossplaneKubernetes-native infrastructure, self-service resource provisioning
PulumiTeams that prefer general-purpose languages over HCL
AWS CDKAWS-only environments using TypeScript/Python

GitOps and Deployment

ToolBest For
ArgoCDKubernetes-native GitOps, multi-cluster deployments
FluxLightweight GitOps, strong Helm support
SpinnakerMulti-cloud deployment pipelines, advanced deployment strategies

Policy Engines

ToolBest For
OPA/GatekeeperGeneral-purpose policy across the stack
KyvernoKubernetes-native policies without learning Rego
CheckovIaC scanning in CI pipelines

Real-World Implementation: Mid-Size Company Example

To make this concrete, here is a simplified architecture for a mid-size organisation (100 engineers, 15 product teams, running on AWS with EKS).

Platform team: 4 engineers (2 senior, 2 mid-level) plus a product manager

Stack:

  • Portal: Backstage with custom plugins for environment provisioning and deployment
  • CI/CD: GitHub Actions for build and test, ArgoCD for Kubernetes deployment
  • Infrastructure: Terraform modules for shared infrastructure, Crossplane for team-provisioned resources (databases, queues, caches)
  • Observability: Prometheus, Grafana, Loki, Tempo (via Grafana Cloud)
  • Policy: Kyverno for Kubernetes policies, Checkov for Terraform scanning
  • Secrets: AWS Secrets Manager with External Secrets Operator

Golden paths delivered:

  1. REST API (Node.js/TypeScript) with Express, deployed to EKS
  2. Event-driven service (Python) consuming from SQS, deployed to EKS
  3. Scheduled data pipeline (Python) running as Kubernetes CronJobs

Timeline:

  • Weeks 1-3: Discovery, developer shadowing, pain point mapping
  • Weeks 4-10: MVP with service scaffolding, automated environment provisioning, basic CI/CD
  • Weeks 11-18: Production deployment workflows, RBAC, security scanning, observability templates
  • Weeks 19+: Ongoing adoption support, additional golden paths, database self-service

Results after 6 months:

  • Environment provisioning dropped from 3 days (ticket-based) to 12 minutes (self-service)
  • Deployment frequency increased from weekly to multiple times per day
  • New developer time-to-first-deploy decreased from 2 weeks to 2 hours
  • Developer NPS improved from -15 to +42

Getting Started: Your First Two Weeks

If you are beginning your IDP journey, here is what we recommend for the first two weeks:

Week 1:

  • Interview 8-10 developers across different teams about their daily friction points
  • Map the current deployment pipeline with timestamps at each stage
  • Identify the single biggest time sink that affects the most developers
  • Review your existing tooling for gaps and integration potential

Week 2:

  • Draft a platform vision document (one page maximum)
  • Define 3 measurable success criteria for the MVP
  • Set up a Backstage instance locally and explore its plugin ecosystem
  • Identify 2-3 early adopter teams willing to pilot the platform
  • Present the proposal to engineering leadership with a 90-day plan

The most successful platform initiatives start small, deliver value quickly, and expand based on evidence. Avoid the temptation to design the perfect architecture upfront. Ship something useful, measure the impact, and iterate.


Build Your Internal Developer Platform with Expert Guidance

Building an Internal Developer Platform is one of the highest-leverage investments an engineering organisation can make, but the path from concept to production-grade platform is filled with decisions that compound over time.

Our team provides comprehensive platform engineering services to help you:

  • Assess your current developer experience and identify the highest-impact improvements
  • Design IDP architecture tailored to your organisation’s size, tools, and cloud environment
  • Implement golden paths that encode your best practices and accelerate onboarding
  • Integrate security and policy enforcement without creating developer friction
  • Establish platform team practices including product management, feedback loops, and adoption metrics

We have guided over 50 teams through IDP implementations, from discovery through production scaling. Whether you are starting from scratch or rescuing a stalled platform initiative, we bring the patterns and experience to accelerate your journey.

Discuss your platform engineering goals with our team —>

Continue exploring these related topics

Chat with real humans
Chat on WhatsApp