paddle.engineering [Engineering]

Go microservices at Paddle: Consistency, without killing innovation

Learn how Paddle transitioned from microservice chaos to engineering consistency by making it easier to build the right thing. We built toolkits that teams actually want to use, freeing up engineers to focus on solving customer problems.

This blog post is based on a talk that George gave at GopherCon 2024. You can watch the full talk on YouTube.

About eighteen months ago, we rolled out OpenTelemetry across all of our services. It took a couple of weeks.

Four years ago, that would have taken months, if it were even possible at all.

Back then, every team at Paddle had complete freedom. Engineers could choose their own deployment platform, pick their own database, and build services however they wanted. We believed maximum independence would unlock maximum productivity.

Within 18 months, unlimited choice had become unlimited complexity. We had a multitude of deployment platforms, services only one person understood, and APIs so fragmented that customers saw different data depending which endpoint they hit. Worse: it was costing us new business.

This is the story of how we found a middle ground between independence and control by building a flexible framework that includes everything you need to build a service, but also gives you the freedom to choose your own tools and patterns when needed.

Today, engineers can roll out changes across all services quickly, move between teams and contribute across services, and ultimately focus on solving interesting problems for our customers.

Maximum independence

Paddle was initially built as a complex monolith, but it had become a bottleneck for every team and we’d started moving towards a microservices architecture.

This transition coincided with the shift to remote work in 2020, so we embraced maximum team independence. No more waiting for central approval, no more architectural committees slowing things down. Every team could choose their own tools, deployment methods, and architectural patterns.

And it worked, at first. Teams moved fast, adopted new technologies without red tape, and shipped features quickly. The lack of central constraints felt liberating.

But the problems started emerging as we scaled. Within 18 months, we had:

  • Code organized around teams, rather than problem areas Services were organized around team ownership rather than domain. This led to duplicated functionality and gaps where no team owned critical functionality.
  • A tech stack explosion
    ECS, Fargate, Lambda, Elastic Beanstalk, Heroku — you name it, we used it. All with completely different build, test, deployment, and monitoring workflows.
  • Inconsistent Go implementations
    Some teams used REST, others gRPC; some teams used PostgreSQL, others MySQL. Each service had completely different internal layouts.
  • Massive knowledge silos
    Most services were only understood by one or two people, introducing a “lottery factor” and making cross-engineering collaboration impossible.
  • Endless bikeshedding
    Teams spent more time debating which library to use instead of solving customer problems.
  • Security concerns
    We had fragmented approaches to authentication and authorization.
  • Complexity for on-call engineers
    Inconsistent logging and service design made it hard for engineers outside of the owning team to support during issues.

Most importantly, our external APIs were drifting apart. For example, our invoicing tool had a different customer database than our subscription tool. This meant that you couldn’t create an invoice for a customer that had a subscription — you’d need to create a (duplicate) customer record using another endpoint first.

While Paddle solved core business problems, our fragmented systems and poor developer experience were costing us deals. This wasn’t just an engineering problem, but a business problem too.

The cultural shift

We wanted teams to spend their energy on solving novel business problems, and not reinventing technical solutions. This meant standardizing our approach, not just to how we design public APIs but also how we manage our internal architecture.

Beyond the technical decisions, this was a cultural shift. The idea of having a central team settle on a standard way of doing things was alien to our culture. We were scared of upsetting engineers. There was real conflict: some people wanted central standards, others absolutely didn’t.

Early attempts at style guides and service frameworks didn’t work. We learned that while it’s easy to push something from the top down, you need to back it up with real support — tooling, training, and ongoing help to get buy-in from teams.

We knew that mandating consistency would breed resentment, so instead we decided to try a “golden path” approach. We were going to make it easier to do the right thing rather than the wrong thing.

In 2021, we formed the AppEx team, a platform group focused on application excellence. AppEx’s mission wasn’t to tell teams how to build services, but to provide building blocks so good that teams would choose to use them.

The ultimate goal for AppEx was to create a set of shared libraries for common technical use cases. But, that was a long way off. Instead, the team started piecemeal by working with teams to understand the biggest problems they were facing, then solving those problems centrally.

A toolkit, not a framework

We started small, focusing on problems every team was solving independently.

Authentication and authorization

  • go-scope
    Central repository of scopes in use. Defines a consistent format for access token scopes.

  • go-auth
    Middleware to validate and parse access tokens. Includes methods for implementing permission checks based on entities and actions allowed via the requester’s scopes.

Example

This code shows how to use the go-auth library to implement authentication and authorization in a service.

First, it defines a User entity that implements the Authorisable interface, and provides a method to tell the auth library what type of entity this is.

user.go
// Define your entity that implements the Authorisable interface
type User struct {
ID string `json:"id"`
FirstName string `json:"first_name"`
LastName string `json:"last_name"`
Password string `json:"password"`
Email string `json:"email"`
}
// GetEntityType tells the auth library what type of entity this is
func (u *User) GetEntityType() string {
return "user"
}

Once an entity is defined, it can be used with the go-auth library in HTTP handlers to check permissions.

handlers.go
// Usage example - checking permissions in your handlers
router.Use(auth.Middleware)
func GetUser(ctx context.Context, req *GetUserRequest) (*User, error) {
user := &User{ID: req.UserID}
// Check if the current requester can read this user
if !authoriser.For(ctx, user).Can(authoriser.ActionRead) {
return nil, apierr.Forbidden("insufficient permissions")
}
// Check permissions for sensitive fields like password
if authoriser.For(ctx, user).IncludeFields("password").Can(authoriser.ActionRead) {
// Include password in response
}
return user, nil
}

Pagination

  • go-paginator
    Client and server library to help implement API and SQL pagination.

Example (server)

This code shows an example of a server-side handler function that fetches items from a database and returns them in a paginated response.

handlers.go
// GetItems is an example function, returning a set of `Item`.
func (r *repo) GetItems(ctx context.Context, req *GetItemsRequest) (*paginator.CursorResponse[Item], int, bool, error) {
p, err := paginator.New[Item](
r.db,
sqlmin.PostgresBuilder().Select("*").From("item"),
req.OrderBy,
paginator.WithPerPage(req.PerPage),
)
items, estimatedCount, hasMore, err := p.GetResultAndCount(ctx, req.After)
return items, estimatedCount, hasMore, err
}

API conformance

  • go-handler
    Boilerplate for writing spec-compliant HTTP handlers as simple Go functions.

  • go-caller
    Extract and validate caller-identity from requests.

  • go-paddleid
    Generate and validate paddle’s k-sortable resource-prefixed identifiers.

  • go-apierr
    Generates spec-compliant error responses + defined API error values.

Example

This code demonstrates how go-handler creates a type-safe HTTP handler that uses struct tags to automatically decode request data from the URL path, query string, and JSON body.

handlers.go
// The Request struct defines all expected inputs for the handler.
// go-handler automatically populates this struct from the incoming HTTP request.
type Request struct {
// The `in` tag decodes data from URL path variables or query parameters.
// This field will be populated from a path param named "path-id" OR a query param named "id".
ID paddleid.ID `in:"path-id,query=id"`
// The standard `json` tag is used to decode data from the JSON request body.
Name string `json:"name"`
}
// The Response struct defines the shape of the JSON response body.
type Response struct {
Output string `json:"output"`
}
// Echo is a type-safe handler. Instead of `http.Request` and `http.ResponseWriter`,
// it uses our typed `Request` and `Response` structs.
func Echo(ctx context.Context, req *Request) (*Response, error) {
// By the time this function executes, `req` has already been populated and validated.
// There is no need for manual JSON unmarshaling or reading query params.
// You can use other packages to set response details, like the status code.
status.Created(ctx) // Sets the HTTP response status to 201 Created.
// Simply return the populated response struct.
// It will be marshaled to JSON and sent to the client automatically.
return &Response{
Output: req.Name,
}, nil
}
// `standard.Wrap` converts our type-safe `Echo` handler into a standard http.Handler,
// making it compatible with any standard Go router like chi, http.Mux, etc.
mux.Get("/echo", standard.Wrap(stack, Echo))

Service communication

  • go-apiclient
    Boilerplate for creating service client libraries with versioning, error propagation, auth and telemetry hooks.

  • go-event
    Transactionally-safe event publishing, with schema versioning and validation.

  • queue-relay
    Sidecar application which delivers queue messages from SQS to HTTP endpoints.

  • outboxer-relay
    Sidecar application which publishes messages from an outbox table to EventBridge.

Generic service boilerplate

  • go-settingstore
    Load service configuration (env vars, secrets) to a Go struct.

  • go-validator
    Boilerplate to validate JSON request payloads against JSONSchema.

  • go-sqlmin
    Postgres connection management and transaction orchestration (uses sqlx).

  • go-bdd
    Streamlines implementing service-level BDD tests, using godog.

Example (feature file)

We use BDD style tests to make sure our APIs are consistent. This lets product owners and engineers see how each service’s public interface looks at a glance.

This code is a BDD feature file written in Gherkin syntax, defining a test scenario for handling an invalid ID in an API request.

payments.feature
Scenario: return a bad_request error when customer_id is invalid
Given I am authenticated with scopes seller.customer-payment-method.read and seller 123
When I make a GET call to /2024-01-25/customers/invalid/payment-methods/paymtd_01gjnkc58ypmhh6f1y97jr8spr
Then I should receive a 400 JSON response
And the response body should be
"""json
{
"error": {
"type": "request_error",
"code": "bad_request",
"detail": "Invalid request",
"errors": [
{
"field": "customer_id",
"message": "invalid input"
}
]
},
"meta": {
"request_id": "{{ CORRELATION_ID }}"
}
}
"""

Example (test file)

This Go code sets up the BDD test runner, initializing mock services, database connections, and other extensions needed to execute the feature files against the application.

bdd_test.go
// 1. Configure the test runner
tr := bdd.NewTestRunner(
// The first argument is a function that starts the actual application server in-process. It's configured with the test settings and mock services instead of real ones.
func(ctx context.Context, serverSettings *bdd.ServerSettings) error {
setting := getSettings(serverSettings, dbSettings)
// This call starts the server, which the test runner will make HTTP requests against.
serve.Serve(ctx, setting)
return nil
},
// The second argument registers all the extensions defined above. The runner uses these to manage state, fixtures, databases, and mocks.
bdd.WithExtensions(sfExt, dbExt, mockExt, ext),
)
// 2. Execute the tests
// Finally, this line kicks off the process, running all `.feature` files against the in-process server and reporting the results.
tr.RunTests(t)

Other niceties

  • go-featureflag
    Consistent Paddle-specific targeting contexts for LaunchDarkly.

  • go-relish
    Step-Definitions for godog (BDD) using code comments.

  • go-testament
    Write Go tests against embedded Postgres.

  • go-telemetry
    go-handler and go-apiclient hooks for OpenTelemetry.

From libraries to service templates

Individual libraries were great, but services were still very different. We realized we weren’t really solving the main problem. We were giving teams better building blocks, but they still had to figure out how to put them together.

By this point, the appetite for consistency was there. People saw the value in the libraries we’d created, and we could see adoption growing. So we took things one step further by creating a GitHub repo template called example-service. It’s a complete production-ready HTTP service that teams can clone and modify.

This is how the repo looks:

  • cmd/
    • migrations/
      • migrator.go
    • serve/
      • serve.go
    • main.go
  • database/
    • migrations/
  • deployment/
    • api/
    • relay/
  • features/
    • create_user.feature
  • internal/
    • handler/
    • repository/
    • service/
    • settings/
  • pkg/
    • client/
      • mocks/
      • client.go
      • client_test.go
  • schema/
    • main.go
  • .dockerignore
  • .editorconfig
  • .gitignore
  • .mockery.yaml
  • Dockerfile
  • README.md
  • bdd_test.go
  • build-config.yaml
  • catalog-info.yaml
  • file-sync.yml
  • go.mod
  • go.sum
  • mkdocs.yml
  • tools.go
  • user.go
  • user_test.go

It’s not just a skeleton, but a fully working service you can run right away. It includes:

  • Working endpoints you can hit immediately.
  • Example BDD tests.
  • All the libraries wired together correctly.
  • Standardized project structure.
  • Client package exports.
  • Shared types for other services to import.
  • Built-in database migration patterns.

We used this as a vehicle for “shoehorning” more best practices into services beyond just the libraries. It was our way to drive consistency at the service level, not just the component level.

Case study: currency service

Here’s where it gets personal: my team was the first to use the new template, and I was nervous. For context, I head up payments, and historically we’ve been deep in gRPC-land.

We needed to build a new REST service called currency-service for the launch of Paddle Billing. We faced a choice: spend time rebuilding our own patterns (not our core competency), or trust this new golden path approach completely.

We decided to go all-in on the new way of working. “Give us everything,” we told the AppEx team. We used the service template, adopted as many libraries as possible, and treated it as a test: how much could we offload to this new way of working?

Can we migrate everything to this?

I wasn’t sure how my engineers would react since they were really bought into gRPC, but the feedback was overwhelmingly positive. In fact, engineers started asking, “can we migrate everything to this?”

While gRPC does a lot of things well, we’d had relative underinvestment in our gRPC toolkit and workflow. Having a cohesive, well thought-out, and actively maintained template made us more productive than we’d been before.

It proved that consistency is more valuable than the specific technology choices. The cost of maintaining and investing across a multitude of approaches is high. Consistency lets our teams focus on delivering more value overall.

Towards a flexible toolkit

The “lots of small libraries” approach was working well. We had convergence in how people were building services, and the feedback was largely positive.

With momentum growing, we consolidated common patterns into go-httpserver: a standardized way to build HTTP services that still gave teams flexibility for their unique needs.

Rather than a full-on framework, we built a toolkit that encourages standardization with escape hatches, not rigid constraints. Teams are still free to choose their own components when needed. If your service needs OpenSearch or a vector database, you’re free to swap out Postgres, but you’d still get to keep the standard approach for HTTP handling, authentication, and testing.

Example

This code shows the main entry point for an application, where an HTTP server is configured with optional settings, a route is registered, and the server is started to listen for requests.

main.go
// Serve configures and starts the HTTP server for the application.
func Serve(ctx context.Context) {
// Initialize a new server using the functional options pattern.
// Each `With...` function configures a specific aspect of the server.
server := httpserver.New(
httpserver.WithLogger(logger),
// Each of these are optional:
httpserver.WithTelemetry(telem),
httpserver.WithLegacyRoutingEnabled(),
// WithShutdownFunc registers a function for cleanup during a graceful shutdown.
// This is the ideal place to close database connections or other resources.
httpserver.WithShutdownFunc(func() {
// db.Close()
}),
)
// Register a POST route for the /example endpoint.
server.Post("2024-01-12", "/example", wrap.Handler(Example))
// Start runs the server. This is a blocking call that listens for requests
// until the provided context is canceled.
server.Start(ctx)
}

Measuring success

Case study: OpenTelemetry

In 2024, we rolled out OpenTelemetry across all services. Previously, this might have taken months as we coordinated with each team to get them to adopt the new standard.

With the new toolkit, it took a couple of weeks.

This would have been inconceivable if all the services had wildly different approaches. Even though Go has a great standard library for middleware, the consistency made that central investment incredibly straightforward.

OpenTelemetry adoption
Adoption of OpenTelemetry across microservices at Paddle
We introduced OpenTelemetry in weeks, reaching 100% adoption in about three months.

What the numbers don’t show

But the biggest shifts can’t be measured on any dashboard:

  • Engineering debates shifted from “which library?” to “how can we solve this problem for our customers?”
  • New hires onboard faster.
  • Engineers can move between teams, work on other services, and hit the ground running.
  • Teams can contribute to each other’s services.
  • We can roll out changes across all systems fairly easily.

Costs and challenges

Going all-in on consistency isn’t without its challenges, and it isn’t free. Here are some of the challenges we faced:

  • Central decision overhead
    When you make decisions about how services should work across all teams, those decisions take longer. But, you’ll save countless hours in small team debates later.
  • Dependency management hell
    In the early days, we had to chase people to make sure they were using libraries and applying updates.
  • Magnified impact of bugs
    When a shared library has a bug, it affects multiple services at once.
  • Breaking changes
    We try to avoid breaking changes, but when they happen, our AppEx team often has to upgrade services themselves.
  • Investment required
    It’s going to cost you more at first, as you begin the journey towards a “golden path.” We have 2-4 people on platform engineering out of about 90 total in engineering. It makes sense for us at our scale, but it might not for you.
  • The standards paradox
    We came up with a new standard to make everything consistent, which is in itself introducing a new standard and way of working.

The real question isn’t whether there are costs, because there definitely are. The question is about whether the benefits outweigh the costs at your scale.

For us at Paddle, as a rapidly growing scale-up, the benefits outweigh the costs. It helps us ship faster and build a solid engineering culture.

For startups, open-source or commercial toolkits might be a good solution for teams looking to get started with consistency.

Takeaways

After going through this journey — the chaos, the cultural changes, and eventually the wins — here are some hard-won insights that I’d share with other teams thinking about following us down this path:

  • Make service creation boring
    Focus engineering creativity on problems unique to your product, not on reinventing HTTP handlers.
  • Work top-down and bottom-up
    You need standards if you want a great developer experience, but you also have to make it easy for people to follow them.
  • Listen and iterate
    Platform teams must be true enablers, not gatekeepers. Accept feedback and adjust as needed.
  • Start by looking at the decisions teams are making
    Focus on common problems and standardize where services talk to each other. That’s where you’ll see the biggest impact.
  • Quality is essential
    Teams won’t adopt mediocre libraries. Better to have fewer, excellent building blocks than many poor ones. Lean on open-source here.
  • People > practice
    No toolkit is going to solve your engineering problems. Great teams of great people are always more valuable than a process.

Result: an engineering culture that scales

Today, building new services at Paddle is boring, so that teams can get on with the interesting stuff. Teams spend their time solving customer problems, not debating deployment strategies. And that consistency directly improves our external developer experience.

Not only are we able to ship high-quality new services faster, we’re able to maintain our competitive edge by pivoting quickly to respond to industry trends and evolving customer needs.

A real test for our new standards came when we reshuffled our billing teams recently. What could have been a chaotic process was incredibly smooth. Because every service followed the same patterns, engineers could switch teams and start contributing immediately. It proved that consistency is a direct catalyst for organizational agility.

Daniel Fosbery, Senior Director of Engineering at Paddle

The “golden path” works because we made standardization optional but irresistible. Instead of constraining creativity, we give our engineers the tools to build at scale.

The result is measurably faster delivery cycles, lower maintenance overhead, and the agility to solve problems for our customers that don’t even exist yet.

// About the author
George Wilson profile photo

About George Wilson

George is Head of Engineering for the Payments Group at Paddle, building and supporting teams that own checkout, payments, and settlement systems.

[We're hiring]

Build the stack that simplifies their stack

Join us and help solve the biggest product and engineering challenges in fintech. Make a difference to thousands of SaaS, app, and AI companies on their journey to global growth.

Find your role