Lipa na M-Pesa Integration

Safaricom dictates the contract. You build around it. Event-driven by design.

The integration problem

Most payment providers adapt to your API. M-Pesa C2B doesn't. Safaricom pushes two callbacks: a validation request before funds move, then a confirmation once the money is collected. That second call is essentially a notification: the funds are already gone. There's no retry, no second chance.

That's why the integration needed its own dedicated gateway, built for resilience from the ground up. Not bolted onto the billing system, but sitting in front of it, absorbing Safaricom's contract and translating it into something the rest of the platform could handle reliably.

This integration is inspired by the real-world C2B Paybill pattern M-Pesa api specifically.

Architecture: design first, then build

A full technical assessment was produced before writing a line of code, TOGAF-inspired in structure, covering baseline architecture, target topology, gap analysis, ADRs, NFRs, security model, resilience strategy, and testing approach. The document went through 7 versions across 6 weeks, including a major architectural pivot from Spring Events to RabbitMQ and the Outbox Pattern.

The document was also written to serve as context for AI-assisted development. Without sufficient specification, AI tools interpret gaps freely rather than following the intended design decisions. The level of detail here was deliberate. Development took only 3 weeks: AI-assisted coding guided by the specification, and the Utility Account System used as a proven architectural template.

The load and soak tests validated the architecture in production. The post-implementation assessment captures deviations from the original design and lessons learned.

2.8M Confirmation events processed
0 Failures or lost events
725/s Peak ingest throughput
121ms p95 end-to-end via Cloudflare

System topology

M-Pesa C2B deployment topology

The integration adds two new microservices, a message broker, and a dedicated event store to the existing stack, each with a clear ownership boundary:

M-Pesa Gateway

Dedicated microservice that terminates the Safaricom C2B integration. Receives validation and confirmation callbacks, persists events and outbox entries to a dedicated MongoDB instance, and publishes provisioning requests to RabbitMQ. Returns HTTP 200 to Safaricom immediately after durable persistence, before downstream processing begins. Does not perform billing logic.

Provisioning Service

Stateless microservice that consumes provisioning requests from RabbitMQ, submits payment deposits to the Utility Account Service via its existing API key auth, and publishes results back. Built for resilience: circuit breaker, retry with backoff, and timeout protection on all UA Service calls. Fully reusable across future payment gateway integrations.

Utility Account Service

The existing billing system. Receives payment deposits from the Provisioning Service exactly as it would from any other provider. Idempotency via TransID (Safaricom's unique transaction identifier) prevents duplicate postings on retry.

RabbitMQ

Decouples ingest from processing. Two queues with dedicated DLQs: provisioning.queue for payment execution requests and results.queue for processing outcomes. Dead-letter exchange routes exhausted messages to DLQs after retries, for operator visibility rather than silent failure.

135 tests across both services covering unit, integration, end-to-end pipeline, HTTP client behaviour, retry count, circuit breaker, and manual acknowledgement scenarios.

Stack

Java 21 Spring Boot 3.5 Spring AMQP Spring RestClient Apache HttpClient 5 RabbitMQ MongoDB MapStruct Resilience4j SpringDoc OpenAPI Testcontainers WireMock JUnit 5 Mockito Docker TOGAF 9

Infrastructure

Deployed on the same Dell Latitude E7250 home lab as the Utility Account Service, running Ubuntu 24. GitHub Actions builds Docker images, publishes to GHCR, and deploys via Docker Compose on a self-hosted runner. Cloudflare Tunnel exposes the M-Pesa Gateway publicly at mpesa.oualidg.dev. Grafana, Prometheus, and Loki cover observability across all services.

GitHub Actions GHCR Docker Compose Nginx Cloudflare Tunnel Grafana Prometheus Loki

Observability

All services ship metrics and logs to Grafana Alloy, which forwards to Prometheus for metrics, Loki for log aggregation, and Alertmanager for alert routing. Grafana provides unified dashboards, log search, and alert visibility across the entire platform.

Observability stack diagram

Key design decisions

Outbox Pattern for reliable event publishing

The classic dual-write problem: writing to MongoDB and publishing to RabbitMQ in two separate operations means either can fail independently, leaving the system in an inconsistent state. The Outbox Pattern solves this by writing the MpesaEvent and the OutboxEntry atomically in a single MongoDB transaction. A separate OutboxProcessor polls and publishes to RabbitMQ using a lease-based state machine (PENDING, PROCESSING, SENT, FAILED) to guarantee exactly-once publish semantics even under JVM crash or concurrent processor scenarios. If RabbitMQ is unavailable at callback time, the event is captured durably and published on recovery.

Post-check before retry

Before retrying a failed payment call, the Provisioning Service checks whether the previous attempt actually posted by querying the confirmation endpoint. If a receipt exists, it's returned immediately without re-posting, avoiding unnecessary retry attempts when the UA Service already processed the request.

Two-call flow: validate then confirm

Validation arrives before funds move and requires a response within 8 seconds. This path is handled synchronously with circuit breaker protection. The confirmation arrives once funds are collected and is acknowledged immediately, with processing completing through the async pipeline. Both are Safaricom-initiated callbacks on their schedule.

End-to-end MDC tracing

Every request carries a correlationId and the Safaricom TransID through all four services via SLF4J MDC. A single payment can be traced end-to-end across Nginx, M-Pesa Gateway, RabbitMQ, Provisioning Service, and UA Service logs without any additional tooling.

Try the API

The M-Pesa Gateway Swagger UI is live. You can simulate a full C2B callback flow directly from the browser.

1. Open Swagger UI

No login required. The callback token is pre-filled on all endpoints.

2. Simulate a validation callback

Call POST /api/v1/validation. Use this sample payload with a valid account number or customer ID as BillRefNumber. Grab one from the Utility Account UI (log in with admin / admin).



{
  "TransactionType": "Pay Bill",
  "TransID": "NLJ7RT61SV001",
  "TransTime": "20250607120000",
  "TransAmount": 1500,
  "BusinessShortCode": "600123",
  "BillRefNumber": "YOUR_ACCOUNT_OR_CUSTOMER_NUMBER",
  "InvoiceNumber": "",
  "OrgAccountBalance": 0,
  "ThirdPartyTransID": "",
  "MSISDN": "254712345678",
  "FirstName": "John",
  "MiddleName": "",
  "LastName": "Doe"
}

            

3. Simulate a confirmation callback

Call POST /api/v1/confirmation with the same BillRefNumber, BusinessShortCode, and a unique TransID. The gateway persists the event to MongoDB and publishes it to the provisioning pipeline.

4. Verify the payment

Open Utility Account Swagger and select Admin API from the Select a definition dropdown (top right). Under the Authentication section, call POST /api/auth/login with admin / admin (the X-Auth-Mode: bearer header is pre-filled). Copy the accessToken from the response, then click Authorize (lock icon, top right) and paste it as a Bearer token. Under the Reports section, call Search payments with your customer ID. The most recent entry should show your TransID as paymentReference and MPESA as the provider.

5. Trace the payment in Grafana

Copy the x-correlation-id from the confirmation response headers in Swagger. Open Grafana (log in with admin / admin), go to Explore, select Loki as the datasource, and run:

{service=~"mpesa-gateway|provisioning-service|utility-account"} |= "YOUR_CORRELATION_ID"

Enable Unique labels to see which service each log line came from. Set Display results to Oldest first to follow the payment journey in sequence -- Gateway ingestion, RabbitMQ publish, Provisioning processing, UA Service deposit, and final state transition.