Zero-Downtime Staged Rollout: Headless Front on Bitrix
Author: WebGoodPeople
One of the most common objections to a headless migration: "we can't take the site down for a week to ship a release." That's a fair objection if the release really requires downtime. The problem is in the framing. A good staged rollout doesn't require downtime at all. Not for a minute.
This article walks through our real process: 6 steps from "old front on Bitrix" to "new Next.js front on 100% of traffic," with no recorded downtime. Concrete technical decisions and a rollback point at every step.
Why "release = downtime" is an outdated model
The classic migration goes like this: ship the new thing, confirm it works, switch over. If something breaks, roll back to the old one.
This model has two large classes of problems:
- You only test on production traffic. Dev and staging don't reproduce real behavior. Users find the bugs, not testers.
- A rollback is just another release. The switch itself takes 10–30 minutes, and during that window some users hit errors.
A staged rollout is a model where the old and the new coexist at the same time, and you shift traffic by percentage. At any moment you can "roll back" by lowering the percentage going to the new front. That's instant and has no downtime.
Architecture: a parallel front over a shared backend
The core idea is to separate the front from the back through an API layer. The backend (Bitrix, 1C, legacy API) stays the master system. The front exists in two copies:
- Old (Bitrix PHP templates) takes part of the traffic
- New (Next.js) takes the other part
Both fronts call the same API layer. Content and data stay in sync automatically, because they live in one source.
You switch between them at three levels:
- DNS / load balancer: the percentage of traffic to each front
- Feature flag: specific pages or modules on the new front
- User segment: a test segment gets the new front 100% of the time
6 steps of a staged rollout
Step 1. An API layer between Bitrix and the front
We put an HTTP layer between Bitrix and any client. This layer is the master contract. Both fronts, old and new, call the same endpoints.
Technically: Next.js API routes or a standalone Node.js service that proxies requests to Bitrix over REST and caches the results.
[Browser] → [Next.js API / Node service] → [Bitrix REST] → [MySQL]
Time: 1–2 weeks to get started. You can cover 80% of the scenarios (catalog, product page, cart, search).
Step 2. A fully working Next.js front on staging
We build a full clone of the current site on Next.js. Every page. Every route. The whole cart. It runs on a staging domain.
Definition of done: you can complete the full user journey (category → product page → cart → checkout) without knowing you're on a new stack.
Time: 4–8 weeks. At this stage there's still no production traffic.
Step 3. Canary: 5% of traffic to the new front
We configure the load balancer (Nginx, HAProxy, or Cloudflare) so that 5% of users land on the new front. The other 95% stay on the old one.
How we pick "that 5%":
- Random sample by IP
- Not VIP users (they get annoyed and make noise)
- Not all the traffic from one country or region (if it fails, that region suffers)
The metrics we watch in real time:
- Latency p95 for each key endpoint
- Error rate (5xx and front-end JS errors via Sentry)
- Checkout conversion on the new front vs the old one
If any of these metrics degrades by more than 10% from baseline, there's an automatic rollback to 0% within 30 seconds.
Canary duration: 3–5 days. We also watch the weekend, where traffic behaves differently.
Step 4. Ramp-up: 5% → 20% → 50%
If the Canary stays clean, we raise the percentage. The sequence we recommend:
- Day 1: 5% (Canary) → 20%
- Day 3: 20% → 50%
- Day 7: 50% → 100%
Between each step there's a one-day buffer for analysis. We check:
- Statistical significance of the conversion difference (usually the new front is faster, so conversion is higher)
- Whether there are any "late" degradations that only show up at higher volume
Step 5. 100% of traffic, old front on standby
100% of traffic goes to the new front. The old front stays alive for another 2–4 weeks. Keeping it running costs pennies, and the value of a fast rollback if you need one is huge.
Rollback procedure: change the balancer config, and 100% of traffic returns to the old front. Time to execute: 2 minutes. We test this on staging every week until the old front is removed.
Step 6. Removing the old front
After 4 weeks of stable operation at 100%, we remove the old front. The code stays in git and in the rollback container image for another 6 months. We physically shut down the old front's servers.
Rollback points at every step
The key property of a staged rollout is that every step has a rollback with no downtime:
- Canary 5% → roll back to 0% (all traffic on the old front), 30 sec
- Ramp-up 20% → 5% (back to Canary), 30 sec
- Ramp-up 50% → 20%, 30 sec
- 100% → 50% or 0%, 2 min
- After the old front is removed → rebuild from the image in Kubernetes, 15 min
Technical details people usually forget
Data version in headers. Every API response carries an X-Data-Version header. If the data structure changes during the rollout, the old front may not be able to read the new format. The header lets the fronts agree on what they're handling.
Session preservation. A user shouldn't lose their cart in the middle of a rollout. Session storage lives in Redis and isn't tied to a front. Both fronts read from the same Redis.
CSRF and cookie domains. If the old front sets a cookie for example.com and the new one sets it for www.example.com, sessions get lost. Check this before the Canary.
SSO and auth. If you use SSO, JWT, or session cookies, the same token has to work on both fronts. Usually this needs a small adapter.
SEO: canonical tags. While the two fronts run in parallel, Google may index both versions. We recommend both fronts return the same rel=canonical pointing to the main domain. A temporary staging sitemap should be noindex only.
What this gives the business
- Zero downtime. Not a single minute of it.
- Controlled risk. You can roll back in seconds at any point.
- A statistically meaningful comparison. Old and new run side by side, so you can honestly measure the difference in conversion, speed, and errors.
- Pace you control. Want it faster? Speed up the ramp-up. Want it cautious? Hold on Canary longer.
Our average project, from "start the API layer" to "100% on the new front with the old one gone," takes 12–16 weeks. Of that, only 3 weeks need active engineering work on the front. The rest goes to Canary, ramp-up, and stabilization.
What to do right now
If you're planning a migration and you hear "this has to be done in one big release," show them this article. It doesn't. A staged rollout is harder on the infrastructure side, but for zero downtime it's worth it.
If you're not sure whether your stack is ready, book a 48-hour audit. We look at whether there's an API layer, how the session is built, and whether a balancer can sit in front. In 2 days we return a list: "ready" / "needs prep" / "blocker."