Problem statement
A platform team owns a large commerce monorepo with:
- 140 backend services
- 30 shared libraries
- 4 frontend applications
- 2 API gateway layers
- a mixture of REST, gRPC, and asynchronous events
They want to modernize the platform by:
- standardizing auth middleware
- replacing a legacy event envelope
- migrating shared DTOs to a versioned contract package
- decomposing one high-risk billing package into smaller modules
The codebase is too large for humans to reason about quickly, but the migration is too risky to “YOLO” through an AI-generated change list.
That makes it a great Gemini CLI case study.
Requirements
Functional requirements
- identify every service still using the legacy event envelope
- find all call paths that depend on the old auth middleware
- detect shared DTO usage that will break if the contract package changes
- produce a phased modernization plan
Non-functional requirements
- no breaking API changes during the first rollout wave
- rollback must be possible within one deploy window
- platform team must separate certain findings from hypotheses
- every recommendation must link back to file-level evidence
APIs
The migration touches three interface categories:
- Gateway APIs using the old auth middleware
- Inter-service contracts defined in generated types and protobufs
- Event envelopes published to Kafka topics consumed by downstream workers
The tricky part is that the migration is not limited to one protocol. The same logical entity appears in HTTP handlers, protobuf definitions, event payloads, and frontend clients.
That cross-cutting reality is exactly where long-context reasoning helps.
High-level design
The team sets up the workflow in three passes.
Pass 1: repository context pack
Gemini ingests:
- the service map
- auth middleware packages
- event envelope types
- shared DTO definitions
- rollout runbooks
Pass 2: audit blueprints
The team runs named audits:
- auth consistency audit
- event envelope migration audit
- contract drift audit
- rollback readiness audit
Pass 3: human-gated execution
Gemini proposes:
- dependency tables
- risk-ranked services
- rollout waves
- rollback constraints
Humans still decide the actual implementation sequence.
Diagram
flowchart TD
A["Monorepo Context Pack"] --> B["Gemini CLI"]
C["Audit Blueprints"] --> B
D["Migration Goal"] --> B
B --> E["Dependency Map"]
B --> F["Risk-Ranked Rollout Waves"]
B --> G["Rollback Constraints"]
E --> H["Human Review"]
F --> H
G --> H
H --> I["Phased Execution"]
Low-level design
At the implementation layer, the team structures Gemini outputs around precise tables:
Table 1: service dependency map
- service name
- legacy auth usage
- legacy event usage
- shared DTO imports
- deployment criticality
Table 2: migration wave assignment
- wave number
- candidate services
- dependency blockers
- rollout owner
- rollback owner
Table 3: unsafe assumptions
- assumption
- evidence present or missing
- whether to confirm by test, log, or runtime trace
This low-level output design matters. Without it, the model generates prose. With it, the model generates work products.
The Gemini prompts that worked
The team starts with:
Load the shared contracts, auth middleware, event envelope definitions,
and the billing-adjacent services.
Identify all services that still depend on the legacy event envelope or the
old auth middleware. Return:
- service
- exact files
- migration blocker
- suspected blast radius
Then they follow with:
Now assume rollback must be possible within one deploy window and that we
cannot dual-write forever.
Re-group the services into rollout waves and identify which wave creates the
highest operational coupling.
That second prompt is what turns a static dependency audit into a rollout plan.
Scaling challenges
1. False confidence from code proximity
A service may import a shared DTO but not exercise the risky field path in production. Gemini can identify likely coupling, but runtime traces still matter.
2. Generated code noise
Generated clients and generated schemas can dominate the context. The team had to prioritize source definitions and treat generated output as evidence, not source of truth.
3. Monorepo boundary ambiguity
Some packages looked shared but were effectively dead. Without human review, Gemini would overstate their importance.
4. Rollout ownership
The technical graph is only half the story. Real migration sequencing also depends on which teams own which services and how fast they can validate changes.
Trade-offs
Trade-off 1: speed vs certainty
Gemini dramatically accelerated the dependency map, but humans still needed to verify the highest-risk edges before rollout.
Trade-off 2: breadth vs signal
Loading more services increased coverage, but too much low-value generated code made the answers noisier. The winning move was a curated context pack.
Trade-off 3: prose vs operational artifacts
Narrative summaries were easier to read, but structured tables were easier to execute. The team chose operational artifacts.
Trade-off 4: centralized analysis vs local expertise
Gemini could see the whole graph, but only local service owners knew which “theoretical dependency” was actually business critical. The best workflow combined both.
Failure scenarios
Failure scenario 1: an auth middleware edge is missed
Mitigation:
- audit gateway routes and service middleware separately
- require evidence links for every “safe” claim
- verify a sample of negative findings manually
Failure scenario 2: a rollback path depends on an untracked consumer
Mitigation:
- run the rollback-readiness audit as a separate pass
- compare code findings against topic consumer inventories
- insist on shadow validation for high-risk waves
Failure scenario 3: the model over-prioritizes dead packages
Mitigation:
- include ownership metadata
- include recent deployment history
- include service health and traffic relevance when grouping rollout waves
Outcome
Gemini did not “perform the migration.” That would be the wrong goal.
What it did do:
- cut the dependency-mapping phase from several days to a few hours
- surface hidden cross-service couplings early
- produce a risk-ranked rollout plan faster than manual review alone
- improve the quality of human design discussions by grounding them in repo-wide evidence
That is the premium use case: better engineering judgment, faster.
Key takeaways
- Long context is strongest when the migration crosses protocols, packages, and teams.
- The best outputs were tables and phased rollout plans, not free-form summaries.
- Gemini accelerated the architecture reasoning layer, but rollout safety still depended on explicit human review.