Q19 of 22 · Scenarios
How would you test a feature that integrates a third-party service that's often down?
ScenariosSeniorscenariothird-partyintegrationresiliencecircuit-breakersenior
Short answer
Short answer: Clarify the expected behavior on third-party unavailability, whether a sandbox or mock exists, and what observability is in place. Then test the happy path, all failure modes (5xx, timeout, circuit breaker), fallback, and recovery.
Detail
Clarify first
- What is the expected behavior when the third party is unavailable — fail fast, fallback to cached data, or queue and retry?
- Is there a sandbox or mock endpoint provided by the third party?
- Does the integration have a circuit breaker or timeout configured, and what are the values?
- What alerting or observability is in place for third-party failures?
Functional (when healthy)
- End-to-end integration works correctly when the third party responds normally
- Authentication to the third party works, including token rotation and key refresh
- Data flows correctly in both directions; field mapping is accurate
- Webhooks or callbacks from the third party are received and processed correctly
Failure modes
- Third party returns 5xx → does the application degrade gracefully or crash? Is the user shown a meaningful message?
- Third party times out → is there a configured timeout, or does the request hang indefinitely?
- Circuit breaker: after N consecutive failures, does the circuit open and stop hammering the third party?
- Third party returns an unexpected error code or malformed response → application handles it safely
Fallback & queue
- If a fallback exists (cached data, queued retry), does it activate when the third party fails?
- Is the user given a meaningful message instead of a generic error?
- Are queued messages processed in the correct order when the third party recovers?
- No double-processing: if a queued event is retried, is it idempotent?
Recovery
- When the third party becomes available again, does the integration resume automatically?
- Does the circuit breaker close after the third party stabilises (half-open probe requests)?
Observability
- Are third-party errors distinguishable from first-party errors in logs and APM?
- Does a third-party outage trigger an alert before users are affected?
Close: automate using WireMock or similar to simulate 5xx, timeout, and malformed response scenarios. Keep manual for the full observability check — verify that alerts fire and dashboards show the right signals during a simulated outage.
// WHAT INTERVIEWERS LOOK FOR
Circuit breaker testing, idempotency in the retry queue, and observability verification. Using a mock or WireMock to simulate outages (not waiting for the real third party to go down) is the practical insight.
// COMMON PITFALL
Only testing the happy path because 'the third party is down right now.' The answer should be: simulate failure with a mock; never depend on the real service being down to test error handling.