News and updates from Maple

Case Study: What happens when a critical system goes down mid-trading day? image

Case Study: What happens when a critical system goes down mid-trading day?

A London-based financial services firm approached Maple after a near miss during a normal trading morning, when a core internal system suddenly became unavailable without warning. There was no cyber incident or external trigger, just unexpected downtime in a platform the business relied on heavily.

The immediate impact

Even though the outage was short-lived, it exposed how fragile the operational response was when something actually broke.

Key issues during the incident included:

  • No clear fallback process for staff to follow in real time
  • Uncertainty around who should be escalated to for decisions
  • Critical workflows being paused while teams waited for updates
  • Time lost as users attempted to manually work around the issue

The system was eventually restored quickly, but the disruption highlighted a bigger underlying problem: there was no tested recovery approach aligned to how the business actually worked day to day.

What we found

When we reviewed their setup in more detail, the issue was not a lack of documentation, but a lack of practical readiness and clarity around execution.

Key findings included:

  • Recovery plans existed but had never been tested in realistic conditions
  • There was no defined prioritisation of systems during an outage
  • A small number of core platforms supported multiple critical workflows without clear separation or fallback options
  • Incident ownership and decision-making responsibilities were not clearly defined, which slowed response times during disruption

Individually, these gaps might not seem significant, but together they created uncertainty at the exact moment clarity mattered most.

What we changed

We worked with the firm to rebuild their resilience approach so it reflected how the business actually operates, rather than how it was originally documented.

Key improvements included:

  • Defining critical systems and establishing clear recovery priorities based on business impact
  • Creating clear incident ownership so every stage of an outage had a named decision-maker
  • Mapping system dependencies to understand what truly needed restoring first
  • Introducing practical recovery scenarios that could be tested and refined over time
  • Aligning IT recovery processes with trading and operational workflows rather than treating them separately

The focus was on making recovery something that could be executed under pressure, not just referenced in documentation.

The result

The most important outcome was not technical performance, but operational confidence. The firm now has a clear, structured approach to handling disruption that reduces hesitation and improves decision-making when incidents occur.

Key outcomes included:

  • A defined understanding of which systems are most critical and in what order they should be restored
  • Clear roles and responsibilities during incidents, reducing confusion under pressure
  • A tested recovery process that reflects real-world operational conditions
  • Faster, more coordinated responses when issues arise, with less downtime spent interpreting procedures

Where Maple fits in

At Maple, we help financial services firms design IT environments that are not only stable in normal conditions but also resilient when things go wrong. That means focusing on how systems behave under pressure, not just how they look on paper.

We typically support firms with:

  • Practical disaster recovery planning that reflects real operational use
  • Business continuity design aligned to trading and critical workflows
  • Infrastructure mapping to identify dependencies and single points of failure
  • Incident readiness testing to ensure plans work in practice, not just theory

The aim is to remove uncertainty from disruption scenarios so teams can focus on getting back to normal operations quickly and confidently.

Most firms believe they are covered for downtime until they experience it in practice. The difference between having a plan and having a working plan only becomes clear when systems actually fail, and decisions need to be made quickly.