Crowdstrike was really a situational awareness issue

3 August 2024
 / 
Bart van Leeuwen
 / 

It is two weeks ago that the Crowdstrike issue brought large infrastructures and companies to their knees. I have been thinking about the situational awareness aspect of this incident. Although it was reported as a technical issue, there is so much to talk about from a human factor point of view.

But first, when I mention situational awareness, I use this definition:

‘The ability to perceive and understand what is happening around you in relation to how time is passing, and then be able to accurately predict future events in time to avoid bad outcomes’

There is a lot to unpack from this incident, but there are three main angles that I’ve been thinking about. Let’s look at them in reverse order from the incident timeline perspective.

  1. The situational awareness of the IT teams.
    The stories range from ‘we got it solved in a few hours’ to ‘we were disrupted for days’. How could this be so far apart?

    Was this caused by a delay of actually noticing the problem, the perception? Or did it take a while before they fully grasped what was going on, understanding? Or was the solution complicated and were the teams not equipped to solve this problem, not enough options, the prediction?

    Dealing with incidents like this requires excellent situational awareness, not only to recognize the problem, but also with having enough options to actually solve it.

    The following story in The Register is a brilliant example of having options.

  2. The situational awareness of product owners.
    The situational awareness the teams responsible for deploying the product and causing the new attack surface.

    This decision was certainly made with the best intentions. Ease of fulfilling regulatory compliance, ease of use and mass deployment, or previous experience with security threat that a product like this could prevent… do you notice how these best intentions are fed by potential positive outcomes?

    Situational awareness should also be driven by realizing the potential negative outcomes. If somewhere in the decision path, “this could cause us a 500 Million USD loss” was raised as an issue, other decision probably would have been made. A statement I’ve read a lot the past weeks; ‘nobody get’s fired for buying Crowdstrike’, this by itself screams a lack of situational awareness, its complacency.

  3. The situational awareness of the software developers.
    It is clear that they were unable to predict this outcome. That means that in their process of perception and understanding, there was something horribly wrong. Were they not aware of the complexity of the product, the perception? Did they never see something like this happen before, the understanding?

    These two simple things would make it impossible for them to predict the outcome. And yes, I’ve read a lot about arrogance on their side, I still believe decisions were made with the best intentions. But also, with a total lack of situational awareness.

My conclusion?

Even with a highly technical problem like Crowdstrike, there is a lot to discuss about situational awareness, and the humans that are involved.

I would love to explain you how I can help you and your team to really understand the barriers of situational awareness. This can help you in the future to ideally prevent disasters like this or at least (quickly) limit them. Situational awareness matters, in more situations than we are aware of.

It is two weeks ago that the Crowdstrike issue brought large infrastructures and companies to their knees. I have been thinking about the situational awareness aspect of this incident. Although it was reported as a technical issue, there is so much to talk about from a human factor point of view.

But first, when I mention situational awareness, I use this definition:

‘The ability to perceive and understand what is happening around you in relation to how time is passing, and then be able to accurately predict future events in time to avoid bad outcomes’

There is a lot to unpack from this incident, but there are three main angles that I’ve been thinking about. Let’s look at them in reverse order from the incident timeline perspective.

  1. The situational awareness of the IT teams.
    The stories range from ‘we got it solved in a few hours’ to ‘we were disrupted for days’. How could this be so far apart?

    Was this caused by a delay of actually noticing the problem, the perception? Or did it take a while before they fully grasped what was going on, understanding? Or was the solution complicated and were the teams not equipped to solve this problem, not enough options, the prediction?

    Dealing with incidents like this requires excellent situational awareness, not only to recognize the problem, but also with having enough options to actually solve it.

    The following story in The Register is a brilliant example of having options.

  2. The situational awareness of product owners.
    The situational awareness the teams responsible for deploying the product and causing the new attack surface.

    This decision was certainly made with the best intentions. Ease of fulfilling regulatory compliance, ease of use and mass deployment, or previous experience with security threat that a product like this could prevent… do you notice how these best intentions are fed by potential positive outcomes?

    Situational awareness should also be driven by realizing the potential negative outcomes. If somewhere in the decision path, “this could cause us a 500 Million USD loss” was raised as an issue, other decision probably would have been made. A statement I’ve read a lot the past weeks; ‘nobody get’s fired for buying Crowdstrike’, this by itself screams a lack of situational awareness, its complacency.

  3. The situational awareness of the software developers.
    It is clear that they were unable to predict this outcome. That means that in their process of perception and understanding, there was something horribly wrong. Were they not aware of the complexity of the product, the perception? Did they never see something like this happen before, the understanding?

    These two simple things would make it impossible for them to predict the outcome. And yes, I’ve read a lot about arrogance on their side, I still believe decisions were made with the best intentions. But also, with a total lack of situational awareness.

My conclusion?

Even with a highly technical problem like Crowdstrike, there is a lot to discuss about situational awareness, and the humans that are involved.

I would love to explain you how I can help you and your team to really understand the barriers of situational awareness. This can help you in the future to ideally prevent disasters like this or at least (quickly) limit them. Situational awareness matters, in more situations than we are aware of.