How software development teams deal with high levels of pressure and emergencies in F1 ? The Answer

<< Cynefin exercise about Agile software development - 5 Key learnings (final) | Home | How software development teams deal with high levels of pressure and emergencies in F1 ? >>

How software development teams deal with high levels of pressure and emergencies in F1 ? The Answer

How software development teams deal with high levels of pressure and emergencies in F1 ?

Is not easy to articulate the answer to this question because it is a lot about actions and behaviors learned from practice and a network of collaborations, information exchange, trust and competences. Here I try to describe it based on my personal experience.

How software development teams deal with high levels of pressure and emergencies in F1 ?

In sort a list from my experience:

A minimal set of practices, checklists and skills required for an accurate and competent job
Technical practices that help to avoid mistakes when working under pressure and during an emergency
Built-in resilience in the technology used and developed
A cooperation network and cooperation practices that give the capability to sense and respond to pressure and emergencies

Here the details:

A minimal set of practices, checklists and skills required for an accurate and competent job

This minimal set of practices and checklists defined for key operations and for different software systems have grown by the teams from lessons learned, from experience and from previous mistakes. Here there are only items that proved in practice to be useful and effective.
It is really the minimal necessary (not sufficient) set for an accurate and competent job, ignoring them is considered negligence.
While making mistakes is accepted when pushing to the limits, that's not admitted when is consequence of negligence.

This set of practices and checklists is a safe baseline that gives confidence to those that suffers more the stress and pressure and helps those that are "adrenaline freak" to remain disciplined :)

For example the set of practices could include the minimal validations and user acceptances tests that a specific application need to pass in order to release it as new official version that can be used during the race.
And checklist can define the steps required to release the application including i.e. the smoke tests, the configuration of the authorizations, the required mails and communications with various departments.

Exceptions are possible when appropriate, they need to be evaluated, discussed and accepted case by case.
Technical practices that help to avoid mistakes when working under pressure and during an emergency
Here some example:
- automation (i.e. of the build, of the creation of the setup-kit, of the tests and of the delivery)
- pair programming (essential when increasing the pace and when dealing with emergencies)
- simple design, low coupling, consistency and principle of least surprise in the code-base
Built-in resilience in the technology used and developed
Here some example:
- avoiding single points of failure from the infrastructure
- simplify systems configurations, monitoring for early detections of problems before they can harm
- crash only server-side applications: that restart safely and continue to work well after a failure
- applicative logs that helps to quickly identify the cause of an IT problem at the track, and remove it
- version rollback: the possibility to rollback to the previous stable version of a software system with 1 click when the new version of the application have a show-stopper bug
A cooperation network and cooperation practices that give the capability to sense and respond to pressure and emergencies

Well this is the one that I find more difficult to articulate because is about social relationships, learned behaviors and the ability to react.
Is about knowing the domain and the business, the ability to have a big picture and to access the level of an emergency and the possible consequences and impacts of an action. Just like doing a triage.
Is about knowing who needs to be informed, who can be affected, who can provide some info, who have some knowledge experience skill and who can authorize some decision.
Is about knowing who is more capable to deal with stress and pressure and who is less.

And is about acting accordingly in an environment where there is a network more then a hierarchy and what count are personal competences of the people in the networks more then the official job descriptions, problems involve many different competencies/specializations and affect different departments in different ways, and where new and unexpected scenarios can pop up.

Here some links
- Robust vs. Resilient Plans
- Seven characteristics of resilience
- Ten definitions of resilience
- Triage

And here keywords for further Googling:
- Adaptive authority patterns
- Dynamic re-organisation on the fly
- Inclusive communication and disintermediation
- Cross-functional networks of generalists and specialists and distributed cognition
- Rewarded reporting of errors and faults
- Redundant observers and thinkers
- Training and learning to improvise