How to Lead at 11 PM When Your Platform Is Down, Your VP Is Spiraling, and 2,200 Clients Are Waiting
It is 11 PM on a Tuesday. Your phone has not stopped vibrating for six hours. The platform went down at 5 PM — a database migration corrupted an index that passed every test in staging but failed at production scale. 2,200 enterprise customers are affected. Customer support has logged 800 tickets and counting. Three enterprise clients have emailed threatening contract termination. Your VP of Sales is calling every fifteen minutes demanding a timeline he can give to clients. And Kenji Nakamura, your VP of Engineering — the person you need most right now — just called you sounding like a man who has been staring at a terminal for six hours and is starting to crack. "We found the root cause," he says, his voice flat and scattered. "But the fix options are not great." He presents two paths: a full database rollback that takes four more hours and risks losing two hours of transaction data, or a targeted fix with a 60% success rate and a 90-minute timeline. His team has been working nonstop. People are making mistakes because they are exhausted. And somewhere underneath the technical update, you hear something else entirely: Kenji thinks this might be his last day.
Why This Conversation Goes Wrong
You panic alongside him. "How did this get through staging? Who approved the migration?" Blame during a crisis triggers a cascade. Kenji shuts down. His team hears through the grapevine that the CEO is assigning fault. The best engineers start drafting their resumes not because of the outage but because of how it was handled. The fix takes six hours instead of four because people are now afraid of making another mistake.
You micromanage the technical decision. "Let me see the rollback plan. Walk me through the exact SQL commands." Unless you are a database engineer, you are now adding a layer of decision-making overhead to a team that is already exhausted. Kenji has to explain production architecture to you at 11 PM instead of fixing it. Your involvement slows the resolution.
You demand certainty where none exists. "I need a guaranteed timeline for the board." There is no guaranteed timeline in a production crisis. Demanding one forces Kenji to either lie or collapse. The honest answer is a probability range, and a leader who cannot work with probabilities in a crisis is a leader who makes the crisis worse.
The Steady Hand
A crisis does not test your technical knowledge. It tests your emotional regulation, your ability to make decisions with incomplete information, and your instinct to protect the people doing the work. The Steady Hand framework is built on a single principle: in a crisis, the leader's primary job is to absorb chaos so the team can focus on the fix. Every question you ask, every decision you make, every word you say either adds order or adds noise. There is no neutral.
Regulate the room before making any decisions
"Kenji, take a breath. We are going to get through this. Before I ask anything else — how are you doing?" This is not a soft question. It is a strategic one. A VP of Engineering on the edge of panic makes worse technical decisions. A VP of Engineering who feels supported by his CEO makes clearer ones. The five seconds you spend acknowledging Kenji as a human being will save you an hour of cascading errors made by an exhausted team running on fear.
Ask three questions, then decide
"I need to understand three things: what is the worst case for the rollback, what is the worst case for the targeted fix, and what does your gut tell you?" Kenji presented the options but did not recommend one because he is afraid of being wrong. Your job is not to pick the technically optimal path — it is to give Kenji permission to trust his own judgment and then back him. When he says the targeted fix is risky but faster, ask: "If it fails, can we still do the rollback?" If yes, you have your answer. Take the faster path with the fallback.
Separate the fix from the communication
"Here is what I am going to do: I am handling Sales, customers, and the board. You handle the fix. Do not answer your phone unless it is me or your engineering leads." Kenji said it himself: Sales keeps calling and he cannot deal with it. Every minute Kenji spends explaining the situation to a non-technical stakeholder is a minute not spent on resolution. You become the shield. The VP of Sales gets a timeline from you. Enterprise customers get a status email from you. The board gets a text from you. Kenji gets silence and focus.
Set the rhythm
"You call me every 45 minutes with a one-sentence update: on track, off track, or pivoting to the rollback. I do not need details unless the plan changes." Structure reduces cognitive load. A fatigued team that knows when the next check-in is can focus between check-ins. A team getting ad hoc calls from their CEO cannot. Forty-five minutes is long enough for meaningful progress and short enough to catch a problem before it compounds.
Name what matters most and what matters later
"We will do the post-mortem on Thursday. Right now, nobody is getting fired, nobody is getting blamed, and the only thing that matters is getting 2,200 customers back online. Go fix it." This sentence does four things simultaneously: it removes the fear of blame, it establishes the priority stack, it gives Kenji psychological safety to take risks on the fix, and it sets a specific future date for accountability so it is not forgotten. Kenji hangs up the phone and, for the first time in six hours, he is thinking about the solution instead of his career.
The moment that changes everything
Kenji does not need a better answer. He needs a calmer room.
Kenji Nakamura is one of the best engineering leaders in the industry. He has architected systems that serve millions of users. But right now, at 11 PM, six hours into the worst outage of his career, he cannot think straight. Not because the problem is beyond him — it is not — but because every input he is receiving is making it harder to focus. Sales is calling. His team is exhausted and making errors. His guilt about the migration is looping in the background. And underneath all of it, he is wondering if this is the day he gets fired. A crisis does not reveal who is technically competent. Everyone in engineering passed that test years ago. A crisis reveals who can think clearly when everything around them says panic. Kenji cannot find that clarity alone right now. He needs someone to take the non-technical noise off his plate, tell him they trust his judgment, and give him space to be the engineer he is. The CEO who does this is not being soft. They are being strategically precise about what their VP of Engineering needs in order to perform. Fix the human, and the human fixes the system.
What to Say (and What Not To)
Instead of
"How did this pass staging?"
Try this
"We will do the full post-mortem Thursday. Right now, walk me through the two options."
Instead of
"I need a guaranteed timeline."
Try this
"Give me a probability: what are the odds the targeted fix works in 90 minutes?"
Instead of
"The board is going to want to know what happened."
Try this
"I am handling Sales, customers, and the board. You handle the fix. Do not answer your phone unless it is me."
Instead of
"Somebody needs to take responsibility for this."
Try this
"Nobody is getting fired tonight. The only job right now is getting 2,200 customers back online."
The Bigger Picture
Google's Project Aristotle research — the most comprehensive study of team performance ever conducted inside a single company — found that psychological safety was the number one predictor of high-performing teams. During a crisis, psychological safety does not mean lowering standards. It means creating conditions where people can take risks, admit uncertainty, and make fast decisions without fear of punishment. Teams with high psychological safety resolve incidents 2.2x faster than teams operating under blame-first management.
PagerDuty's 2024 State of Incident Management report analyzed 14,000 production incidents and found that the single largest contributor to extended downtime was not technical complexity — it was organizational friction. Incidents where the incident commander had clear authority and communication was centralized resolved 47% faster than incidents managed by committee. The CEO's job during a crisis is not to join the war room. It is to remove every obstacle between the war room and the resolution.
Practice This Conversation
10 minutes · AI voice roleplay with Kenji Nakamura
Reading about this is step one. Practicing it changes everything. Sonitura lets you rehearse this exact conversation with Kenji Nakamura, a realistic AI vp of engineering, 4-year veteran, manages 60 engineers who reacts to your words in real time. It takes 10 minutes. The next 11 PM phone call, your first instinct will be to steady the room — and the fix will follow.
Practice This Scenario Free →