Insights - WA Systems

A workspace with a laptop, open notebook, and coffee mug on a wooden desk, looking out at a snow-capped mountain through a window

Why this isn't a luxury

The first reaction most engineering leaders have to the Lookout role is that it sounds expensive. One engineer, one week at a time, doing work that's deliberately less ambitious than what they'd otherwise be shipping. For an eight-person team, that's a twelve percent tax on output, every week, forever.

This math is wrong, but the kind of wrong it is matters.

The literature on team performance is fairly clear that teams running at full utilisation produce less than teams running with slack — they have no capacity to absorb the unexpected, and the unexpected is the entire substance of running production software. This argument has been made well, repeatedly, for decades. It almost never wins in a sprint planning meeting. The reason isn't that engineering leaders disagree with it; it's that "slack" reads as the absence of work, and the absence of work is the thing every CEO and founder is structurally allergic to. You can't budget for an absence. You can't measure it. You can't defend it when the next quarter's targets come down.

The Lookout role works around this by giving slack a name, a deliverable, and a person. The team isn't budgeting for "slack" — it's budgeting for one role with a defined surface: dashboards watched, queue triaged, small work cleared, handoff produced, bell rung when needed. That's countable. Incidents caught before the alert versus after. Time-to-detect trends. Flaky tests resolved. None of these metrics are perfect, but they exist, and they're the kind of thing a CTO can put in a quarterly review.

In practice, the Lookout and the on-call engineer are usually the same person. The on-call engineer is already not on sprint commitments. They're already reading dashboards intermittently, because catching something before the page fires is part of how good on-call engineers do their job. The Lookout role isn't a new function added on top of on-call — it's the deliberate version of what good on-callers already do informally. What the role contributes is the structure: naming the watching as work, scheduling it explicitly, protecting it from absorption into other commitments, producing a written handoff at the end. The on-call engineer who knows they are also the Lookout will use their week differently than the on-call engineer who is just waiting for a page.

This reframes the cost question. Most teams asking can we afford a Lookout? already have an on-call rotation. The Lookout role doesn't add a new headcount commitment; it reshapes how the on-call week is used. The engineer is already not shipping features that week. The question is whether they spend the week waiting for incidents and doing whatever else they can fit in, or watching for the conditions that produce incidents and reducing them deliberately. The first is reactive; the second is the Lookout. Same engineer, same week, different posture.

In practice, a Lookout week looks like this. The engineer's sprint commitments for that week are zero, or close to zero. They are not on a feature ticket anyone is waiting on. Their named work is the small interruptible queue: flaky tests, documentation, the issue queue, paired tickets. Their dashboards are open all day. They take meetings if asked, but aren't in any meeting where their absence would matter. At the end of the week, they produce a written handoff: what's been weird, what's still open, what the incoming Lookout should pay attention to. The tickets cleared are the secondary output. The watching is the primary output, measured through what didn't happen — the incidents caught early, the regressions that didn't reach customers, the patterns noticed before they became outages.

The honest version of this is that very few teams run the role exactly as described. What most teams have, if they have anything at all, is slack hiding in named roles: the senior engineer "on platform this quarter," the on-call who isn't on sprint commitments, the tech lead whose calendar is structurally underbooked. The function is being held; it just doesn't have a name. The work of bringing the Lookout role to life is mostly the work of taking that camouflaged slack and making it explicit — naming it, scheduling it, writing it into the team's working agreement so it survives the next pressure cycle. Start with one engineer, one week, one team. Run it for a month. Document what got caught and what got cleaned up. Defend the role the first time it's inconvenient — that's the moment it either becomes structural or quietly disappears. The argument that wins isn't the abstract one about utilisation. It's the concrete one about the four incidents that didn't happen last month.

Rotation, not specialisation

The next question is who holds the role. The temptation, especially in larger teams, is to make it a specialty — find the engineer who's naturally good at watching, who reads the dashboards already, and make watching their permanent job.

This is a mistake.

A specialist Lookout is a single point of failure. They hold years of pattern recognition in their head — which dashboards are flaky on Mondays, which services have a slow morning ramp, which deploys historically need an extra eye. When they leave, or burn out, or go on holiday, the team loses something it didn't realise was load-bearing.

Watching is also a skill, and skills that aren't practised across a team don't become part of its vocabulary. If only one engineer reads the dashboards, only one engineer learns to read them. The rest of the team treats observability as someone else's domain. Rotation forces the team to develop a shared literacy. The runbooks get better because more people use them. The dashboards get clearer because more people read them.

The third reason is about the engineers themselves. A week on the Lookout is a week of seeing the system from outside the feature you're currently building — triaging issues from support, pairing on tickets across the team, noticing things in services you don't normally touch. Engineers come back from a Lookout week as better engineers, in ways that are hard to name and easy to recognise.

The honest counter-argument is the ramp-up cost. The first day of a new Lookout's week is always weaker than the last day. They don't yet know what the dashboards looked like yesterday, or which deploys earlier in the week needed extra attention. The mitigations are practical: a short written handoff at the end of each rotation, capturing what's been weird that week; a shadow half-day where the incoming Lookout pairs with the outgoing one; a team-wide dashboard review at the start of each week, which serves as both handoff and shared literacy. None of these eliminate the cost. They reduce it to something the team can absorb.

The Lookout role is one of the quieter answers to a loud problem. Most engineering teams know their incidents take longer than they should, that the same kinds of regressions keep slipping through, that the senior engineer who notices things is doing more than the org chart suggests. The role doesn't fix any of that by itself. It gives the watching a name, a place, and a person — and it makes the act of seeing into work the team chooses to do, rather than work that happens to them.

Part one — Your First Incident Will Tell You Everything introduced the team and the failure mode. Part two — The Role Nobody Assigned looked at the structure of the response itself. Part three — Your Incident Isn't Over When the Site Comes Back focused on what happens after the incident is over. Part four — The Lookout introduced the Lookout role.

Thomas Riboulet is a Fractional VP of Engineering working with European tech companies. He writes about engineering leadership, team structure, and sustainable delivery at insights.wa-systems.eu.