Local-First Is Reliability Engineering

When your cloud tool is the thing that fails, you need a path that doesn't.

Apr 01, 2026

The tools you can still reach when the network can't.

You’re locked out of the system that tells you you’re locked out.

That’s not a hypothetical. The cloud tool is up; the status page confirms it. But sync is spinning and you can’t tell if your work saved. You’re working in green-indicator purgatory. The tool that was supposed to make you more reliable just became the thing you have to manage.

This is the failure mode nobody designs against. Not data loss. Not a breach. The mundane version: the tool you depend on becomes unreliable or unaccessible at exactly the moment you need to understand its reliability.

The last piece ended on a specific problem. When the network’s interpretation of your work can’t be trusted, keeping more of what you produce in your own hands stops being a preference. Local-first is what that looks like as an engineering decision.

WHAT LOCAL-FIRST ACTUALLY MEANS

Local-first isn’t anti-cloud. It’s a design tradeoff.

The core principle is simple: the authoritative copy of your data lives on your device. The cloud is a sync target, not the source of truth. That sounds like a minor distinction. It isn’t.

Cloud-first design optimizes for availability across devices, multi-party collaboration, and administrative simplicity. You get access from anywhere, on any device, without managing your own backup. The tradeoff is that your availability is coupled to theirs.

Local-first optimizes for a different failure mode. Can you still do the work if the network is down? If the provider has an outage? If your account is locked for 72 hours while their support queue clears?

In a cloud-first world: usually no. In a local-first world: usually yes.

Your working copy is on disk. The sync layer is a delivery mechanism, not the dependency. If the sync layer fails, you still have the file. You still have the state. You can still do the work.

That’s the engineering argument. Not ideology. A reliability requirement you set on your own behalf.

THE FAILURE MODES CLOUD-FIRST CREATES

There are three failure categories worth naming explicitly.

Account lockouts. Provider security events, billing failures, two-factor authentication bugs, and policy decisions generate lockouts. Duration ranges from hours to weeks. Your data is usually fine. You just can’t reach it.

Service degradation. This is the one that gets underreported. The service is “up” according to the status page. Sync is intermittent. Conflict resolution is unclear. You’re not sure which device has the canonical version of something you edited this morning. You keep working and hope the sync layer sorts it out later.

Provider discontinuation. Companies get acquired. Products get shut down. SaaS businesses die quietly. The better-managed ones give you 30 to 90 days and an export path. The others don’t. Your data is often recoverable. Your workflow isn’t.

Each is a different shape of the same problem. You built your reliability model on top of someone else’s availability guarantee. The guarantee is implicit in the design, not explicit in the contract.

The Recovery-First piece talked about undo as a feature requirement. Undo requires state you can access. If your state lives only in the cloud, your recovery options are exactly as available as your internet connection. That’s a dependency chain most tools don’t surface until it breaks.

THE PATTERNS

Local-first doesn’t require running your own servers. It requires one design constraint: always have a path to the data that doesn’t need the internet.

The simplest version: write in plaintext or Markdown. Sync to a cloud provider if you want, but work from the local file. Your editor is your application. The file is your database. When the provider is down, open the file. When you switch providers, move the file.

A more structured version: tools like Obsidian or Bear store notes locally and optionally sync. The local file is the source of truth. The sync is convenience, not a dependency. If iCloud is degraded, the app still works. You lose sync. You don’t lose access.

For operational tooling: SQLite is underused for personal and small-team workflows. A local database that syncs via Dropbox or a similar mechanism handles a surprising amount of structured data. No network required to read or write. Sync happens when connectivity is available.

The pattern across all three: you own the state. The cloud relationship is additive, not foundational.

Access is not ownership. One requires an internet connection. The other doesn't.

THE TRADEOFFS

Local-first shifts certain problems onto you. Name them before choosing the pattern.

Backup is now your responsibility. The cloud provider was handling it. If your local copy is on one machine and that machine fails, you’ve traded one availability risk for another. You need a backup discipline: Time Machine, a second sync target, periodic exports. Something redundant that you control.

Multi-device sync becomes a coordination problem. When the authoritative copy is local, moving between devices requires more thought. You can’t just open a browser tab on another machine. You need the sync to have completed. You need to know which device has the newest version.

Real-time collaboration doesn’t work well with local-first. If you need three people editing the same document simultaneously, you need a coordination layer. Local-first is a poor fit for that pattern. It’s a good fit for individual and small-team workflows where you’re not editing the same artifact at the same time.

The decision rule is direct. If you need offline access, account-lockout resilience, or control over your export path: local-first. If you need real-time multi-party state and can accept the provider’s availability as your own: cloud-first. Most individual and small-team workflows aren’t actually the second case, even when they’re built that way.

THE RELIABILITY ARGUMENT

Engineers don’t argue about resilience as a philosophy. They design for failure modes.

Local-first is what you build when the failure mode you can’t tolerate is “I can’t work because my provider is down.” That’s a real failure mode. It happens regularly. Building around it is engineering, not stubbornness.

The Cognitive Budget piece talked about what it costs when tools that are supposed to help start requiring maintenance attention themselves. Local-first tools tend to have a smaller failure surface. No API to break. No sync conflict to debug. No authentication token to expire. That’s not nothing.

The Analog Executive piece opened with a simple observation: being unreachable is now a luxury. Local-first is the technical version of the same principle. Owning your state means you’re not subject to someone else’s availability schedule. That’s a form of independence that compounds over time.

Keep one path to safety that doesn’t require the internet. That’s not a retreat from modern infrastructure. That’s a reliability requirement you set on your own behalf.

Week 8 is about the logbook: what it means to keep records that don’t require permission to read.

Local First Software - Kleppmann, M., Wiggins, A., van Hardenberg, P., McGranaghan, M. “Local-First Software: You Own Your Data, in Spite of the Cloud.” Ink & Switch / ACM Onward! 2019. The foundational paper defining the seven ideals of local-first software design.
Obsidian.md - plaintext Markdown note-taking with local storage and optional sync.
SQLite.org - public domain, zero-dependency embedded database engine. Appropriate for local-first structured data at any scale below “multiple writers simultaneously.”
Automerge - CRDT (Conflict-free Replicated Data Type) library for building local-first sync without a central coordinator. The most accessible implementation of the theoretical foundation described in the Kleppmann et al. paper.

Morphic

Discussion about this post

Ready for more?