I remember the first time I added an is_deleted column to a table. It was a users table, and the product manager had asked what happens if someone accidentally deletes their account. I didn't have a good answer, so I did what felt safe: instead of DELETE FROM users WHERE id = ?, I wrote UPDATE users SET is_deleted = true WHERE id = ?. The row stays. The data is preserved. We can undo it if we need to. Problem solved.
That was the easiest technical decision I made that quarter. It was also the one that cost me the most time over the next eighteen months.
Soft deletes are one of those patterns that feel like insurance. You're not really deleting anything, you're just hiding it. If something goes wrong, you can bring it back. It sounds like the responsible choice. And sometimes it is. But most teams adopt soft deletes as a blanket policy across every table without understanding what they're signing up for. This post is about what that actually costs.
The Promise
Teams reach for soft deletes for a handful of reasons, and they're all reasonable on the surface.
Data recovery. A user deletes something by mistake, or an admin fat-fingers a bulk operation. With soft deletes, you flip a boolean and the data is back. No need to restore from a backup.
Audit trail. You can always see what was there. Deleted records are still queryable, so you have a built-in history of what existed and when it was removed.
Regulatory compliance. Some industries require you to retain data for a certain period, even after a user requests deletion. Soft deletes feel like they satisfy this requirement.
And then there's the gut-feel argument. DELETE is scary. UPDATE feels reversible. When you're staring at a production database at 2 AM, "mark as deleted" is a lot less terrifying than "remove permanently."
These are real needs. I'm not going to argue that they don't matter. But is_deleted is not the only way to address them, and it comes with costs that aren't obvious until you've lived with them for a while.
Cost #1: Every Query Gets a Tax
The moment you add is_deleted to a table, every query against that table needs a WHERE is_deleted = false clause. Every single one. Your repository methods, your joins, your aggregation queries, your reports, your admin dashboards, your export scripts. All of them.
Miss one, and you're showing deleted data to users. A deleted comment appears in a thread. A cancelled order shows up in the revenue report. A deactivated user's profile is still visible. These bugs are subtle because the data looks valid. It is valid. It's just supposed to be invisible.
Most ORMs offer a way to handle this automatically. Hibernate has @Where(clause = "is_deleted = false"). ActiveRecord has default_scope. Django has custom managers. These help, but they're leaky. The global filter works until it doesn't. A raw SQL query in a migration script doesn't go through the ORM. A reporting tool that connects directly to the database doesn't know about your application-level filter. A junior developer writing a one-off data fix runs SELECT * FROM users WHERE email = '[email protected]' and gets two rows back, one active and one deleted, and doesn't understand why.
The tax isn't just on the code you write today. It's on every query anyone will ever write against this table, forever. And it's a silent tax. Nothing fails when you forget it. The query runs fine. It just returns wrong results.
Cost #2: Unique Constraints Break
This one catches every team eventually. You have a users table with a unique constraint on email. A user with [email protected] deletes their account. You soft-delete the row. Now a new person tries to sign up with [email protected]. Unique constraint violation. The email is "taken" by a deleted account.
The workarounds are all ugly. You can remove the unique constraint and enforce uniqueness in application code, which means you're now racing against concurrent inserts with no database-level protection. You can add a composite unique constraint on (email, is_deleted), but that only lets you have one active and one deleted row with the same email. Two deletions and you're stuck again. You can use deleted_at instead of is_deleted and make the unique constraint on (email, deleted_at), using null for active rows, but now your schema encodes deletion semantics into your constraint definitions.
The least bad option in PostgreSQL is a partial unique index: CREATE UNIQUE INDEX idx_users_email_active ON users(email) WHERE is_deleted = false. This enforces uniqueness only among active rows. It works. But it's one more thing to remember, one more thing to document, and one more thing that a developer unfamiliar with partial indexes will stare at in confusion during a schema review.
Every table with both a unique constraint and soft deletes has this problem. Usernames, slugs, SKUs, phone numbers, anything that needs to be unique in the live data but might have ghost duplicates in the deleted rows.
Cost #3: Cascading Confusion
You soft-delete a user. What happens to their orders? Their comments? Their uploaded files? Their profile photo? Their membership in groups?
Option one: soft-delete the children too. The user's orders get is_deleted = true. Their comments get is_deleted = true. Their uploads get is_deleted = true. Now you have a cascade of soft deletes to manage. How deep does it go? If a user's comment is soft-deleted, do the replies to that comment get soft-deleted too? What about likes on those replies? You're implementing your own cascade logic in application code, because the database's ON DELETE CASCADE doesn't know about your boolean column.
Option two: leave the children alone. The user is soft-deleted, but their comments still exist. Their orders still reference them. Now you have foreign keys pointing to deleted rows. A query that joins orders to users returns a deleted user for an active order. You need to add AND users.is_deleted = false to the join. Every join. Against every table that references users.
Option three: some mix of the two. Soft-delete orders but keep comments. Archive uploads but leave group memberships. Congratulations, you now have a per-entity deletion policy that lives in your service layer, has no schema-level enforcement, and is documented in a Confluence page that nobody reads.
Real ON DELETE CASCADE handles this at the database level. It's declarative, consistent, and impossible to forget. Soft-delete cascades are imperative, inconsistent, and easy to get wrong. Every new relationship you add to a soft-deletable entity is another edge case to handle.
Cost #4: Data Growth and Index Bloat
Deleted rows never leave the table. That's the whole point. But the consequence is that your table grows without bound.
For a table that rarely has deletions, like a products table, this is fine. But for high-churn tables, the math gets ugly fast. Consider a sessions table, a notifications table, or a temporary_tokens table where records are created and "deleted" constantly. With soft deletes, every expired session and dismissed notification is still sitting in the table. Your indexes include these rows. Your SELECT queries scan past them. Your VACUUM can't reclaim the space because the rows aren't dead, they're just marked.
You can create partial indexes that only cover active rows, like CREATE INDEX idx_active_sessions ON sessions(user_id) WHERE is_deleted = false. But now you have two mental models: the full table and the "active" view. Your query planner needs to pick the right index. Your DBA needs to maintain both. And the table itself is still bloated.
Eventually, someone will propose a cleanup job: "Let's hard-delete soft-deleted rows that are older than 90 days." And now you're writing the DELETE statement you were trying to avoid in the first place, plus a scheduled job to run it, plus monitoring to make sure it doesn't lock the table for too long, plus logic to handle the fact that other tables might still reference those rows. The soft-delete has become a deferred hard-delete with extra steps.
And Then There's the Mental Model Pollution
This is the cost that's hardest to measure and easiest to underestimate. Soft deletes mean that your data model no longer represents reality. It represents reality plus ghosts.
Every developer on the team needs to hold this in their head. When they write a query, they need to remember that deleted rows exist. When they debug an issue, they need to consider whether they're seeing deleted data. When they write a test, they need to seed both active and deleted records to verify the filter works. When they review a pull request, they need to check that every new query includes the filter.
New team members are particularly vulnerable. They write a query, it works, the tests pass. Nobody catches the missing WHERE clause until a customer reports seeing deleted content in production. The code was correct. The data model was the trap.
Over time, the team develops a low-grade anxiety around any query that touches a soft-deletable table. "Did I remember the filter?" becomes a background process running in every developer's head. It's not catastrophic, but it's friction. And friction compounds.
The Alternatives
I'm not going to tell you to just hard-delete everything and hope for the best. The needs that soft deletes try to address are real. Here's how to address them without poisoning your data model.
Hard Delete + Audit Log
Delete the row. Actually delete it. But before you do, write a snapshot to an audit_events table: the entity type, the entity ID, a JSON blob of the data at the time of deletion, the timestamp, and who did it.
You get full history without any of the query pollution. The audit table is append-only, which means it's simple, fast, and never interferes with your live data. If you need to recover a deleted record, you reconstruct it from the audit log. This is rare enough to be a manual operation, not an automated feature. And that's fine. How often does someone actually need to un-delete something? In my experience, almost never. When they do, it's an incident, not a feature flow.
The audit log also gives you a better history than soft deletes do. Soft deletes tell you "this row exists and is marked deleted." An audit log tells you who deleted it, when, what the data looked like at the time, and what action triggered the deletion. It's strictly more information.
Status Columns With Business Meaning
Instead of a binary is_deleted, model the actual state the entity is in. A user account isn't "deleted" or "not deleted." It's active, suspended, archived, or closed. An order isn't deleted. It's pending, fulfilled, cancelled, or refunded.
These are real business states with real semantics. They're not a filter hack. A query that says WHERE status = 'active' is intentional and readable. A query that says WHERE status = 'cancelled' is a feature, not a workaround. And each status can have its own rules: suspended users can't log in but their content stays visible. Archived projects are read-only. Cancelled orders can be reactivated within 24 hours.
This approach forces you to think about what deletion actually means for each entity. And that's a good thing, because it's rarely the same thing across the board.
Temporal Tables
If your actual requirement is time-travel queries, "show me what this data looked like on March 15th," the database has a feature for that. PostgreSQL has temporal table extensions. SQL Server has system-versioned tables. These maintain a history of every row change in a system-managed shadow table, completely transparent to your application code.
Your live table stays clean. Your historical data is queryable. The database handles the bookkeeping. This is the right tool for compliance and audit requirements where you genuinely need to answer "what was the state of this record at time T?"
Event Sourcing (Rarely)
If the sequence of events is your data model, if you need to replay history, reconstruct state at any point in time, and the domain genuinely requires it, event sourcing solves the deletion problem because you never delete anything. Everything is an event. A "deletion" is a UserAccountClosed event.
But I've written about this before. Event sourcing adds enormous complexity, and most teams don't need it. Don't adopt event sourcing just because you want an undo button. That's using a sledgehammer to hang a picture frame.
When Soft Deletes Actually Make Sense
I'm not saying soft deletes are always wrong. They make sense in a narrow set of circumstances.
A specific table with a clear undo UX requirement. A "trash" or "recycle bin" feature where users expect to recover recently deleted items. Think email trash, file recycle bin, or a drafts folder. These are specific, bounded use cases.
Bounded retention. The soft-deleted rows are hard-deleted after a defined period. 30 days, 90 days, whatever. The point is that the growth is capped. You're not keeping ghosts forever.
Low-churn, small tables. A configuration table with 50 rows. A categories table. Something where the data volume is small enough that the costs don't compound.
The common thread: soft deletes as a specific, intentional choice for specific tables. Not as a blanket policy. Not as a default. Not as "add is_deleted to every table because what if we need it."
The Trap
The real trap isn't the is_deleted column. It's the word "just." Just add a boolean. Just filter in the WHERE clause. Just use a default scope. Every "just" hides a cost that compounds over time: every query needs the filter, every constraint needs a workaround, every relationship needs cascade logic, every new developer needs to learn the rule, every table grows without bound.
Soft deletes are seductive because they feel like insurance. But insurance has premiums. This one charges you on every query, every join, every schema change, and every onboarding. Evaluate each entity on its own terms. Think about what "deleted" actually means in your domain. And for most tables, the answer is simple: deleted means deleted.
Comments (0)