TRIM does not erase data nor tells the SSD controller to perform an erase operation on those blocks in the background.
The TRIM command triggers a cascade of state updates within the SSD’s Flash Translation Layer (FTL) metadata.
Here is precisely how TRIM handles those page-level and block-level markings:
1. At the page level: marking as stale
When you delete a file, the Operating System sends a TRIM command containing a range of Logical Block Addresses (LBAs).
- The SSD controller looks up these LBAs in its mapping table.
- It disconnects (unmaps) the physical NAND pages from those logical addresses.
- It immediately marks those specific physical pages as stale (invalid) but the "immediately" here is not absolute (see my section about "retrim" down below).
2. At the block level: the dual outcome
Once the pages are marked stale, the controller instantly evaluates the status of the parent block that contains them. It will mark the block in one of two ways:
- Scenario A: a "block with stale pages" (partially invalid)
- Condition: The block now contains some stale pages, but still holds other pages with valid user data. It's a block that contains a mix of both valid (live) data and stale (dead) data.
- State: It cannot be erased yet because doing so would destroy the good data inside it.
- The problem: NAND flash can only be erased in entire blocks, not individual pages.
- Fate: It must wait for Garbage Collection. The controller will eventually copy the few valid pages to a new location, turning this into a fully stale block.
- Result: The controller flags this block internally as a candidate for future Garbage Collection. It notes the ratio of valid-to-stale pages so it can prioritize it later when free space runs low.
- Scenario B: a "stale block"(fully invalid)
- Condition: The TRIM command successfully invalidated the very last remaining valid page in that block, or the block was already filled with stale pages. It's a block where 100% of the pages are marked as stale or invalid.
- State: It contains absolutely zero useful data.
- The benefit: This is the ideal target for the SSD controller.
- Fate: It can be erased immediately without moving any data first. This makes the recycling process incredibly fast and efficient.
- Result: The controller flags the entire block as fully stale. Because it contains 0% valid data, the controller can bypass Garbage Collection entirely and move this block straight to the Erase Queue to be wiped during background maintenance.
If your SSD has to clear blocks with
mixed pages, it must read the good data, write it elsewhere, and then erase the block. This extra work causes
Write Amplification, which slows down your drive
and wears out the flash faster. If the SSD can find
fully stale blocks, it skips the moving step entirely, instantly boosting performance.
TRIM command received ➔ pages marked stale ➔ controller checks parent block ➔ flags the block as "Mixed" OR "Fully Stale".
Modern SSDs intentionally
delay Garbage Collection (GC) until free blocks run low to protect the drive's lifespan and maintain peak performance.
This strategy is divided into two distinct modes:
Idle GC and
Reactive GC.
1. Idle Garbage Collection (background)
When the SSD is sitting idle, it will occasionally clean up blocks, but it does so very conservatively.
- The goal: Prepare just enough free blocks for the next burst of write activity.
- Why it waits: If the controller aggressively moves valid data too early, and the host operating system deletes that same data a few minutes later, the SSD just wasted write cycles. Waiting gives data a chance to "die" naturally. Another reason why it waits is to improve Wear Leveling (more on that soon).
2. Reactive Garbage Collection (foreground)
This triggers only when the pool of free blocks drops below a critical threshold.
- The goal: Emergency space creation.
- The action: The controller aggressively targets blocks with stale pages, forces the movement of valid data to a new block, and erases the old block.
- The penalty: Because this happens while you are actively using the drive, it causes a noticeable drop in write speeds (often called the "GC cliff").
The strategy: why modern SSDs wait
SSD controllers use a metric called
Greedy Garbage Collection or
Cost-Age-Based policies to decide exactly when to move data. They delay the process for three major reasons:
- Minimizing Write Amplification: Moving valid data writes new data to the NAND. Doing this too often wears out the drive prematurely.
- Data Consolidation: By waiting, more pages within a block are likely to become stale. It is much more efficient to clear a block that is 95% stale than one that is only 20% stale.
- Host latency priority: Moving data takes processing power and bandwidth. Modern SSDs prioritize user read/write requests over background cleaning until they absolutely have no choice.
The intentional delay in Garbage Collection acts as a
data stabilization window. It serves as a natural filter that provides the wear-leveling mechanism with richer, high-fidelity metadata over time. This synergy directly translates to
improved lifespan and efficiency through several mechanisms:
1. Accurate Identification of "Hot" vs. "Cold" Data
- The concept: Data that changes frequently is "hot" (e.g., system logs, cache), while data that rarely changes is "cold" (e.g., operating system files, stored photos).
- How the delay helps: If the SSD triggers GC too quickly, it cannot tell the difference between hot and cold data because both look identical when newly written. By delaying GC, the FTL can observe data behaviors over a longer period. Hot data will be repeatedly overwritten and invalidate itself naturally, while cold data remains untouched.
- The Wear Leveling benefit: Wear leveling relies entirely on this distinction. It intentionally targets cold data to migrate into highly worn blocks (Static Wear Leveling). If it moves data prematurely, it might accidentally put high-turnover hot data into a worn-out block, destroying the block much faster.
2. Reducing Metadata "Churn"
- The concept: Tracking the precise Erase Counts (P/E cycles) and tracking which specific blocks are aging requires the FTL to maintain internal tracking tables (metadata).
- How the delay helps: If GC and wear leveling were running aggressively in the background constantly, the SSD controller would spend a massive amount of its limited processing power and RAM constantly updating these tracking tables.
- The Wear Leveling benefit: Waiting allows the controller to perform a comprehensive, holistic calculation of the drive's health status at longer intervals. This eliminates metadata "churn"—saving precious internal controller bandwidth and avoiding writing unnecessary metadata back to the NAND itself.
3. Preventing Adaptive Over-Correction (Oscillation)
- The concept: Dynamic wear leveling algorithms use real-time mathematical thresholds to decide where to route the next incoming write.
- How the delay helps: In a short timeframe, an SSD might experience an unusual burst of heavy writing to one sector, making those blocks look abnormally worn out. If the controller reacted instantly, it would over-correct by shifting operations across the drive needlessly.
- The Wear Leveling benefit: Deferring these deep cleanups gives the workload time to average out. The wear leveling algorithm gains a statistically accurate profile of the drive's global wear state, leading to precise, highly effective block choices rather than knee-jerk adjustments.
The bottom line: co-dependency
Garbage Collection and Wear Leveling are often engineered as parts of the same loop. By letting blocks sit idle longer, the SSD successfully separates data by its "lifespan," which gives the wear-leveling engine the exact blueprint it needs to distribute physical stress flawlessly across the NAND array.
Without reverse-engineering the SSD, there is no way to tell if or when physical blocks will be erased in the NAND after you delete a file. Unless you're using enterprise SSDs with the NVMe Zoned Namespaces (ZNS) command set or the NVMe Key Value (KV) command set, without permanent data sanitization of the whole SSD, you cannot ensure that the deleted file's data content will be made irrecoverable within the NAND.
Retrim
To clarify on "TRIM immediately marks those specific physical pages as stale", the
immediate part happens
only if the OS uses
Inline TRIM (Continuous TRIM). However, most modern operating systems rely heavily on
Periodic Re-TRIM (Scheduled TRIM) instead.
Here is exactly how these two different OS approaches change when those pages actually get marked as stale:
1. Periodic Re-TRIM (delayed marking)
Most modern OS environments (like Windows "Optimize Drives" or Linux fstrim) default to this method to preserve system performance.
- How it works: When you delete a file, the OS deletes it from its own file system but does not immediately tell the SSD. It keeps a log of those deleted addresses.
- The "Re-TRIM" event: Once a week (or during low activity), the OS runs a batch job and sends a massive list of TRIM commands to the drive all at once.
- The SSD state change: The SSD pages are not marked as stale at the moment of deletion. They remain marked as "valid" inside the SSD until that periodic Re-TRIM command finally arrives.
2. Inline TRIM (continuous: immediate marking)
Some systems (like macOS or certain Linux configurations using the discard mount option) choose to send the command immediately.
- How it works: The exact millisecond you empty the trash, the OS pauses briefly to send the TRIM command down the SATA/NVMe bus.
- The SSD state change: In this specific mode, the SSD does immediately mark those pages as stale.
- The downside: This causes high "queue depth" overhead and can introduce micro-stutters during heavy file deletions, which is why operating systems moved toward the periodic method.
The SSD's secret catch: Command Queueing
Even when the OS sends a TRIM command (whether inline or periodic), the SSD controller itself might not process it instantly.
TRIM is an asynchronous, non-blocking command in modern NVMe drives. The controller places the TRIM request into a low-priority internal queue. If you are in the middle of a heavy gaming session or exporting a video, the SSD controller will intentionally delay processing the TRIM command—leaving the pages marked as "valid"—until the drive goes idle.
No TRIM commands are discarded from the SSD controller’s internal queue.
Once a TRIM command is sent by the operating system and accepted into the SSD's command queue, it is guaranteed to be processed, as discarding a valid command would cause data corruption. What sometimes creates confusion around this topic—and why the operating system still performs a periodic "Re-TRIM"—comes down to three specific architectural reasons:
1. The OS "Re-TRIM" is a safety net, not a resend of lost commands
When Windows or Linux runs a scheduled weekly Re-TRIM (via Defrag or fstrim), it isn't resending commands that the SSD threw away. Instead, it is doing two things:
- Catching missed blocks: During heavy system use, the OS file system driver might occasionally skip sending an inline TRIM command to avoid clogging the storage bus. The periodic sweep acts as a "catch-all" to find any deleted blocks it missed.
- Cleaning up metapages: File systems constantly reuse metadata blocks. A periodic sweep ensures that space freed up by internal OS file system management is fully synchronized with the SSD.
2. Queue boundaries (hardware overflows)
While the SSD won't discard a command it has accepted, its internal queue does have a fixed physical size.
- If an OS floods a drive with too many inline TRIM commands at once, the SSD's command queue can fill up completely.
- When the queue is full, the SSD doesn't discard commands; it handles it via flow control—it simply stops accepting new commands and forces the OS to wait (block) until the queue clears. This is why the OS batch-processes TRIM during idle times to avoid freezing the system.
3. FTL Processing Order vs. Deletion Order
When the SSD controller processes the TRIM queue, it doesn't necessarily execute them in the exact order they arrived. The Flash Translation Layer (FTL) will often sort and merge TRIM commands to match the physical layout of the NAND flash.
A command might sit in the queue for a relatively long time while the SSD prioritizes user read/write traffic, but it remains safely in memory until the controller completes the unmapping process.