From 70e00a426b4e3a428025552e570d8dca3555189c Mon Sep 17 00:00:00 2001 From: Himadri Bhattacharjee <107522312+lavafroth@users.noreply.github.com> Date: Wed, 24 Sep 2025 10:46:09 +0530 Subject: [PATCH] feat: update jdupes article --- content/post/liberating-14GiB-of-space.md | 72 +++++++++++------------ 1 file changed, 33 insertions(+), 39 deletions(-) diff --git a/content/post/liberating-14GiB-of-space.md b/content/post/liberating-14GiB-of-space.md index 1d91b4dc..d969e60d 100644 --- a/content/post/liberating-14GiB-of-space.md +++ b/content/post/liberating-14GiB-of-space.md @@ -14,73 +14,67 @@ The idea is simple: Since the mileage for second step might vary from person to person, I'll elaborate on the first step. -I chose [jdupes](https://github.com/jbruchon/jdupes) as my weapon of choice for finding and removing the duplicates. -It's free and open-source and is cross platform. +I chose [jdupes](https://codeberg.org/jbruchon/jdupes) for deleting the duplicates because it's open-source and is cross platform. For a given folder we would run the following to wipe the duplicates: ```powershell -jdupes -rdNz . +jdupes --recurse --delete --no-prompt --zero-match . ``` -Let me explain the flags: - -Flag|Explanation --|- -`r`|Find duplicates recursively -`d`|Delete duplicates -`N`|No-prompt: when used with the `d` flag, it keeps the first file and removes all the others in a collection of duplicates -`z`|Consider zero length files to be duplicates - -The `.` here means the current directory. - -Please read the tool's help page for more granular control during the cleanup. +This will recursively delete all the duplicates except the source file without prompting for a confirmation. +It will also consider zero length files to be duplicates. -The computer in question runs Microsoft Windows and there's a thing -common in almost all Windows setups, *drives*. +The target OS is Windows, which has the glaring problem of *drives*. -This was a glaring issue. There could be files that are unique in a given drive but are actually duplicates +There could be files unique in a given drive but are actually duplicates in the inter-drive space. There are two ways to combat this. -First method: -- Run jdupes on a drive to free some space -- Move some data from other drives into the current drive to fill it up again -- Repeat +### Rinse, move, repeat + - Run jdupes on a drive to free some space + - Move some data from other drives into the current drive to fill it up again + - Repeat This, obviously, is a terrible idea beacause we have the overhead cost of moving the files after each run as well as the fact that we have to run jdupes exhaustively for many iterations. -Second (and probably the more elegant) method: -- From the space of drives to be cleaned, pick a random drive (parent) -- Hardlink all the other drives from the space into the drive we picked previously (children) -- Run jdupes +### Single pass with hardlinks + - Pick a random drive as the parent node + - Hardlink all the other drives to the parent + - Run jdupes -This method only requires us to run jdupes once. +Consider the following scenario -Assuming we have picked the `A` drive as the parent and the `E` drive is one of the children, -we would run the following powershell command to hardlink `E` drive to a folder called `Edrive` in `A`. +- Parent drive: A + - Child drive: E + - Child drive: B + - etc. + +To create the hardlink of `E` in `A`, we would run the following in powershell. ```powershell New-Item -ItemType HardLink -Path A:\Edrive -Value E:\ ``` -We would repeat this for all the children drives, modifying the command -ever so slightly to meet our needs. +We repeat this for the rest of the drives like drive `B` where +- `Edrive` becomes `Bdrive` +- `E:` becomes `B:` -This implies that when we run jdupes from the root of the `A` drive, it would -traverse the hardlinks and find duplicates in the inter-drive space. +When we run jdupes from the `A` drive with -Next, we'd go to the root of `A` drive and run jdupes. -```powershell -A: -jdupes -rdNz . +```sh +cd A: +jdupes --recurse --delete --no-prompt --zero-match . ``` -Finally we remove the hardlinks: +it traverses the hardlinks and removes duplicates in all the linked drives. + +Finally we can remove the hardlinks + ```powershell rm A:\Edrive ``` -> Note: Do not run jdupes at `SYSTEMROOT` (`C:` drive for most people) +> Note: Do not run jdupes at `SYSTEMROOT` (the root of `C:` drive for most people) as there are legitimate duplicates which, if deleted, can brick a system. I'd recommend running jdupes in individual directories like _Music_, _Documents_, etc.