mirror of
https://github.com/lavafroth/lavafroth.github.io.git
synced 2026-06-03 23:11:16 -03:00
feat: update jdupes article
This commit is contained in:
@@ -14,73 +14,67 @@ The idea is simple:
|
||||
|
||||
Since the mileage for second step might vary from person to person, I'll elaborate on the first step.
|
||||
|
||||
I chose [jdupes](https://github.com/jbruchon/jdupes) as my weapon of choice for finding and removing the duplicates.
|
||||
It's free and open-source and is cross platform.
|
||||
I chose [jdupes](https://codeberg.org/jbruchon/jdupes) for deleting the duplicates because it's open-source and is cross platform.
|
||||
|
||||
For a given folder we would run the following to wipe the duplicates:
|
||||
|
||||
```powershell
|
||||
jdupes -rdNz .
|
||||
jdupes --recurse --delete --no-prompt --zero-match .
|
||||
```
|
||||
|
||||
Let me explain the flags:
|
||||
|
||||
Flag|Explanation
|
||||
-|-
|
||||
`r`|Find duplicates recursively
|
||||
`d`|Delete duplicates
|
||||
`N`|No-prompt: when used with the `d` flag, it keeps the first file and removes all the others in a collection of duplicates
|
||||
`z`|Consider zero length files to be duplicates
|
||||
|
||||
The `.` here means the current directory.
|
||||
|
||||
Please read the tool's help page for more granular control during the cleanup.
|
||||
This will recursively delete all the duplicates except the source file without prompting for a confirmation.
|
||||
It will also consider zero length files to be duplicates.
|
||||
|
||||
The computer in question runs Microsoft Windows and there's a thing
|
||||
common in almost all Windows setups, *drives*.
|
||||
The target OS is Windows, which has the glaring problem of *drives*.
|
||||
|
||||
This was a glaring issue. There could be files that are unique in a given drive but are actually duplicates
|
||||
There could be files unique in a given drive but are actually duplicates
|
||||
in the inter-drive space. There are two ways to combat this.
|
||||
|
||||
First method:
|
||||
- Run jdupes on a drive to free some space
|
||||
- Move some data from other drives into the current drive to fill it up again
|
||||
- Repeat
|
||||
### Rinse, move, repeat
|
||||
- Run jdupes on a drive to free some space
|
||||
- Move some data from other drives into the current drive to fill it up again
|
||||
- Repeat
|
||||
|
||||
This, obviously, is a terrible idea beacause we have the overhead cost of moving the files after each run
|
||||
as well as the fact that we have to run jdupes exhaustively for many iterations.
|
||||
|
||||
Second (and probably the more elegant) method:
|
||||
- From the space of drives to be cleaned, pick a random drive (parent)
|
||||
- Hardlink all the other drives from the space into the drive we picked previously (children)
|
||||
- Run jdupes
|
||||
### Single pass with hardlinks
|
||||
- Pick a random drive as the parent node
|
||||
- Hardlink all the other drives to the parent
|
||||
- Run jdupes
|
||||
|
||||
This method only requires us to run jdupes once.
|
||||
Consider the following scenario
|
||||
|
||||
Assuming we have picked the `A` drive as the parent and the `E` drive is one of the children,
|
||||
we would run the following powershell command to hardlink `E` drive to a folder called `Edrive` in `A`.
|
||||
- Parent drive: A
|
||||
- Child drive: E
|
||||
- Child drive: B
|
||||
- etc.
|
||||
|
||||
To create the hardlink of `E` in `A`, we would run the following in powershell.
|
||||
|
||||
```powershell
|
||||
New-Item -ItemType HardLink -Path A:\Edrive -Value E:\
|
||||
```
|
||||
|
||||
We would repeat this for all the children drives, modifying the command
|
||||
ever so slightly to meet our needs.
|
||||
We repeat this for the rest of the drives like drive `B` where
|
||||
- `Edrive` becomes `Bdrive`
|
||||
- `E:` becomes `B:`
|
||||
|
||||
This implies that when we run jdupes from the root of the `A` drive, it would
|
||||
traverse the hardlinks and find duplicates in the inter-drive space.
|
||||
When we run jdupes from the `A` drive with
|
||||
|
||||
Next, we'd go to the root of `A` drive and run jdupes.
|
||||
```powershell
|
||||
A:
|
||||
jdupes -rdNz .
|
||||
```sh
|
||||
cd A:
|
||||
jdupes --recurse --delete --no-prompt --zero-match .
|
||||
```
|
||||
|
||||
Finally we remove the hardlinks:
|
||||
it traverses the hardlinks and removes duplicates in all the linked drives.
|
||||
|
||||
Finally we can remove the hardlinks
|
||||
|
||||
```powershell
|
||||
rm A:\Edrive
|
||||
```
|
||||
|
||||
> Note: Do not run jdupes at `SYSTEMROOT` (`C:` drive for most people)
|
||||
> Note: Do not run jdupes at `SYSTEMROOT` (the root of `C:` drive for most people)
|
||||
as there are legitimate duplicates which, if deleted, can brick a system. I'd recommend
|
||||
running jdupes in individual directories like _Music_, _Documents_, etc.
|
||||
|
||||
Reference in New Issue
Block a user