rmd

rmd : ReMove Duplicates, an rm implementation able to remove duplicate files


Project maintained by FilippoRanza Hosted on GitHub Pages — Theme by mattgraham

rmd

Build Status crates.io License: MIT PRs Welcome

An improved rm implementation able to remove duplicate files

Description

rmd is an rm reimplementation made in pure Rust. It is able to remove files and directories as usual.

rmd is also able to:

Installation

This tool can be easly installed from sources:

cargo install rmd

Compile from source

It is also possible to directly clone the repository and compile rmd from there. In this case it is recommended to run all tests before compile rmd for production. A convenient way to do that is using make

make build

This will run all cargo tests (both unit and integration) and cli tests before compile rmd for production.

Usage

It works in an almost compatible way with the standard rm. To get a full help run:

rmd --help

Standard Features (Standard Mode)

But the most common scenarios includes:

Additional Features (Automatic Mode)

Remove Duplicates

Remove by Last Access

This functionality allows to remove file older or newer then a given time-specification.

Remove File older then time-spec

rmd --older <time-spec> [directory...]

Remove File newer then time-spec

rmd --newer <time-spec> [directory...]

rmd checks if the last access is before (so the file is older) or after (so the file is newer) then the time described by the time-specification. time-specification describes a relative amount of time (in seconds) in the past from the moment when the program is run.

time-specification format

[N+T]+

Where:

Time Descriptor Table

Short Format Long Format Meaning Value
s second second 1 second
m minute minute 60 seconds
h hour hour 60 minutes
d day day 24 hours
w week week 7 days
M month month 30 days
y year year 365 days
Examples
rmd --older 2y4M5d

will remove in the current directory, and recursively in all sub directories, file with a last access time equal or before 2 year, 4 month and 5 days in the past from the time when the program is run.

rmd --newer '4h+30m'

will remove in the current directory, and recursively in all sub directories, file with a last access time equal or after 4 hour and 30 minutes in the past from the time when the program is run.

rmd --older '1M 15d' /home/user/temp-store

will remove in /home/user/temp-store and recursively in all sub directories, file with a last access time equal or before 1 month and 15 days in the past from the time when the program is run.

rmd --newer 30s /home/user/wrong-downloads

will remove in /home/user/wrong-downloads and recursively in all sub directories, file with a last access time equal or after 30 seconds in the past from the time when the program is run.

Remove by Size

This functionality allows to remove file smaller or larger then a given size-specification.

Remove File smaller then size-spec

rmd --smaller <size-spec> [directory...]

Remove File larger then size-spec

rmd --larger <size-spec> [directory...]

rmd checks if the file size, in bytes. If larger mode is used rmd checks, for each file in the specified directory, and recursivelly in all sub directories, if the size is larger or equal to the size decribed in size-spec and if so rmd remove the file. Of course if smaller mode is used rmd checks for file smaller or equal to the size in size-spec.

size-specification format

[N+S]+

Where:

Deciamal Size Descriptor Table

Short Format Long Format Meaning Value
b   byte 1 byte
kb kilo kilobyte 1000 byte
mb mega megabyte 1000 kilobyte
gb giga gigabyte 1000 megabyte
tb tera terabyte 1000 gigabyte
pb peta petabyte 1000 terabyte

Binary Size Descriptor Table

Short Format Long Format Meaning Value
b   byte 1 byte
kib kibi kibibyte 1024 byte
mib mebi mebibyte 1024 kibibyte
gib gibi gibibyte 1024 mebibyte
tib tebi tebibyte 1024 gibibyte
pib pebi pebibyte 1024 tebibyte

Decimal and Binary size descriptor can be use together

Examples
rmd --smaller '2kb,56mib'

will remove in the current directory, and recursively in all sub directories, file with a size smaller or equal to 56 Mebibytes and 2 Kilobytes.

rmd --larger 4gb30mb

will remove in the current directory, and recursively in all sub directories, file with a size larger or equal to 4 Gigabytes and 30 Megabytes.

rmd --larger '1 mebi 15 kibi' /home/user/temp-store

will remove in /home/user/temp-store and recursively in all sub directories, file with a size larger or equal to 1 Mebibytes and 15 Kibibytes.

rmd --smaller 30kb /home/user/useless-files

will remove in /home/user/useless-files and recursively in all sub directories, file with a size smaller or equal to 30 Kilobytes.

Skip Files

Sometimes you may need to skip some files or directories from been removed, for example you may want to preserve any .bak file or to completely ignore directories like .git. In these cases rmd provides two useful options:

rmd --ignore-extensions bak --duplicates

will remove any duplicate file in the current directory and recursively in all sub directories ignoring any file with .bak extension. So if to equal file “file.rs.bak” and “copy-file.rs.bak” will be preserved. Also the original “file.bak” (if it is unique) will be preserved because .bak file are completely ignored.

rmd --ignore-extensions bak pdf mp3 --larger 40kb project

will remove all file larger or equal to 40 Kilobytes in the project directory, and recursively in all sub directories, but files with .bak, .pdf and .mp3 extensions. So, for example, project/docs.pdf a 4 Mb file will not be removed.

rmd --clean --ignore-directories xmas_photos --older 1y documents

will remove any file older than one year in documents directory and recursively in all sub directories, ignoring any directory named xmas_photo. If xmas_photo is empty it will not be removed. rmd simply will never open any directory named xmas_photo in the directory tree rooted in documents.

rmd --clean --ignore-directories important_files .git --duplicates /home/user

will remove any duplicate file in the user home, and recursively in all sub directories, ignoring any directory named .git or important_files.

It is allowed to use --ignore-directories and --ignore-extensions together.

It is also possible to simply ignore hidden files and directories. --ignore-unix-hidden allows to automatically ignore any file and directory whose name starts with ‘.’ (unix style hidden files). rmd working with --ignore-unix-hidden set skips hidden files and does not open hidden directories, so any non hidden file inside an hidden directory is left untouched.

An Exmple:

rmd -d --ignore-unix-hidden important_project

will deduplicate important_project but hidden files or directories (such as .git) are ignored.

Note

Advice

It is very likely that you will end up using --ignore-extenions and/or --ignore-directories with the same arguments over and over. In this scenario a good idea could be add an alias to your shell configuration file like

alias rmdd='rmd --ignore-extensions bak --ignore-directories .git .hg'

or a shell function like

function rmd() {
    rmd --ignore-extensions bak --ignore-directories .git .hg -- "$@"
}