LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Reply
  Search this Thread
Old 08-02-2016, 11:52 AM   #1
maheshs97
LQ Newbie
 
Registered: Aug 2016
Posts: 1

Rep: Reputation: Disabled
Shell script to find duplicate files and old files


Hi All,

I need to clean up our storage. For that, I want to have a shell script to find duplicate file and/or files with modification date more than 2 years.

Please help me for this.

Regards,
Mahesh.S
 
Old 08-02-2016, 11:59 AM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,757

Rep: Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983Reputation: 7983
Quote:
Originally Posted by maheshs97 View Post
Hi All,
I need to clean up our storage. For that, I want to have a shell script to find duplicate file and/or files with modification date more than 2 years. Please help me for this.
We will be glad to HELP you with this. So start by posting what YOU have personally done/tried/written, and tell us where you're stuck. But don't ask for handouts...we will not write your scripts for you. Read the "Question Guidelines" link in my posting signature.

There are THOUSANDS of easily-found bash scripting tutorials you can find with a brief Google search...many that do things similar to what you're asking. Start there.
 
2 members found this post helpful.
Old 08-02-2016, 12:05 PM   #3
jamison20000e
Senior Member
 
Registered: Nov 2005
Location: ...uncanny valley... infinity\1975; (randomly born:) Milwaukee, WI, US( + travel,) Earth&Mars (I wish,) END BORDER$!◣◢┌∩┐ Fe26-E,e...
Distribution: any GPL that work on freest-HW; has been KDE, CLI, Novena-SBC but open.. http://goo.gl/NqgqJx &c ;-)
Posts: 4,888
Blog Entries: 2

Rep: Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567
Hi.

Try Fslint http://www.pixelbeat.org/fslint/
and the end of my signature has a link for more specifics at the CLI &c...

have fun!

Last edited by jamison20000e; 08-02-2016 at 12:18 PM. Reason: added the &c ;) i.e: Etc. etcetera and so on... :D
 
Old 08-02-2016, 01:01 PM   #4
Sefyir
Member
 
Registered: Mar 2015
Distribution: Linux Mint
Posts: 634

Rep: Reputation: 316Reputation: 316Reputation: 316Reputation: 316
Quote:
Originally Posted by maheshs97 View Post
files with modification date more than 2 years.
The find command is good for this

Code:
       -mtime n
              File's  data was last modified n*24 hours ago.  See the comments
              for -atime to understand how rounding affects the interpretation
              of file modification times.
 
Old 08-02-2016, 05:14 PM   #5
Habitual
LQ Veteran
 
Registered: Jan 2011
Location: Abingdon, VA
Distribution: Catalina
Posts: 9,374
Blog Entries: 37

Rep: Reputation: Disabled
Welcome to LQ!
Make friends, show us your code.
 
Old 08-05-2016, 03:01 AM   #6
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,369

Rep: Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753
As per Sefyir, 'find' for old files.
For 'same' consider whether you mean the same name (need a script) or the same content (try a hashsum eg md5sum)
 
Old 10-18-2016, 09:08 AM   #7
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,818

Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Quote:
Originally Posted by maheshs97 View Post
Hi All,

I need to clean up our storage. For that, I want to have a shell script to find duplicate file and/or files with modification date more than 2 years.
I'm not going to write your script for you but I'll offer a few hints -- and maybe a few code snippets -- to get you started.

1.) The "find" command is going to be central to what you're doing. Try:
Code:
find <dir-tree-root> -type f -exec md5suum {} \; > checksum.lis
You'll have a list of checksums and filenames under "<dir-tree-root>" in "checksum.lis"

2.) This won't tell you anything about duplicates. A way to find which checksums show up in checksum.lis multiple times try:
Code:
cat checksum.lis | awk '{print $1}' | sort | uniq -c
This runs the checksum.lis file through "awk" to print the first value in each record (the MD5 checksum), sort them, and run them through "uniq" to get the occurance count. You'll be interested in checksums that appear more than once. What I would do at this point is re-run the above string of commands but pipe it through grep to find these:
Code:
<cat checksum.lis | blah blah | uniq -c> | grep -v " 1 "
This'll get you a list of checksums preceded by the number of times they were found -- except those that were only found once. If you pipe this string of commands through awk, you can obtain just the checksums and save those in a file, say "duplicate_checksums.lis".

3.) Then use
Code:
grep -f duplicate_checksums.lis checksum.lis
which will display the records in the original checksum/file listing that contained the duplicate checksums found. From that list you can do whatever you need to do with the duplicates.

If you plan on having to do this frequently or on multiple systems, it shouldn't be too hard to combine all of the above into a shell script that automates the process.

Note 1: Don't forget that you probably want to keep one copy of the duplicate files found above.

Note 2: What I tend to do when I embark on a cleanup like this is to wrap the cleanup code with something like:
Code:
TESTING="Y"
.
.
.
cat files_to_remove.lis | while read FILE; do
if [ "$TESTING" == "Y" ]; then
    echo "Testing: Not removing file: $FILE"
else
    rm $FILE
fi
Run it this way, observe the output and, if it looks OK, change TESTING to "N", and re-run.

4.) Extending this to include only files that are 2 years old and older should be fairly simple after you read the find(1) manpage (hint: `-mtime'). Also, be careful not to remove just any files that are older than some arbitrary number of days. You might accidentally clobber application files -- or operating system files -- some of which might be rarely changed and have old datestamps on them.

5.) You have done a backup of the filesystems you're planning on cleaning and know that these backups are correct/usable, right?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Script to find the duplicate files Tekken Linux - Server 6 03-30-2013 11:29 AM
find duplicate files using bash script? b-RAM Linux - Newbie 4 06-08-2010 07:05 AM
Simple Shell Script? Deleting Duplicate Files... Tag234 Linux - Newbie 6 10-10-2009 04:49 AM
A bash script to find duplicate image files fotoguy Programming 7 01-25-2007 06:47 PM
Script to find duplicate files within one or more directories peter88 Linux - General 6 12-10-2006 05:17 AM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 12:53 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration