I have a filesystem which has got 1.7 million files. This filesystem contains files since Jan 2005.I need to gzip all files till Dec2008. I need to do this in an yearly basis. ie one zip file for 2005 , one zip file for 2006 . Same procedure for rest of the years.
Can anyone help me with the commands for this task?
Your help is much appreciated.
Note: If you are the author of this question and wish to assign points to any of the answers, please login first.For more information on assigning points ,click
here
Verify your selection by listing everything captured by find
When ready
find . -atime 360 -exec gzip {} \;
This is for one year. 720 for two years, etc.
I'd also suggest using 'tar' after you gzip else you'll run out of space fast. Real, real fast. In fact, having another dir to work with would be good.
that assumes the OP had the files already segregated into directories by year, which may or may not be the case.
"find . -atime 360 -exec gzip {} \;"
is probably closer to what the OP wants, but will result in one zip file for each original file found...which may be what their after.
or you could take the above "find" and "mv" the file to a separate directory, then gzip each, then tar the results....or mv the file, tar the directory and gzip *that*.
ssheri needs to remember the their is no "create date" stored in unix filesystems, M. Steele is going after the "access time" which would be may be a good bet. see the "man" page for "find", in particular the "-atime", "-mtime" and "-ctime" options to see which best fits.
Another option would be to create two reference files with appropriate dates, and use the "-newer" and "-older" options to sort out what you want.
All of the above is why I originally asked if the date was somehow "buried" in the filename.
some additional information about the original data layout, and the desired results might help in providing more appropriate responses.
> Another option would be to create two > reference files [...]
This seems like a better scheme than any of the "-<X>time" options. Especially if you're not running the job at 00:00 on 1 January. "-atime" would seem to be the least likely to get the desired result (unless no one ever looks at these files).
> or you could take the above "find" and > "mv" the file [...]
I'd vote for moving them to year-specific directories that way, and then doing something like:
tar cf - year_2005_dir | \ gzip -c > year_2005_dir.tar.gz
Creating an actual "tar" archive file, and _then_ hitting it with gzip tends to require more disk space, at least temporarily.
> find . -atime 360 -exec gzip {} \; > > This is for one year. 720 for two years, > etc.
Around here, years are longer than 360 days. Which calendar do you use? (And which does "find" use?)
This lists the files exactly from year 2007, (1st jan -> 31th dec) and also dives into subdirs. After that you could feed this file to gzip/tar or whatever you want...
lots of options presented.....still waiting for "ssheri" to shed some light on the original directory layout and the desired output.
from what was originally stated, it could well be that the OP wants a gzip file for a given year that contains all the files for that year (as opposed to zipping a tar of those files).
If so, I don't think that option has been covered yet, and it might be a pain to implement.
Hi All, Thanks for your quick responses. I hope I would explain my requirement in detail. =======================================
I have a filesystem which contains 1.7 million files. File are there since 2005 till today. My requirement is to tar and zip the files for each year separately. ie one tar/zip file for 2005, 2006,2007 and 2008. The files can be identified by their time stamp and there are no separate directories for each year. All files are residing on a single directory. ======================================
"I have a filesystem which contains 1.7 million files. File are there since 2005 till today. My requirement is to tar and zip the files for each year separately. ie one tar/zip file for 2005, 2006,2007 and 2008. The files can be identified by their time stamp and there are no separate directories for each year. All files are residing on a single directory."
Ok, this could get ugly. Making the assumption that the files will be removed after archiving, then something like the following can be modified to work:
First, you need to realize that UNIX doesn't have / track a file timestamp related to the "creation time". It knows the following:
atime (File Access Time) Access time shows the last time the data from a file was accessed - read by one of the Unix processes directly or through commands and scripts.
ctime (File Change Time) ctime changes when you change file's ownership or access permissions. It will also naturally highlight the last time file had its contents updated.
mtime (File Modify Time) Last modification time shows time of the last change to file's contents. It does not change with owner or permission changes, and is therefore used for tracking the actual changes to data of the file itself.
So...which one you look at depends on what you want. IF you can guarantee that the contents of the file, once written, were never modified, then the mtime option of find should be ok. Access time is useless for this if the file has ever been read after writing. Ctime *might* work.
If none of the above apply, then you're toast, as you've no way to locate files written in 2005.
Let us say that mtime is workable in your case, and you are going to find those files in year 2005. I'd create two reference files representing the upper and lower limits of the times you wish to locate:
touch -a -m -t 200501010000.00 $HOME/first.ref touch -a -m -t 200512312359.59 $HOME/last.ref
should get be everything between 01/01/2005 at 00:00 and 12/31/2005 at 23:59 and 59 seconds.
then use find to locate the relevant file using find and move them to a directory by themselves
mkdir /yourname/2005 cd /where_files_are
find . -xdev -type f -newer $HOME/first.ref -a !-newer $HOME/last.ref -exec mv {} /yourname/2005/. \+
at that point, you should be able to tar the newly created directory and pipe that to zip as noted in one of the posts above.
Note that the above has not been tested, you might want to substitute something harmless, like ls for the move until you get it sorted out.
repeat the above, after adjusting timestamps on the ref files, and creating the required directories.
>I have a filesystem which has got 1.7 million files. ... All files are residing on a single directory.
I assume people have told you this is not a good idea?
>The files can be identified by their time stamp
Encoded in their name, or in the ll(1) output? I have a case where they are encoded in their name.
>there are no separate directories for each year.
If the names include the year, the first thing to do is to create a subdirectory and move all of a year into it.
If they don't include the year, you can make a simple script to do that: last_year="" ll -trog | while read F1 F2 F3 F4 F5 F6 F7; do case $F6 in 200[5-8]) ;; *) continue;; esac if [ "$F6" != "$last_year" ]; then mkdir -p $F6 last_year=$F6 fi echo mv "$F7" $last_year done
Once they are in a separate directory, you use the tar-gzip suggestions as Steven suggested.
If you don't want to include the directory name in the tarball, you can use -C: tar cf - -C 2005 . | gzip > year_2005.tar.gz
perhaps the biggest problem here is that questions to the OP get a restatment of the original question, without additional information. and specific questions go unanswered.
there are a variety of answers posted, some of which may be more appropriate than others, depending on the exact goal, which isn't clear here.
I will add that if ssheri has any control of the creation of these files, going I'd encode the creation date in the filename somehow, as anything relying on the "timestamps" is not going to be a reliable method for determining when a file was created.
Your suggestions match my requirement. The files are not modified after their arrival to the filesystem. These files are getting saved to the filesystem as a result of a scheduled job. Later these are not getting modified by any user.
I have checked up using the options which "oldschool" provided. ======================================= 1. created refernec files
2. ran find . -xdev -type f -newer $HOME/first.ref -a !-newer $HOME/last.ref -exec cp -p {} /yourname/2005/. \+
======================================
I have used cp instead of mv for resting. Test was only for 1 month data. But I am getting an error when I excecute it.
cp:./filename: not a directory, where filename is the last file whcih supposed to be copied as per the reference file.
For example if use touch -a -m -t 200510010000.00 $HOME/first.ref touch -a -m -t 200511302359.59 $HOME/last.ref
the above error is coming up with last file dated 20051130. I tried changing the date for touch and I am getting the same error for the last file created on the date as per last.ref.