[wplug] find question

Fri Oct 31 08:20:57 EST 2003

Stupid find Tricks

General Techniques
Avoiding NFS filesystems
Here's how to do a search from / while avoiding NFS filesystems (doing a
find across an NFS mount is REALLY slow): 
find / -fstype nfs -prune -o <<insert real search request here>>
The -prune option always returns true from find's perspective but tells find
to stop searching deeper. It will be invoked if a directory in an NFS
filesystem is encountered. Otherwise, the -fstype nfs will be false in which
case the right hand side of the -o (or) option will be processed. 
The end result is that find doesn't look inside NFS filesystems and the
right hand side of the -o is applied to everything that isn't on an NFS
filesystem. 
For example, this would print the names of all files larger than 5M which
reside on the local machine: 
find / -fstype nfs -prune -o -size +10000 -print
In case it isn't obvious, I'm one of the old-timers who still tends to
specify the -print option even though it is rarely necessary anymore. 
Finding files which don't match a search request
It is sometimes easier to express what you don't want to find than it is to
express what you do want to find. The conventional way to do it is to use
the find command's ! option. For example, the following would find
everything in the current directory hierarchy which isn't a directory: 
find . ! -type d -print
Note that ! has higher precedence than the implied -a between the -type d
and the -print in the above example. 
Another way to achieve the same effect which some people prefer because some
shells treat ! as a meta-character is as follows: 
find . -type d -o -print

Using search criteria that find doesn't support directly
Sometimes, you want to find files which match complex criteria which are
difficult or impossible to express directly with the find command. In this
case, write a shell script or program that performs the desired tests and
use the find command's -exec option. For example, assuming that our criteria
testing shell script is called doit, the following would do the trick: 
find . -exec ./doit {} \; -print
A simple (and relatively common) variant of this is to use the grep command
to perform the test. For example, the following will print the names of all
.c files in the current directory which contain the word hello: 
find . -type f -name '*.c' -exec grep -q hello {} \; -print
There's a better way to just get a list of the files that match the grep
request in the above example. For example, the following will list the files
with names ending in .c and containing the word hello: 
find . -type f -name '*.c' -exec grep -l hello {} /dev/null \;
The /dev/null is provided because some versions of grep don't list the file
name unless at least two file names are specified (the /dev/null is not
required on AIX's grep (since at least AIX 4.3.2) although we still provide
it because we prefer to use techniques which are portable (the script may
never be ported but we use lots of different kinds of Unix and can't keep
track of these sorts of subtle differences between them)). 
Note that the structure of the first example is still valid for many
contexts. For example, the following will compile each of the .c files
containing the word hello (yes, rather contrived but . . .): 
find . -type f -name '*.c' -exec grep -q hello {} \; -exec cc -c {} \;
Also, keep in mind that invoking a command or especially a shell script
against each file in a directory hierarchy can be rather expensive. Use
conventional find options (like in the hello example above) to eliminate as
many candidates as possible before invoking the command or shell script with
the -exec option. 
An arguably better way of invoking grep on a whole pile of files is: 
find . -type f -print | xargs grep hello
This results in far fewer invocations of grep (i.e. it runs faster). 
Doing complicated things with the files that you find
find is great at finding files matching particular criteria. What may not be
obvious is that it is easy to then perform complex operations on the files
that find finds for you. Write yourself a shell script that does the complex
operations on a file which is passed as the first parameter and then invoke
the script against each file that find finds like this: 
find . -type f -size +10000 -exec ./doit {} \;
Using grep to print certain lines in the files you find 
One common operation performed on found files is to use grep to extract
certain lines from the files. At first glance, the following appears to
print the lines containing the word hello in each of the found files: 
find . -type f -size +10000 -exec grep hello {} \;
The problem is that since you've only specified a single file to the grep
command, grep doesn't prefix each line with the name of the file that the
line was found in. Since this is often quite important in this context,
trick grep into showing the file name by giving it two files to look in but
being sure that it won't find the pattern in the second file. Obviously, an
empty file works best for this so we use everyone's favourite empty file
/dev/null in our example: 
find . -type f -size +10000 -exec grep hello {} /dev/null \;
Find is being given two file names so it will prefix lines found in the
first file with the name of the file (it will never find anything in
/dev/null so you won't get lines prefixed by /dev/null). 
Applying a single invocation of a command on the list files that are found
Here's one of our favourites: 
vi ` find . -type f -name '*.c' -print `
This invokes vi on all of the .c files in the current hierarchy. Once
invoked, you move on to the next file using the :n command from within vi. 
Be careful - you don't want to do this at the top of a directory hierarchy
containing a few thousand .c files since you'll get really tired of typing
the :n commands after the first hundred files! 
This trick is really handy when you're planning on deleting the files which
are found. First use the find command to print a list of the files which you
want to delete. Once you're sure that the find command is listing the
correct files, then use the shell's command line recall and editing
capabilities to wrap an rm command around the find like this: 
rm -f ` find . -type f -name '*.o' -print `
The idea here is to make sure that a typing mistake doesn't result in the
loss of a whole bunch of files which you'd rather keep. Another way of
getting to the same place is to use the find command's -exec option to
remove the files. Again, construct and test a find command which lists the
files and then use the shell's command line recall and editing capabilities
to append the -exec rm -f {} \; onto the command to get the following: 
find . -type f -name '*.o' -exec rm -f {} \;

Solving Specific Problems
Why is this filesystem full?
Here's a simple invocation which presents you with a list of all the files
in the root filesystem with the largest files appearing first. 
find / -xdev -type f -ls | sort +6n -r | more
Replacing the / with the name of another filesystem will get you a list for
that filesystem. 
Dropping the -xdev will get you a system-wide list. If the box is an NFS
client then make sure you use the form 
find / -fstype nfs -o -type f -ls | sort +6n -r | more
to avoid traversing the NFS-mounted filesystems. 
Be prepared for the possibility that the find command is unable to find the
large file(s) that are consuming all the space. This can happen if an
application has deleted the name of an already open file. The file remains
in existence until the application terminates but the find command can't
find it because the file has no name. Your best bet in this situation is
possibly to use the lsof utility which is available on the 'net. 
Note that the lsof utility is COMPLETELY UNSUPPORTED by IBM. The lsof
utility program requires root privileges to run. Before you decided to use
lsof on your system, carefully consider the possible consequences of running
a program that you found on the 'net on your system.