Globs
More often than not, the loops that you would want to write in the shell are intended to apply an operation to many files, and those files can be all in the same directory, or scattered throughout many directories.
Shell languages (like Bash) have a feature called “globbing” that can
help with quickly getting a list of files to operate on in something
like a loop in a more reliable (and arguably easier!) way than using
find
.
Why “glob”?
Apparently it’s short for “global”. I definitely don’t pronounce “glob” as “globe” (as in “global”), I pronounce it as in “glob” like a glob of jam.
Simple globs
The simplest glob in a shell is the *
character. This
glob “expands” to all files in the current directory that are not
hidden.
If you have the following files in your current directory:
ls -a
. .. .gitignore hello.md hello.docx script.sh
Then the *
glob would “expand” to
hello.md hello.docx script.sh
:
echo * # prints out the names of all
# files and folders in this directory
hello.md hello.docx script.sh
This actually means that the echo
program was run with
three arguments, not just one. Before the shell launches the
echo
program, it effectively replaces the *
in
the command line with hello.md hello.docx script.sh
.
Write your own shell script that prints out what its arguments are
using the numbered argument variables ($1 $2 $3
…) to see
how a simple glob expands in different directories.
Advanced globs
Simple globs are the ones you will use most often because you usually want to do something to everything in the current directory. But simple globs are not the only kind of globs. Let’s look at two other kinds of globs:
- Begins-with, ends-with, or contains globs
- Directory globs
Begins-with, ends-with, or contains globs
If you remember way back to fortnight 5 in
COMP 1002, you learned about a program named grep
that
could be used to filter lines from a file based on a pattern that you
provided. The patterns that you gave to grep
also
use the *
symbol to indicate “anything”, but we saw that
you could put the *
anywhere within a string.
We can do the same thing with globs!
We can match all file names that begin with hello
using
a glob hello*
:
# assuming the same directory as above
echo hello*
hello.md hello.docx
Or we can match all file names that end with .md
using a
glob *.md
:
pandoc *.md -o all.docx # run pandoc, passing *all* of
# the markdown files in the
# current directory, and convert
# them all into a single Word
# document
Or you can match file names that contain the letters ll
using a glob *ll*
:
cat *ll*
# contents of hello.docx get printed out (which is a bunch of garbage!)
# contents of hello.md get printed out
Directory globs
Globs are helpful for matching many files in a directory, but we don’t keep all of our files in one directory, right? (Right?!)
We can also use globs to help us find files that are in
subdirectories, but we have to change the way we’re writing out glob:
instead of using one *
, we use two
**
:
Download (or find) crazy-directories.tar (use
wget
!). We previously used find
to, uh, find
the Markdown files in this massive blob of directories. We can also use
globs to help us.
Extract crazy-directories.tar
(if you just downloaded
it), then change into the directory. We can use globs to find all
Markdown files in this directory:
ls **/*.md # find files with names ending with .md
# in any subdirectory
The technical Bash name for this feature is not “directory glob”, but is instead called “globstar”.
Further reading about globs
You can read more about globs in a few places:
- The manual pages: use
man 7 glob
(or manual pages online). - The Advanced
Bash-Scripting Guide has even more kinds of globbing (using curly
braces
{}
and question marks?
).- Chapter 37 of the Advanced Bash-Scripting Guide has more examples of how to use directory globs.