Introduction to Tools and Techniques in Computer Science

Globs

Franklin Bristow

More often than not, the loops that you would want to write in the shell are intended to apply an operation to many files, and those files can be all in the same directory, or scattered throughout many directories.

Shell languages (like Bash) have a feature called “globbing” that can help with quickly getting a list of files to operate on in something like a loop in a more reliable (and arguably easier!) way than using find.

Files covered in globs of jam. (Prompt: “a desk drawer with many files, all covered in blueberry jam”)

Why “glob”?

Apparently it’s short for “global”. I definitely don’t pronounce “glob” as “globe” (as in “global”), I pronounce it as in “glob” like a glob of jam.

Simple globs

The simplest glob in a shell is the * character. This glob “expands” to all files in the current directory that are not hidden.

If you have the following files in your current directory:

ls -a
. .. .gitignore hello.md hello.docx script.sh

Then the * glob would “expand” to hello.md hello.docx script.sh:

echo * # prints out the names of all
       # files and folders in this directory
hello.md hello.docx script.sh

This actually means that the echo program was run with three arguments, not just one. Before the shell launches the echo program, it effectively replaces the * in the command line with hello.md hello.docx script.sh.

Write your own shell script that prints out what its arguments are using the numbered argument variables ($1 $2 $3…) to see how a simple glob expands in different directories.

Advanced globs

Simple globs are the ones you will use most often because you usually want to do something to everything in the current directory. But simple globs are not the only kind of globs. Let’s look at two other kinds of globs:

  1. Begins-with, ends-with, or contains globs
  2. Directory globs

Begins-with, ends-with, or contains globs

If you remember way back to fortnight 5 in COMP 1002, you learned about a program named grep that could be used to filter lines from a file based on a pattern that you provided. The patterns that you gave to grep also use the * symbol to indicate “anything”, but we saw that you could put the * anywhere within a string.

We can do the same thing with globs!

We can match all file names that begin with hello using a glob hello*:

# assuming the same directory as above
echo hello* 
hello.md hello.docx

Or we can match all file names that end with .md using a glob *.md:

pandoc *.md -o all.docx # run pandoc, passing *all* of
                        # the markdown files in the 
                        # current directory, and convert
                        # them all into a single Word
                        # document

Or you can match file names that contain the letters ll using a glob *ll*:

cat *ll*
# contents of hello.docx get printed out (which is a bunch of garbage!)
# contents of hello.md get printed out

Directory globs

Globs are helpful for matching many files in a directory, but we don’t keep all of our files in one directory, right? (Right?!)

We can also use globs to help us find files that are in subdirectories, but we have to change the way we’re writing out glob: instead of using one *, we use two **:

Download (or find) crazy-directories.tar (use wget!). We previously used find to, uh, find the Markdown files in this massive blob of directories. We can also use globs to help us.

Extract crazy-directories.tar (if you just downloaded it), then change into the directory. We can use globs to find all Markdown files in this directory:

ls **/*.md # find files with names ending with .md
           # in any subdirectory

The technical Bash name for this feature is not “directory glob”, but is instead called “globstar”.

Further reading about globs

You can read more about globs in a few places: