Introduction to Tools and Techniques in Computer Science

Globs

Franklin Bristow

More often than not, the loops that you would want to write in the shell are intended to apply an operation to many files, and those files can be all in the same directory, or scattered throughout many directories.

Shell languages (like Bash) have a feature called “globbing” that can help with quickly getting a list of files to operate on in something like a loop in a more reliable (and arguably easier!) way than using find.

Files covered in globs of jam. (Prompt: “a desk drawer with many files, all covered in blueberry jam”)

Why “glob”?

Apparently it’s short for “global”. I definitely don’t pronounce “glob” as “globe” (as in “global”), I pronounce it as in “glob” like a glob of jam.

Simple globs

The simplest glob in a shell is the * character. This glob “expands” to all files in the current directory that are not hidden.

If you have the following files in your current directory:

ls -a
. .. .gitignore hello.md hello.docx script.sh

Then the * glob would “expand” to hello.md hello.docx script.sh:

echo * # prints out the names of all
       # files and folders in this directory
hello.md hello.docx script.sh

This actually means that the echo program was run with three arguments, not just one. Before the shell launches the echo program, it effectively replaces the * in the command line with hello.md hello.docx script.sh.

Write your own shell script that prints out what its arguments are using the numbered argument variables ($1 $2 $3…) to see how a simple glob expands in different directories.

Advanced globs

Simple globs are the ones you will use most often because you usually want to do something to everything in the current directory. But simple globs are not the only kind of globs. Let’s look at two other kinds of globs:

  1. Begins-with, ends-with, or contains globs
  2. Directory globs

Begins-with, ends-with, or contains globs

If you remember way back to fortnight 5 in COMP 1002, you learned about a program named grep that could be used to filter lines from a file based on a pattern that you provided. The patterns that you gave to grep also use the * symbol to indicate “anything”, but we saw that you could put the * anywhere within a string.

We can do the same thing with globs!

We can match all file names that begin with hello using a glob hello*:

# assuming the same directory as above
echo hello* 
hello.md hello.docx

Or we can match all file names that end with .md using a glob *.md:

pandoc *.md -o all.docx # run pandoc, passing *all* of
                        # the markdown files in the 
                        # current directory, and convert
                        # them all into a single Word
                        # document

Or you can match file names that contain the letters ll using a glob *ll*:

cat *ll*
# contents of hello.docx get printed out (which is a bunch of garbage!)
# contents of hello.md get printed out

Directory globs

Globs are helpful for matching many files in a directory, but we don’t keep all of our files in one directory, right? (Right?!)

We can also use globs to help us find files that are in subdirectories, but we have to change the way we’re writing out glob: instead of using one *, we use two **:

Download (or find) crazy-directories.tar (use wget!). We previously used find to, uh, find the Markdown files in this massive blob of directories. We can also use globs to help us.

Extract crazy-directories.tar (if you just downloaded it), then change into the directory. We can use globs to find all Markdown files in this directory:

ls **/*.md # find files with names ending with .md
           # in any subdirectory

The technical Bash name for this feature is not “directory glob”, but is instead called “globstar”.

Some systems (like aviary) may have different shell settings toggled for various reasons that can disable some functions like globs. You can check to see if your current shell has globs disabled by running the command below in a bash shell.

shopt | grep "globstar"
globstar        off

If you do find that “globstar” toggled off you will have to toggle it on using the command

shopt -s globstar

Upon checking to see if it is enabled, you should now see that it is toggled to “on” now.

This will only last for your current session and you will need to set this every time you want to use globs in bash scripts. If you want to have this option get automatically you can edit a file called .bashrc which is a file that automatically runs the commands in it when you open a bash session or run a shell script using bash. (Like an automatically executed shell script!)

The .bashrc file is found in your user’s home directory, and can be edited using vim by running vim ~/.bashrc. You can add the option by placing the above command in this file and saving. The changes should be reflected the next time you run a shell script or open a bash shell.

If you have other shells you will find a similar file for your respective shell like .tcshrc or .cshrc on Aviary, or .zshrc with Zsh, or config.fish for fish.

If you want to read more about about .bashrc files you can check out these resources:

Further reading about globs

You can read more about globs in a few places: