Globs
More often than not, the loops that you would want to write in the shell are intended to apply an operation to many files, and those files can be all in the same directory, or scattered throughout many directories.
Shell languages (like Bash) have a feature called “globbing” that can
help with quickly getting a list of files to operate on in something
like a loop in a more reliable (and arguably easier!) way than using
find.
Why “glob”?
Apparently it’s short for “global”. I definitely don’t pronounce “glob” as “globe” (as in “global”), I pronounce it as in “glob” like a glob of jam.
Simple globs
The simplest glob in a shell is the * character. This
glob “expands” to all files in the current directory that are not
hidden.
If you have the following files in your current directory:
ls -a. .. .gitignore hello.md hello.docx script.sh
Then the * glob would “expand” to
hello.md hello.docx script.sh:
echo * # prints out the names of all
# files and folders in this directoryhello.md hello.docx script.sh
This actually means that the echo program was run with
three arguments, not just one. Before the shell launches the
echo program, it effectively replaces the * in
the command line with hello.md hello.docx script.sh.
Write your own shell script that prints out what its arguments are
using the numbered argument variables ($1 $2 $3…) to see
how a simple glob expands in different directories.
Advanced globs
Simple globs are the ones you will use most often because you usually want to do something to everything in the current directory. But simple globs are not the only kind of globs. Let’s look at two other kinds of globs:
- Begins-with, ends-with, or contains globs
- Directory globs
Begins-with, ends-with, or contains globs
If you remember way back to fortnight 5 in
COMP 1002, you learned about a program named grep that
could be used to filter lines from a file based on a pattern that you
provided. The patterns that you gave to grep also
use the * symbol to indicate “anything”, but we saw that
you could put the * anywhere within a string.
We can do the same thing with globs!
We can match all file names that begin with hello using
a glob hello*:
# assuming the same directory as above
echo hello* hello.md hello.docx
Or we can match all file names that end with .md using a
glob *.md:
pandoc *.md -o all.docx # run pandoc, passing *all* of
# the markdown files in the
# current directory, and convert
# them all into a single Word
# documentOr you can match file names that contain the letters ll
using a glob *ll*:
cat *ll*
# contents of hello.docx get printed out (which is a bunch of garbage!)
# contents of hello.md get printed outDirectory globs
Globs are helpful for matching many files in a directory, but we don’t keep all of our files in one directory, right? (Right?!)
We can also use globs to help us find files that are in
subdirectories, but we have to change the way we’re writing out glob:
instead of using one *, we use two
**:
Download (or find) crazy-directories.tar (use
wget!). We previously used find to, uh, find
the Markdown files in this massive blob of directories. We can also use
globs to help us.
Extract crazy-directories.tar (if you just downloaded
it), then change into the directory. We can use globs to find all
Markdown files in this directory:
ls **/*.md # find files with names ending with .md
# in any subdirectoryThe technical Bash name for this feature is not “directory glob”, but is instead called “globstar”.
Some systems (like aviary) may have different shell settings toggled
for various reasons that can disable some functions like globs. You can
check to see if your current shell has globs disabled by running the
command below in a bash shell.
shopt | grep "globstar"globstar off
If you do find that “globstar” toggled off you will have to toggle it on using the command
shopt -s globstarUpon checking to see if it is enabled, you should now see that it is toggled to “on” now.
This will only last for your current session and you will need to set
this every time you want to use globs in bash scripts. If you want to
have this option get automatically you can edit a file called
.bashrc which is a file that automatically runs the
commands in it when you open a bash session or run a shell script using
bash. (Like an automatically executed shell script!)
The .bashrc file is found in your user’s home directory,
and can be edited using vim by running vim ~/.bashrc. You
can add the option by placing the above command in this file and saving.
The changes should be reflected the next time you run a shell script or
open a bash shell.
If you have other shells you will find a similar file for your
respective shell like .tcshrc or .cshrc on
Aviary, or .zshrc with Zsh, or config.fish for
fish.
If you want to read more about about .bashrc files you
can check out these resources:
- The manual pages: use
man bash(or manual pages online). - This
.bashrcoverview by Digital Ocean.
Further reading about globs
You can read more about globs in a few places:
- The manual pages: use
man 7 glob(or manual pages online). - The Advanced
Bash-Scripting Guide has even more kinds of globbing (using curly
braces
{}and question marks?).- Chapter 37 of the Advanced Bash-Scripting Guide has more examples of how to use directory globs.