Pub Talk Part II
'Waiter, get me a pint and don't worry about my lad over here, he's finally getting to meet a real operating system and he's got a lot to learn!'
'So my friend, could you get anything of what I've said so far?'
'Well, I can get what you mean, but I actually don't see what's the point of it.'
'Take it easy pal! We're just begining... What I've said so far is a taste of what lies ahead. As soon as we start developing structured programs, you'll see how useful those tools can be. After learning that, you'll see how easy it is to reach the top shellves. Now, tell me: how do you like the
grep family?'
'Pardon me! I don't know any
grep family'
'Sure, sure...
grep is an an acronym for "global regular expression print" - although there is a legend that tells that the name
grep comes from
ed (a text editor that is vim's grampa), in which the search command was
g/_regular expression_/p, or
g/_re_/p.'
'Well, this
grep command takes regular expressions and matches them to the lines of an "input". By the way, there is this guy - Aurélio Marinho Jargas - who maintains a webpage that can give you all the hints, clues and even tutorials you want about regular expressions (regexp). If you feel like learning to program in Shell, Perl, Python, etc. you're better to see what he's got!'
The great grep
'Waiter, this time I'll try a caipirinha - the Brazilian National Drink

-
(See how to prepare it)!'
'So, I told you that
grep matches regular expressions to the lines of an "input". But what are those "inputs"? Well, there are different ways of defining those inputs. Let's see!'
Searching a file:
$ grep mary /etc/passwd
Searching more than one file:
$ grep grep *.sh
Searching the output of a command
$ who | grep pelegrino
Considering the 1st example - which is the simplest one - I searched the occurrences of the word
mary in any position of the file
/etc/passwd. If I wanted to search it as a login name - or, in other words, just at the begining of the registers of that file - I should execute:
$ grep '^rafael' /etc/passwd
'Hold on, hold on... what's that caret (circumflex
^) and those apostrophes for?'
'The caret (
^), as you'd know if you had read the other articles on regular expressions I told you about, constrains the matches to the begining of the lines and the apostrophes (
') tell grep not to understand that circumflex, in order to be searched for.'
The 2nd example will list all the lines of all the files with the extension
.sh that have the world
grep. Since I use this extension to my Shell scripts, what I've done is to look for a good
grep example in all my scripts.
And look!!
grep accepts as input the output of another command, as long as it is indicated by a pipe symbol (
|) - this is very common in shell and it accelerates enourmously the execution of commands, since it takes the output of a command and reads it as if it were a file.
So, looking at the 3rd example, the command
who lists the users who are logged in the same machine as you are (remember: Linux is a multi user system) and the command
grep verifies whether the user
pelegrino is working or not.
The grep family
You know, the command
grep is widely known, because it is frequently used, but what most people don't know is that there are three commands in the grep family. They are:
Their main features are:
- grep
Can (or cannot) use simple regular expressions, but when it is not the case of using them, it is better to execute fgrep (it is faster);
- egrep (
'e' standing for extended)
Is a very powerful tool that uses regular expressions. It is often seen as the slowest brother of the grep family, hence it is more likely to use it when it is necessary to elaborate a regular expression that grep does not accept;
- fgrep (
'f' standing for fast, or file)
As its own name points out, is the fast brother of the family. It is fast running (it is about 30% faster than grep and 50% faster than egrep), but it is does not allow the use of regular expressions

The considerations above on speed are valid to the Unix
grep family.
grep is faster running on Linux, because the other two (
fgrep and
egrep) are shell scripts that execute
grep.
And I must say: I don't like that solution.
'Now that you know the differences among the tree, tell me: What do you think about the examples I gave before the explanation?'
'I thought
fgrep would solve your problem a lot faster than
grep.'
'Perfect!! I see you got what I said! Let's see some other examples to make their differences even clearer.'
I know that there is a text talking about Linux, but I'm not quite sure on whether the word Linux is written with a capital L or with a small one, what should I do?
There are two options in that case:
$ egrep (Linux | linux) arquivo.txt
or
$ grep [Ll]inux arquivo.txt
In the first case, the complex regular expression
(Linux | linux) uses the parentheses to group up the options and the pipe (
|) as a logical "or", which means that you are searching Linux or linux.
In the second case, on the other hand, the regular expression
[Ll]inux means that you are searching a word that starst with
L or
l followed by
inux. Since this expression is simpler,
grep itself can solve it, so I think it is a more recomendable one (remember:
egrep is slower).
Another example. If you want to list the subdirectories of a directory, you should run:
$ ls -l | grep '^d'
drwxr-xr-x 3 root root 4096 Dec 18 2000 doc
drwxr-xr-x 11 root root 4096 Jul 13 18:58 freeciv
drwxr-xr-x 3 root root 4096 Oct 17 2000 gimp
drwxr-xr-x 3 root root 4096 Aug 8 2000 gnome
drwxr-xr-x 2 root root 4096 Aug 8 2000 idl
drwxrwxr-x 14 root root 4096 Jul 13 18:58 locale
drwxrwxr-x 12 root root 4096 Jan 14 2000 lyx
drwxrwxr-x 3 root root 4096 Jan 17 2000 pixmaps
drwxr-xr-x 3 root root 4096 Jul 2 20:30 scribus
drwxrwxr-x 3 root root 4096 Jan 17 2000 sounds
drwxr-xr-x 3 root root 4096 Dec 18 2000 xine
As you can see above, the circumflex (
^) limits the search to the first position of the long output of the
ls command. The apostrophes tell the shell not to 'understand' the circumflex (
^).
Let's take another example. You know what are the first four positions of the output of a
ls -s command for an ordinary file (not a directory, nor a link, nor anything...) should be:
| |
|
|
- |
| Position |
1st |
2nd |
3rd |
4th |
| Possible values |
- |
r |
w |
x |
| |
- |
- |
s (suid) |
Thus, in order to find out what are the executable files in a directory, you should:
$ ls -la | egrep '^-..(x|s)'
-rwxr-xr-x 1 root root 2875 Jun 18 19:38 rc
-rwxr-xr-x 1 root root 857 Aug 9 22:03 rc.local
-rwxr-xr-x 1 root root 18453 Jul 6 17:28 rc.sysinit
Once again the caret (
^) limits the search to the begining of each line, hence, the listed occurrences are the ones that start with a
-, followed by anything (the full stop - a dot - in a regular expression denotes any character), once again followed by any character, followed by an
x or a
s.
The same result would be found with the command:
$ ls -la | grep '^-..[xs]'
and the search would be faster.
Building a CD Library
'Let me use a nice and didactic example: the process of building a CD Library. Keep in mind that it is as possible to develop software to organize audio CDs, as it is to data CDs (including those you get when you buy magazines, those you burn for yourself, etc.).'
'Hold on a sec. Where am I taking the CD data from?'
'Firstly I'll show you how your software can obtain data from those who are using it, afterwards I'll show you how to get data from the screen or from a file.'
Informing the Parameters
'In our case, the layout of a music file will be:'
name of the album^artist~name of the song:..:singer of the song
As you can see above, a circumflex (
^) separates the name of the album from the rest of the register (which contains information on each song and on its singer). The artist and the name of the song are separated by a tilde (
~), and a colon (
:) separates name of the song and name of the singer.
The software I'm intended to develop is called
musinc, and it will include registers on my music file. I will inform the content of each album as a parameter whenever I run the software, this way:
$ musinc "album^musician~music:musician~music:..."
That way, the software
musinc will get data from each album as if it were a variable. The only difference between a received parameter and a variable is that the first one gets numerical names (I know it sounds strange... what I meant was that they get one character names), such as
$1, $2, $3, ..., $9. Let's make a test:
$ cat teste
#!/bin/bash
# Program to test how to inform the parameters
echo "1o. parm -> $1"
echo "2o. parm -> $2"
echo "3o. parm -> $3"
Let's run it now:
$ teste informing parameters to test
bash: teste: cannot execute
OOPS, there is a detail I've forgotten: we have to make the file executable before running it:
$ chmod 755 teste
$ teste informing parameters to test
1o. parm -> informing
2o. parm -> parameters
3o. parm -> to
Interestingly, the last word
test was not considered by our program. That is because the program just considered the three first parameters. Let's execute it another way:
$ teste "informing parameters" to test
1o. parm -> informing parameters
2o. parm -> to
3o. parm -> test
With inverted commas Shell did not consider the blank space between the two first words, making it consider them as a single parameter.
Parametric Hints
Since we are talking about parameters, let me give you some hints:
| Meaning of the main variables |
$* |
Set of all parameters (similar to $@) |
| Variable |
Meaning |
$0 |
Name of the program |
$# |
Amount of informed parameters |
Making changes on the program
teste, in order to use the variables we have just seen. Let's do it this way:
$ cat teste
#!/bin/bash
# Program to test how to inform the parameters (2nd Version)
echo The program $0 received $# parameters
echo "1o. parm -> $1"
echo "2o. parm -> $2"
echo "3o. parm -> $3"
echo Todos de uma só \"tacada\": $*
Note that preceding the inverted commas I inserted a inverted slash, in order to tell Shell not to interpret them. Let's run the program.
$ teste informing parameters to test
The program teste received 4 parameters
1o. parm -> informing
2o. parm -> parameters
3o. parm -> to
Todos de uma "tacada": informing parameters to test
As I've said before, the parameters are numbered from 1 to 9, but that does not mean that it is not possible to use more than 9 parameters. Let's test it:
$ cat teste
#!/bin/bash
# Program to test how to inform the parameters (3rd Version)
echo The program $0 received $# parameters
echo "11th parm -> $11"
shift
echo "2nd parm -> $1"
shift 2
echo "4th Parm -> $4"
Let's run it now:
$ teste informing parameters to test
The program teste received 4 parameters
11th parm -> informing1
2nd parm -> parameters
4th parm -> test
There are two remarkable points about this script:
- In order to show that the parameters range from
$1 to $9, I wrote an echo $11 and what happened? It was interpreted as a $1 followed by the character 1, and the result was informing1;
- The command
shift, whose syntax is shift n (in which n is a variable that can assume any numerical value - although its default is 1), does not consider the first n parameters, making the first parameter the one numbered n+1.
Well, now that you know a little bit more about informing parameters, let's return to our CD Library and create our script for including CDs on bank called
musics. It is a very simple script (as simple as everything else in Shell) and I'll list you so that you can see:
$ cat musinc
#!/bin/bash
# Cadastra CDs (Version 1)
#
echo $1 >> musics
Since it is a is very functional script, I'll simply attach the received parameter at the end of the file songs. Let's include 3 albums and see if it works (in order to simplify, I'll suppose each album contains just 2 songs):
$ musinc "album 3^Musician5~Music5:Musician6~Music5"
$ musinc "album 1^Musician1~Music1:Musician2~Music2"
$ musinc "album 2^Musician3~Music3:Musician4~Music4"
Listing the content of songs.
$ cat musics
album 3^Musician5~Music5:Musician6~Music6
album 1^Musician1~Music1:Musician2~Music2
album 2^Musician3~Music3:Musician4~Music4
It is not as functional as it was supposed to be... it could be a lot better. The albums are out of order, complicating the research. Let's change the script and test it again:
$ cat musinc
#!/bin/bash
# Cadastra CDs (versao 2)
#
echo $1 >> musics
sort musics -o musics
Including another one
$ musinc "album 4^Musician7~Music7:Musician8~Music8"
Now let's see what happens to the song file:
$ cat musics
album 1^Musician1~Music1:Musician2~Music2
album 2^Musician3~Music3:Musician4~Music4
album 3^Musician5~Music5:Musician6~Music5
album 4^Musician7~Music7:Musician8~Music8
I simply inserted a line that classifies the file
musics, pointing the output to the same file (that's how the option
-o works), after attaching each album.
WOW! Now it is nice and almost functional. But attention and don't panic! That is not the final version. The next version of the program will be a lot better and more friendly! We'll develop it as soon as we learn how to get data from the screen and how to format the input.
Listing with the
cat command is totally out, let's make a program called
muslist that lists the album whose name is given as parameter:
$ cat muslist
#!/bin/bash
# Search for CDs (version 1)
#
grep $1 musicas
Let's run it looking for
album 2. As we have previously seen, when informing the sequence of characters
album 2, it is necessary to prevent Shell from interpreting it (otherwise it would read two parameters). Let's try the following:
$ muslist "album 2"
grep: can't open 2
musicas: album 1^Musician1~Music1:Musician2~Music2
musicas: album 2^Musician3~Music3:Musician4~Music4
musicas: album 3^Musician5~Music5:Musician6~Music6
musicas: album 4^Musician7~Music7:Musician8~Music8
'What a mess!! Where is the mistake? I put the parameter between inverted commas so that shell would not split it into two...'
'Yeap, but pay attention to how
grep is running:
grep $1 musics
Even putting
album 2 between inverted commas, when Shell sees
$1 it splits it into two arguments. So, the final content of the line that
grep has executed is:
grep album 2 musics
As the
grep syntax is:
grep [arq1, arq2, ..., arqn]
grep has understood that it was supposed to look for the chain of characters
album on the files
2 and
musics. But, since there is no arquivo 2, an error has occurred. Moreover, since the word album was found in every register of musicas, all registers were listed.

Use inverted commas whenever there is a blank space or a
<TAB> in the chain of characters that
grep will run. That helps the words after the blank space or
<TAB> from being interpreted as file names.
On the other side, it is better not to consider the case of the letters in the research. The following program would solve two problems at the same time:
$ cat muslist
#!/bin/bash
# Search for CDs (version 2)
#
grep -i "$1" musics
In that case, the option
-i tells
grep not to consider the case of the letters. Another point is the parameter
$1 that was inserted between inverted commas so that
grep would understand the chain of characters as a single argument.
$ muslist "album 2"
album2^Musician3~Music3:Musician4~Music4
Pay attention too to the fact that
grep locates the chain of characters in any position of the register, so, this way we can search for album, song, singer or even for pieces of information. As soon as we get started with conditional commands, we'll get a new version of
muslist that asks us in which of the fields the research will be performed.'
'Hold on pal! That putting between inverted commas thing is not really a friendly way of doing that...'
'You are right! Let me show you another way, then:
$ cat muslist
#!/bin/bash
# Consulta CDs (versao 3)
#
grep -i "$*" musics
$ muslist album 2
album 2^Musician3~Music3:Musician4~Music4
The option
$* stands for all parameters, and in that program it will be substituted by the chain
album 2 (according to the previous example), and it will do what you wanted it to.
You should have realized by now that the problem about Shell is not if if does or not something, but what is the best way of doing it (as you've seen, the range of options is huge!).'
'But what if I have to exclude a CD? Once I forgot a CD of mine under the sun and when I looked at it again... it was lost. What if that happened again?'
'Well, let's make another script called
musexc, in order to solve that kind of problem.'
Before developing it, I'd like to introduce you to a very useful option of the
grep family. Meet the option
-v. This option lists every input register, but the ones found by the command. Let's see the example:
$ grep -v "album 2" musics
album 1^Musician1~Music1:Musician2~Music2
album 3^Musician5~Music5:Musician6~Music6
album 4^Musician7~Music7:Musician8~Music8
As I've mentioned, that
grep from the example lists all the registers but the ones that refer to
album 2, and that happens because it fits into the parameters of the command. Now we are ready to develop the script that will remove the lost CD from your CD Library. It looks like this:
$ cat musexc
#!/bin/bash
# Delete CDs from Library (version 1)
#
grep -v "$1" musics > /tmp/mus$$
mv -f /tmp/mus$$ musics
The first line sends the file
musics to /tmp/mus$$, but extracting the registers that conform to the
grep='s research. Afterwards, it moves (or renames, if you prefer this word) =/tmp/mus$$ to
musics.
I used the file
/tmp/mus$$ as a work copy, because, as I've mentioned previously, the
$$ contains the
PID (
Process
IDentification), because of that, when others edit the file
musics, a different work copy will be made, and that avoids running over other's files.
'And that's it?'
'Yeah, man! Well, those programs we've made are quite basic, because we still lack knowledge about some tools. But, while I have another pint, you can practice using the examples, and I promise you will develop a nice control system for your CDs.
Next time we meet, I'll show you how conditional commands work and we'll improve those scripts.'
'That's it for now... but before:
Waiter, another round for me and my pal, please!'