You are here:
Wiki-SL
>
TWikiBar Web
>
TWikiBarTalk002
(07 Jan 2007,
JulioNeves
)
(raw view)
E
dit
A
ttach
---+!! Pub Talk Part II --- %TOC% --- 'Waiter, get me a pint and don't worry about my lad over here, he's finally getting to meet a real operating system and he's got a lot to learn!' 'So my friend, could you get anything of what I've said so far?' 'Well, I can get what you mean, but I actually don't see what's the point of it.' 'Take it easy pal! We're just begining... What I've said so far is a taste of what lies ahead. As soon as we start developing structured programs, you'll see how useful those tools can be. After learning that, you'll see how easy it is to reach the top shellves. Now, tell me: how do you like the =grep= family?' 'Pardon me! I don't know any =grep= family' 'Sure, sure... =grep= is an an acronym for "global regular expression print" - although there is a legend that tells that the name =grep= comes from =ed= (a text editor that is vim's grampa), in which the search command was =g/_regular expression_/p=, or =g/_re_/p=.' 'Well, this =grep= command takes regular expressions and matches them to the lines of an "input". By the way, there is this guy - Aurélio Marinho Jargas - who maintains a webpage that can give you all the hints, clues and even tutorials you want about regular expressions (regexp). If you feel like learning to program in Shell, Perl, Python, etc. you're better to see what he's got!' ---++ The great grep 'Waiter, this time I'll try a caipirinha - the Brazilian National Drink ;) - <a href="http://www.google.com.br/search?num=100&hl=pt-BR&q=%22how+to+make+a+caipirinha%22+%22carnival+drinks%22+-sake&btnG=Pesquisar&meta=" target="_blank">(See how to prepare it)</a>!' 'So, I told you that =grep= matches regular expressions to the lines of an "input". But what are those "inputs"? Well, there are different ways of defining those inputs. Let's see!' Searching a file: %TERMINAL_INI% $ grep mary /etc/passwd %TERMINAL_FIM% Searching more than one file: %TERMINAL_INI% $ grep grep *.sh %TERMINAL_FIM% Searching the output of a command %TERMINAL_INI% $ who | grep pelegrino %TERMINAL_FIM% Considering the 1st example - which is the simplest one - I searched the occurrences of the word =mary= in any position of the file =/etc/passwd=. If I wanted to search it as a login name - or, in other words, just at the begining of the registers of that file - I should execute: %TERMINAL_INI% $ grep '^rafael' /etc/passwd %TERMINAL_FIM% 'Hold on, hold on... what's that caret (circumflex =^=) and those apostrophes for?' 'The caret (=^=), as you'd know if you had read the other articles on regular expressions I told you about, constrains the matches to the begining of the lines and the apostrophes (='=) tell grep not to understand that circumflex, in order to be searched for.' The 2nd example will list all the lines of all the files with the extension =.sh= that have the world =grep=. Since I use this extension to my Shell scripts, what I've done is to look for a good =grep= example in all my scripts. And look!! =grep= accepts as input the output of another command, as long as it is indicated by a pipe symbol (=|=) - this is very common in shell and it accelerates enourmously the execution of commands, since it takes the output of a command and reads it as if it were a file. So, looking at the 3rd example, the command =who= lists the users who are logged in the same machine as you are (remember: Linux is a multi user system) and the command =grep= verifies whether the user =pelegrino= is working or not. ---+++ The grep family You know, the command =grep= is widely known, because it is frequently used, but what most people don't know is that there are three commands in the grep family. They are: * *grep* * *egrep* * *fgrep* Their main features are: * *grep* %BR% Can (or cannot) use simple regular expressions, but when it is not the case of using them, it is better to execute =fgrep= (it is faster); * *egrep* (='e'= standing for extended) %BR% Is a very powerful tool that uses regular expressions. It is often seen as the slowest brother of the grep family, hence it is more likely to use it when it is necessary to elaborate a regular expression that grep does not accept; * *fgrep* (='f'= standing for fast, or file) %BR% As its own name points out, is the fast brother of the family. It is fast running (it is about 30% faster than grep and 50% faster than egrep), but it is does not allow the use of regular expressions %ATTENTION_INI% The considerations above on speed are valid to the Unix =grep= family. =grep= is faster running on Linux, because the other two (=fgrep= and =egrep=) are shell scripts that execute =grep=. And I must say: I don't like that solution. %ATTENTION_FIM% 'Now that you know the differences among the tree, tell me: What do you think about the examples I gave before the explanation?' 'I thought =fgrep= would solve your problem a lot faster than =grep=.' 'Perfect!! I see you got what I said! Let's see some other examples to make their differences even clearer.' * Examples I know that there is a text talking about Linux, but I'm not quite sure on whether the word Linux is written with a capital L or with a small one, what should I do? There are two options in that case: %TERMINAL_INI% $ egrep (Linux | linux) arquivo.txt %TERMINAL_FIM% or %TERMINAL_INI% $ grep [Ll]inux arquivo.txt %TERMINAL_FIM% In the first case, the complex regular expression =(Linux | linux)= uses the parentheses to group up the options and the pipe (=|=) as a logical "or", which means that you are searching Linux or linux. In the second case, on the other hand, the regular expression =[Ll]inux= means that you are searching a word that starst with =L= or =l= followed by =inux=. Since this expression is simpler, =grep= itself can solve it, so I think it is a more recomendable one (remember: =egrep= is slower). Another example. If you want to list the subdirectories of a directory, you should run: %TERMINAL_INI% $ ls -l | grep '^d'%OUT_INI% drwxr-xr-x 3 root root 4096 Dec 18 2000 doc drwxr-xr-x 11 root root 4096 Jul 13 18:58 freeciv drwxr-xr-x 3 root root 4096 Oct 17 2000 gimp drwxr-xr-x 3 root root 4096 Aug 8 2000 gnome drwxr-xr-x 2 root root 4096 Aug 8 2000 idl drwxrwxr-x 14 root root 4096 Jul 13 18:58 locale drwxrwxr-x 12 root root 4096 Jan 14 2000 lyx drwxrwxr-x 3 root root 4096 Jan 17 2000 pixmaps drwxr-xr-x 3 root root 4096 Jul 2 20:30 scribus drwxrwxr-x 3 root root 4096 Jan 17 2000 sounds drwxr-xr-x 3 root root 4096 Dec 18 2000 xine%OUT_FIM% %TERMINAL_FIM% As you can see above, the circumflex (=^=) limits the search to the first position of the long output of the =ls= command. The apostrophes tell the shell not to 'understand' the circumflex (=^=). Let's take another example. You know what are the first four positions of the output of a =ls -s= command for an ordinary file (not a directory, nor a link, nor anything...) should be: <center> %TABLE{ databg="#ffffff" headerrows="1" }% | Position | 1st | 2nd | 3rd | 4th | | Possible values | - | r | w | x | |^| | - | - | s (suid) | |^| | | | - | </center> Thus, in order to find out what are the executable files in a directory, you should: %TERMINAL_INI% $ ls -la | egrep '^-..(x|s)'%OUT_INI% -rwxr-xr-x 1 root root 2875 Jun 18 19:38 rc -rwxr-xr-x 1 root root 857 Aug 9 22:03 rc.local -rwxr-xr-x 1 root root 18453 Jul 6 17:28 rc.sysinit%OUT_FIM% %TERMINAL_FIM% Once again the caret (=^=) limits the search to the begining of each line, hence, the listed occurrences are the ones that start with a =-=, followed by anything (the full stop - a dot - in a regular expression denotes any character), once again followed by any character, followed by an =x= or a =s=. The same result would be found with the command: %TERMINAL_INI% $ ls -la | grep '^-..[xs]' %TERMINAL_FIM% and the search would be faster. ---++ Building a CD Library 'Let me use a nice and didactic example: the process of building a CD Library. Keep in mind that it is as possible to develop software to organize audio CDs, as it is to data CDs (including those you get when you buy magazines, those you burn for yourself, etc.).' 'Hold on a sec. Where am I taking the CD data from?' 'Firstly I'll show you how your software can obtain data from those who are using it, afterwards I'll show you how to get data from the screen or from a file.' ---+++ Informing the Parameters 'In our case, the layout of a music file will be:' <verbatim> name of the album^artist~name of the song:..:singer of the song </verbatim> As you can see above, a circumflex (=^=) separates the name of the album from the rest of the register (which contains information on each song and on its singer). The artist and the name of the song are separated by a tilde (=~=), and a colon (=:=) separates name of the song and name of the singer. The software I'm intended to develop is called =musinc=, and it will include registers on my music file. I will inform the content of each album as a parameter whenever I run the software, this way: %TERMINAL_INI% $ musinc "album^musician~music:musician~music:..." %TERMINAL_FIM% That way, the software =musinc= will get data from each album as if it were a variable. The only difference between a received parameter and a variable is that the first one gets numerical names (I know it sounds strange... what I meant was that they get one character names), such as =$1, $2, $3, ..., $9=. Let's make a test: %TERMINAL_INI% $ cat teste%OUT_INI% #!/bin/bash # Program to test how to inform the parameters echo "1o. parm -> $1" echo "2o. parm -> $2" echo "3o. parm -> $3"%OUT_FIM% %TERMINAL_FIM% Let's run it now: %TERMINAL_INI% $ teste informing parameters to test%OUT_INI% bash: teste: cannot execute%OUT_FIM% %TERMINAL_FIM% OOPS, there is a detail I've forgotten: we have to make the file executable before running it: %TERMINAL_INI% $ chmod 755 teste $ teste informing parameters to test%OUT_INI% 1o. parm -> informing 2o. parm -> parameters 3o. parm -> to%OUT_FIM% %TERMINAL_FIM% Interestingly, the last word =test= was not considered by our program. That is because the program just considered the three first parameters. Let's execute it another way: %TERMINAL_INI% $ teste "informing parameters" to test%OUT_INI% 1o. parm -> informing parameters 2o. parm -> to 3o. parm -> test%OUT_FIM% %TERMINAL_FIM% With inverted commas Shell did not consider the blank space between the two first words, making it consider them as a single parameter. ---+++ Parametric Hints Since we are talking about parameters, let me give you some hints: <center> %TABLE{ databg="#ffffff" headerrows="1" }% | *Meaning of the main variables* || | *Variable* | *Meaning* | | =$0= | Name of the program | | =$#= | Amount of informed parameters | | =$*= | Set of all parameters (similar to $@) | </center> * Examples Making changes on the program =teste=, in order to use the variables we have just seen. Let's do it this way: %TERMINAL_INI% $ cat teste%OUT_INI% #!/bin/bash # Program to test how to inform the parameters (2nd Version) echo The program $0 received $# parameters echo "1o. parm -> $1" echo "2o. parm -> $2" echo "3o. parm -> $3" echo Todos de uma só \"tacada\": $*%OUT_FIM% %TERMINAL_FIM% Note that preceding the inverted commas I inserted a inverted slash, in order to tell Shell not to interpret them. Let's run the program. %TERMINAL_INI% $ teste informing parameters to test%OUT_INI% The program teste received 4 parameters 1o. parm -> informing 2o. parm -> parameters 3o. parm -> to Todos de uma "tacada": informing parameters to test%OUT_FIM% %TERMINAL_FIM% As I've said before, the parameters are numbered from 1 to 9, but that does not mean that it is not possible to use more than 9 parameters. Let's test it: * Example: %TERMINAL_INI% $ cat teste%OUT_INI% #!/bin/bash # Program to test how to inform the parameters (3rd Version) echo The program $0 received $# parameters echo "11th parm -> $11" shift echo "2nd parm -> $1" shift 2 echo "4th Parm -> $4"%OUT_FIM% %TERMINAL_FIM% Let's run it now: %TERMINAL_INI% $ teste informing parameters to test%OUT_INI% The program teste received 4 parameters 11th parm -> informing1 2nd parm -> parameters 4th parm -> test%OUT_FIM% %TERMINAL_FIM% There are two remarkable points about this script: 1. In order to show that the parameters range from =$1= to =$9=, I wrote an =echo $11= and what happened? It was interpreted as a =$1= followed by the character =1=, and the result was =informing1=; 2. The command =shift=, whose syntax is =shift n= (in which =n= is a variable that can assume any numerical value - although its default is =1=), does not consider the first =n= parameters, making the first parameter the one numbered =n+1=. Well, now that you know a little bit more about informing parameters, let's return to our CD Library and create our script for including CDs on bank called =musics=. It is a very simple script (as simple as everything else in Shell) and I'll list you so that you can see: * Examples: %TERMINAL_INI% $ cat musinc%OUT_INI% #!/bin/bash # Cadastra CDs (Version 1) # echo $1 >> musics%OUT_FIM% %TERMINAL_FIM% Since it is a is very functional script, I'll simply attach the received parameter at the end of the file songs. Let's include 3 albums and see if it works (in order to simplify, I'll suppose each album contains just 2 songs): %TERMINAL_INI% $ musinc "album 3^Musician5~Music5:Musician6~Music5" $ musinc "album 1^Musician1~Music1:Musician2~Music2" $ musinc "album 2^Musician3~Music3:Musician4~Music4" %TERMINAL_FIM% Listing the content of songs. %TERMINAL_INI% $ cat musics%OUT_INI% album 3^Musician5~Music5:Musician6~Music6 album 1^Musician1~Music1:Musician2~Music2 album 2^Musician3~Music3:Musician4~Music4%OUT_FIM% %TERMINAL_FIM% It is not as functional as it was supposed to be... it could be a lot better. The albums are out of order, complicating the research. Let's change the script and test it again: %TERMINAL_INI% $ cat musinc%OUT_INI% #!/bin/bash # Cadastra CDs (versao 2) # echo $1 >> musics sort musics -o musics%OUT_FIM% %TERMINAL_FIM% Including another one %TERMINAL_INI% $ musinc "album 4^Musician7~Music7:Musician8~Music8" %TERMINAL_FIM% Now let's see what happens to the song file: %TERMINAL_INI% $ cat musics%OUT_INI% album 1^Musician1~Music1:Musician2~Music2 album 2^Musician3~Music3:Musician4~Music4 album 3^Musician5~Music5:Musician6~Music5 album 4^Musician7~Music7:Musician8~Music8%OUT_FIM% %TERMINAL_FIM% I simply inserted a line that classifies the file =musics=, pointing the output to the same file (that's how the option =-o= works), after attaching each album. WOW! Now it is nice and almost functional. But attention and don't panic! That is not the final version. The next version of the program will be a lot better and more friendly! We'll develop it as soon as we learn how to get data from the screen and how to format the input. * Examples Listing with the =cat= command is totally out, let's make a program called =muslist= that lists the album whose name is given as parameter: %TERMINAL_INI% $ cat muslist%OUT_INI% #!/bin/bash # Search for CDs (version 1) # grep $1 musicas%OUT_FIM% %TERMINAL_FIM% Let's run it looking for =album 2=. As we have previously seen, when informing the sequence of characters =album 2=, it is necessary to prevent Shell from interpreting it (otherwise it would read two parameters). Let's try the following: %TERMINAL_INI% $ muslist "album 2"%OUT_INI% grep: can't open 2 musicas: album 1^Musician1~Music1:Musician2~Music2 musicas: album 2^Musician3~Music3:Musician4~Music4 musicas: album 3^Musician5~Music5:Musician6~Music6 musicas: album 4^Musician7~Music7:Musician8~Music8%OUT_FIM% %TERMINAL_FIM% 'What a mess!! Where is the mistake? I put the parameter between inverted commas so that shell would not split it into two...' 'Yeap, but pay attention to how =grep= is running: <verbatim> grep $1 musics </verbatim> Even putting =album 2= between inverted commas, when Shell sees =$1= it splits it into two arguments. So, the final content of the line that =grep= has executed is: <verbatim> grep album 2 musics </verbatim> As the =grep= syntax is: <verbatim> grep [arq1, arq2, ..., arqn] </verbatim> =grep= has understood that it was supposed to look for the chain of characters =album= on the files =2= and =musics=. But, since there is no arquivo 2, an error has occurred. Moreover, since the word album was found in every register of musicas, all registers were listed. %TIP_INI% Use inverted commas whenever there is a blank space or a =<TAB>= in the chain of characters that =grep= will run. That helps the words after the blank space or =<TAB>= from being interpreted as file names. %TIP_FIM% On the other side, it is better not to consider the case of the letters in the research. The following program would solve two problems at the same time: %TERMINAL_INI% $ cat muslist%OUT_INI% #!/bin/bash # Search for CDs (version 2) # grep -i "$1" musics%OUT_FIM% %TERMINAL_FIM% In that case, the option =-i= tells =grep= not to consider the case of the letters. Another point is the parameter =$1= that was inserted between inverted commas so that =grep= would understand the chain of characters as a single argument. %TERMINAL_INI% $ muslist "album 2"%OUT_INI% album2^Musician3~Music3:Musician4~Music4%OUT_FIM% %TERMINAL_FIM% Pay attention too to the fact that =grep= locates the chain of characters in any position of the register, so, this way we can search for album, song, singer or even for pieces of information. As soon as we get started with conditional commands, we'll get a new version of =muslist= that asks us in which of the fields the research will be performed.' 'Hold on pal! That putting between inverted commas thing is not really a friendly way of doing that...' 'You are right! Let me show you another way, then: %TERMINAL_INI% $ cat muslist%OUT_INI% #!/bin/bash # Consulta CDs (versao 3) # grep -i "$*" musics $ muslist album 2 album 2^Musician3~Music3:Musician4~Music4%OUT_FIM% %TERMINAL_FIM% The option =$*= stands for all parameters, and in that program it will be substituted by the chain =album 2= (according to the previous example), and it will do what you wanted it to. You should have realized by now that the problem about Shell is not if if does or not something, but what is the best way of doing it (as you've seen, the range of options is huge!).' 'But what if I have to exclude a CD? Once I forgot a CD of mine under the sun and when I looked at it again... it was lost. What if that happened again?' 'Well, let's make another script called =musexc=, in order to solve that kind of problem.' Before developing it, I'd like to introduce you to a very useful option of the =grep= family. Meet the option =-v=. This option lists every input register, but the ones found by the command. Let's see the example: %TERMINAL_INI% $ grep -v "album 2" musics%OUT_INI% album 1^Musician1~Music1:Musician2~Music2 album 3^Musician5~Music5:Musician6~Music6 album 4^Musician7~Music7:Musician8~Music8%OUT_FIM% %TERMINAL_FIM% As I've mentioned, that =grep= from the example lists all the registers but the ones that refer to =album 2=, and that happens because it fits into the parameters of the command. Now we are ready to develop the script that will remove the lost CD from your CD Library. It looks like this: %TERMINAL_INI% $ cat musexc%OUT_INI% #!/bin/bash # Delete CDs from Library (version 1) # grep -v "$1" musics > /tmp/mus$$ mv -f /tmp/mus$$ musics%OUT_FIM% %TERMINAL_FIM% The first line sends the file =musics= to /tmp/mus$$, but extracting the registers that conform to the =grep='s research. Afterwards, it moves (or renames, if you prefer this word) =/tmp/mus$$= to =musics=. I used the file =/tmp/mus$$= as a work copy, because, as I've mentioned previously, the =$$= contains the =PID= (<kbd>P</kbd>rocess <kbd>ID</kbd>entification), because of that, when others edit the file =musics=, a different work copy will be made, and that avoids running over other's files. 'And that's it?' 'Yeah, man! Well, those programs we've made are quite basic, because we still lack knowledge about some tools. But, while I have another pint, you can practice using the examples, and I promise you will develop a nice control system for your CDs. Next time we meet, I'll show you how conditional commands work and we'll improve those scripts.' 'That's it for now... but before: Waiter, another round for me and my pal, please!'
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r10
<
r9
<
r8
<
r7
<
r6
|
B
acklinks
|
V
iew topic
|
M
ore topic actions
Topic revision: r10 - 07 Jan 2007 - 22:47:35 -
JulioNeves
TWikiBar.TWikiBarTalk002 moved from TWikiBar.TWikiBarPapo013 on 01 Sep 2006 - 16:47 by
JarbasJunior
-
put it back
TWikiBar
Página Inicial
Últimas alterações
Índice
Procurar
Estatísticas de Uso
Aviso de Atualização
Configurações Gerais
Projeto Gráfico
Mapa do Site
Quem Somos
Registre-se
?
Regras de Formatação
Biblioteca Gráfica
?
Carinhas Gráficas
Webs Wiki-SL
Amadeu
Anapolivre
ArquivoLivre
Arte
BahiaSocial
BeaBa
BibliotecaLivre
Blogs
BrasilDigital
BrasilELivre
BSM
Ccsa
CESL
CoberturaWiki
Cooperativas
Curriculo
DarvinMarosin
DiaD
Dinamicoop
Economia
EconomiaSolidaria
EducacaoLivre
Ekaaty
Emacsbr
ENSL
Fatos
Festival3
Festival4
Flisol
Fmpb
Formatos
Foswikibr
FSM2005
GNOMEBR
GTTemario2004
GTWeb
Guialivre
HDC
Incubus
InkscapeBrasil
Jogos
KdeBR
KSP
LGM
LinuxStokDoc
Livros
Main
Mentores
MHHOB
MinuanoDigital
MoradiaECidadania
OlhosDagua
Olimpo
OLPC
OOPTQ
Papers
PCLivre
PentahoBrasil
Pessoas
Portal
Prefeituras
PSLAL
PSLBA
PSLBancarios
PSLBrasil
PSLGO
PSLMA
PSLMG
PSLMIP
PSLMT
PSLMulheres
PSLPI
PubFisl10
PubFisl7
PubFisl8
PubFisl9
QuilomboDoSopapo
RadioSL
RedeMesh
RedePopular
RobotWars
Sandbox
Saudelivre
Scribus
Sementes
Shakya
SLRJ
SoftwareLivreIrece
SoftwareLivreVS
SoLiSC
SuporteLivre
System
Telecentros
TeseSA
TextoLivre
TV
TWikiBar
TWikiPtbr
UNELivre
UNIMIX
VilaTorres
WebNordeste
WTRD2004
Este Menu
?skin=free
English
Español
Português brasileiro
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Wiki-SL?
Send feedback