Learning BASH: Text Processing - HEAD & TAIL

BASH continues to surprise me with it's amazing collection of simple, yet extremely useful commands. They can give you a huge boost in speed and control while working. No doubt bash along with editors like VIM are the developer's favorite combinations. Gradually , you will feel the invention of mouse as a waste since you can pretty much control everything with just your keyboard.

Today we continue with more commands that are related to Text Processing.

HEAD & TAIL command


These commands are used to get contents of a file starting from the top and bottom. Unlike the CAT command that displays the whole content of a file, these command gives you control over how much you want to see.

syntax: HEAD filename | TAIL filename

Note: By default, HEAD | TAIL shows 1st/last 10 lines of a file.

Lets say I have a text file like this.

$ cat numbers.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Let's run head command without any arguments other than the filename.

$ head numbers.txt
1
2
3
4
5
6
7
8
9
10

Problem: Display first 5 lines of the file provided.

$ head numbers.txt -n 5
1
2
3
4
5

$ tail numbers.txt -n 5
16
17
18
19
20


The -n argument lets us specify the number of lines I want to see/grab , starting from the first line.

Problem: Display the first 20 characters of a file/some text line.


$ echo "This is a test line with many characters" | head -c 12
This is a te

$ echo "This is a test line with many characters" | tail -c 12
 characters


Here, the -c argument is for character count. As the bash help says,

  -c, --bytes=[-]NUM       print the first NUM bytes of each file with the leading '-', print all but the
                                         last NUM bytes of each file


Note: HEAD/TAIL commands do not accept a range. So you can't display lines starting from n1 to n2. say 5-10th line. You can do that, but using a mixture of tail and head command. We will see that later.

TIP:

One of the most popular use of the tail command is to monitor changes to a file. Example a log file that logs each activity in a software or a website.

$ tail -f log.txt

The -f is for file input.

Problem: Display the lines from the given file between line 10 and 15.

Solution: So head gives us lines from first line i.e 1 to n . And tail gives us n lines , counting from the bottom . Our answer is expected as 11,12,13,14 as I said , I want numbers IN BETWEEN 10 and 15.

In plain words, this might be our plan.


  • First we get us the all numbers till (15 - 1). i.e 1 to 14. 
  • Then we remove 1 to 10 from this and the rest is our answer. In pictures, I can put it like this.






$ head numbers.txt -n 15 | tail -n 5
11
12
13
14
15

Learning BASH: Text Processing - Cut Command


Text processing tools in Bash is a huge topic . So we will take it one command at a time.


CUT COMMAND


You might think , CUT means to remove a file from location A to location B. But as the link here says, Cut command in unix (or linux) is used to select sections of text from each line of files. You can use the cut command to select fields or columns from a line by specifying a delimiter or you can select a portion of text by specifying the range or characters. Basically the cut command slices a line and extracts the text.

The definition of CUT command in linux itself says:



Print selected parts of lines from each FILE to standard output.

I created a text file (I am on windows running Cygwin...so......) . Added a few lines.

This is the first line
This is the second
And this is not the last line
Finally we end
Good Bye

The linux help says:


 N         N'th byte, character or field, counted from 1
  N-       from N'th byte, character or field, to end of line
  N-M   from N'th to M'th (included) byte, character or field
  -M      from first to M'th (included) byte, character or field


Problem : Give me the first (1st) letters of every line.

Solution:


$ cut -c1 foo.txt
T
T
A
F
G

Analysis: -c1 means , column one (1). Or position 1. That's the N'th byte.
Note: Column numbering starts from 1. NOT zero (0).

Problem: Show me the first three characters of each line.
Soln:


$ cut -c1-3 foo.txt
Thi
Thi
And
Fin
Goo

$ cut -c-3 foo.txt
Thi
Thi
And
Fin
Goo

Two ways to do it , May be more, but these are the easiest ways I suppose.
You can specify a RANGE . We have used here -M and N-M in each example.

Problem: Get the 3rd character of each line in a file. The file is given as an input from user. 

Solution:

cut -c3 $(expr read file)

Note that there other ways to do this.

cut -c3 

Note:  cut reads from standard input if the argument is "-" or absent.


Using Delimiters



The -d option in cut command can be used to specify the delimiter and -f option is used to specify the field position.

$ cut -d$' ' -f-3 foo.txt
This is the
This is the
And this is
Finally we end
Good Bye

Note : The -d needs a delimiter to be specified. The -f tells us the position. Here I have used first to third position .

In the above example, my delimiter is a single length space. I need to see till the 3rd occurance of space.

Another example:

$ cat foo.txt
Hi:I:Am:Groot

$ cut -d$':' -f1-3 foo.txt
Hi:I:Am

Problem: Given a sentence, identify and display its fourth word. Assume that the space (' ') is the only delimiter between words.

Solution:


cut -d$' ' -f4

Same for semi colon example:


$ cat foo.txt
Hi:I:Am:Groot

$ cut -d$':' -f4 foo.txt
Groot


Problem: Given a tab delimited file with several columns (tsv format) print the fields from second fields to last field.

Solution:

cut -f2- 

Note: The default delimiter is tab. So you DON'T need to specify a delimiter at all if the problem asks for a tab delimiter.

__________________________________________________________________________________________________

Reference: I took the problems from my favorite code competition site. Hacker Rank. Visit this link to practice more problems . Solve the first 9 problems which are based on CUT command for bash. Best of luck.

Learn BASH with me in 5 mins

I just started learning Linux bash from today. From my first impression of the language, I infer that it is a language with all the basic capabilities as of an infant high level language. May be I am right or wrong.Time will tell . We will keep going and keep discovering gradually. Let's start with the usual protocol of learning a language.

The HELLO WORLD program.


How to print things in shell. This is the first thing everyone wants to know while learning any language.

Anything that is not a variable is printable . And we print/echo it using the famous ECHO keyword

$ echo hello world
hello world


Printing a number.



$ echo 1
1
Printing a string with double quotes
$ echo "my name is arindam"
my name is arindam
Printing a string with single quotes
$ echo 'my name is Arindam'
my name is Arindam
Printing a number with quotes
$ echo '1' 1

Creating  Variables and recalling them.


So how can we store things. How to recall that stored value. How to change that value.

X=999

Note: There should be no spaces around the assignment operator (=). Also, there is no return value after the assignment statement is executed.

$ X=999

$ echo $X
999

$ $X
bash: 999: command not found

A simple = sign works great for assigning values but, the spaces around a important. Otherwise you will get an error.

To recall the value inside a variable, use the $ sign.
If you don't use the echo keyword and try to print the value by just a $ sign (people coming from languages like python would understand why someone would try such a thing).

Saving Strings in variable



$ X=arin

$ echo $X
arin

$ x="hi world"

$ x=hi world
bash: world: command not found

You can store a single word with spaces around without using quotes. But if there are spaces, then you need to use quotes. Other bash breaks down "x=hi world" as two commands x=hi and world. Obviously this doesn't work.

Dynamically changing value


$ echo $X
99

$ echo $((X+1))
100

What happened here. I wanted to use the variable X and get an incremented value of the same.
You need to use a double parenthesis in these cases. Note that this won't change the value of X to the new value.

Using Bash as a calculator



$ echo $((X*2))+$X
198+99

$ echo $(($((X*2))+$X))
297

This probably is an overkill but , if you need to do it, this is how you can.

From the first line, you can observe that , bash evaluates each section separately, and just displays their value in the same format. This is like interpretation


The second does the job , because we asked to evaluate the equation using $((equation)).
I think this looks messy and risky but, just for example sake, it works.

Another way , is to use the keyword "expr". Whatever is mentioned after this keyword become the expression to be solved/interpreted.

$ echo $(expr 5 + 5)
10

n=4
$ echo $(expr 4 * $n)
16


Iterations and LOOPS


Every body loves loops. Iterations are a part of every language. Bash provides the omnipresent FOR  loop and WHILE loop.


$ for num in 1 2 3
> do
> echo $num
> done
1
2
3

So here, we looped on a list of numbers.


X=1
while [ $X -le 99 ]
do
    echo $X
    X=$((X+2))
done

Here, I have printed from number 1 to 99 but only odd numbers.


Accepting input


Another common feature of any language is accepting an input from the user. We have the 'read' keyword for it.


read name
echo Welcome $name

Ranges

My reference: http://www.cyberciti.biz/faq/bash-for-loop/

How do deal with ranges. Bash has a syntax for that. {start..end}


for num in {1..50}
do
echo $num
done

Now , what if the end limit of your range, is inside a variable. You might think, I'll just do {1..$N}. Sorry that doesn't work. There is a better way to do this. If you know C syntax, then you must be familiar with this.

n=4

$ for ((i=1;i<=n;i++)); do echo $i; done
1
2
3
4


Ranges with Step


We want to add a step value. We can do it as {start..end..step}

$ for num in {1..10..2}; do echo $num; done
1
3
5
7
9

If condition with comparison operators



if [ $A -gt $B ]
    then 
        echo $(($A-$B))
else
    echo $(($B-$A))
fi

There are many operators available. Below table should be referred.


For string comparisons, the operators are different.


Example:


$ if [ 'Y' == 'Y' ]; then echo YES; else echo NO; fi
YES

$ if [ 'N' == 'Y' ]; then echo YES; else echo NO; fi
NO

Multiple conditions inside IF


There might be a condition when you have two or more possibilities for the if or else part to be true.

read D
if [ $D == 'Y' -o $D == 'y' ]; then echo YES; else echo NO; fi

Here, the -o stands for OR. Even || works for OR operation but the syntax changes slightly.

read D
if [ $D == 'Y' ] || [ $D == 'y' ]; then echo YES; else echo NO; fi

Also note, for AND operation , -a is used. Also && can be used.

Problem:

Find out if a triangle is scalene , equilateral or isosceles given sides of a triangle a, b, c.


Sol:


read a
read b
read c

if [ $a -eq $b -a $a -eq $c ];then
 echo EQUILATERAL;
elif [ $a -eq $b ] || [ $a -eq $c ]||[ $b -eq $c ];then 
 echo ISOSCELES;
else 
 echo SCALENE;
fi


Problem:


Find out the average to 3 decimals of accuracy. Given an array of numbers.


Input Format
The first line contains an integer, .
lines follow, each containing a single integer.
Output Format
Display the average of the integers, rounded off to three decimal places.

Soln:



read N
s=0
for ((i=1;i<=N;i++))
do
read temp
s=$((s+$temp))
done
printf "%.3f" $(echo $s/$N | bc -l) 

Total Pageviews