قالب وردپرس درنا توس
Home / Tips and Tricks / Process a file line by line in a Linux Bash script

Process a file line by line in a Linux Bash script



A terminal window on a Linux computer system.
Fatmawati Achmad Zaenuri / Shutterstock

It̵

7;s pretty easy to read the contents of a Linux text file line by line in a shell script – as long as you have a few subtle pitfalls. Here’s how to do it safely.

Files, text and idioms

Every programming language has a set of idioms. These are the standard, easy ways to perform a range of common tasks. They are the basic or standard way of using one of the functions of the language the programmer is working with. They become part of a programmer’s mental blueprint toolkit.

Actions such as reading data from files, working with loops, and swapping the values ​​of two variables are good examples. The programmer will know at least one way to achieve their goals in a generic or vanilla way. Perhaps that is sufficient for the current requirement. Or maybe they beautify the code to make it more efficient or applicable to the specific solution they are developing. But having the building block idiom at your fingertips is a great starting point.

Knowing and understanding idioms in one language will also make it easier to learn a new programming language. Knowing how things are constructed in one language and looking for the equivalent – or best – in another language is a great way to appreciate the similarities and differences between programming languages ​​you already know and the language you are learning.

Reading Lines From A File: The One-Liner

In Bash you can get a while loop on the command line to read every line of text from a file and do something with it. Our text file is called ‘data.txt’. It contains a list of the months of the year.

January
February
March
.
.
October
November
December

Our simple one-liner is:

while read line; do echo $line; done < data.txt

while read line; do echo $line; done < data.txt in a terminal window

The while loop reads a line from the file and the execution stream of the small program goes to the body of the loop. The echo command writes the line of text in the terminal window. The read attempt fails if there are no more lines to read and the loop is completed.

A neat trick is the ability to redirect a file in a loop. In other programming languages, you have to open the file, read it, and then close it when you are done. With Bash, you can easily use file redirection and let the shell handle all those low-level things for you.

This one-liner is of course not very useful. Linux offers it all cat command, which does just that for us. We've come up with a long-winded way to replace a three-letter command. But it does show visibly the principles of reading from a file.

That works well enough to some extent. Suppose we have another text file with the names of the months. In this file, the escape sequence for a newline character is added to each line. We call it 'data2.txt'.

Januaryn
Februaryn
Marchn
.
.
Octobern
Novembern
Decembern

Let's use our one-liner for our new file.

while read line; do echo $line; done < data2.txt

while read line; do echo $line; done < data2.txt in a terminal window

The backslash escape character ' "Is removed. As a result, an "n" is added to each line. Bash interprets the backslash as the start of an escape sequence. Often times, we don't want Bash to interpret what it reads. It may be more convenient to read a line in its entirety - backslash escape sequences and all - and choose what to parse or replace, within your own code.

If we want to perform any meaningful processing or parsing of the text lines, we need a script.

Read lines from a file with a script

Here's our script. It's called "script1.sh."

#!/bin/bash

Counter=0

while IFS='' read -r LinefromFile || [[ -n "${LinefromFile}" ]]; do

    ((Counter++))
    echo "Accessing line $Counter: ${LinefromFile}"

done < "$1"

We set a variable with the name Counter zero, then we define our while loop.

The first statement on the while line is IFS='' . IFS stands for internal field separator. It contains values ​​that Bash uses to identify word boundaries. By default, the read command removes the leading and trailing spaces. If we want to read the lines from the file exactly as they are, we have to set IFS to be an empty string.

We could set this once outside the loop, just like we set the value of Counter . But with more complex scripts - especially those with many user-defined functions in them - it is possible IFS can be set to different values ​​elsewhere in the script. Ensure that IFS every time the while loop iterates guarantees that we know what its behavior will be.

We're going to read out a line of text in a variable called LinefromFile . We use the -r (read backslash as a normal character) option to ignore backslashes. They are treated like any other character and are not given any special treatment.

There are two conditions that meet the while loop and let the text be processed by the body of the loop:

  • read -r LinefromFile : When a line of text is successfully read from the file, the read command sends success signal to it while , and the while loop passes the execution flow to the body of the loop. Note that the read command must see a newline character at the end of the line of text to be considered successfully read. If the file is not a POSIX-compatible text file, the last line cannot contain a newline character. If it read command sees end of file marker (EOF) before line is ended by newline not treat it as a successful lecture. When that happens, the last line of text is not passed to the body of the loop and not processed.
  • [ -n "${LinefromFile}" ] : We need to do some extra work to handle non-POSIX compliant files. This comparison checks the text that is read from the file. If not terminated with a newline character, this comparison will still yield success for the while loop. This will ensure that any fragments from the back line are processed by the body of the loop.

These two clauses are separated by the logical operator OR ' || So that if one of both clause returns success, the fetched text is processed by the body of the loop, whether there is a newline character or not.

In the body of our loop we increase the Counter variable by one and use echo to send some output to the terminal window. The line number and text of each line are displayed.

We can still use our redirect trick to redirect a file to a loop. In this case we redirect $ 1, a variable containing the name of the first command line parameter passed to the script. With this trick, we can easily pass the name of the data file we want the script to work on.

Copy and paste the script into an editor and save it with the file name 'script1.sh'. Use the chmod command to make it executable.

chmod +x script1.sh

chmod + x script1.sh in a terminal window

Let's see what our script makes from the data2.txt text file and the backslashes in it.

./script1.sh data2.txt

./script1.sh data2.txt in a terminal window

Every character on the line is represented verbatim. The backslashes are not interpreted as escape characters. They are printed as regular characters.

Pass the line to a function

We still show the text on the screen. In a realistic programming scenario, we're probably about to do something interesting with the text line. In most cases it is good programming practice to take care of further processing of the line in another function.

Here's how we could do it. This is "script2.sh."

#!/bin/bash

Counter=0

function process_line() {

    echo "Processing line $Counter: $1"

}

while IFS='' read -r LinefromFile || [[ -n "${LinefromFile}" ]]; do

    ((Counter++))
    process_line "$LinefromFile"

done < "$1"

We define our Counter variable as before, and then we define a function called process_line() . The definition of a function should appear before the function is first called in the script.

Our function is passed to the newly read line of text in every iteration of it while loop. We can access that value within the function through the $1 variable. If two variables were passed to the function, we can access those values ​​with $1 and $2 and so on for more variables.

The While loop is largely the same. There is only one change in the body of the loop. The echo line has been replaced by a call to the process_line() position. Note that you do not have to use the "()" parentheses in the name of the function when you call it.

The name of the variable that contains the line of text, LinefromFile , is enclosed in quotation marks when passed to the function. This is suitable for lines with spaces in them. Without the quotes, the first word is treated as $1 by function the second word is considered to be $2 , and so on. Using quotation marks ensures that the entire line of text is treated as $1. Note that this is so not the same $1 which contains the same data file passed to the script.

Because Counter declared in the script body and not in a function, it can be referenced in the process_line() position.

Copy or type the above script into an editor and save it with the file name 'script2.sh'. Make it executable with chmod :

chmod +x script2.sh

chmod + x script2.sh in a terminal window

Now we can run it and pass it in a new data file, "data3.txt". It contains a list of the months and a line with many words on it.

January
February
March
.
.
October
November nMore text "at the end of the line"
December

Our mission is:

./script2.sh data3.txt

./script2.sh data3.txt in a terminal window

The lines are read from the file and passed to it one by one process_line() position. All lines are displayed correctly including the odd line with the backspace, quotes and multiple words in it.

Building blocks are useful

There is a line of thought that says that an idiom must contain something unique to that language. That's not a belief I subscribe to. What's important is that it makes good use of the language, is easy to remember, and provides a reliable and robust way to implement certain functionality in your code.




Source link