Why while read Sometimes Eats Your Variables

by Anton Van Assche - 12 min read

If you have ever written a shell script that loops over lines of input, you may have stumbled on a surprising behavior. Consider the following code:

count=0
echo -e "a\nb\nc" | while read line; do
  ((count++))
done
echo "Count: $count"

You might expect it to print Count: 3. Instead, it prints:

Count: 0

The loop clearly executed three times, so why did the variable never change? The answer lies in how Bash implements pipelines and subshells.

Pipelines and Subshells

A pipeline in Bash, like producer | consumer, is not a single process. Bash must set up a pipe (a unidirectional data channel in the kernel) and then run each command in the pipeline in a separate process. That way, data flows between processes without blocking the parent shell.

You can picture it like this:

Parent shell (bash)
   |
   |-- fork() -> producer (echo)
   |        writes "a\nb\nc" to pipe
   |
   `-- fork() -> consumer (while read loop)
            reads from pipe
            increments $count (in child only)

Variables in Bash are scoped to the process. The count you incremented lives in the subshell created for the consumer. When the subshell exits, its memory is destroyed. The parent shell's count was never touched, which is why the final echo shows 0.

We can visualize what happens in memory:

Parent Shell Memory
-------------------
count = 0
ENV_VAR = "original"

Pipeline forks -> Subshell (while loop)
-------------------
count = 0      <- incremented independently
ENV_VAR = "original"

After subshell exits:
-------------------
Parent shell memory unchanged:
count = 0
ENV_VAR = "original"

This combination shows both the process-level separation (PID view) and the variable-level effect (memory view). It makes it much easier to understand why your loop doesn't change the parent shell's variables.

Why Bash Does This

You might wonder why Bash doesn't just run the last command of a pipeline in the current shell. The reason is consistency. The POSIX specification allows the shell to decide whether each part of a pipeline runs in a subshell or not, but historically most shells fork each stage to avoid tricky edge cases where builtins would otherwise block the pipeline.

"Each command in a multi-command pipeline, where pipes are created, is executed in a subshell, which is a separate process." The Bash Manual

The side effect is that anything you do in that last stage - setting variables, changing directories, or modifying shell options - disappears when the subshell exits.

Here is a simplified ASCII view of what happens:

Without pipeline:
bash (pid 1000)
   |
   `-- runs while-loop directly
       updates $count in pid 1000

With pipeline:
bash (pid 1000)
   |
   |-- fork() -> pid 1001 (echo)
   |
   `-- fork() -> pid 1002 (while loop)
                 updates $count in pid 1002
                 exits, state lost

Proving the Subshell

We can actually visualize this behavior by printing the process ID of each stage:

echo "Parent PID: $BASHPID"

echo -e "a\nb\nc" | while read line; do
    echo "Loop PID: $BASHPID"
done

Here we use the BASHPID variable, which holds the PID of the current Bash process. The output will look something like this:

Parent PID: 1219125
Loop PID: 1225380
Loop PID: 1225380
Loop PID: 1225380

Notice that the Loop PID is different from the Parent PID, confirming that the loop runs in a separate subshell process. Each iteration of the loop shows the same PID because it is the same subshell instance handling all reads from the pipe.

In the example above we used $BASHPID instead of the older $$, this is due to the fact that $$ always shows the PID of the original shell (the parent shell), not the current subshell. When we would used $$ the result would simply be four times 1219125 which can be misleading.

Another way to see the subshell effect is to modify an environment, e.g. changing a variable or changing directories, and observing that the change does not persist after the loop.

count=0
echo -e "a\nb\nc" | while read line; do
    ((count++))
done
echo "Count: $count"        # prints 0, variable change lost

echo "Initial directory: $(pwd)"
echo -e "/tmp\n/home" | while read dir; do
    cd "$dir"
    echo "Inside loop: $(pwd)"
done
echo "After loop: $(pwd)"   # parent shell directory unchanged

When we execute this, we see that the count remains 0, and the directory after the loop is the same as before, confirming that changes inside the loop do not affect the parent shell.

Resulting in output like:

Count: 0
Initial directory: /home/anton
Inside loop: /tmp
Inside loop: /home
After loop: /home/anton

Both examples clearly demonstrate that the loop runs in a subshell, and each subshell has its own separate memory and environment.

Using {...} vs. (...) Groups

Bash supports two types of command grouping: curly braces and parentheses. Each has different implications for variable scope and subshell behavior:

  • Curly braces { …; } will cause the commands inside the grouping to run in the current shell. Any variable changes or environment modifications will persist after the group completes.
  • Parentheses ( … ) will run the commands inside a subshell. Variable changes and environment modifications will not persist after the subshell exits.

An important caveat when using curly braces inside a pipeline, is that it will not prevent the whole pipeline from running in a subshell. The entire pipeline will still run in a subshell, the curly braces only take affect on how the commands are handled inside that subshell.

For example:

count=0
echo -e "a\nb\nc" | { while read line; do ((count++)); done; }
echo "Count: $count"  # still prints 0, because the whole pipeline is in a subshell

Using parentheses inside a pipeline just nests a subshell within a subshell:

count=0
echo -e "a\nb\nc" | ( while read line; do ((count++)); done )
echo "Count: $count"  # still prints 0, because the whole pipeline is in a subshell

This makes it clear that grouping alone does not overcome the subshell behavior of pipelines.

Workarounds

There are a few ways to avoid this behavior, depending on your Bash version, coding style, and requirements.

Redirect input into the loop

Instead of piping into the loop, feed it input via process substitution. This way the loop runs in the parent shell:

count=0
while read line; do
  ((count++))
done < <(echo -e "a\nb\nc")

echo "Count: $count"
# -> Count: 3

Here, < <(...) creates a temporary file descriptor that Bash connects directly to the loop, so no subshell is needed for the while.

Rethink the design

Sometimes, you don't need to maintain state in a loop at all. For example, if you only need to count lines, a utility like wc can do it directly:

count=$(echo -e "a\nb\nc" | wc -l)
echo "Count: $count"
# -> Count: 3

Another modern approach (since Bash 4.0) is to use mapfile (or readarray) to read all lines into an array, which runs in the current shell and avoids subshell issues:

mapfile -t lines < <(echo -e "a\nb\nc")
count=${#lines[@]}
echo "Count: $count"
# -> Count: 3

Both wc -l and mapfile avoid the need for a subshell entirely. They are often faster and express the intent more clearly.

Use Bash's lastpipe option

Since Bash 4.2, you can enable lastpipe, which tells Bash to run the last command in a pipeline in the current shell rather than a subshell. This only works in non-interactive shells, not in your interactive terminal.

shopt -s lastpipe

count=0
echo -e "a\nb\nc" | while read line; do
  ((count++))
done

echo "Count: $count"
# -> Count: 3

With lastpipe enabled, Bash executes the loop directly in the parent process. This is closer to what many people expect, but it is not enabled by default because it can subtly break portability.

While it looks convenient, be cautious about using it, it may introduce hard to debug bugs due to a variable change where you least expect it. Most times a better redesign or writing style will lead to more robust and maintainable code. As some parts of your script might rely on the traditional behavior of pipelines creating subshells.

The Bottom Line

Pipelines in Bash fork processes, and a while read loop fed by a pipeline runs in a subshell. Any variables modified inside that loop are lost once the subshell exits. To preserve state, you can redirect input instead of piping, enable lastpipe in modern Bash, or restructure your code to avoid the issue entirely.

Once you understand that every pipeline stage is its own process, the mystery of the disappearing variable becomes much less magical - and much easier to avoid in your scripts.

References & Further Reading

  • man bash - Bash Manual
  • man 2 fork - fork() System Call Manual
  • man 2 pipe - pipe() System Call Manual
  • man 2 wait - wait() System Call Manual
  • help shopt - Bash shell options (see lastpipe)