sed is short for “stream editor”. It is a UNIX tool that dates way back to the early days. This article will introduce sed and give you an understanding of what it can do for you.
Using the UNIX way of working sed can take an input stream and modify it’s contents. Examples of where your are using streams and might not realise it:
- “cat myfile.txt | less”
- “ls -la | grep find_this_file.txt”
- “cat afile.txt | tee a_duplicate.txt”
Notice they all have the “|” pipe symbol, that is saying take the output of the first command and pipe it into the second.
Let us start off with a very ridiculous problem and see how sed can help. You have a file (poem.txt) with the following contents:
1 2 3 4
Well that’s not quite how you remember the poem now is it? We are now going to use sed to start fixing this poem, returning it back to original glory.
I’m going to assume that you are here because you have a UNIX-based system (Linux, OSX, Solaris, whatever, or god forbid you are forced into using cygwin), and you know the basics of using a shell. Otherwise I suggest you read-up elsewhere first.
So first obvious mistake, it’s not Johnny is it? It is Mary! To fix that you need to type the following at a shell prompt (you can download the example file from here or copy & paste-it from above or even type it out if you prefer; go on you know you need the exercise). I’ll start off with some solutions without telling you all the details of how sed works; let’s just see the magic first, and answer the obvious questions later. So you have your poem.txt file, so perform the following on it:
You should see the following output:
1 2 3 4
The sed utlility has a lot of commands, each of these is indicated by a single character before the first seperator “/”. The simple “s” command, substitute will be your most cherished friend, and if you only learn this one command you will still be able to get a lot out of sed.
So the above sed command say’s substitute an occurrence of Johnny for Mary. sed is case-sensitive so it wouldn’t have worked if you had typed “johnny” instead of “Johnny”, or “mary” instead of “Mary”. You didn’t do that did you? oops if you did!
Now try the following:
What!? The second Johnny has not been replaced! Our command seems to only replaces a single occurrence per line.
If you really wanted to change the second Johnny, you will need to extended the command with an additional piece of notation:
That’s better. That “g” on the end is a flag. It indicates global substitution, meaning it will try to match the command as many times as it can as it works it’s way along the line.
The command without the “g” finishes once the command has found the first match. This obviously could be useful if Jack really did name his child Johnny Junior after Grandad (or maybe the Milk Man).
So we have been through some examples of how useful this tool can be in editing streams, let’s take a closer look at the syntax:
With the subsitute command it boils down to:
Now there is a huge heap of commands that we can possibly make use of. We have to accept that we are only going touch the surface of what sed can do. I will stick with the substitute command since this one has a lot more to give. Perhaps in a future article I will expand into other useful commands.
Regular expressions. Brrr, enough to make anyone judder. Yes they can be very useful, but I’m not going talk about these here. We’ll stay with basic strings and you can mix in proper regular expressions when you are familiar with the basics.
Lets try something new, lets use an option to update the poem.txt file permanently, the -i option. This is used to edit a file in place. Go ahead run the following:
The output will now be:
1 2 3 4
The .bak indicates that a backup file poem.txt.bak will have been generated, very useful considering the ease with which these commands can be mis-used to mess your files up.
What about the flags? These can be one of:
- g - Replace all occurrences of regular expression
- i - Match regular expression case in-sensitive
- p - If Regular expression matches then print what was matched
- n - Where n is a number. Replace nth instance of match
- w - Followed by filename. Write out the result to the given file
We’ve used “g”, I’m sure you can imagine what “i” will do, what about “p”?
1 2 3 4 5
And now this:
1 2 3
The -n option turns off all output, but p forces the substitutions that match to be output, kind of like grep (that tool is a story for another day).
Without the -n it just duplicates the lines that match, thus:
1 2 3 4 5 6 7
Which could be useful if you wanted to duplicate a certain line that matches your string.
Finally the w option. This is just the same as taking the output from the command and sending to a file via redirection. In other words the following create the same output.txt file:
The difference is that the “w” command writes to the file and to the terminal, thus you can save the result of the sed command and pipe the output onwards to perform further manipulation:
1 2 3 4
In this last command we piped the output of one sed command into another. There are three ways to do this, and all these have identical outcomes:
1 2 3 4 5 6
Back to our poem, armed with our new found knowledge let’s put this all together and snap the poem back into shape:
1 2 3 4 5 6 7 8 9 10 11
Extremely verbose and contrived but quite effective.
So now for some useful (and not so useful) examples of sed in action:
Remove leading and trailing spaces & tabs from a file:
Don’t have grep use sed instead, find poem.txt:
1 2 3
Convert forward-slash to back-slash (‘/’ to ‘\’):
Convert back-slash to forward-slash (‘\’ to ‘/’) :