Key Knowledge Areas
- Send text files and output streams through text utility filters to modify the output using standard UNIX commands found in the GNU textutils package.
Terms and Utilities
- cat
- cut
- expand
- fmt
- head
- od
- join
- nl
- paste
- pr
- sed
- sort
- split
- tail
- tr
- unexpand
- uniq
- wc
Join
The GNU join (/usr/bin/join) command is a little like a database join but with text files rather than databases. For a simple join we can use join file1 file2, but for more complex joins we specify the fields to join with join -1 3 -2 1 file1 file. It does seem a little of a mess but it does make sense when we get into it. Where possible though the data would best exist within databases.
Starting with a simple join would seem like the best idea. We have two text files: city and country:
1 London 2 Cardiff 3 Edinburgh 4 Belfast
1 England 2 Wales 3 Scotland 4 NI
The two files have matching first columns and they are in sort order, either numerical or alphabetical will suffice. If no additional options are provided to join statement then it will join the two files on column 1.
join city country
The output then show the city joined with the country, the column that is joined on shows just once.
In a more complex example we may want to join the employees and managers file:
1 fred 2 bill 1 3 sally 1 4 jim 2 5 jane 2 6 sue 3 7 meg 3
The first column shows the employee ID, then the name and then the manager ID. The manager ID maps to an employees ID, as of course, the manager is also an employee.
A list of managers show in the managers file:
1 fred 2 bill 3 sally
So now we must join column 3 from the employees table to column 1 in the managers table. This is specified in the join command as:
join -1 3 -2 1 employees managers
Fred does not show as he is not managed, if we want to show non-matching rows we can use the -v1 option with the join command to list non matches from file 1:
join -v1 -1 3 -2 1 employees managers
This will just list Fred: