It is 2019 and we are already nearly at the end of January. If learning Linux was your New Year’s resolution, how far have you got? Have your wheel’s stopped turning and are you becoming bogged down and not knowing where to go. At the start of any project it is often like this and perhaps the meaning of life, the universe and everything is not 42 but simply not being bogged down. Let’s give you a well needed push towards your goals and let the snatch rope of your existence be learning PERL. PERL is everywhere and for that reason it make a great learning subject. You do not need to add anything to your Linux distribution to run PERL.
PERL is not new
It has been around for a long time, since 1987
As PERL is shipped with all Linux distributions and used in many standard applications you do not need to install anything to get yourself started and learning PERL. It is a scripting language meaning that the code does not need to be compiled. PERL scripts are purely text files that carry out the tasks that you create and dictate. To get started, though, we do not even need to create scripts. We can execute PERL directly from the command line.
PERL can execute code directly from the CLI
Rather than creating scripts short code snippets can be run directly from the command line
$ perl -e 'print("Hello World\n");'
Hello World
Current Linux distributions should include PERL version 5. Apple’s OS X operating system also includes PERL as standard. To check the version of PERL that you have installed the option -v will return the version number.
Different Operating Systems will vary on the PERL version they ship with:
Ubuntu 1804: 5.26
CentOS 7: 5.16
OS X: 5.18
Learning PERL may not seem the most modern of languages to learn so why do I suggest it? We have already discussed the fact the PERL is installed by default. Whereas Python and other languages may need to be added. PERL, being a traditional language is not as “tidy” as newer languages; however, learning PERL will help with all languages. Newer languages target the ease and speed of development over more traditional coding best practices. Learning to do anything properly is a great discipline for understanding and getting the process correct. Being a mature language also means that there are many PERL libraries already created to do do the heavy lifting in your programs.
PERL is not the easiest or most cool language to learn
Learning to code properly will be a tool that you can use with any language
To create a basic PERL script we can welcome the world to our intentions by creating the following text file, we will call it hello.pl. I know we don’t want to say Hello World but we are learning, little steps. Before you know it you will be selecting 42 words randomly from the Linux dictionary and printing those the the screen.
#!/usr/bin/env perl
print("Hello World\n");
Line 1: We should know this, the shebang that tells the script which processor we use. The command env is search for the perl program and perl is then used to execute the script.
Line 2: The PERL function print is used to display text to the screen, the \n character is used to add a new line or return after the text. If you want, practice the script with and without the \n added. Nothing will break, but your command line prompt will begin on a new line where the return is added to the script. Each line of code will end in the semi-colon or ; The shebang line is commented and as such does not end in the semi-colon. It is easy to forget the ; so you will become used to adding this when you script fails.
When you have created the file, we will need to make it executable. Text files in Linux do not have the execute permission set by default. This is something that we need to add using the command chmod.
$ chmod +x hello.pl
We an then execute the script using the following syntax:
$ ./hello.pl
We should see Hello World printed to the screen. The leading ./ is just used to locate the script in our current directory. Normally only those directories included in the PATH variable are searched for programs. Now that we have created a basic scripts we can move forward with a practical PERL example and select 42 words randomly from the dictionary.
Selecting 42 Random Words
The project that we shall work with will retrieve 42 random words from the dictionary. The dictionary file is added by distribution specific packages, the words package in CentOS and wamerican package in Ubuntu. The file /usr/share/dict/words is a text file with one word per line, usually with 1/2 a million or so words in the file. This is a great example file to retrieve random words from. The challenge that we face in choosing random words from the file is to ensure that that we select unique words. It is possible, even if unlikely, that the 42 selected words from the the dictionary are all the same. To avoid this, we will make sure that as a selection is made it is removed from the selection pool. This means that each word can only be selected once.
Opening the file in PERL
To start we will open and close the file:
#!/usr/bin/env perl
use strict;
use warnings;
my $dictionary = "/usr/share/dict/words";
my $index;
open(DATA, $dictionary) or die "Failed to open the file: $dictionary";
close(DATA)
Again you will need to create a text file with this content and the file will need to be made executable as before using the chmod command. Running the command should give you no output as all we do is open the file and then close it. Changing the name of the dictionary file to something that does not exists will create the error. This is a good way of testing that the file and your error checking is working as designed. Try replacing the line: my $dictionary = “/usr/share/dict/words”; with my $dictionary = “/usr/share/dict/wordz”; The script should now fail. Don’t forget to set it back again to the correct file name.
Line1: Shebang
Line2: Enforcing us to declare variables which is great practice and will help in the long run
Line3: Enable warnings for such things as unused variables which can help detect errors and oversights in our code. The 2 use lines are commonly found in PERL scripts and make great sense to include in most PERL scripts.
Line 4: The word my is used to declare variables. The variable name starts with a $ symbol to indicate the variable is a scalar or holds a single value. We also set the value of the variable in the same line that we declare the variable. Sometimes the declaration and the setting occur in different lines of code:
my $dictionary;
$dictionary = "/usr/share/dict/words/";
The variable is used so that we only have to type the path once. It is really just a shortcut and certainly makes sense to use.
Line 5: Is used to declare a scalar variable $index that will be used later
Line 6: The open function is used to open a file handle. We use the special name DATA in this case as the file handle name. The variable $dictionary lets the file handle know which file to open. If the open function fails we call the message from the die function.
Line 7: I think that you can guess this. Where we open the file handle we should also close the file handle.
Read each line into an array
Now that we have the file open we need to read it. We will read the file into an array. An array is a multi-valued variable and denoted in the declaration using the @ symbol. The diamond braces are used to surround the file handle name. The function chomp is used to remove the line returns that will exist at the end of each line of the file. This keeps are data clean and reliable. Remember, that the dictionary file should have just one word per line.
#!/usr/bin/env perl
use strict;
use warnings;
my $dictionary = "/usr/share/dict/words";
my $ index;
open(DATA, $dictionary) or die "Failed to open the file: $dictionary";
chomp(my @words = <DATA>);
close(DATA)
This new line will still not add any noticeable functionality to the script. Where possible, it is a great idea to run the script after edits are made to ensure that errors are picked up quickly and easily fixed.
Selecting 42 words from the file
Our project requires that we select 5 random words from the dictionary. We can use a for loop to effect this:
#!/usr/bin/env perl
use strict;
use warnings;
my $dictionary = "/usr/share/dict/words";
my $index;
open(DATA, $dictionary) or die "Failed to open the file: $dictionary";
chomp(my @words = <DATA>);
for (1..42) {
$index = int(rand(@words));
print("$words[$index]\n");
}
close(DATA)
Looking at this newly added code piece by piece we start with the for loop itself:
for (1..42) {
}
The keyword for starts the loop. To loop 42 times we use a range , (1..42), or 1 to 42. The brace brackets, { } ,create the code block that will execute for each of the 42 iterations of the loop.
The code inside the loop is the workhorse. The first line within the for loop sets the $index variable which was declared earlier. The value that we set is an integer, whole number that is randomly selected from 1 up to the maximum value that we obtain from the number of entries in the @words array. If we have 500,000 words then the random number selected will be between 1 and 500,000. There are two functions used here. The int function ensures that a whole number is used. The function rand selects the random number from 1 to the maximum value specified in the argument. Using @words as the maximum value uses the array length or the number of entries in the words array.
For example:
int(rand(10));
This would supply and random whole number between 1 and 10. If we omit the int function:
rand(10);
We would return a random fractional number between 1 and 10.
The final line in the loop used the print function. We saw this earlier when we visited the “Hello World” script. We use this to print a scalar variable now. A single entry from the words array so we change from the @ symbol to the $ symbol. A single entry in an array is denoted using the following syntax
$arrayname[entrynumber];
As an example:
print("$words[0]\n");
This would print the first entry of the words array or the first line of the dictionary file. On my CentOS 7 system it is the word 1080. The value that we use in the working example is the $index value randomly selected before. We will then print a random word from the dictionary on each of the 42 iterations. You should see something similar to this, obviously with your own random words, when the full code is executed:
$ ./randomwords.pl
equivaluer
non-member
readvocate
umping
Remmer
trimargarin
potlucks
pct.
humidors
verbifies
worser
interparenthetic
postorder
sighted
ornitholitic
Helleborus
spilly
folios
bulimic
perturbing
ascomycetes
fole
gipsyhead
rattle-head
bathymetrically
trytophan
Pan-orthodox
self-election
exust
frictions
wiretaps
sinnet
considerableness
mentery
lodgings
Polyidus
FOAC
Bartlemy
Jennette
Utta
millocratism
antisacerdotalist
Possible duplicates
As we have a wide range of numbers to randomly select it is is unlikely that we will display the same words twice within the 42 word list. It is possible though as the random selection is made each time the loop iterates. If the list to select from was much smaller this code would show duplicates regularly being selected. To avoid this we can remove an entry from the array once it has been selected. The entry is removed from the array but not from the dictionary, the integrity of the file is maintained.
#!/usr/bin/env perl
use strict;
use warnings;
my $dictionary = "/usr/share/dict/words";
my $index;
open(DATA, $dictionary) or die "Failed to open the file: $dictionary";
chomp( my @words = <DATA> );
for (1..42) {
$index = int(rand(@words));
print("$words[$index]\n");
splice(@words, $index, 1);
}
close(DATA)
It is the use of the splice function within the loop that removes the entry. We go to the entry and remove 1 entry, or just that single entry from the array.
The meaning of life, the universe and everything to Douglas Adams was 42. If you are a developer or budding developer it is free and open source software. Sharing what you love and have a passion for with others. If you are learning Linux or learning programming then practice this code and share your 42 words with @theurbanpenguin on twitter or Facebook.