Perl <- StackOverflow top 100

1: Find size of an array in Perl (score 559516 in 2015)

Question

I seem to have come across several different ways to find the size of an array. What is the difference between these three methods?

my @arr = (2);
print scalar @arr; # First way to print array size

print $#arr; # Second way to print array size

my $arrSize = @arr;
print $arrSize; # Third way to print array size

Answer accepted (score 228)

The first and third ways are the same: they evaluate an array in scalar context. I would consider this to be the standard way to get an array’s size.

The second way actually returns the last index of the array, which is not (usually) the same as the array size.

Answer 2 (score 40)

First, the second is not equivalent to the other two. $#array returns the last index of the array, which is one less than the size of the array.

The other two are virtually the same. You are simply using two different means to create scalar context. It comes down to a question of readability.

I personally prefer the following:

say 0+@array;          # Represent @array as a number

I find it clearer than

say scalar(@array);    # Represent @array as a scalar

and

my $size = @array;
say $size;

The latter looks quite clear alone like this, but I find that the extra line takes away from clarity when part of other code. It’s useful for teaching what @array does in scalar context, and maybe if you want to use $size more than once.

Answer 3 (score 27)

This gets the size by forcing the array into a scalar context, in which it is evaluated as its size:

print scalar @arr;

This is another way of forcing the array into a scalar context, since it’s being assigned to a scalar variable:

my $arrSize = @arr;

This gets the index of the last element in the array, so it’s actually the size minus 1 (assuming indexes start at 0, which is adjustable in Perl although doing so is usually a bad idea):

print $#arr;

This last one isn’t really good to use for getting the array size. It would be useful if you just want to get the last element of the array:

my $lastElement = $arr[$#arr];

Also, as you can see here on Stack Overflow, this construct isn’t handled correctly by most syntax highlighters…

2: How do I break out of a loop in Perl? (score 547733 in 2014)

Question

I’m trying to use a break statement in a for loop, but since I’m also using strict subs in my Perl code, I’m getting an error saying:

Bareword “break” not allowed while “strict subs” in use at ./final.pl line 154.

Is there a workaround for this (besides disabling strict subs)?

My code is formatted as follows:

for my $entry (@array){
    if ($string eq "text"){
         break;
    }
}

Answer accepted (score 425)

Oh, I found it. You use last instead of break

for my $entry (@array){
    if ($string eq "text"){
         last;
    }
}

Answer 2 (score 174)

Additional data (in case you have more questions):

FOO: {
       for my $i ( @listone ){
          for my $j ( @listtwo ){
                 if ( cond( $i,$j ) ){

                    last FOO;  # --->
                                   # |
                 }                 # |
          }                        # |
       }                           # |
 } # <-------------------------------

Answer 3 (score 17)

Simply last would work here:

for my $entry (@array){
    if ($string eq "text"){
         last;
    }
}

If you have nested loops, then last will exit from the innermost. Use labels in this case:

LBL_SCORE: {
       for my $entry1 ( @array1 ){
          for my $entry2 ( @array2 ){
                 if ( $entry1 eq $entry2 ){   # or any condition
                    last LBL_SCORE;
                 }
          }
       }
 }

Given last statement will make compiler to come out from both the loops. Same can be done in any number of loops, and labels can be fixed anywhere.

3: How to fix a locale setting warning from Perl? (score 488557 in 2016)

Question

When I run perl, I get the warning:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

How do I fix it?

Answer accepted (score 419)

Your OS doesn’t know about en_US.UTF-8.

You didn’t mention a specific platform, but I can reproduce your problem:

% uname -a
OSF1 hunter2 V5.1 2650 alpha
% perl -e exit
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LC_ALL = (unset),
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

My guess is you used ssh to connect to this older host from a newer desktop machine. It’s common for /etc/ssh/sshd_config to contain

AcceptEnv LANG LC_*

which allows clients to propagate the values of those environment variables into new sessions.

The warning gives you a hint about how to squelch it if you don’t require the full-up locale:

% env LANG=C perl -e exit
%

or with bash:

$ LANG=C perl -e exit
$ 

For a permanent fix, choose one of

  1. On the older host, set the LANG environment variable in your shell’s initialization file.
  2. Modify your environment on the client side, e.g., rather than ssh hunter2, use the command LANG=C ssh hunter2.
  3. If you have admin rights, stop ssh from sending the environment variables by commenting out the SendEnv LANG LC_* line in the local /etc/ssh/ssh_config file. (Thanks to this answer. See Bug 1285 for OpenSSH for more.)

Answer 2 (score 454)

Here is how to solve it on Mac OS Lion (10.7) or Cygwin (Windows 10):

Add the following lines to your bashrc or bash_profile on the host machine:

# Setting for the new UTF-8 terminal support in Lion
export LC_CTYPE=en_US.UTF-8
export LC_ALL=en_US.UTF-8

If you are using zsh, edit zshrc:

# Setting for the new UTF-8 terminal support in Lion
LC_CTYPE=en_US.UTF-8
LC_ALL=en_US.UTF-8

Answer 3 (score 194)

If you are creating a rootfs using debootstrap you will need to generate the locales. You can do this by running:

# (optional) enable missing locales
sudo nano /etc/locale.gen

# then regenerate
sudo locale-gen

This tip comes from, https://help.ubuntu.com/community/Xen

4: How can I check if a Perl array contains a particular value? (score 389390 in 2010)

Question

I am trying to figure out a way of checking for the existence of a value in an array without iterating through the array.

I am reading a file for a parameter. I have a long list of parameters I do not want to deal with. I placed these unwanted parameters in an array @badparams.

I want to read a new parameter and if it does not exist in @badparams, process it. If it does exist in @badparams, go to the next read.

Answer accepted (score 181)

Simply turn the array into a hash:

my %params = map { $_ => 1 } @badparams;

if(exists($params{$someparam})) { ... }

You can also add more (unique) params to the list:

$params{$newparam} = 1;

And later get a list of (unique) params back:

@badparams = keys %params;

Answer 2 (score 211)

Best general purpose - Especially short arrays (1000 items or less) and coders that are unsure of what optimizations best suit their needs.

# $value can be any regex. be safe
if ( grep( /^$value$/, @array ) ) {
  print "found it";
}

It has been mentioned that grep passes through all values even if the first value in the array matches. This is true, however grep is still extremely fast for most cases. If you’re talking about short arrays (less than 1000 items) then most algorithms are going to be pretty fast anyway. If you’re talking about very long arrays (1,000,000 items) grep is acceptably quick regardless of whether the item is the first or the middle or last in the array.

Optimization Cases for longer arrays:

If your array is sorted, use a “binary search”.

If the same array is repeatedly searched many times, copy it into a hash first and then check the hash. If memory is a concern, then move each item from the array into the hash. More memory efficient but destroys the original array.

If same values are searched repeatedly within the array, lazily build a cache. (as each item is searched, first check if the search result was stored in a persisted hash. if the search result is not found in the hash, then search the array and put the result in the persisted hash so that next time we’ll find it in the hash and skip the search).

Note: these optimizations will only be faster when dealing with long arrays. Don’t over optimize.

Answer 3 (score 117)

You can use smartmatch feature in Perl 5.10 as follows:

For literal value lookup doing below will do the trick.

if ( "value" ~~ @array ) 

For scalar lookup, doing below will work as above.

if ($val ~~ @array)

For inline array doing below, will work as above.

if ( $var ~~ ['bar', 'value', 'foo'] ) 

In Perl 5.18 smartmatch is flagged as experimental therefore you need to turn off the warnings by turning on experimental pragma by adding below to your script/module:

use experimental 'smartmatch';

Alternatively if you want to avoid the use of smartmatch - then as Aaron said use:

if ( grep( /^$value$/, @array ) ) {
  #TODO:
}

5: How do I compare two strings in Perl? (score 361902 in 2009)

Question

How do I compare two strings in Perl?

I am learning Perl, I had this basic question looked it up here on StackOverflow and found no good answer so I thought I would ask.

Answer accepted (score 172)

See perldoc perlop. Use lt, gt, eq, ne, and cmp as appropriate for string comparisons:

Binary eq returns true if the left argument is stringwise equal to the right argument.

Binary ne returns true if the left argument is stringwise not equal to the right argument.

Binary cmp returns -1, 0, or 1 depending on whether the left argument is stringwise less than, equal to, or greater than the right argument.

Binary ~~ does a smartmatch between its arguments. …

lt, le, ge, gt and cmp use the collation (sort) order specified by the current locale if a legacy use locale (but not use locale ':not_characters') is in effect. See perllocale. Do not mix these with Unicode, only with legacy binary encodings. The standard Unicode::Collate and Unicode::Collate::Locale modules offer much more powerful solutions to collation issues.

Answer 2 (score 132)

  • cmp Compare

    'a' cmp 'b' # -1
    'b' cmp 'a' #  1
    'a' cmp 'a' #  0
    ```</li>
    <li><p>`eq` Equal to</p>
    
    ```perl
    'a' eq  'b' #  0
    'b' eq  'a' #  0
    'a' eq  'a' #  1
    ```</li>
    <li><p>`ne` Not-Equal to</p>
    
    ```perl
    'a' ne  'b' #  1
    'b' ne  'a' #  1
    'a' ne  'a' #  0
    ```</li>
    <li><p>`lt` Less than</p>
    
    ```perl
    'a' lt  'b' #  1
    'b' lt  'a' #  0
    'a' lt  'a' #  0
    ```</li>
    <li><p>`le` Less than or equal to</p>
    
    ```perl
    'a' le  'b' #  1
    'b' le  'a' #  0
    'a' le  'a' #  1
    ```</li>
    <li><p>`gt` Greater than</p>
    
    ```perl
    'a' gt  'b' #  0
    'b' gt  'a' #  1
    'a' gt  'a' #  0
    ```</li>
    <li><p>`ge` Greater than or equal to</p>
    
    ```perl
    'a' ge  'b' #  0
    'b' ge  'a' #  1
    'a' ge  'a' #  1
    ```</li>
    </ul>
    
    See <a href="http://perldoc.perl.org/perlop.html#Equality-Operators" rel="noreferrer">`perldoc perlop`</a> for more information.  
    
    ( I'm simplifying this a little bit as all but `cmp` return a value that is both an empty string, and a numerically zero value instead of `0`, and a value that is both the string `'1'` and the numeric value `1`. These are the same values you will always get from boolean operators in Perl. You should really only be using the return values for boolean or numeric operations, in which case the difference doesn't really matter. )  
    
    #### Answer 3 (score 17)
    In addtion to Sinan Ünür comprehensive listing of string comparison operators, Perl 5.10 adds the smart match operator.  
    
    The smart match operator compares two items based on their type.  See the chart below for the 5.10 behavior (I believe this behavior is changing slightly in 5.10.1):  
    
    <h5><a href="http://perldoc.perl.org/perlsyn.html#Smart-matching-in-detail" rel="noreferrer">`perldoc perlsyn` "Smart matching in detail"</a>:</h3>
    
    <blockquote><p>
    The behaviour of a smart match depends on what type of thing its arguments are. It is always commutative, i.e. `$a ~~ $b` behaves the same as `$b ~~ $a` . The behaviour is determined by the following table: the first row that applies, in either order, determines the match behaviour.
    
    <pre>
      $a      $b        Type of Match Implied    Matching Code
      ======  =====     =====================    =============
      (overloading trumps everything)
    
      Code[+] Code[+]   referential equality     $a == $b   
      Any     Code[+]   scalar sub truth         $b−>($a)   
    
      Hash    Hash      hash keys identical      [sort keys %$a]~~[sort keys %$b]
      Hash    Array     hash slice existence     grep {exists $a−>{$_}} @$b
      Hash    Regex     hash key grep            grep /$b/, keys %$a
      Hash    Any       hash entry existence     exists $a−>{$b}
    
      Array   Array     arrays are identical[*]
      Array   Regex     array grep               grep /$b/, @$a
      Array   Num       array contains number    grep $_ == $b, @$a 
      Array   Any       array contains string    grep $_ eq $b, @$a 
    
      Any     undef     undefined                !defined $a
      Any     Regex     pattern match            $a =~ /$b/ 
      Code()  Code()    results are equal        $a−>() eq $b−>()
      Any     Code()    simple closure truth     $b−>() # ignoring $a
      Num     numish[!] numeric equality         $a == $b   
      Any     Str       string equality          $a eq $b   
      Any     Num       numeric equality         $a == $b   
    
      Any     Any       string equality          $a eq $b   
    
    + − this must be a code reference whose prototype (if present) is not ""
    (subs with a "" prototype are dealt with by the 'Code()' entry lower down) 
    * − that is, each element matches the element of same index in the other
    array. If a circular reference is found, we fall back to referential 
    equality.   
    ! − either a real number, or a string that looks like a number
    </pre>
    
    The "matching code" doesn't represent the real matching code, of course: it's just there to explain the intended meaning. Unlike grep, the smart match operator will short-circuit whenever it can.
    
    Custom matching via overloading
    You can change the way that an object is matched by overloading the `~~` operator. This trumps the usual smart match semantics. See <a href="http://perldoc.perl.org/overload.html" rel="noreferrer">`overload`</a>.
    
    </blockquote>
    
    </b> </em> </i> </small> </strong> </sub> </sup>
    
    ### 6: How can I pass command-line arguments to a Perl program? (score [338119](https://stackoverflow.com/q/361752.html) in 2012)
    
    #### Question
    I'm working on a Perl script. How can I pass command line parameters to it?  
    
    Example:  
    
    ```perl
    script.pl "string1" "string2"

Answer accepted (score 188)

Depends on what you want to do. If you want to use the two arguments as input files, you can just pass them in and then use &lt;&gt; to read their contents.

If they have a different meaning, you can use GetOpt::Std and GetOpt::Long to process them easily. GetOpt::Std supports only single-character switches and GetOpt::Long is much more flexible. From GetOpt::Long:

use Getopt::Long;
my $data   = "file.dat";
my $length = 24;
my $verbose;
$result = GetOptions ("length=i" => \$length,    # numeric
                    "file=s"   => \$data,      # string
                    "verbose"  => \$verbose);  # flag

Alternatively, @ARGV is a special variable that contains all the command line arguments. $ARGV[0] is the first (ie. "string1" in your case) and $ARGV[1] is the second argument. You don’t need a special module to access @ARGV.

Answer 2 (score 60)

You pass them in just like you’re thinking, and in your script, you get them from the array @ARGV. Like so:

my $numArgs = $#ARGV + 1;
print "thanks, you gave me $numArgs command-line arguments.\n";

foreach my $argnum (0 .. $#ARGV) {

   print "$ARGV[$argnum]\n";

}

From here.

Answer 3 (score 29)

foreach my $arg (@ARGV) {
    print $arg, "\n";
}

will print each argument.

7: How do you round a floating point number in Perl? (score 322891 in 2008)

Question

How can I round a decimal number (floating point) to the nearest integer?

e.g. 

1.2 = 1
1.7 = 2

Answer accepted (score 192)

Output of perldoc -q round

Does Perl have a round() function? What about ceil() and floor()? Trig functions?

Remember that int() merely truncates toward 0. For rounding to a certain number of digits, sprintf() or printf() is usually the easiest route.

    printf("%.3f", 3.1415926535);       # prints 3.142

The POSIX module (part of the standard Perl distribution) implements ceil(), floor(), and a number of other mathematical and trigonometric functions.

    use POSIX;
    $ceil   = ceil(3.5);                        # 4
    $floor  = floor(3.5);                       # 3

In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex module. With 5.004, the Math::Trig module (part of the standard Perl distribution) implements the trigonometric functions. Internally it uses the Math::Complex module and some functions can break out from the real axis into the complex plane, for example the inverse sine of 2.

Rounding in financial applications can have serious implications, and the rounding method used should be specified precisely. In these cases, it probably pays not to trust whichever system rounding is being used by Perl, but to instead implement the rounding function you need yourself.

To see why, notice how you’ll still have an issue on half-way-point alternation:

    for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}

    0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
    0.8 0.8 0.9 0.9 1.0 1.0

Don’t blame Perl. It’s the same as in C. IEEE says we have to do this. Perl numbers whose absolute values are integers under 2**31 (on 32 bit machines) will work pretty much like mathematical integers. Other numbers are not guaranteed.

Answer 2 (score 127)

Whilst not disagreeing with the complex answers about half-way marks and so on, for the more common (and possibly trivial) use-case:

my $rounded = int($float + 0.5);

UPDATE

If it’s possible for your $float to be negative, the following variation will produce the correct result:

my $rounded = int($float + $float/abs($float*2 || 1));

With this calculation -1.4 is rounded to -1, and -1.6 to -2, and zero won’t explode.

Answer 3 (score 71)

You can either use a module like Math::Round:

use Math::Round;
my $rounded = round( $float );

Or you can do it the crude way:

my $rounded = sprintf "%.0f", $float;

8: How can I print the contents of a hash in Perl? (score 306215 in 2016)

Question

I keep printing my hash as # of buckets / # allocated. How do I print the contents of my hash?

Without using a while loop would be most preferable (for example, a one-liner would be best).

Answer 2 (score 240)

Data::Dumper is your friend.

use Data::Dumper;
my %hash = ('abc' => 123, 'def' => [4,5,6]);
print Dumper(\%hash);

will output

$VAR1 = {
          'def' => [
                     4,
                     5,
                     6
                   ],
          'abc' => 123
        };

Answer 3 (score 60)

Easy:

print "$_ $h{$_}\n" for (keys %h);

Elegant, but actually 30% slower (!):

while (my ($k,$v)=each %h){print "$k $v\n"}

9: Check whether a string contains a substring (score 302935 in 2017)

Question

How can I check whether a given string contains a certain substring, using Perl?

More specifically, I want to see whether s1.domain.com is present in the given string variable.

Answer accepted (score 219)

To find out if a string contains substring you can use the index function:

if (index($str, $substr) != -1) {
    print "$str contains $substr\n";
} 

It will return the position of the first occurrence of $substr in $str, or -1 if the substring is not found.

Answer 2 (score 54)

Another possibility is to use regular expressions which is what Perl is famous for:

if ($mystring =~ /s1\.domain\.com/) {
   print qq("$mystring" contains "s1.domain.com"\n);
}

The backslashes are needed because a . can match any character. You can get around this by using the \Q and \E operators.

my $substring = "s1.domain.com";
    if ($mystring =~ /\Q$substring\E/) {
   print qq("$mystring" contains "$substring"\n);
}

Or, you can do as eugene y stated and use the index function. Just a word of warning: Index returns a -1 when it can’t find a match instead of an undef or 0.

Thus, this is an error:

my $substring = "s1.domain.com";
if (not index($mystring, $substr)) {
    print qq("$mystring" doesn't contains "$substring"\n";
} 

This will be wrong if s1.domain.com is at the beginning of your string. I’ve personally been burned on this more than once.

Answer 3 (score 14)

Case Insensitive Substring Example

This is an extension of Eugene’s answer, which converts the strings to lower case before checking for the substring:

if (index(lc($str), lc($substr)) != -1) {
    print "$str contains $substr\n";
} 

10: String compare in Perl with “eq” vs “==” (score 301040 in 2016)

Question

I am (a complete Perl newbie) doing string compare in an if statement:

If I do following:

if ($str1 == "taste" && $str2 == "waste") { }

I see the correct result (i.e. if the condition matches, it evaluates the “then” block). But I see these warnings:

Argument “taste” isn’t numeric in numeric eq (==) at line number x.
Argument “waste” isn’t numeric in numeric eq (==) at line number x.

But if I do:

if ($str1 eq "taste" && $str2 eq "waste") { }

Even if the if condition is satisfied, it doesn’t evaluate the “then” block.

Here, $str1 is taste and $str2 is waste.

How should I fix this?

Answer accepted (score 104)

First, eq is for comparing strings; == is for comparing numbers.

Even if the “if” condition is satisfied, it doesn’t evaluate the “then” block.

I think your problem is that your variables don’t contain what you think they do. I think your $str1 or $str2 contains something like “taste” or so. Check them by printing before your if: print "str1='$str1'\n";.

The trailing newline can be removed with the chomp($str1); function.

Answer 2 (score 28)

== does a numeric comparison: it converts both arguments to a number and then compares them. As long as $str1 and $str2 both evaluate to 0 as numbers, the condition will be satisfied.

eq does a string comparison: the two arguments must match lexically (case-sensitive) for the condition to be satisfied.

"foo" == "bar";   # True, both strings evaluate to 0.
"foo" eq "bar";   # False, the strings are not equivalent.
"Foo" eq "foo";   # False, the F characters are different cases.
"foo" eq "foo";   # True, both strings match exactly.

Answer 3 (score 5)

Did you try to chomp the $str1 and $str2?

I found a similar issue with using (another) $str1 eq ‘Y’ and it only went away when I first did:

chomp($str1);
if ($str1 eq 'Y') {
....
}

works after that.

Hope that helps.

11: Perl read line by line (score 273520 in 2012)

Question

I have a simple Perl script to read a file line by line. Code is below. I want to display two lines and break the loop. But it doesn’t work. Where is the bug?

$file='SnPmaster.txt';
open(INFO, $file) or die("Could not open  file.");

$count = 0; 
foreach $line (<INFO>)  {   
    print $line;    
    if ($++counter == 2){
      last;
    }
}
close(INFO);

Answer accepted (score 118)

If you had use strict turned on, you would have found out that $++foo doesn’t make any sense.

Here’s how to do it:

use strict;
use warnings;

my $file = 'SnPmaster.txt';
open my $info, $file or die "Could not open $file: $!";

while( my $line = <$info>)  {   
    print $line;    
    last if $. == 2;
}

close $info;

This takes advantage of the special variable $. which keeps track of the line number in the current file. (See perlvar)

If you want to use a counter instead, use

my $count = 0;
while( my $line = <$info>)  {   
    print $line;    
    last if ++$count == 2;
}

Answer 2 (score 12)

With these types of complex programs, it’s better to let Perl generate the Perl code for you:

$ perl -MO=Deparse -pe'exit if $.>2'

Which will gladly tell you the answer,

LINE: while (defined($_ = <ARGV>)) {
    exit if $. > 2;
}
continue {
    die "-p destination: $!\n" unless print $_;
}

Alternatively, you can simply run it as such from the command line,

$ perl -pe'exit if$.>2' file.txt

Answer 3 (score 5)

you need to use ++$counter, not $++counter, hence the reason it isn’t working..

12: How can I check if a file exists in Perl? (score 269401 in 2013)

Question

I have a relative path

   $base_path = "input/myMock.TGZ";

myMock.TGZ is the file name located in input folder. The filename can change. But the path is always stored in $base_path.

I need to check if the file exists in $base_path.

Answer accepted (score 170)

Test whether something exists at given path using the -e file-test operator.

print "$base_path exists!\n" if -e $base_path;

However, this test is probably broader than you intend. The code above will generate output if a plain file exists at that path, but it will also fire for a directory, a named pipe, a symlink, or a more exotic possibility. See the documentation for details.

Given the extension of .TGZ in your question, it seems that you expect a plain file rather than the alternatives. The -f file-test operator asks whether a path leads to a plain file.

print "$base_path is a plain file!\n" if -f $base_path;

The perlfunc documentation covers the long list of Perl’s file-test operators that covers many situations you will encounter in practice.

  • -r
    File is readable by effective uid/gid.
  • -w
    File is writable by effective uid/gid.
  • -x
    File is executable by effective uid/gid.
  • -o
    File is owned by effective uid.
  • -R
    File is readable by real uid/gid.
  • -W
    File is writable by real uid/gid.
  • -X
    File is executable by real uid/gid.
  • -O
    File is owned by real uid.
  • -e
    File exists.
  • -z
    File has zero size (is empty).
  • -s
    File has nonzero size (returns size in bytes).
  • -f
    File is a plain file.
  • -d
    File is a directory.
  • -l
    File is a symbolic link (false if symlinks aren’t supported by the file system).
  • -p
    File is a named pipe (FIFO), or Filehandle is a pipe.
  • -S
    File is a socket.
  • -b
    File is a block special file.
  • -c
    File is a character special file.
  • -t
    Filehandle is opened to a tty.
  • -u
    File has setuid bit set.
  • -g
    File has setgid bit set.
  • -k
    File has sticky bit set.
  • -T
    File is an ASCII or UTF-8 text file (heuristic guess).
  • -B
    File is a “binary” file (opposite of -T).
  • -M
    Script start time minus file modification time, in days.
  • -A
    Same for access time.
  • -C
    Same for inode change time (Unix, may differ for other platforms)

Answer 2 (score 29)

You might want a variant of exists … perldoc -f “-f”

      -X FILEHANDLE
       -X EXPR
       -X DIRHANDLE
       -X      A file test, where X is one of the letters listed below.  This unary operator takes one argument,
               either a filename, a filehandle, or a dirhandle, and tests the associated file to see if something is
               true about it.  If the argument is omitted, tests $_, except for "-t", which tests STDIN.  Unless
               otherwise documented, it returns 1 for true and '' for false, or the undefined value if the file
               doesn’t exist.  Despite the funny names, precedence is the same as any other named unary operator.
               The operator may be any of:

                   -r  File is readable by effective uid/gid.
                   -w  File is writable by effective uid/gid.
                   -x  File is executable by effective uid/gid.
                   -o  File is owned by effective uid.

                   -R  File is readable by real uid/gid.
                   -W  File is writable by real uid/gid.
                   -X  File is executable by real uid/gid.
                   -O  File is owned by real uid.

                   -e  File exists.
                   -z  File has zero size (is empty).
                   -s  File has nonzero size (returns size in bytes).

                   -f  File is a plain file.
                   -d  File is a directory.
                   -l  File is a symbolic link.
                   -p  File is a named pipe (FIFO), or Filehandle is a pipe.
                   -S  File is a socket.
                   -b  File is a block special file.
                   -c  File is a character special file.
                   -t  Filehandle is opened to a tty.

                   -u  File has setuid bit set.
                   -g  File has setgid bit set.
                   -k  File has sticky bit set.

                   -T  File is an ASCII text file (heuristic guess).
                   -B  File is a "binary" file (opposite of -T).

                   -M  Script start time minus file modification time, in days.

Answer 3 (score 15)

if (-e $base_path)
{ 
 # code
}

-e is the ‘existence’ operator in Perl.

You can check permissions and other attributes using the code on this page.

13: How can I convert a string to a number in Perl? (score 264988 in 2012)

Question

How would I convert a string holding a number into its numeric value in Perl?

Answer accepted (score 89)

You don’t need to convert it at all:

% perl -e 'print "5.45" + 0.1;'
5.55

Answer 2 (score 69)

This is a simple solution:

Example 1

my $var1 = "123abc";
print $var1 + 0;

Result

123

Example 2

my $var2 = "abc123";
print $var2 + 0;

Result

0

Answer 3 (score 40)

Perl is a context-based language. It doesn’t do its work according to the data you give it. Instead, it figures out how to treat the data based on the operators you use and the context in which you use them. If you do numbers sorts of things, you get numbers:

# numeric addition with strings:
my $sum = '5.45' + '0.01'; # 5.46

If you do strings sorts of things, you get strings:

# string replication with numbers:
my $string = ( 45/2 ) x 4; # "22.522.522.522.5"

Perl mostly figures out what to do and it’s mostly right. Another way of saying the same thing is that Perl cares more about the verbs than it does the nouns.

Are you trying to do something and it isn’t working?

14: How do I enter a multi-line comment in Perl? (score 250429 in 2017)

Question

Possible Duplicate:
What are the common workarounds for multi-line comments in Perl?

How do I add a multi-line comment to Perl source code?

Answer accepted (score 136)

POD is the official way to do multi line comments in Perl,

From faq.perl.org[perlfaq7]

The quick-and-dirty way to comment out more than one line of Perl is to surround those lines with Pod directives. You have to put these directives at the beginning of the line and somewhere where Perl expects a new statement (so not in the middle of statements like the # comments). You end the comment with =cut, ending the Pod section:

=pod

my $object = NotGonnaHappen->new();

ignored_sub();

$wont_be_assigned = 37;

=cut

The quick-and-dirty method only works well when you don’t plan to leave the commented code in the source. If a Pod parser comes along, your multiline comment is going to show up in the Pod translation. A better way hides it from Pod parsers as well.

The =begin directive can mark a section for a particular purpose. If the Pod parser doesn’t want to handle it, it just ignores it. Label the comments with comment. End the comment using =end with the same label. You still need the =cut to go back to Perl code from the Pod comment:

=begin comment

my $object = NotGonnaHappen->new();

ignored_sub();

$wont_be_assigned = 37;

=end comment

=cut

Answer 2 (score 24)

I found it. Perl has multi-line comments:

#!/usr/bin/perl

use strict;

use warnings;

=for comment

Example of multiline comment.

Example of multiline comment.

=cut

print "Multi Line Comment Example \n";

15: How do I use boolean variables in Perl? (score 243622 in 2009)

Question

I have tried:

$var = false;
$var = FALSE;
$var = False;

None of these work. I get the error message

Bareword "false" not allowed while "strict subs" is in use.

Answer accepted (score 280)

In Perl, the following evaluate to false in conditionals:

0
'0'
undef
''  # Empty scalar
()  # Empty list
('')

The rest are true. There are no barewords for true or false.

Answer 2 (score 69)

The most complete, concise definition of false I’ve come across is:

Anything that stringifies to the empty string or the string 0 is false. Everything else is true.

Therefore, the following values are false:

  • The empty string
  • Numerical value zero
  • An undefined value
  • An object with an overloaded boolean operator that evaluates one of the above.
  • A magical variable that evaluates to one of the above on fetch.

Keep in mind that an empty list literal evaluates to an undefined value in scalar context, so it evaluates to something false.


A note on “true zeroes”

While numbers that stringify to 0 are false, strings that numify to zero aren’t necessarily. The only false strings are 0 and the empty string. Any other string, even if it numifies to zero, is true.

The following are strings that are true as a boolean and zero as a number.

  • Without a warning:
    • "0.0"
    • "0E0"
    • "00"
    • "+0"
    • "-0"
    • " 0"
    • "0\n"
    • ".0"
    • "0."
    • "0 but true"
    • "\t00"
    • "\n0e1"
    • "+0.e-9"
  • With a warning:
    • Any string for which Scalar::Util::looks_like_number returns false. (e.g. "abc")

Answer 3 (score 57)

Perl doesn’t have a native boolean type, but you can use comparison of integers or strings in order to get the same behavior. Alan’s example is a nice way of doing that using comparison of integers. Here’s an example

my $boolean = 0;
if ( $boolean ) {
    print "$boolean evaluates to true\n";
} else {
    print "$boolean evaluates to false\n";
}

One thing that I’ve done in some of my programs is added the same behavior using a constant:

#!/usr/bin/perl

use strict;
use warnings;

use constant false => 0;
use constant true  => 1;

my $val1 = true;
my $val2 = false;

print $val1, " && ", $val2;
if ( $val1 && $val2 ) {
    print " evaluates to true.\n";
} else {
    print " evaluates to false.\n";
}

print $val1, " || ", $val2;
if ( $val1 || $val2 ) {
    print " evaluates to true.\n";
} else {
    print " evaluates to false.\n";
}

The lines marked in “use constant” define a constant named true that always evaluates to 1, and a constant named false that always evaluates by 0. Because of the way that constants are defined in Perl, the following lines of code fails as well:

true = 0;
true = false;

The error message should say something like “Can’t modify constant in scalar assignment.”

I saw that in one of the comments you asked about comparing strings. You should know that because Perl combines strings and numeric types in scalar variables, you have different syntax for comparing strings and numbers:

my $var1 = "5.0";
my $var2 = "5";

print "using operator eq\n";
if ( $var1 eq $var2 ) {
    print "$var1 and $var2 are equal!\n";
} else {
    print "$var1 and $var2 are not equal!\n";
}

print "using operator ==\n";
if ( $var1 == $var2 ) {
    print "$var1 and $var2 are equal!\n";
} else {
    print "$var1 and $var2 are not equal!\n";
}

The difference between these operators is a very common source of confusion in Perl.

16: Easy way to print Perl array? (with a little formatting) (score 242206 in 2017)

Question

Is there an easy way to print out a Perl array with commas in between each element?

Writing a for loop to do it is pretty easy but not quite elegant….if that makes sense.

Answer accepted (score 143)

Just use join():

# assuming @array is your array:
print join(", ", @array);

Answer 2 (score 29)

You can use Data::Dump:

use Data::Dump qw(dump);
my @a = (1, [2, 3], {4 => 5});
dump(@a);

Produces:

"(1, [2, 3], { 4 => 5 })"

Answer 3 (score 18)

If you’re coding for the kind of clarity that would be understood by someone who is just starting out with Perl, the traditional this construct says what it means, with a high degree of clarity and legibility:

$string = join ', ', @array;
print "$string\n";

This construct is documented in perldoc -fjoin.

However, I’ve always liked how simple $, makes it. The special variable $" is for interpolation, and the special variable $, is for lists. Combine either one with dynamic scope-constraining ‘local’ to avoid having ripple effects throughout the script:

use 5.012_002;
use strict;
use warnings;

my @array = qw/ 1 2 3 4 5 /;

{
    local $" = ', ';
    print "@array\n"; # Interpolation.
}

OR with $,:

use feature q(say);
use strict;
use warnings;

my @array = qw/ 1 2 3 4 5 /;
{
    local $, = ', ';
    say @array; # List
}

The special variables $, and $" are documented in perlvar. The local keyword, and how it can be used to constrain the effects of altering a global punctuation variable’s value is probably best described in perlsub.

Enjoy!

17: What’s the easiest way to install a missing Perl module? (score 241111 in 2015)

Question

I get this error:

Can't locate Foo.pm in @INC

Is there an easier way to install it than downloading, untarring, making, etc?

Answer accepted (score 211)

On Unix:

usually you start cpan in your shell:

# cpan

and type

install Chocolate::Belgian

or in short form:

cpan Chocolate::Belgian

On Windows:

If you’re using ActivePerl on Windows, the PPM (Perl Package Manager) has much of the same functionality as CPAN.pm.

Example:

# ppm
ppm&gt; search net-smtp
ppm&gt; install Net-SMTP-Multipart

see How do I install Perl modules? in the CPAN FAQ

Many distributions ship a lot of perl modules as packages.

  • Debian/Ubuntu: apt-cache search 'perl$'
  • Arch Linux: pacman -Ss '^perl-'
  • Gentoo: category dev-perl

You should always prefer them as you benefit from automatic (security) updates and the ease of removal. This can be pretty tricky with the cpan tool itself.

For Gentoo there’s a nice tool called g-cpan which builds/installs the module from CPAN and creates a Gentoo package (ebuild) for you.

Answer 2 (score 61)

Try App::cpanminus:

# cpanm Chocolate::Belgian

It’s great for just getting stuff installed. It provides none of the more complex functionality of CPAN or CPANPLUS, so it’s easy to use, provided you know which module you want to install. If you haven’t already got cpanminus, just type:

# cpan App::cpanminus

to install it.

It is also possible to install it without using cpan at all. The basic bootstrap procedure is,

curl -L http://cpanmin.us | perl - --sudo App::cpanminus

For more information go to the App::cpanminus page and look at the section on installation.

Answer 3 (score 29)

I note some folks suggesting one run cpan under sudo. That used to be necessary to install into the system directory, but modern versions of the CPAN shell allow you to configure it to use sudo just for installing. This is much safer, since it means that tests don’t run as root.

If you have an old CPAN shell, simply install the new cpan (“install CPAN”) and when you reload the shell, it should prompt you to configure these new directives.

Nowadays, when I’m on a system with an old CPAN, the first thing I do is update the shell and set it up to do this so I can do most of my cpan work as a normal user.

Also, I’d strongly suggest that Windows users investigate strawberry Perl. This is a version of Perl that comes packaged with a pre-configured CPAN shell as well as a compiler. It also includes some hard-to-compile Perl modules with their external C library dependencies, notably XML::Parser. This means that you can do the same thing as every other Perl user when it comes to installing modules, and things tend to “just work” a lot more often.

18: Regex to match any character including new lines (score 222128 in )

Question

Is there a regex to match “all characters including newlines”?

For example, in the regex below, there is no output from $2 because (.+?) doesn’t include new lines when matching.

$string = "START Curabitur mollis, dolor ut rutrum consequat, arcu nisl ultrices diam, adipiscing aliquam ipsum metus id velit. Aenean vestibulum gravida felis, quis bibendum nisl euismod ut. 

Nunc at orci sed quam pharetra congue. Nulla a justo vitae diam eleifend dictum. Maecenas egestas ipsum elementum dui sollicitudin tempus. Donec bibendum cursus nisi, vitae convallis ante ornare a. Curabitur libero lorem, semper sit amet cursus at, cursus id purus. Cras varius metus eu diam vulputate vel elementum mauris tempor. 

Morbi tristique interdum libero, eu pulvinar elit fringilla vel. Curabitur fringilla bibendum urna, ullamcorper placerat quam fermentum id. Nunc aliquam, nunc sit amet bibendum lacinia, magna massa auctor enim, nec dictum sapien eros in arcu. 

Pellentesque viverra ullamcorper lectus, a facilisis ipsum tempus et. Nulla mi enim, interdum at imperdiet eget, bibendum nec END";

$string =~ /(START)(.+?)(END)/;

print $2;

Answer accepted (score 170)

Add the s modifier to your regex to cause . to match newlines:

$string =~ /(START)(.+?)(END)/s;

Answer 2 (score 287)

If you don’t want add the /s regex modifier (perhaps you still want . to retain its original meaning elsewhere in the regex), you may also use a character class. One possibility:

[\S\s]

a character which is not a space or is a space… in other words, any character.

You can also change modifiers locally in a small part of the regex, like so:

(?s:.)

Answer 3 (score 8)

Yeap, you just need to make . match newline :

$string =~ /(START)(.+?)(END)/s;

19: Best way to iterate through a Perl array (score 214532 in 2013)

Question

Which is the best implementation(in terms of speed and memory usage) for iterating through a Perl array? Is there any better way? (@Array need not be retained).

Implementation 1
foreach (@Array)
{
      SubRoutine($_);
}
Implementation 2
while($Element=shift(@Array))
{
      SubRoutine($Element);
}
Implementation 3
while(scalar(@Array) !=0)
{
      $Element=shift(@Array);
      SubRoutine($Element);
}
Implementation 4
for my $i (0 .. $#Array)
{
      SubRoutine($Array[$i]);
}
Implementation 5
map { SubRoutine($_) } @Array ;

Answer accepted (score 75)

  • In terms of speed: #1 and #4, but not by much in most instances.

    You could write a benchmark to confirm, but I suspect you’ll find #1 and #4 to be slightly faster because the iteration work is done in C instead of Perl, and no needless copying of the array elements occurs. ($_ is aliased to the element in #1, but #2 and #3 actually copy the scalars from the array.)

    #5 might be similar.
  • In terms memory usage: They’re all the same except for #5.

    for (@a) is special-cased to avoid flattening the array. The loop iterates over the indexes of the array.
  • In terms of readability: #1.

  • In terms of flexibility: #1/#4 and #5.

    #2 does not support elements that are false. #2 and #3 are destructive.

Answer 2 (score 24)

If you only care about the elements of @Array, use:

for my $el (@Array) {
# ...
}

or

If the indices matter, use:

for my $i (0 .. $#Array) {
# ...
}

Or, as of perl 5.12.1, you can use:

while (my ($i, $el) = each @Array) {
# ...
}

If you need both the element and its index in the body of the loop, I would expect using each to be the fastest, but then you’ll be giving up compatibility with pre-5.12.1 perls.

Some other pattern than these might be appropriate under certain circumstances.

Answer 3 (score 3)

IMO, implementation #1 is typical and being short and idiomatic for Perl trumps the others for that alone. A benchmark of the three choices might offer you insight into speed, at least.

20: What’s the difference between Perl’s backticks, system, and exec? (score 212862 in 2010)

Question

Can someone please help me? In Perl, what is the difference between:

exec "command";

and

system("command");

and

print `command`;

Are there other ways to run shell commands too?

Answer accepted (score 257)

exec

executes a command and never returns. It’s like a return statement in a function.

If the command is not found exec returns false. It never returns true, because if the command is found it never returns at all. There is also no point in returning STDOUT, STDERR or exit status of the command. You can find documentation about it in perlfunc, because it is a function.

system

executes a command and your Perl script is continued after the command has finished.

The return value is the exit status of the command. You can find documentation about it in perlfunc.

backticks

like system executes a command and your perl script is continued after the command has finished.

In contrary to system the return value is STDOUT of the command. qx// is equivalent to backticks. You can find documentation about it in perlop, because unlike system and execit is an operator.


Other ways

What is missing from the above is a way to execute a command asynchronously. That means your perl script and your command run simultaneously. This can be accomplished with open. It allows you to read STDOUT/STDERR and write to STDIN of your command. It is platform dependent though.

There are also several modules which can ease this tasks. There is IPC::Open2 and IPC::Open3 and IPC::Run, as well as Win32::Process::Create if you are on windows.

Answer 2 (score 165)

In general I use system, open, IPC::Open2, or IPC::Open3 depending on what I want to do. The qx// operator, while simple, is too constraining in its functionality to be very useful outside of quick hacks. I find open to much handier.

system: run a command and wait for it to return

Use system when you want to run a command, don’t care about its output, and don’t want the Perl script to do anything until the command finishes.

#doesn't spawn a shell, arguments are passed as they are
system("command", "arg1", "arg2", "arg3");

or

#spawns a shell, arguments are interpreted by the shell, use only if you
#want the shell to do globbing (e.g. *.txt) for you or you want to redirect
#output
system("command arg1 arg2 arg3");
qx// or ``: run a command and capture its STDOUT

Use qx// when you want to run a command, capture what it writes to STDOUT, and don’t want the Perl script to do anything until the command finishes.

#arguments are always processed by the shell

#in list context it returns the output as a list of lines
my @lines = qx/command arg1 arg2 arg3/;

#in scalar context it returns the output as one string
my $output = qx/command arg1 arg2 arg3/;
exec: replace the current process with another process.

Use exec along with fork when you want to run a command, don’t care about its output, and don’t want to wait for it to return. system is really just

sub my_system {
    die "could not fork\n" unless defined(my $pid = fork);
    return waitpid $pid, 0 if $pid; #parent waits for child
    exec @_; #replace child with new process
}

You may also want to read the waitpid and perlipc manuals.

open: run a process and create a pipe to its STDIN or STDERR

Use open when you want to write data to a process’s STDIN or read data from a process’s STDOUT (but not both at the same time).

#read from a gzip file as if it were a normal file
open my $read_fh, "-|", "gzip", "-d", $filename
    or die "could not open $filename: $!";

#write to a gzip compressed file as if were a normal file
open my $write_fh, "|-", "gzip", $filename
    or die "could not open $filename: $!";
IPC::Open2: run a process and create a pipe to both STDIN and STDOUT

Use IPC::Open2 when you need to read from and write to a process’s STDIN and STDOUT.

use IPC::Open2;

open2 my $out, my $in, "/usr/bin/bc"
    or die "could not run bc";

print $in "5+6\n";

my $answer = <$out>;
IPC::Open3: run a process and create a pipe to STDIN, STDOUT, and STDERR

use IPC::Open3 when you need to capture all three standard file handles of the process. I would write an example, but it works mostly the same way IPC::Open2 does, but with a slightly different order to the arguments and a third file handle.

Answer 3 (score 17)

Let me quote the manuals first:

perldoc exec():

The exec function executes a system command and never returns– use system instead of exec if you want it to return

perldoc system():

Does exactly the same thing as exec LIST , except that a fork is done first, and the parent process waits for the child process to complete.

In contrast to exec and system, backticks don’t give you the return value but the collected STDOUT.

perldoc `String`:

A string which is (possibly) interpolated and then executed as a system command with /bin/sh or its equivalent. Shell wildcards, pipes, and redirections will be honored. The collected standard output of the command is returned; standard error is unaffected.

Alternatives:

In more complex scenarios, where you want to fetch STDOUT, STDERR or the return code, you can use well known standard modules like IPC::Open2 and IPC::Open3.

Example:

use IPC::Open2;
my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'some', 'cmd', 'and', 'args');
waitpid( $pid, 0 );
my $child_exit_status = $? >> 8;

Finally, IPC::Run from the CPAN is also worth looking at…

21: Show a PDF files in users browser via PHP/Perl (score 211415 in 2012)

Question

I want to show my users PDF files. The reason why I use cgi to show the pdf is I want to track the clicks for the pdf, and cloak the real location of the saved pdf.

I’ve been searching on the Internet and only found how to show save dialog to the users and creating a pdf, not show the files to the users.

What I wanted for is show the users my pdf files, not creating or download the pdf. Here is what I got form the official php documentation:

<?php
header('Content-type: application/pdf');
readfile('the.pdf');
?>

Also my google-search-result perl code:

open(PDF, "the.pdf") or die "could not open PDF [$!]";
binmode PDF;
my $output = do { local $/; <PDF> };
close (PDF);

print "Content-Type: application/pdf\n";
print "Content-Length: " .length($output) . "\n\n";
print $output

if you do it on ruby, please say it to me. But I’m not sure if my server support rails.

Sorry if my code is too far away from the method to show the pdf, since I don’t know anything about pdf processing and how to implement this problem.

Lets assume that the users have the Adobe Reader plug-in. So, how to fix my problem?

edit : I want to show plain pdf file. My primary purpose: track my pdf files and use some fancy urls.

edit : Here’s my main php code:

<?php
$file='/files/the.pdf';
header('Content-type: application/pdf');
header('Content-Disposition: inline; filename="the.pdf"');
@readfile($file);
?>

edit : Now the code is working. But the loading progress bar (on Adobe Reader X plugin) doesn’t shows up. Why? Anyone can help me? Here’s my main code:

<?php
$file='./files/the.pdf';
header('Content-type: application/pdf');
header('Content-Disposition: inline; filename="the.pdf"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . filesize($file));
@readfile($file);
?>

edit : All my problems solved. Here’s the final code:

<?php
$file = './path/to/the.pdf';
$filename = 'Custom file name for the.pdf'; /* Note: Always use .pdf at the end. */

header('Content-type: application/pdf');
header('Content-Disposition: inline; filename="' . $filename . '"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . filesize($file));
header('Accept-Ranges: bytes');

@readfile($file);
?>

Thanks! :)

Answer 2 (score 49)

I assume you want the PDF to display in the browser, rather than forcing a download. If that is the case, try setting the Content-Disposition header with a value of inline.

Also remember that this will also be affected by browser settings - some browsers may be configured to always download PDF files or open them in a different application (e.g. Adobe Reader)

Answer 3 (score 14)

    $url ="https://yourFile.pdf";
    $content = file_get_contents($url);

    header('Content-Type: application/pdf');
    header('Content-Length: ' . strlen($content));
    header('Content-Disposition: inline; filename="YourFileName.pdf"');
    header('Cache-Control: private, max-age=0, must-revalidate');
    header('Pragma: public');
    ini_set('zlib.output_compression','0');

    die($content);

Tested and works fine. If you want the file to download instead, replace

    Content-Disposition: inline

with

    Content-Disposition: attachment

22: How do I get a list of installed CPAN modules? (score 207546 in )

Question

Aside from trying

perldoc <module name>

individually for any CPAN module that takes my fancy or going through the file system and looking at the directories I have no idea what modules we have installed.

What’s the easiest way to just get a big list of every CPAN module installed? From the command line or otherwise.

Answer accepted (score 62)

This is answered in the Perl FAQ, the answer which can be quickly found with perldoc -q installed. In short, it comes down to using ExtUtils::Installed or using File::Find, variants of both of which have been covered previously in this thread.

You can also find the FAQ entry “How do I find which modules are installed on my system?” in perlfaq3. You can see a list of all FAQ answers by looking in perlfaq

Answer 2 (score 34)

perldoc perllocal

Edit: There’s a (little) more info about it in the CPAN FAQ

Answer 3 (score 24)

perldoc -q installed

claims that cpan -l will do the trick, however it’s not working for me. The other option:

cpan -a

does spit out a nice list of installed packages and has the nice side effect of writing them to a file.

23: How do I remove duplicate items from an array in Perl? (score 206565 in 2014)

Question

I have an array in Perl:

my @my_array = ("one","two","three","two","three");

How do I remove the duplicates from the array?

Answer accepted (score 159)

You can do something like this as demonstrated in perlfaq4:

sub uniq {
    my %seen;
    grep !$seen{$_}++, @_;
}

my @array = qw(one two three two three);
my @filtered = uniq(@array);

print "@filtered\n";

Outputs:

one two three

If you want to use a module, try the uniq function from List::MoreUtils

Answer 2 (score 119)

The Perl documentation comes with a nice collection of FAQs. Your question is frequently asked:

% perldoc -q duplicate

The answer, copy and pasted from the output of the command above, appears below:

Found in /usr/local/lib/perl5/5.10.0/pods/perlfaq4.pod
 How can I remove duplicate elements from a list or array?
   (contributed by brian d foy)

   Use a hash. When you think the words "unique" or "duplicated", think
   "hash keys".

   If you don't care about the order of the elements, you could just
   create the hash then extract the keys. It's not important how you
   create that hash: just that you use "keys" to get the unique elements.

       my %hash   = map { $_, 1 } @array;
       # or a hash slice: @hash{ @array } = ();
       # or a foreach: $hash{$_} = 1 foreach ( @array );

       my @unique = keys %hash;

   If you want to use a module, try the "uniq" function from
   "List::MoreUtils". In list context it returns the unique elements,
   preserving their order in the list. In scalar context, it returns the
   number of unique elements.

       use List::MoreUtils qw(uniq);

       my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
       my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7

   You can also go through each element and skip the ones you've seen
   before. Use a hash to keep track. The first time the loop sees an
   element, that element has no key in %Seen. The "next" statement creates
   the key and immediately uses its value, which is "undef", so the loop
   continues to the "push" and increments the value for that key. The next
   time the loop sees that same element, its key exists in the hash and
   the value for that key is true (since it's not 0 or "undef"), so the
   next skips that iteration and the loop goes to the next element.

       my @unique = ();
       my %seen   = ();

       foreach my $elem ( @array )
       {
         next if $seen{ $elem }++;
         push @unique, $elem;
       }

   You can write this more briefly using a grep, which does the same
   thing.

       my %seen = ();
       my @unique = grep { ! $seen{ $_ }++ } @array;

Answer 3 (score 67)

Install List::MoreUtils from CPAN

Then in your code:

use strict;
use warnings;
use List::MoreUtils qw(uniq);

my @dup_list = qw(1 1 1 2 3 4 4);

my @uniq_list = uniq(@dup_list);

24: Perl - If string contains text? (score 204940 in 2015)

Question

I want to use curl to view the source of a page and if that source contains a word that matches the string then it will execute a print. How would I do a if $string contains?

In VB it would be like.

dim string1 as string = "1"
If string1.contains("1") Then
Code here...
End If

Something similar to that but in Perl.

Answer 2 (score 90)

If you just need to search for one string within another, use the index function (or rindex if you want to start scanning from the end of the string):

if (index($string, $substring) != -1) {
   print "'$string' contains '$substring'\n";
}

To search a string for a pattern match, use the match operator m//:

if ($string =~ m/pattern/) {
    print "'$string' matches the pattern\n";       
}

Answer 3 (score 29)

if ($string =~ m/something/) {
   # Do work
}

Where something is a regular expression.

25: How to extract string following a pattern with grep, regex or perl (score 193684 in 2018)

Question

I have a file that looks something like this:

<table name="content_analyzer" primary-key="id">
  <type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
  <type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
  <type="global" />
</table>

I need to extract anything within the quotes that follow name=, i.e., content_analyzer, content_analyzer2 and content_analyzer_items.

I am doing this on a Linux box, so a solution using sed, perl, grep or bash is fine.

Answer accepted (score 143)

Since you need to match content without including it in the result (must match name=" but it’s not part of the desired result) some form of zero-width matching or group capturing is required. This can be done easily with the following tools:

Perl

With Perl you could use the n option to loop line by line and print the content of a capturing group if it matches:

perl -ne 'print "$1\n" if /name="(.*?)"/' filename
GNU grep

If you have an improved version of grep, such as GNU grep, you may have the -P option available. This option will enable Perl-like regex, allowing you to use \K which is a shorthand lookbehind. It will reset the match position, so anything before it is zero-width.

grep -Po 'name="\K.*?(?=")' filename

The o option makes grep print only the matched text, instead of the whole line.

Vim - Text Editor

Another way is to use a text editor directly. With Vim, one of the various ways of accomplishing this would be to delete lines without name= and then extract the content from the resulting lines:

:v/.*name="\v([^"]+).*/d|%s//\1

Standard grep

If you don’t have access to these tools, for some reason, something similar could be achieved with standard grep. However, without the look around it will require some cleanup later:

grep -o 'name="[^"]*"' filename

A note about saving results

In all of the commands above the results will be sent to stdout. It’s important to remember that you can always save them by piping it to a file by appending:

> result

to the end of the command.

Answer 2 (score 5)

If you’re using Perl, download a module to parse the XML: XML::Simple, XML::Twig, or XML::LibXML. Don’t re-invent the wheel.

Answer 3 (score 5)

If you’re using Perl, download a module to parse the XML: XML::Simple, XML::Twig, or XML::LibXML. Don’t re-invent the wheel.

26: How can I call a shell command in my Perl script? (score 190411 in 2019)

Question

What would be an example of how I can call a shell command, say ‘ls -a’ in a Perl script and the way to retrieve the output of the command as well?

Answer accepted (score 69)

How to run a shell script from a Perl program

  1. Using system system($command, @arguments);

For example:

system("sh", "script.sh", "--help" );

system("sh script.sh --help");

System will execute the $command with @arguments and return to your script when finished. You may check $! for certain errors passed to the OS by the external application. Read the documentation for system for the nuances of how various invocations are slightly different.

  1. Using exec

This is very similar to the use of system, but it will terminate your script upon execution. Again, read the documentation for exec for more.

  1. Using backticks or qx//

my $output =script.sh –option;

my $output = qx/script.sh --option/;

The backtick operator and it’s equivalent qx// excute the command and options inside the operator and return that commands output to STDOUT when it finishes.

There are also ways to run external applications through creative use of open, but this is advanced use; read the documentation for more.

Answer 2 (score 15)

From Perl HowTo, the most common ways to execute external commands from Perl are:

  • my $files =ls -la`— captures the output of the command in$files</li> <li>system “touch ~/foo”— if you don't want to capture the command's output</li> <li>exec “vim ~/foo”— if you don't want to return to the script after executing the command</li> <li>open(my $file, ‘|-’, “grep foo”); print $file “foo”` — if you want to pipe input into the command

Answer 3 (score 12)

Examples
  1. `ls -l;</li> <li>system(“ls -l”);</li> <li>exec(“ls -l”);`

27: The program can’t start because api-ms-win-crt-runtime-l1-1-0.dll is missing while starting Apache server on my computer (score 184586 in 2017)

Question

In order to quickly set up my PHP/Perl development environment, I installed xampp. Specifically, I attempted to install using: xampp-win32-7.0.1-0-VC14-installer.

When I started the program, I ran into an error related to a dll file. I also tried the suggestions in this question that sounded like fixes to the same problem.

The problem is still not fixed after trying the above.

screenshot of error

Answer accepted (score 10)

I was facing the same issue. After many tries below solution worked for me.

Before installing VC++ install your windows updates. 1. Go to Start - Control Panel - Windows Update 2. Check for the updates. 3. Install all updates. 4. Restart your system.

After that you can follow the below steps.

@ABHI KUMAR

Download the Visual C++ Redistributable 2015

Visual C++ Redistributable for Visual Studio 2015 (64-bit)

Visual C++ Redistributable for Visual Studio 2015 (32-bit)

(Reinstal if already installed) then restart your computer or use windows updates for download auto.

For link download https://www.microsoft.com/de-de/download/details.aspx?id=48145.

Answer 2 (score 9)

Download the Visual C++ Redistributable 2015

Updated links to VC++ file:

For Windows 64-bit

For Windows 32-bit

28: “inappropriate ioctl for device” (score 183560 in 2016)

Question

I have a Perl script running in an AIX box.

The script tries to open a file from a certain directory and it fails to read the file because file has no read permission, but I get a different error saying inappropriate ioctl for device.

Shouldn’t it say something like no read permissions for file or something similar?

What does this inappropriate ioctl for device message mean?

How can I fix it?

EDIT: This is what I found when I did strace.

open("/local/logs/xxx/xxxxServer.log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 
    0666) = 4 _llseek(4, 0, [77146], SEEK_END) = 0
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbffc14f8) = -1 ENOTTY 
    (Inappropriate ioctl for  device)

Answer accepted (score 37)

Most likely it means that the open didn’t fail.

When Perl opens a file, it checks whether or not the file is a TTY (so that it can answer the -T $fh filetest operator) by issuing the TCGETS ioctl against it. If the file is a regular file and not a tty, the ioctl fails and sets errno to ENOTTY (string value: “Inappropriate ioctl for device”). As ysth says, the most common reason for seeing an unexpected value in $! is checking it when it’s not valid – that is, anywhere other than immediately after a syscall failed, so testing the result codes of your operations is critically important.

If open actually did return false for you, and you found ENOTTY in $! then I would consider this a small bug (giving a useless value of $!) but I would also be very curious as to how it happened. Code and/or truss output would be nifty.

Answer 2 (score 21)

Odd errors like “inappropriate ioctl for device” are usually a result of checking $! at some point other than just after a system call failed. If you’d show your code, I bet someone would rapidly point out your error.

Answer 3 (score 5)

“files” in *nix type systems are very much an abstract concept.

They can be areas on disk organized by a file system, but they could equally well be a network connection, a bit of shared memory, the buffer output from another process, a screen or a keyboard.

In order for perl to be really useful it mirrors this model very closely, and does not treat files by emulating a magnetic tape as many 4gls do.

So it tried an “IOCTL” operation ‘open for write’ on a file handle which does not allow write operations which is an inappropriate IOCTL operation for that device/file.

The easiest thing to do is stick an " or die 'Cannot open $myfile' statement at the end of you open and you can choose your own meaningful message.

29: What is the proper way to check if a string is empty in Perl? (score 182537 in 2010)

Question

I’ve just been using this code to check if a string is empty:

if ($str == "")
{
  // ...
}

And also the same with the not equals operator…

if ($str != "")
{
  // ...
}

This seems to work (I think), but I’m not sure it’s the correct way, or if there are any unforeseen drawbacks. Something just doesn’t feel right about it.

Answer accepted (score 123)

For string comparisons in Perl, use eq or ne:

if ($str eq "")
{
  // ...
}

The == and != operators are numeric comparison operators. They will attempt to convert both operands to integers before comparing them.

See the perlop man page for more information.

Answer 2 (score 88)

  1. Due to the way that strings are stored in Perl, getting the length of a string is optimized.
    if (length $str) is a good way of checking that a string is non-empty.

  2. If you’re in a situation where you haven’t already guarded against undef, then the catch-all for “non-empty” that won’t warn is if (defined $str and length $str).

Answer 3 (score 9)

As already mentioned by several people, eq is the right operator here.

If you use warnings; in your script, you’ll get warnings about this (and many other useful things); I’d recommend use strict; as well.

30: How do I perform a Perl substitution on a string while keeping the original? (score 181742 in 2013)

Question

In Perl, what is a good way to perform a replacement on a string using a regular expression and store the value in a different variable, without changing the original?

I usually just copy the string to a new variable then bind it to the s/// regex that does the replacement on the new string, but I was wondering if there is a better way to do this?

$newstring = $oldstring;
$newstring =~ s/foo/bar/g;

Answer accepted (score 239)

This is the idiom I’ve always used to get a modified copy of a string without changing the original:

(my $newstring = $oldstring) =~ s/foo/bar/g;

In perl 5.14.0 or later, you can use the new /r non-destructive substitution modifier:

my $newstring = $oldstring =~ s/foo/bar/gr; 

Note: The above solutions work without g too. They also work with any other modifiers.

Answer 2 (score 42)

The statement:

(my $newstring = $oldstring) =~ s/foo/bar/g;

Which is equivalent to:

my $newstring = $oldstring;
$newstring =~ s/foo/bar/g;

Alternatively, as of Perl 5.13.2 you can use /r to do a non destructive substitution:

use 5.013;
#...
my $newstring = $oldstring =~ s/foo/bar/gr;

Answer 3 (score 21)

Under use strict, say:

(my $new = $original) =~ s/foo/bar/;

instead.

31: In Perl, how can I concisely check if a $variable is defined and contains a non zero length string? (score 177567 in 2009)

Question

I currently use the following Perl to check if a variable is defined and contains text. I have to check defined first to avoid an ‘uninitialized value’ warning:

if (defined $name && length $name > 0) {
    # do something with $name
}

Is there a better (presumably more concise) way to write this?

Answer accepted (score 72)

You often see the check for definedness so you don’t have to deal with the warning for using an undef value (and in Perl 5.10 it tells you the offending variable):

 Use of uninitialized value $name in ...

So, to get around this warning, people come up with all sorts of code, and that code starts to look like an important part of the solution rather than the bubble gum and duct tape that it is. Sometimes, it’s better to show what you are doing by explicitly turning off the warning that you are trying to avoid:

 {
 no warnings 'uninitialized';

 if( length $name ) {
      ...
      }
 }

In other cases, use some sort of null value instead of the data. With Perl 5.10’s defined-or operator, you can give length an explicit empty string (defined, and give back zero length) instead of the variable that will trigger the warning:

 use 5.010;

 if( length( $name // '' ) ) {
      ...
      }

In Perl 5.12, it’s a bit easier because length on an undefined value also returns undefined. That might seem like a bit of silliness, but that pleases the mathematician I might have wanted to be. That doesn’t issue a warning, which is the reason this question exists.

use 5.012;
use warnings;

my $name;

if( length $name ) { # no warning
    ...
    }

Answer 2 (score 23)

As mobrule indicates, you could use the following instead for a small savings:

if (defined $name && $name ne '') {
    # do something with $name
}

You could ditch the defined check and get something even shorter, e.g.:

if ($name ne '') {
    # do something with $name
}

But in the case where $name is not defined, although the logic flow will work just as intended, if you are using warnings (and you should be), then you’ll get the following admonishment:

Use of uninitialized value in string ne

So, if there’s a chance that $name might not be defined, you really do need to check for definedness first and foremost in order to avoid that warning. As Sinan Ünür points out, you can use Scalar::MoreUtils to get code that does exactly that (checks for definedness, then checks for zero length) out of the box, via the empty() method:

use Scalar::MoreUtils qw(empty);
if(not empty($name)) {
    # do something with $name 
}

Answer 3 (score 14)

First, since length always returns a non-negative number,

if ( length $name )

and

if ( length $name > 0 )

are equivalent.

If you are OK with replacing an undefined value with an empty string, you can use Perl 5.10’s //= operator which assigns the RHS to the LHS unless the LHS is defined:

#!/usr/bin/perl

use feature qw( say );
use strict; use warnings;

my $name;

say 'nonempty' if length($name //= '');
say "'$name'";

Note the absence of warnings about an uninitialized variable as $name is assigned the empty string if it is undefined.

However, if you do not want to depend on 5.10 being installed, use the functions provided by Scalar::MoreUtils. For example, the above can be written as:

#!/usr/bin/perl

use strict; use warnings;

use Scalar::MoreUtils qw( define );

my $name;

print "nonempty\n" if length($name = define $name);
print "'$name'\n";

If you don’t want to clobber $name, use default.

32: How do I search a Perl array for a matching string? (score 176905 in 2010)

Question

What is the smartest way of searching through an array of strings for a matching string in Perl?

One caveat, I would like the search to be case-insensitive

so "aAa" would be in ("aaa","bbb")

Answer accepted (score 29)

I guess

@foo = ("aAa", "bbb");
@bar = grep(/^aaa/i, @foo);
print join ",",@bar;

would do the trick.

Answer 2 (score 143)

It depends on what you want the search to do:

  • if you want to find all matches, use the built-in grep:

    my @matches = grep { /pattern/ } @list_of_strings;
    ```</li>
    <li><p>if you want to find the <strong>first match</strong>, use `first` in <a href="http://search.cpan.org/perldoc?List::Util" rel="noreferrer">List::Util</a>:</p>
    
    ```perl
    use List::Util 'first';  
    my $match = first { /pattern/ } @list_of_strings;
    ```</li>
    <li><p>if you want to find the <strong>count of all matches</strong>, use `true` in <a href="http://search.cpan.org/perldoc?List::MoreUtils" rel="noreferrer">List::MoreUtils</a>:</p>
    
    ```perl
    use List::MoreUtils 'true';
    my $count = true { /pattern/ } @list_of_strings;
    ```</li>
    <li><p>if you want to know the <strong>index of the first match</strong>, use `first_index` in <a href="http://search.cpan.org/perldoc?List::MoreUtils" rel="noreferrer">List::MoreUtils</a>:</p>
    
    ```perl
    use List::MoreUtils 'first_index'; 
    my $index = first_index { /pattern/ } @list_of_strings;
    ```</li>
    <li><p>if you want to simply know <strong>if there was a match</strong>, but you don't care which element it was or its value, use `any` in <a href="http://search.cpan.org/perldoc?List::MoreUtils" rel="noreferrer">List::Util</a>:</p>
    
    ```perl
    use List::Util 1.33 'any';
    my $match_found = any { /pattern/ } @list_of_strings;
    ```</li>
    </ul>
    
    All these examples do similar things at their core, but their implementations have been heavily optimized to be fast, and will be faster than any pure-perl implementation that you might write yourself with <a href="http://perldoc.perl.org/functions/grep.html" rel="noreferrer">grep</a>, <a href="http://perldoc.perl.org/functions/map.html" rel="noreferrer">map</a> or a <a href="http://perldoc.perl.org/perlsyn.html#For-Loops" rel="noreferrer">for loop</a>.  
    
    <hr>
    
    Note that the algorithm for doing the looping is a separate issue than performing the individual matches.  To match a string case-insensitively, you can simply use the `i` flag in the pattern:  `/pattern/i`.  You should definitely read through <a href="http://perldoc.perl.org/perlre.html" rel="noreferrer">perldoc perlre</a> if you have not previously done so.  
    
    #### Answer 3 (score 29)
    Perl 5.10+ contains the 'smart-match' operator `~~`, which returns true if a certain element is contained in an array or hash, and false if it doesn't (see <a href="http://perldoc.perl.org/perlfaq4.html#How-can-I-tell-whether-a-certain-element-is-contained-in-a-list-or-array%3f" rel="noreferrer">perlfaq4</a>):  
    
    The nice thing is that it also supports regexes, meaning that your case-insensitive requirement can easily be taken care of:  
    
    ```perl
    use strict;
    use warnings;
    use 5.010;
    
    my @array  = qw/aaa bbb/;
    my $wanted = 'aAa';
    
    say "'$wanted' matches!" if /$wanted/i ~~ @array;   # Prints "'aAa' matches!"

33: How do I get the full path to a Perl script that is executing? (score 175554 in 2015)

Question

I have Perl script and need to determine the full path and filename of the script during execution. I discovered that depending on how you call the script $0 varies and sometimes contains the fullpath+filename and sometimes just filename. Because the working directory can vary as well I can’t think of a way to reliably get the fullpath+filename of the script.

Anyone got a solution?

Answer accepted (score 237)

There are a few ways:

  • $0 is the currently executing script as provided by POSIX, relative to the current working directory if the script is at or below the CWD
  • Additionally, cwd(), getcwd() and abs_path() are provided by the Cwd module and tell you where the script is being run from
  • The module FindBin provides the $Bin & $RealBin variables that usually are the path to the executing script; this module also provides $Script & $RealScript that are the name of the script
  • __FILE__ is the actual file that the Perl interpreter deals with during compilation, including its full path.

I’ve seen the first three ($0, the Cwd module and the FindBin module) fail under mod_perl spectacularly, producing worthless output such as '.' or an empty string. In such environments, I use __FILE__ and get the path from that using the File::Basename module:

use File::Basename;
my $dirname = dirname(__FILE__);

Answer 2 (score 144)

$0 is typically the name of your program, so how about this?

use Cwd 'abs_path';
print abs_path($0);

Seems to me that this should work as abs_path knows if you are using a relative or absolute path.

Update For anyone reading this years later, you should read Drew’s answer. It’s much better than mine.

Answer 3 (score 35)

Use File::Spec;
File::Spec->rel2abs( __FILE__ );

http://perldoc.perl.org/File/Spec/Unix.html

34: Difference between and expression meta characters (score 174237 in 2019)

Question

Can anyone explain the difference between \b and \w regular expression metacharacters? It is my understanding that both these metacharacters are used for word boundaries. Apart from this, which meta character is efficient for multilingual content?

Answer accepted (score 242)

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length.

There are three different positions that qualify as word boundaries:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

Simply put: \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b. A “word character” is a character that can be used to form words. All characters that are not “word characters” are “non-word characters”.

In all flavors, the characters [a-zA-Z0-9_] are word characters. These are also matched by the short-hand character class \w. Flavors showing “ascii” for word boundaries in the flavor comparison recognize only these as word characters.

\w stands for “word character”, usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.

\B is the negated version of \b. \B matches at every position where \b does not. Effectively, \B matches at any position between two word characters as well as at any position between two non-word characters.

\W is short for [^\w], the negated version of \w.

Answer 2 (score 20)

\w matches a word character. \b is a zero-width match that matches a position character that has a word character on one side, and something that’s not a word character on the other. (Examples of things that aren’t word characters include whitespace, beginning and end of the string, etc.)

\w matches a, b, c, d, e, and f in "abc def"
\b matches the (zero-width) position before a, after c, before d, and after f in "abc def"

See: http://www.regular-expressions.info/reference.html/

Answer 3 (score 8)

@Mahender, you probably meant the difference between \W (instead of \w) and \b. If not, then I would agree with @BoltClock and @jwismar above. Otherwise continue reading.

\W would match any non-word character and so its easy to try to use it to match word boundaries. The problem is that it will not match the start or end of a line. \b is more suited for matching word boundaries as it will also match the start or end of a line. Roughly speaking (more experienced users can correct me here) \b can be thought of as (\W|^|$). [Edit: as @Ωmega mentions below, `\b` is a zero-length match so `(\W|^|$)` is not strictly correct, but hopefully helps explain the diff]

Quick example: For the string Hello World, .+\W would match Hello_ (with the space) but will not match World. .+\b would match both Hello and World.

35: In Perl, how can I read an entire file into a string? (score 171157 in 2014)

Question

I’m trying to open an .html file as one big long string. This is what I’ve got:

open(FILE, 'index.html') or die "Can't read file 'filename' [$!]\n";  
$document = <FILE>; 
close (FILE);  
print $document;

which results in:

&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN

However, I want the result to look like:

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN”
“http://www.w3.org/TR/html4/loose.dtd”>
<html>
<head>
<meta http-equiv=“Content-Type” content=“text/html; charset=iso-8859-1”>

This way I can search the entire document more easily.

Answer accepted (score 76)

Add:

 local $/;

before reading from the file handle. See How can I read in an entire file all at once?, or

$ perldoc -q "entire file"

See Variables related to filehandles in perldoc perlvar and perldoc -f local.

Incidentally, if you can put your script on the server, you can have all the modules you want. See How do I keep my own module/library directory?.

In addition, Path::Class::File allows you to slurp and spew.

Path::Tiny gives even more convenience methods such as slurp, slurp_raw, slurp_utf8 as well as their spew counterparts.

Answer 2 (score 93)

I would do it like this:

my $file = "index.html";
my $document = do {
    local $/ = undef;
    open my $fh, "<", $file
        or die "could not open $file: $!";
    <$fh>;
};

Note the use of the three-argument version of open. It is much safer than the old two- (or one-) argument versions. Also note the use of a lexical filehandle. Lexical filehandles are nicer than the old bareword variants, for many reasons. We are taking advantage of one of them here: they close when they go out of scope.

Answer 3 (score 75)

With File::Slurp:

use File::Slurp;
my $text = read_file('index.html');

Yes, even you can use CPAN.

36: How to match any non white space character except a particular one? (score 168481 in 2014)

Question

In Perl \S matches any non-whitespace character.

How can I match any non-whitespace character except a backslash \?

Answer accepted (score 136)

You can use a character class:

/[^\s\\]/

matches anything that is not a whitespace character nor a \. Here’s another example:

[abc] means “match a, b or c”; [^abc] means “match any character except a, b or c”.

Answer 2 (score 12)

You can use a lookahead:

/(?=\S)[^\\]/

Answer 3 (score 2)

This worked for me using sed [Edit: comment below points out sed doesn’t support ]

[^ ]

while

[^\s] 

didn’t

# Delete everything except space and 'g'
echo "ghai ghai" | sed "s/[^\sg]//g"
gg

echo "ghai ghai" | sed "s/[^ g]//g"
g g

37: no pg_hba.conf entry for host (score 161302 in 2016)

Question

I get following error when I try to connect using DBI

DBI connect('database=chaosLRdb;host=192.168.0.1;port=5433','postgres',...) 
failed: FATAL:  no pg_hba.conf entry for host "192.168.0.1", user "postgres", database "chaosLRdb", SSL off

Here is my pg_hba.conf file:

# "local" is for Unix domain socket connections only
local   all         all                               md5
# IPv4 local connections:
host    all         all         127.0.0.1/32          md5
# IPv6 local connections:
host    all         all         ::1/128               md5

host    all         postgres    127.0.0.1/32          trust

host    all        postgres     192.168.0.1/32        trust

host    all        all         192.168.0.1/32        trust

host    all        all         192.168.0.1/128        trust

host    all        all         192.168.0.1/32        md5

host    chaosLRdb    postgres         192.168.0.1/32      md5
local    all        all         192.168.0.1/32        trust

My perl code is

#!/usr/bin/perl-w
use DBI;
use FileHandle;

print "Start connecting to the DB...\n";

@ary = DBI->available_drivers(true);
%drivers = DBI->installed_drivers();
my $dbh = DBI->connect("DBI:PgPP:database=chaosLRdb;host=192.168.0.1;port=5433", "postgres", "chaos123");

May I know what i miss here?

Answer 2 (score 34)

In your pg_hba.conf file, I see some incorrect and confusing lines:

# fine, this allows all dbs, all users, to be trusted from 192.168.0.1/32
# not recommend because of the lax permissions
host    all        all         192.168.0.1/32        trust

# wrong, /128 is an invalid netmask for ipv4, this line should be removed
host    all        all         192.168.0.1/128       trust

# this conflicts with the first line
# it says that that the password should be md5 and not plaintext
# I think the first line should be removed
host    all        all         192.168.0.1/32        md5

# this is fine except is it unnecessary because of the previous line
# which allows any user and any database to connect with md5 password
host    chaosLRdb  postgres    192.168.0.1/32        md5

# wrong, on local lines, an IP cannot be specified
# remove the 4th column
local   all        all         192.168.0.1/32        trust

I suspect that if you md5’d the password, this might work if you trim the lines. To get the md5 you can use perl or the following shell script:

 echo -n 'chaos123' | md5sum
 > d6766c33ba6cf0bb249b37151b068f10  -

So then your connect line would like something like:

my $dbh = DBI->connect("DBI:PgPP:database=chaosLRdb;host=192.168.0.1;port=5433",
    "chaosuser", "d6766c33ba6cf0bb249b37151b068f10");

For more information, here’s the documentation of postgres 8.X’s pg_hba.conf file.

Answer 3 (score 18)

If you can change this line:

host    all        all         192.168.0.1/32        md5

With this:

host    all        all         all                   md5

You can see if this solves the problem.

But another consideration is your postgresql port(5432) is very open to password attacks with hackers (maybe they can brute force the password). You can change your postgresql port 5432 to ‘33333’ or another value, so they can’t know this configuration.

38: Array initialization in Perl (score 161143 in 2015)

Question

How do I initialize an array to 0?

I have tried this.

my @arr = ();

But it always throws me a warning, “Use of uninitialized value”. I do not know the size of the array beforehand. I fill it dynamically. I thought the above piece of code was supposed to initialize it to 0.

How do I do this?

Answer accepted (score 51)

If I understand you, perhaps you don’t need an array of zeroes; rather, you need a hash. The hash keys will be the values in the other array and the hash values will be the number of times the value exists in the other array:

use strict;
use warnings;

my @other_array = (0,0,0,1,2,2,3,3,3,4);
my %tallies;
$tallies{$_} ++ for @other_array;

print "$_ => $tallies{$_}\n" for sort {$a <=> $b} keys %tallies;    

Output:

0 => 3
1 => 1
2 => 2
3 => 3
4 => 1

To answer your specific question more directly, to create an array populated with a bunch of zeroes, you can use the technique in these two examples:

my @zeroes = (0) x 5;            # (0,0,0,0,0)

my @zeroes = (0) x @other_array; # A zero for each item in @other_array.
                                 # This works because in scalar context
                                 # an array evaluates to its size.

Answer 2 (score 18)

What do you mean by “initialize an array to zero”? Arrays don’t contain “zero” – they can contain “zero elements”, which is the same as “an empty list”. Or, you could have an array with one element, where that element is a zero: my @array = (0);

my @array = (); should work just fine – it allocates a new array called @array, and then assigns it the empty list, (). Note that this is identical to simply saying my @array;, since the initial value of a new array is the empty list anyway.

Are you sure you are getting an error from this line, and not somewhere else in your code? Ensure you have use strict; use warnings; in your module or script, and check the line number of the error you get. (Posting some contextual code here might help, too.)

Answer 3 (score 3)

To produce the output in your comment to your post, this will do it:

use strict;
use warnings;

my @other_array = (0,0,0,1,2,2,3,3,3,4);
my @array;
my %uniqs;

$uniqs{$_}++ for @other_array;

foreach (keys %uniqs) { $array[$_]=$uniqs{$_} }

print "array[$_] = $array[$_]\n" for (0..$#array);

Output:

   array[0] = 3
   array[1] = 1
   array[2] = 2
   array[3] = 3
   array[4] = 1

This is different than your stated algorithm of producing a parallel array with zero values, but it is a more Perly way of doing it…

If you must have a parallel array that is the same size as your first array with the elements initialized to 0, this statement will dynamically do it: @array=(0) x scalar(@other_array); but really, you don’t need to do that.

39: How do I configure Apache 2 to run Perl CGI scripts? (score 157915 in 2016)

Question

I would like to configure Apache 2 running on Kubuntu to execute Perl CGI scripts. I’ve tried some steps that I came across by googling, but nothing seems to work.

What is the right way of achieving this?

Answer 2 (score 24)

This post is intended to rescue the people who are suffering from *not being able to properly setup Apache2 for Perl on Ubuntu. (The system configurations specific to your Linux machine will be mentioned within square brackets, like [this]).

Possible outcome of an improperly setup Apache 2:

  1. Browser trying to download the .pl file instead of executing and giving out the result.
  2. Forbidden.
  3. Internal server error.

If one follows the steps described below with a reasonable intelligence, he/she can get through the errors mentioned above.

Before starting the steps. Go to /etc/hosts file and add IP address / domain-name` for example:

127.0.0.1 www.BECK.com

Step 1: Install apache2 Step 2: Install mod_perl Step 3: Configure apache2

open sites-available/default and add the following,

<Files ~ "\.(pl|cgi)$">
    SetHandler perl-script
    PerlResponseHandler ModPerl::PerlRun
    Options +ExecCGI
    PerlSendHeader On
</Files>

<Directory />
    Options FollowSymLinks
    AllowOverride None
</Directory>
<Directory [path-to-store-your-website-files-like-.html-(perl-scripts-should-be-stored-in-cgi-bin] >
####(The Perl/CGI scripts can be stored out of the cgi-bin directory, but that's a story for another day. Let's concentrate on washing out the issue at hand)
####
    Options Indexes FollowSymLinks MultiViews
    AllowOverride None
    Order allow,deny
    allow from all
</Directory>

ScriptAlias /cgi-bin/ [path-where-you-want-your-.pl-and-.cgi-files]

<Directory [path-where-you-want-your-.pl-and-.cgi-files]>
    AllowOverride None
    Options ExecCGI -MultiViews +SymLinksIfOwnerMatch
    AddHandler cgi-script .pl
    Order allow,deny
    allow from all
</Directory>
<Files ~ "\.(pl|cgi)$">
    SetHandler perl-script
    PerlResponseHandler ModPerl::PerlRun
    Options +ExecCGI
    PerlSendHeader On
</Files>

<Directory />
    Options FollowSymLinks
    AllowOverride None
</Directory>
<Directory [path-to-store-your-website-files-like-.html-(perl-scripts-should-be-stored-in-cgi-bin] >
####(The Perl/CGI scripts can be stored out of the cgi-bin directory, but that's a story for another day. Let's concentrate on washing out the issue at hand)
####
    Options Indexes FollowSymLinks MultiViews
    AllowOverride None
    Order allow,deny
    allow from all
</Directory>

ScriptAlias /cgi-bin/ [path-where-you-want-your-.pl-and-.cgi-files]

<Directory [path-where-you-want-your-.pl-and-.cgi-files]>
    AllowOverride None
    Options ExecCGI -MultiViews +SymLinksIfOwnerMatch
    AddHandler cgi-script .pl
    Order allow,deny
    allow from all
</Directory>
Step 4:

Add the following lines to your /etc/apache2/apache2.conf file.

AddHandler cgi-script .cgi .pl
<Files ~ "\.pl$">
Options +ExecCGI
</Files>
<Files ~ "\.cgi$">
Options +ExecCGI
</Files>

<IfModule mod_perl.c>
<IfModule mod_alias.c>
Alias /perl/ /home/sly/host/perl/
</IfModule>
<Location /perl>
SetHandler perl-script
PerlHandler Apache::Registry
Options +ExecCGI
</Location>
</IfModule>

<Files ~ "\.pl$">
Options +ExecCGI
</Files>
Step 5:

Very important, or at least I guess so, only after doing this step, I got it to work.

AddHandler cgi-script .cgi .pl

<Files ~ "\.pl$">
Options +ExecCGI
</Files>
<Files ~ "\.cgi$">
Options +ExecCGI
</Files>

<IfModule mod_perl.c>
<IfModule mod_alias.c>
Alias /perl/ /home/sly/host/perl/
</IfModule>
<Location /perl>
SetHandler perl-script
PerlHandler Apache::Registry
Options +ExecCGI
</Location>
</IfModule>

<Files ~ "\.pl$">
Options +ExecCGI
</Files>
Step 6

Very important, or at least I guess so, only after doing this step, I got it to work.

Add the following to you /etc/apache2/sites-enabled/000-default file

<Files ~ "\.(pl|cgi)$">
SetHandler perl-script
PerlResponseHandler ModPerl::PerlRun
Options +ExecCGI
PerlSendHeader On
</Files>
Step 7:

Now add, your Perl script as test.pl in the place where you mentioned before in step 3 as [path-where-you-want-your-.pl-and-.cgi-files].

Give permissions to the .pl file using chmod and then, type the webaddress/cgi-bin/test.pl in the address bar of the browser, there you go, you got it.

(Now, many of the things would have been redundant in this post. Kindly ignore it.)

Answer 3 (score 19)

You’ll need to take a look at your Apache error log to see what the “internal server error” is. The four most likely cases, in my experience would be:

  1. The CGI program is in a directory which does not have CGI execution enabled. Solution: Add the ExecCGI option to that directory via either httpd.conf or a .htaccess file.

  2. Apache is only configured to run CGIs from a dedicated cgi-bin directory. Solution: Move the CGI program there or add an AddHandler cgi-script .cgi statement to httpd.conf.

  3. The CGI program is not set as executable. Solution (assuming a *nix-type operating system): chmod +x my_prog.cgi

  4. The CGI program is exiting without sending headers. Solution: Run the program from the command line and verify that a) it actually runs rather than dying with a compile-time error and b) it generates the correct output, which should include, at the very minimum, a Content-Type header and a blank line following the last of its headers.

40: How can I check if a Perl module is installed on my system from the command line? (score 157845 in 2009)

Question

I tried to check if XML::Simple is installed in my system or not.

perl -e 'while (<@INC>) { while (<$_/*.pm>) { print "$_\n"; } }'

The above one-liner was used for listing all modules installed in my system. However, it is not listing XML modules.

However, the following executes fine.

perl -e "use XML::Simple "

What might be the issue?

Answer accepted (score 93)

You can check for a module’s installation path by:

perldoc -l XML::Simple

The problem with your one-liner is that, it is not recursively traversing directories/sub-directories. Hence, you get only pragmatic module names as output.

Answer 2 (score 39)

Quick and dirty:

$ perl -MXML::Simple -e 1

Answer 3 (score 17)

$ perl -MXML::Simple -le 'print $INC{"XML/Simple.pm"}'

From the perlvar entry on %INC:

  • %INC

The hash %INC contains entries for each filename included via the do, require, or use operators. The key is the filename you specified (with module names converted to pathnames), and the value is the location of the file found. The require operator uses this hash to determine whether a particular file has already been included.

If the file was loaded via a hook (e.g. a subroutine reference, see require for a description of these hooks), this hook is by default inserted into %INC in place of a filename. Note, however, that the hook may have set the %INC entry by itself to provide some more specific info.

41: Escaping a forward slash in a regular expression (score 156283 in 2011)

Question

My question is a simple one, and it is about regular expression escaping. Do you have to escape a forward slash / in a regular expression? And how would you go about doing it?

Answer accepted (score 79)

What context/language? Some languages use / as the pattern delimiter, so yes, you need to escape it, depending on which language/context. You escape it by putting a backward slash in front of it: \/ For some languages (like PHP) you can use other characters as the delimiter and therefore you don’t need to escape it. But AFAIK in all languages, the only special significance the / has is it may be the designated pattern delimiter.

Answer 2 (score 35)

Here are a few options:

  • In Perl, you can choose alternate delimiters. You’re not confined to m//. You could choose another, such as m{}. Then escaping isn’t necessary. As a matter of fact, Damian Conway in “Perl Best Practices” asserts that m{} is the only alternate delimiter that ought to be used, and this is reinforced by Perl::Critic (on CPAN). While you can get away with using a variety of alternate delimiter characters, // and {} seem to be the clearest to decipher later on. However, if either of those choices result in too much escaping, choose whichever one lends itself best to legibility. Common examples are m(...), m[...], and m!...!.

  • In cases where you either cannot or prefer not to use alternate delimiters, you can escape the forward slashes with a backslash: m/\/[^/]+$/ for example (using an alternate delimiter that could become m{/[^/]+$}, which may read more clearly). Escaping the slash with a backslash is common enough to have earned a name and a wikipedia page: Leaning Toothpick Syndrome. In regular expressions where there’s just a single instance, escaping a slash might not rise to the level of being considered a hindrance to legibility, but if it starts to get out of hand, and if your language permits alternate delimiters as Perl does, that would be the preferred solution.

Answer 3 (score 9)

Use the backslash \ or choose a different delimiter, ie m#.\d# instead of /.\d/ “In Perl, you can change the / regular expression delimiter to almost any other special character if you preceed it with the letter m (for match);”

42: How to print variables in Perl (score 148908 in 2017)

Question

I have some code that looks like

my ($ids,$nIds);
while (<myFile>){
    chomp;
    $ids.= $_ . " ";
    $nIds++;
}

This should concatenate every line in my myFile, and nIds should be my number of lines. How do I print out my $ids and $nIds?

I tried simply print $ids, but Perl complains.

my ($ids, $nIds)

is a list, right? With two elements?

Answer accepted (score 33)

print "Number of lines: $nids\n";
print "Content: $ids\n";

How did Perl complain? print $ids should work, though you probably want a newline at the end, either explicitly with print as above or implicitly by using say or -l/$</a>.

If you want to interpolate a variable in a string and have something immediately after it that would looks like part of the variable but isn’t, enclose the variable name in {}:

print "foo${ids}bar";

Answer 2 (score 11)

You should always include all relevant code when asking a question. In this case, the print statement that is the center of your question. The print statement is probably the most crucial piece of information. The second most crucial piece of information is the error, which you also did not include. Next time, include both of those.

print $ids should be a fairly hard statement to mess up, but it is possible. Possible reasons:

  1. $ids is undefined. Gives the warning undefined value in print
  2. $ids is out of scope. With use strict, gives fatal warning Global variable $ids needs explicit package name, and otherwise the undefined warning from above.
  3. You forgot a semi-colon at the end of the line.
  4. You tried to do print $ids $nIds, in which case perl thinks that $ids is supposed to be a filehandle, and you get an error such as print to unopened filehandle.

Explanations

1: Should not happen. It might happen if you do something like this (assuming you are not using strict):

my $var;
while (<>) {
    $Var .= $_;
}
print $var;

Gives the warning for undefined value, because $Var and $var are two different variables.

2: Might happen, if you do something like this:

if ($something) {
    my $var = "something happened!";
}
print $var;

my declares the variable inside the current block. Outside the block, it is out of scope.

3: Simple enough, common mistake, easily fixed. Easier to spot with use warnings.

4: Also a common mistake. There are a number of ways to correctly print two variables in the same print statement:

print "$var1 $var2";  # concatenation inside a double quoted string
print $var1 . $var2;  # concatenation
print $var1, $var2;   # supplying print with a list of args

Lastly, some perl magic tips for you:

use strict;
use warnings;

# open with explicit direction '<', check the return value
# to make sure open succeeded. Using a lexical filehandle.
open my $fh, '<', 'file.txt' or die $!;

# read the whole file into an array and
# chomp all the lines at once
chomp(my @file = <$fh>);
close $fh;

my $ids  = join(' ', @file);
my $nIds = scalar @file;
print "Number of lines: $nIds\n";
print "Text:\n$ids\n";

Reading the whole file into an array is suitable for small files only, otherwise it uses a lot of memory. Usually, line-by-line is preferred.

Variations:

  • print "@file" is equivalent to $ids = join(' ',@file); print $ids;
  • $#file will return the last index in @file. Since arrays usually start at 0, $#file + 1 is equivalent to scalar @file.

You can also do:

my $ids;
do {
    local $/;
    $ids = <$fh>;
}

By temporarily “turning off” $/, the input record separator, i.e. newline, you will make &lt;$fh&gt; return the entire file. What &lt;$fh&gt; really does is read until it finds $/, then return that string. Note that this will preserve the newlines in $ids.

Line-by-line solution:

open my $fh, '<', 'file.txt' or die $!; # btw, $! contains the most recent error
my $ids;
while (<$fh>) {
    chomp;
    $ids .= "$_ "; # concatenate with string
}
my $nIds = $.; # $. is Current line number for the last filehandle accessed.

Answer 3 (score 9)

How do I print out my $ids and $nIds?
print "$ids\n";
print "$nIds\n";
I tried simply print $ids, but Perl complains.

Complains about what? Uninitialised value? Perhaps your loop was never entered due to an error opening the file. Be sure to check if open returned an error, and make sure you are using use strict; use warnings;.

my ($ids, $nIds) is a list, right? With two elements?

It’s a (very special) function call. $ids,$nIds is a list with two elements.

43: What is the best way to delete a value from an array in Perl? (score 144303 in 2014)

Question

The array has lots of data and I need to delete two elements.

Below is the code snippet I am using,

my @array = (1,2,3,4,5,5,6,5,4,9);
my $element_omitted = 5;
@array = grep { $_ != $element_omitted } @array;

Answer accepted (score 85)

Use splice if you already know the index of the element you want to delete.

Grep works if you are searching.

If you need to do a lot of these, you will get much better performance if you keep your array in sorted order, since you can then do binary search to find the necessary index.

If it makes sense in your context, you may want to consider using a “magic value” for deleted records, rather then deleting them, to save on data movement – set deleted elements to undef, for example. Naturally, this has its own issues (if you need to know the number of “live” elements, you need to keep track of it separately, etc), but may be worth the trouble depending on your application.

Edit Actually now that I take a second look – don’t use the grep code above. It would be more efficient to find the index of the element you want to delete, then use splice to delete it (the code you have accumulates all the non-matching results..)

my $index = 0;
$index++ until $arr[$index] eq 'foo';
splice(@arr, $index, 1);

That will delete the first occurrence. Deleting all occurrences is very similar, except you will want to get all indexes in one pass:

my @del_indexes = grep { $arr[$_] eq 'foo' } 0..$#arr;

The rest is left as an excercise for the reader – remember that the array changes as you splice it!

Edit2 John Siracusa correctly pointed out I had a bug in my example.. fixed, sorry about that.

Answer 2 (score 13)

splice will remove array element(s) by index. Use grep, as in your example, to search and remove.

Answer 3 (score 8)

Is this something you are going to be doing a lot? If so, you may want to consider a different data structure. Grep is going to search the entire array every time and for a large array could be quite costly. If speed is an issue then you may want to consider using a Hash instead.

In your example, the key would be the number and the value would be the count of elements of that number.

44: How is Perl’s @INC constructed? (aka What are all the ways of affecting where Perl modules are searched for?) (score 141297 in 2017)

Question

What are all the ways of affecting where Perl modules are searched for? or, How is Perl’s @INC constructed?

As we know, Perl uses @INC array containing directory names to determine where to search for Perl module files.

There does not seem to be a comprehensive “@INC” FAQ-type post on StackOverflow, so this question is intended as one.

Answer accepted (score 252)

We will look at how the contents of this array are constructed and can be manipulated to affect where the Perl interpreter will find the module files.

  1. Default @INC

    Perl interpreter is compiled with a specific @INC default value. To find out this value, run env -i perl -V command (env -i ignores the PERL5LIB environmental variable - see #2) and in the output you will see something like this:

    $ env -i perl -V
    ...
    @INC:
     /usr/lib/perl5/site_perl/5.18.0/x86_64-linux-thread-multi-ld
     /usr/lib/perl5/site_perl/5.18.0
     /usr/lib/perl5/5.18.0/x86_64-linux-thread-multi-ld
     /usr/lib/perl5/5.18.0
     .

Note . at the end; this is the current directory (which is not necessarily the same as the script’s directory). It is missing in Perl 5.26+, and when Perl runs with -T (taint checks enabled).

To change the default path when configuring Perl binary compilation, set the configuration option otherlibdirs:

Configure -Dotherlibdirs=/usr/lib/perl5/site_perl/5.16.3
  1. Environmental variable PERL5LIB (or PERLLIB)

    Perl pre-pends @INC with a list of directories (colon-separated) contained in PERL5LIB (if it is not defined, PERLLIB is used) environment variable of your shell. To see the contents of @INC after PERL5LIB and PERLLIB environment variables have taken effect, run perl -V.

    $ perl -V
    ...
    %ENV:
      PERL5LIB="/home/myuser/test"
    @INC:
     /home/myuser/test
     /usr/lib/perl5/site_perl/5.18.0/x86_64-linux-thread-multi-ld
     /usr/lib/perl5/site_perl/5.18.0
     /usr/lib/perl5/5.18.0/x86_64-linux-thread-multi-ld
     /usr/lib/perl5/5.18.0
     .
  2. -I command-line option

    Perl pre-pends @INC with a list of directories (colon-separated) passed as value of the -I command-line option. This can be done in three ways, as usual with Perl options:

    • Pass it on command line:

      perl -I /my/moduledir your_script.pl
      ```</li>
      <li><p>Pass it via the first line (shebang) of your Perl script:</p>
      
      ```perl
      #!/usr/local/bin/perl -w -I /my/moduledir
      ```</li>
      <li><p>Pass it as part of `PERL5OPT` (or `PERLOPT`) environment variable (see chapter 19.02 in <a href="http://oreilly.com/catalog/9780596004927" rel="noreferrer">Programming Perl</a>)</p></li>
      </ul></li>
      <li><p>Pass it via the <a href="http://perldoc.perl.org/lib.html" rel="noreferrer">`lib` pragma</a></p>
      
      Perl pre-pends `@INC` with a list of directories passed in to it via `use lib`.  
      
      In a program:  
      
      ```perl
      use lib ("/dir1", "/dir2");

      On the command line:

      perl -Mlib=/dir1,/dir2
      You can also remove the directories from @INC via no lib.
    • You can directly manipulate @INC as a regular Perl array.

      Note: Since @INC is used during the compilation phase, this must be done inside of a BEGIN {} block, which precedes the use MyModule statement.

      • Add directories to the beginning via unshift @INC, $dir.

      • Add directories to the end via push @INC, $dir.

      • Do anything else you can do with a Perl array.

Note: The directories are unshifted onto @INC in the order listed in this answer, e.g. default @INC is last in the list, preceded by PERL5LIB, preceded by -I, preceded by use lib and direct @INC manipulation, the latter two mixed in whichever order they are in Perl code.

References:

There does not seem to be a comprehensive @INC FAQ-type post on Stack Overflow, so this question is intended as one.

When to use each approach?
  • If the modules in a directory need to be used by many/all scripts on your site, especially run by multiple users, that directory should be included in the default @INC compiled into the Perl binary.

  • If the modules in the directory will be used exclusively by a specific user for all the scripts that user runs (or if recompiling Perl is not an option to change default @INC in previous use case), set the users’ PERL5LIB, usually during user login.

    Note: Please be aware of the usual Unix environment variable pitfalls - e.g. in certain cases running the scripts as a particular user does not guarantee running them with that user’s environment set up, e.g. via su.
  • If the modules in the directory need to be used only in specific circumstances (e.g. when the script(s) is executed in development/debug mode, you can either set PERL5LIB manually, or pass the -I option to perl.

  • If the modules need to be used only for specific scripts, by all users using them, use use lib/no lib pragmas in the program itself. It also should be used when the directory to be searched needs to be dynamically determined during runtime - e.g. from the script’s command line parameters or script’s path (see the FindBin module for very nice use case).

  • If the directories in @INC need to be manipulated according to some complicated logic, either impossible to too unwieldy to implement by combination of use lib/no lib pragmas, then use direct @INC manipulation inside BEGIN {} block or inside a special purpose library designated for @INC manipulation, which must be used by your script(s) before any other modules are used.

    An example of this is automatically switching between libraries in prod/uat/dev directories, with waterfall library pickup in prod if it’s missing from dev and/or UAT (the last condition makes the standard “use lib + FindBin” solution fairly complicated. A detailed illustration of this scenario is in How do I use beta Perl modules from beta Perl scripts?.

  • An additional use case for directly manipulating @INC is to be able to add subroutine references or object references (yes, Virginia, @INC can contain custom Perl code and not just directory names, as explained in When is a subroutine reference in @INC called?).

Answer 2 (score 17)

In addition to the locations listed above, the OS X version of Perl also has two more ways:

  1. The /Library/Perl/x.xx/AppendToPath file. Paths listed in this file are appended to @INC at runtime.

  2. The /Library/Perl/x.xx/PrependToPath file. Paths listed in this file are prepended to @INC at runtime.

Answer 3 (score 6)

As it was said already @INC is an array and you’re free to add anything you want.

My CGI REST script looks like:

#!/usr/bin/perl
use strict;
use warnings;
BEGIN {
    push @INC, 'fully_qualified_path_to_module_wiht_our_REST.pm';
}
use Modules::Rest;
gone(@_);

Subroutine gone is exported by Rest.pm.

45: What is the meaning of @_ in Perl? (score 140281 in 2010)

Question

What is the meaning of @_ in Perl?

Answer accepted (score 114)

perldoc perlvar is the first place to check for any special-named Perl variable info.

Quoting:

@_: Within a subroutine the array @_ contains the parameters passed to that subroutine.

More details can be found in perldoc perlsub (Perl subroutines) linked from the perlvar:

Any arguments passed in show up in the array @_ .

Therefore, if you called a function with two arguments, those would be stored in $_[0] and $_[1].

The array @_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (or an error occurs if it is not updatable).

If an argument is an array or hash element which did not exist when the function was called, that element is created only when (and if) it is modified or a reference to it is taken. (Some earlier versions of Perl created the element whether or not the element was assigned to.) Assigning to the whole array @_ removes that aliasing, and does not update any arguments.

Answer 2 (score 25)

Usually, you expand the parameters passed to a sub using the @_ variable:

sub test{
  my ($a, $b, $c) = @_;
  ...
}

# call the test sub with the parameters
test('alice', 'bob', 'charlie');

That’s the way claimed to be correct by perlcritic.

Answer 3 (score 10)

The question was what @_ means in Perl. The answer to that question is that, insofar as $_ means it in Perl, @_ similarly means they.

No one seems to have mentioned this critical aspect of its meaning — as well as theirs.

They’re consequently both used as pronouns, or sometimes as topicalizers.

They typically have nominal antecedents, although not always.

46: Submit form and stay on same page? (score 139638 in )

Question

I have a form that looks like this

<form action="receiver.pl" method="post">
  <input name="signed" type="checkbox">
  <input value="Save" type="submit">
</form>

and I would like to stay on the same page, when Submit is clicked, but still have receiver.pl executed.

How should that be done?

Answer accepted (score 63)

The easiest answer: jQuery. Do something like this:

$(document).ready(function(){
   var $form = $('form');
   $form.submit(function(){
      $.post($(this).attr('action'), $(this).serialize(), function(response){
            // do something here on success
      },'json');
      return false;
   });
});

If you want to add content dynamically and still need it to work, and also with more than one form, you can do this:

   $('form').live('submit', function(){
      $.post($(this).attr('action'), $(this).serialize(), function(response){
            // do something here on success
      },'json');
      return false;
   });

Answer 2 (score 63)

99% of the time I would use XMLHttpRequest or fetch for something like this. However, there’s an alternative solution which doesn’t require javascript…

You could include a hidden iframe on your page and set the target attribute of your form to point to that iframe.

<style>
  .hide { position:absolute; top:-1px; left:-1px; width:1px; height:1px; }
</style>

<iframe name="hiddenFrame" class="hide"></iframe>

<form action="receiver.pl" method="post" target="hiddenFrame">
  <input name="signed" type="checkbox">
  <input value="Save" type="submit">
</form>

There are very few scenarios where I would choose this route. Generally handling it with javascript is better because, with javascript you can…

  • gracefully handle errors (e.g. retry)
  • provide UI indicators (e.g. loading, processing, success, failure)
  • run logic before the request is sent, or run logic after the response is received.

Answer 3 (score 35)

The HTTP/CGI way to do this would be for your program to return an HTTP status code of 204 (No Content).

47: Today’s Date in Perl in MM/DD/YYYY format (score 138594 in 2012)

Question

I’m working on a Perl program at work and stuck on (what I think is) a trivial problem. I simply need to build a string in the format ‘06/13/2012’ (always 10 characters, so 0’s for numbers less than 10).

Here’s what I have so far:

use Time::localtime;
$tm=localtime;
my ($day,$month,$year)=($tm->mday,$tm->month,$tm->year);

Answer accepted (score 56)

You can do it fast, only using one POSIX function. If you have bunch of tasks with dates, see the module DateTime.

use POSIX qw(strftime);

my $date = strftime "%m/%d/%Y", localtime;
print $date;

Answer 2 (score 54)

You can use Time::Piece, which shouldn’t need installing as it is a core module and has been distributed with Perl 5 since version 10.

use Time::Piece;

my $date = localtime->strftime('%m/%d/%Y');
print $date;

output

06/13/2012

Update

You may prefer to use the dmy method, which takes a single parameter which is the separator to be used between the fields of the result, and avoids having to specify a full date/time format

my $date = localtime->dmy('/');

This produces an identical result to that of my original solution

Answer 3 (score 12)

use DateTime qw();
DateTime->now->strftime('%m/%d/%Y')   

expression returns 06/13/2012

48: How do I include a Perl module that’s in a different directory? (score 138552 in 2016)

Question

How do I include a Perl module that’s in a different directory? It needs to be a relative path from the module that’s including it.

I’ve tried

push ( @INC,"directory_path/more_path");

also

push ( @INC,"directory_path\\more_path");

Answer accepted (score 68)

EDIT: Putting the right solution first, originally from this question. It’s the only one that searches relative to the module directory:

use FindBin;                 # locate this script
use lib "$FindBin::Bin/..";  # use the parent directory
use yourlib;

There’s many other ways that search for libraries relative to the current directory. You can invoke perl with the -I argument, passing the directory of the other module:

perl -I.. yourscript.pl

You can include a line near the top of your perl script:

use lib '..';

You can modify the environment variable PERL5LIB before you run the script:

export PERL5LIB=$PERL5LIB:..

The push(@INC) strategy can also work, but it has to be wrapped in BEGIN{} to make sure that the push is run before the module search:

BEGIN {push @INC, '..'}
use yourlib;

Answer 2 (score 14)

Most likely the reason your push did not work is order of execution.

use is a compile time directive. You push is done at execution time:

push ( @INC,"directory_path/more_path");
use Foo.pm;  # In directory path/more_path

You can use a BEGIN block to get around this problem:

BEGIN {
    push ( @INC,"directory_path/more_path");
}
use Foo.pm;  # In directory path/more_path

IMO, it’s clearest, and therefore best to use lib:

use lib "directory_path/more_path";
use Foo.pm;  # In directory path/more_path

See perlmod for information about BEGIN and other special blocks and when they execute.

Edit

For loading code relative to your script/library, I strongly endorse File::FindLib

You can say use File::FindLib 'my/test/libs'; to look for a library directory anywhere above your script in the path.

Say your work is structured like this:

   /home/me/projects/
    |- shared/
    |   |- bin/
    |   `- lib/
    `- ossum-thing/
       `- scripts 
           |- bin/
           `- lib/

Inside a script in ossum-thing/scripts/bin:

use File::FindLib 'lib/';
use File::FindLib 'shared/lib/';

Will find your library directories and add them to your @INC.

It’s also useful to create a module that contains all the environment set-up needed to run your suite of tools and just load it in all the executables in your project.

use File::FindLib 'lib/MyEnvironment.pm'

Answer 3 (score 10)

‘use lib’ can also take a single string value…

#!/usr/bin/perl
use lib '<relative-path>';
use <your lib>;

49: Find everything between two XML tags with RegEx (score 137341 in 2018)

Question

In RegEx, I want to find the tag and everything between two XML tags, like the following:

<primaryAddress>
    <addressLine>280 Flinders Mall</addressLine>
    <geoCodeGranularity>PROPERTY</geoCodeGranularity>
    <latitude>-19.261365</latitude>
    <longitude>146.815585</longitude>
    <postcode>4810</postcode>
    <state>QLD</state>
    <suburb>Townsville</suburb>
    <type>PHYSICAL</type>
</primaryAddress>

I want to find the tag and everything between primaryAddress, and erase that.

Everything between the primaryAddress tag is a variable, but I want to remove the entire tag and sub-tags whenever I get primaryAddress.

Anyone have any idea how to do that?

Answer accepted (score 109)

It is not a good idea to use regex for HTML/XML parsing…

However, if you want to do it anyway, search for regex pattern

<primaryAddress>[\s\S]*?<\/primaryAddress>

and replace it with empty string…

Answer 2 (score 15)

You should be able to match it with: /&lt;primaryAddress&gt;(.+?)&lt;\/primaryAddress&gt;/

The content between the tags will be in the matched group.

Answer 3 (score 6)

It is not good to use this method but if you really want to split it with regex

<primaryAddress.*>((.|\n)*?)<\/primaryAddress>
the verified answer returns the tags but this just return the value between tags.

50: Programmatically read from STDIN or input file in Perl (score 135677 in )

Question

What is the slickest way to programatically read from stdin or an input file (if provided) in Perl?

Answer accepted (score 81)

while (<>) {
print;
}

will read either from a file specified on the command line or from stdin if no file is given

If you are required this loop construction in command line, then you may use -n option:

$ perl -ne 'print;'

Here you just put code between {} from first example into '' in second

Answer 2 (score 44)

This provides a named variable to work with:

foreach my $line ( <STDIN> ) {
    chomp( $line );
    print "$line\n";
}

To read a file, pipe it in like this:

program.pl < inputfile

Answer 3 (score 14)

The “slickest” way in certain situations is to take advantage of the -n switch. It implicitly wraps your code with a while(&lt;&gt;) loop and handles the input flexibly.

In slickestWay.pl:

#!/usr/bin/perl -n

BEGIN: {
  # do something once here
}

# implement logic for a single line of input
print $result;

At the command line:

chmod +x slickestWay.pl

Now, depending on your input do one of the following:

  1. Wait for user input

    ./slickestWay.pl
    ```</li>
    <li><p>Read from file(s) named in arguments (no redirection required)</p>
    
    ```perl
    ./slickestWay.pl input.txt
    ./slickestWay.pl input.txt moreInput.txt
    ```</li>
    <li><p>Use a pipe</p>
    
    ```perl
    someOtherScript | ./slickestWay.pl 
    ```</li>
    </ol>
    
    The `BEGIN` block is necessary if you need to initialize some kind of object-oriented interface, such as Text::CSV or some such, which you can add to the shebang with `-M`.  
    
    `-l` and `-p` are also your friends.  
    
    </b> </em> </i> </small> </strong> </sub> </sup>
    
    ### 51: How can I quickly sum all numbers in a file? (score [134412](https://stackoverflow.com/q/2702564.html) in 2010)
    
    #### Question
    I have a file which contains several thousand numbers, each on it's own line:  
    
    ```perl
    34
    42
    11
    6
    2
    99
    ...

    I’m looking to write a script which will print the sum of all numbers in the file. I’ve got a solution, but it’s not very efficient. (It takes several minutes to run.) I’m looking for a more efficient solution. Any suggestions?

Answer accepted (score 105)

For a Perl one-liner, it’s basically the same thing as the awk solution in Ayman Hourieh’s answer:

 % perl -nle '$sum += $_ } END { print $sum'

If you’re curious what Perl one-liners do, you can deparse them:

 %  perl -MO=Deparse -nle '$sum += $_ } END { print $sum'

The result is a more verbose version of the program, in a form that no one would ever write on their own:

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    $sum += $_;
}
sub END {
    print $sum;
}
-e syntax OK

Just for giggles, I tried this with a file containing 1,000,000 numbers (in the range 0 - 9,999). On my Mac Pro, it returns virtually instantaneously. That’s too bad, because I was hoping using mmap would be really fast, but it’s just the same time:

use 5.010;
use File::Map qw(map_file);

map_file my $map, $ARGV[0];

$sum += $1 while $map =~ m/(\d+)/g;

say $sum;

Answer 2 (score 348)

You can use awk:

awk '{ sum += $1 } END { print sum }' file

Answer 3 (score 91)

None of the solution thus far use paste. Here’s one:

paste -sd+ filename | bc

As an example, calculate Σn where 1<=n<=100000:

$ seq 100000 | paste -sd+ | bc -l
5000050000

(For the curious, seq n would print a sequence of numbers from 1 to n given a positive number n.)

52: What does $1 mean in Perl? (score 133432 in 2009)

Question

What does $1 mean in Perl? Further, what does $2 mean? How many $number variables are there?

Answer accepted (score 65)

The $number variables contain the parts of the string that matched the capture groups ( ... ) in the pattern for your last regex match if the match was successful.

For example, take the following string:

$text = "the quick brown fox jumps over the lazy dog.";

After the statement

$text =~ m/ (b.+?) /;

$1 equals the text “brown”.

Answer 2 (score 35)

The number variables are the matches from the last successful match or substitution operator you applied:

my $string = 'abcdefghi';

if( $string =~ /(abc)def(ghi)/ ) {
    print "I found $1 and $2\n";
    }

Always test that the match or substitution was successful before using $1 and so on. Otherwise, you might pick up the leftovers from another operation.

Perl regular expressions are documented in perlre.

Answer 3 (score 11)

$1, $2, etc will contain the value of captures from the last successful match - it’s important to check whether the match succeeded before accessing them, i.e. 

 if ( $var =~ m/( )/ ) { # use $1 etc... }

An example of the problem - $1 contains ‘Quick’ in both print statements below:

#!/usr/bin/perl

'Quick brown fox' =~ m{ ( quick ) }ix;
print "Found: $1\n";

'Lazy dog' =~ m{ ( quick ) }ix;
print "Found: $1\n";

53: Quickly getting to YYYY-mm-dd HH:MM:SS in Perl (score 133065 in 2009)

Question

When writing Perl scripts I frequently find the need to obtain the current time represented as a string formatted as YYYY-mm-dd HH:MM:SS (say 2009-11-29 14:28:29).

In doing this I find myself taking this quite cumbersome path:

  • man perlfunc
  • /localtime to search for localtime - repeat five times (/ + \n) to reach the relevant section of the manpage
  • Copy the string ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time); from the manpage to my script.
  • Try with my $now = sprintf("%04d-%02d-%02d %02d:%02d:%02d", $year, $mon, $mday, $hour, $min, $sec);
  • Remember gotcha #1: Must add 1900 to $year to get current year.
  • Try with my $now = sprintf("%04d-%02d-%02d %02d:%02d:%02d", $year+1900, $mon, $mday, $hour, $min, $sec);
  • Remember gotcha #2: Must add 1 to $mon to get current month.
  • Try with my $now = sprintf("%04d-%02d-%02d %02d:%02d:%02d", $year+1900, $mon+1, $mday, $hour, $min, $sec);
  • Seems ok. Done!

While the process outlined above works it is far from optimal. I’m sure there is a smarter way, so my question is simply:

What is the easiest way to obtain a YYYY-mm-dd HH:MM:SS of the current date/time in Perl?

Where “easy” encompasses both “easy-to-write” and “easy-to-remember”.

Answer accepted (score 72)

Use strftime in the standard POSIX module. The arguments to strftime in Perl’s binding were designed to align with the return values from localtime and gmtime. Compare

strftime(fmt, sec, min, hour, mday, mon, year, wday = -1, yday = -1, isdst = -1)

with

my          ($sec,$min,$hour,$mday,$mon,$year,$wday,     $yday,     $isdst) = gmtime(time);

Example command-line use is

$ perl -MPOSIX -le 'print strftime "%F %T", localtime $^T'

or from a source file as in

use POSIX;

print strftime "%F %T", localtime time;

Some systems do not support the %F and %T shorthands, so you will have to be explicit with

print strftime "%Y-%m-%d %H:%M:%S", localtime time;

or

print strftime "%Y-%m-%d %H:%M:%S", gmtime time;

Note that time returns the current time when called whereas $^T is fixed to the time when your program started. With gmtime, the return value is the current time in GMT. Retrieve time in your local timezone with localtime.

Answer 2 (score 32)

What not use the DateTime module to do the dirty work for you? It’s easy to write and remember!

use strict;
use warnings;
use DateTime;

my $dt   = DateTime->now;   # Stores current date and time as datetime object
my $date = $dt->ymd;   # Retrieves date as a string in 'yyyy-mm-dd' format
my $time = $dt->hms;   # Retrieves time as a string in 'hh:mm:ss' format

my $wanted = "$date $time";   # creates 'yyyy-mm-dd hh:mm:ss' string
print $wanted;

Once you know what’s going on, you can get rid of the temps and save a few lines of code:

use strict;
use warnings;
use DateTime;

my $dt = DateTime->now;
print join ' ', $dt->ymd, $dt->hms;

Answer 3 (score 30)

Try this:

use POSIX qw/strftime/;
print strftime('%Y-%m-%d',localtime);

the strftime method does the job effectively for me. Very simple and efficient.


54: How do I get a file’s last modified time in Perl? (score 131340 in )

Question

Suppose I have a filehandle $fh. I can check its existence with -e $fh or its file size with -s $fh or a slew of additional information about the file. How can I get its last modified time stamp?

Answer accepted (score 97)

You can use the built-in module File::stat (included as of Perl 5.004).

Calling stat($fh) returns an array with the following information about the file handle passed in (from the perlfunc man page for stat):

  0 dev      device number of filesystem
  1 ino      inode number
  2 mode     file mode  (type and permissions)
  3 nlink    number of (hard) links to the file
  4 uid      numeric user ID of file's owner
  5 gid      numeric group ID of file's owner
  6 rdev     the device identifier (special files only)
  7 size     total size of file, in bytes
  8 atime    last access time since the epoch
  9 mtime    last modify time since the epoch
 10 ctime    inode change time (NOT creation time!) since the epoch
 11 blksize  preferred block size for file system I/O
 12 blocks   actual number of blocks allocated

Element number 9 in this array will give you the last modified time since the epoch (00:00 January 1, 1970 GMT). From that you can determine the local time:

my $epoch_timestamp = (stat($fh))[9];
my $timestamp       = localtime($epoch_timestamp);

To avoid the magic number 9 needed in the previous example, additionally use Time::localtime, another built-in module (also included as of Perl 5.004). This requires some (arguably) more legible code:

use File::stat;
use Time::localtime;
my $timestamp = ctime(stat($fh)->mtime);

Answer 2 (score 24)

Use the builtin stat function. Or more specifically:

my $modtime = (stat($fh))[9]

Answer 3 (score 18)

my @array = stat($filehandle);

The modification time is stored in Unix format in $array[9].

Or explicitly:

my ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size,
    $atime, $mtime, $ctime, $blksize, $blocks) = stat($filepath);

  0 dev      Device number of filesystem
  1 ino      inode number
  2 mode     File mode  (type and permissions)
  3 nlink    Number of (hard) links to the file
  4 uid      Numeric user ID of file's owner
  5 gid      Numeric group ID of file's owner
  6 rdev     The device identifier (special files only)
  7 size     Total size of file, in bytes
  8 atime    Last access time in seconds since the epoch
  9 mtime    Last modify time in seconds since the epoch
 10 ctime    inode change time in seconds since the epoch
 11 blksize  Preferred block size for file system I/O
 12 blocks   Actual number of blocks allocated

The epoch was at 00:00 January 1, 1970 GMT.

More information is in stat.

55: Perl regular expression (using a variable as a search string with Perl operator characters included) (score 131154 in 2018)

Question

$text_to_search = "example text with [foo] and more";
$search_string = "[foo]";

if ($text_to_search =~ m/$search_string/)
    print "wee";

Please observe the above code. For some reason I would like to find the text “[foo]” in the $text_to_search variable and print “wee” if I find it. To do this I would have to ensure that the [ and ] is substituted with [ and ] to make Perl treat it as characters instead of operators.

How can I do this without having to first replace [ and ] with \[ and \] using a s/// expression?

Answer 2 (score 62)

Use \Q to autoescape any potentially problematic characters in your variable.

if($text_to_search =~ m/\Q$search_string/) print "wee";

Answer 3 (score 46)

Use the quotemeta function:

$text_to_search = "example text with [foo] and more";
$search_string = quotemeta "[foo]";

print "wee" if ($text_to_search =~ /$search_string/);

56: Turning multiple lines into one line with comma separated (Perl/Sed/AWK) (score 126035 in )

Question

I have the following data in multiple lines:

foo
bar
qux
zuu
sdf
sdfasdf

What I want to do is to convert them with one line and comma separated:

foo,bar,qux,zuu,sdf,sdfasdf

What’s the best unix one-liner to do that?

Answer accepted (score 133)

Using paste command:

paste -d, -s file

Answer 2 (score 65)

file
aaa
bbb
ccc
ddd
xargs
cat file | xargs
result
aaa bbb ccc ddd 
xargs improoved
cat file | xargs | sed -e 's/ /,/g'
result
aaa,bbb,ccc,ddd 

Answer 3 (score 10)

xargs -a your_file | sed 's/ /,/g'

This is a shorter way.

57: Perl: function to trim string leading and trailing whitespace (score 125433 in 2011)

Question

Is there a built-in function to trim leading and trailing whitespace such that trim(" hello world ") eq "hello world"?

Answer 2 (score 82)

Here’s one approach using a regular expression:

$string =~ s/^\s+|\s+$//g ;     # remove both leading and trailing whitespace

Perl 6 will include a trim function:

$string .= trim;

Source: Wikipedia

Answer 3 (score 81)

This is available in String::Util with the trim method:

Editor’s note: String::Util is not a core module, but you can install it from CPAN with [sudo] cpan String::Util.

use String::Util 'trim';
my $str = "  hello  ";
$str = trim($str);
print "string is now: '$str'\n";

prints:

string is now ‘hello’

However it is easy enough to do yourself:

$str =~ s/^\s+//;
$str =~ s/\s+$//;

58: How do I tell if a variable has a numeric value in Perl? (score 125334 in 2009)

Question

Is there a simple way in Perl that will allow me to determine if a given variable is numeric? Something along the lines of:

if (is_number($x))
{ ... }

would be ideal. A technique that won’t throw warnings when the -w switch is being used is certainly preferred.

Answer accepted (score 122)

Use Scalar::Util::looks_like_number() which uses the internal Perl C API’s looks_like_number() function, which is probably the most efficient way to do this. Note that the strings “inf” and “infinity” are treated as numbers.

Example:
#!/usr/bin/perl

use warnings;
use strict;

use Scalar::Util qw(looks_like_number);

my @exprs = qw(1 5.25 0.001 1.3e8 foo bar 1dd inf infinity);

foreach my $expr (@exprs) {
    print "$expr is", looks_like_number($expr) ? '' : ' not', " a number\n";
}

Gives this output:

1 is a number
5.25 is a number
0.001 is a number
1.3e8 is a number
foo is not a number
bar is not a number
1dd is not a number
inf is a number
infinity is a number
See also:

Answer 2 (score 23)

Check out the CPAN module Regexp::Common. I think it does exactly what you need and handles all the edge cases (e.g. real numbers, scientific notation, etc). e.g. 

use Regexp::Common;
if ($var =~ /$RE{num}{real}/) { print q{a number}; }

Answer 3 (score 22)

The original question was how to tell if a variable was numeric, not if it “has a numeric value”.

There are a few operators that have separate modes of operation for numeric and string operands, where “numeric” means anything that was originally a number or was ever used in a numeric context (e.g. in $x = "123"; 0+$x, before the addition, $x is a string, afterwards it is considered numeric).

One way to tell is this:

if ( length( do { no warnings "numeric"; $x & "" } ) ) {
    print "$x is numeric\n";
}

If the bitwise feature is enabled, that makes &amp; only a numeric operator and adds a separate string &amp;. operator, you must disable it:

if ( length( do { no if $] >= 5.022, "feature", "bitwise"; no warnings "numeric"; $x & "" } ) ) {
    print "$x is numeric\n";
}

(bitwise is available in perl 5.022 and above, and enabled by default if you use 5.028; or above.)

59: Match whitespace but not newlines (score 125061 in 2016)

Question

I sometimes want to match whitespace but not newline.

So far I’ve been resorting to [ \t]. Is there a less awkward way?

Answer accepted (score 162)

Perl versions 5.10 and later support subsidiary vertical and horizontal character classes, \v and \h, as well as the generic whitespace character class \s

The cleanest solution is to use the horizontal whitespace character class \h. This will match tab and space from the ASCII set, non-breaking space from extended ASCII, or any of these Unicode characters

U+0009 CHARACTER TABULATION
U+0020 SPACE
U+00A0 NO-BREAK SPACE (not matched by \s)

U+1680 OGHAM SPACE MARK
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE

The vertical space pattern \v is less useful, but matches these characters

U+000A LINE FEED
U+000B LINE TABULATION
U+000C FORM FEED
U+000D CARRIAGE RETURN
U+0085 NEXT LINE (not matched by \s)

U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR

There are seven vertical whitespace characters which match \v and eighteen horizontal ones which match \h. \s matches twenty-three characters

All whitespace characters are either vertical or horizontal with no overlap, but they are not proper subsets because \h also matches U+00A0 NO-BREAK SPACE, and \v also matches U+0085 NEXT LINE, neither of which are matched by \s

Answer 2 (score 335)

Use a double-negative:

/[^\S\r\n]/

That is, not-not-whitespace (the capital S complements) or not-carriage-return or not-newline. Distributing the outer not (i.e., the complementing ^ in the character class) with De Morgan’s law, this is equivalent to “whitespace but not carriage return or newline.” Including both \r and \n in the pattern correctly handles all of Unix (LF), classic Mac OS (CR), and DOS-ish (CR LF) newline conventions.

No need to take my word for it:

#! /usr/bin/env perl

use strict;
use warnings;

use 5.005;  # for qr//

my $ws_not_crlf = qr/[^\S\r\n]/;

for (' ', '\f', '\t', '\r', '\n') {
  my $qq = qq["$_"];
  printf "%-4s => %s\n", $qq,
    (eval $qq) =~ $ws_not_crlf ? "match" : "no match";
}

Output:

" "  => match
"\f" => match
"\t" => match
"\r" => no match
"\n" => no match

Note the exclusion of vertical tab, but this is addressed in v5.18.

Before objecting too harshly, the Perl documentation uses the same technique. A footnote in the “Whitespace” section of perlrecharclass reads

Prior to Perl v5.18, \s did not match the vertical tab. [^\S\cK] (obscurely) matches what \s traditionally did.

The same section of perlrecharclass also suggests other approaches that won’t offend language teachers’ opposition to double-negatives.

Outside locale and Unicode rules or when the /a switch is in effect, “\s matches [\t\n\f\r ] and, starting in Perl v5.18, the vertical tab, \cK.” Discard \r and \n to leave /[\t\f\cK ]/ for matching whitespace but not newline.

If your text is Unicode, use code similar to the sub below to construct a pattern from the table in the aforementioned documentation section.

sub ws_not_nl {
  local($_) = <<'EOTable';
0x0009        CHARACTER TABULATION   h s
0x000a              LINE FEED (LF)    vs
0x000b             LINE TABULATION    vs  [1]
0x000c              FORM FEED (FF)    vs
0x000d        CARRIAGE RETURN (CR)    vs
0x0020                       SPACE   h s
0x0085             NEXT LINE (NEL)    vs  [2]
0x00a0              NO-BREAK SPACE   h s  [2]
0x1680            OGHAM SPACE MARK   h s
0x2000                     EN QUAD   h s
0x2001                     EM QUAD   h s
0x2002                    EN SPACE   h s
0x2003                    EM SPACE   h s
0x2004          THREE-PER-EM SPACE   h s
0x2005           FOUR-PER-EM SPACE   h s
0x2006            SIX-PER-EM SPACE   h s
0x2007                FIGURE SPACE   h s
0x2008           PUNCTUATION SPACE   h s
0x2009                  THIN SPACE   h s
0x200a                  HAIR SPACE   h s
0x2028              LINE SEPARATOR    vs
0x2029         PARAGRAPH SEPARATOR    vs
0x202f       NARROW NO-BREAK SPACE   h s
0x205f   MEDIUM MATHEMATICAL SPACE   h s
0x3000           IDEOGRAPHIC SPACE   h s
EOTable

  my $class;
  while (/^0x([0-9a-f]{4})\s+([A-Z\s]+)/mg) {
    my($hex,$name) = ($1,$2);
    next if $name =~ /\b(?:CR|NL|NEL|SEPARATOR)\b/;
    $class .= "\\N{U+$hex}";
  }

  qr/[$class]/u;
}

Other Applications

The double-negative trick is also handy for matching alphabetic characters too. Remember that \w matches “word characters,” alphabetic characters and digits and underscore. We ugly-Americans sometimes want to write it as, say,

if (/[A-Za-z]+/) { ... }

but a double-negative character-class can respect the locale:

if (/[^\W\d_]+/) { ... }

Expressing “a word character but not digit or underscore” this way is a bit opaque. A POSIX character-class communicates the intent more directly

if (/[[:alpha:]]+/) { ... }

or with a Unicode property as szbalint suggested

if (/\p{Letter}+/) { ... }

Answer 3 (score 44)

A variation on Greg’s answer that includes carriage returns too:

/[^\S\r\n]/

This regex is safer than /[^\S\n]/ with no \r. My reasoning is that Windows uses \r\n for newlines, and Mac OS 9 used \r. You’re unlikely to find \r without \n nowadays, but if you do find it, it couldn’t mean anything but a newline. Thus, since \r can mean a newline, we should exclude it too.

60: How can Perl’s print add a newline by default? (score 124937 in 2010)

Question

In Perl most of my print statements take the form

print "hello." . "\n";

Is there a nice way to avoid keeping all the pesky “”s lying around?

I know I could make a new function such as myprint that automatically appends , but it would be nice if I could override the existing print.

Answer accepted (score 95)

Perl 6 has the say function that automatically appends \n.

You can also use say in Perl 5.10 or 5.12 if you add

use feature qw(say);

to the beginning of your program. Or you can use Modern::Perl to get this and other features.

See perldoc feature for more details.

Answer 2 (score 33)

You can use the -l option in the she-bang header:

#!/usr/bin/perl -l

$text = "hello";

print $text;
print $text;

Output:

hello
hello

Answer 3 (score 24)

If Perl 5.10+ is not an option, here is a quick and dirty approximation. It’s not exactly the same, since say has some magic when its first arg is a handle, but for printing to STDOUT:

sub say {print @_, "\n"}

say 'hello';

61: Negative regex for Perl string pattern match (score 124513 in 2017)

Question

I have this regex:

if($string =~ m/^(Clinton|[^Bush]|Reagan)/i)
  {print "$string\n"};

I want to match with Clinton and Reagan, but not Bush.

It’s not working.

Answer accepted (score 27)

Sample text:

Clinton said
Bush used crayons
Reagan forgot

Just omitting a Bush match:

$ perl -ne 'print if /^(Clinton|Reagan)/' textfile
Clinton said
Reagan forgot

Or if you really want to specify:

$ perl -ne 'print if /^(?!Bush)(Clinton|Reagan)/' textfile
Clinton said
Reagan forgot

Answer 2 (score 132)

Your regex does not work because [] defines a character class, but what you want is a lookahead:

(?=) - Positive look ahead assertion foo(?=bar) matches foo when followed by bar
(?!) - Negative look ahead assertion foo(?!bar) matches foo when not followed by bar
(?<=) - Positive look behind assertion (?<=foo)bar matches bar when preceded by foo
(?<!) - Negative look behind assertion (?<!foo)bar matches bar when NOT preceded by foo
(?>) - Once-only subpatterns (?>\d+)bar Performance enhancing when bar not present
(?(x)) - Conditional subpatterns
(?(3)foo|fu)bar - Matches foo if 3rd subpattern has matched, fu if not
(?#) - Comment (?# Pattern does x y or z)

So try: (?!bush)

Answer 3 (score 17)

Your regex says the following:

/^         - if the line starts with
(          - start a capture group
Clinton|   - "Clinton" 
|          - or
[^Bush]    - Any single character except "B", "u", "s" or "h"
|          - or
Reagan)   - "Reagan". End capture group.
/i         - Make matches case-insensitive 

So, in other words, your middle part of the regex is screwing you up. As it is a “catch-all” kind of group, it will allow any line that does not begin with any of the upper or lower case letters in “Bush”. For example, these lines would match your regex:

Our president, George Bush
In the news today, pigs can fly
012-3123 33

You either make a negative look-ahead, as suggested earlier, or you simply make two regexes:

if( ($string =~ m/^(Clinton|Reagan)/i) and
    ($string !~ m/^Bush/i) ) {
   print "$string\n";
}

As mirod has pointed out in the comments, the second check is quite unnecessary when using the caret (^) to match only beginning of lines, as lines that begin with “Clinton” or “Reagan” could never begin with “Bush”.

However, it would be valid without the carets.

62: In Perl, how do I create a hash whose keys come from a given array? (score 121882 in 2015)

Question

Let’s say I have an array, and I know I’m going to be doing a lot of “Does the array contain X?” checks. The efficient way to do this is to turn that array into a hash, where the keys are the array’s elements, and then you can just say
if($hash{X}) { ... }

Is there an easy way to do this array-to-hash conversion? Ideally, it should be versatile enough to take an anonymous array and return an anonymous hash.

Answer accepted (score 114)

%hash = map { $_ => 1 } @array;

It’s not as short as the “@hash{@array} = …” solutions, but those ones require the hash and array to already be defined somewhere else, whereas this one can take an anonymous array and return an anonymous hash.

What this does is take each element in the array and pair it up with a “1”. When this list of (key, 1, key, 1, key 1) pairs get assigned to a hash, the odd-numbered ones become the hash’s keys, and the even-numbered ones become the respective values.

Answer 2 (score 42)

 @hash{@array} = (1) x @array;

It’s a hash slice, a list of values from the hash, so it gets the list-y @ in front.

From the docs:

If you’re confused about why you use an ‘@’ there on a hash slice instead of a ‘%’, think of it like this. The type of bracket (square or curly) governs whether it’s an array or a hash being looked at. On the other hand, the leading symbol (‘$’ or ‘@’) on the array or hash indicates whether you are getting back a singular value (a scalar) or a plural one (a list).

Answer 3 (score 37)

@hash{@keys} = undef;

The syntax here where you are referring to the hash with an @ is a hash slice. We’re basically saying $hash{$keys[0]} AND $hash{$keys[1]} AND $hash{$keys[2]} … is a list on the left hand side of the =, an lvalue, and we’re assigning to that list, which actually goes into the hash and sets the values for all the named keys. In this case, I only specified one value, so that value goes into $hash{$keys[0]}, and the other hash entries all auto-vivify (come to life) with undefined values. [My original suggestion here was set the expression = 1, which would’ve set that one key to 1 and the others to undef. I changed it for consistency, but as we’ll see below, the exact values do not matter.]

When you realize that the lvalue, the expression on the left hand side of the =, is a list built out of the hash, then it’ll start to make some sense why we’re using that @. [Except I think this will change in Perl 6.]

The idea here is that you are using the hash as a set. What matters is not the value I am assigning; it’s just the existence of the keys. So what you want to do is not something like:

if ($hash{$key} == 1) # then key is in the hash

instead:

if (exists $hash{$key}) # then key is in the set

It’s actually more efficient to just run an exists check than to bother with the value in the hash, although to me the important thing here is just the concept that you are representing a set just with the keys of the hash. Also, somebody pointed out that by using undef as the value here, we will consume less storage space than we would assigning a value. (And also generate less confusion, as the value does not matter, and my solution would assign a value only to the first element in the hash and leave the others undef, and some other solutions are turning cartwheels to build an array of values to go into the hash; completely wasted effort).

63: How can I see if a Perl hash already has a certain key? (score 120275 in 2015)

Question

I have a Perl script that is counting the number of occurrences of various strings in a text file. I want to be able to check if a certain string is not yet a key in the hash. Is there a better way of doing this altogether?

Here is what I am doing:

foreach $line (@lines){
    if(($line =~ m|my regex|) )
    {
        $string = $1;
        if ($string is not a key in %strings) # "strings" is an associative array
        {
            $strings{$string} = 1;
        }
        else
        {
            $n = ($strings{$string});
            $strings{$string} = $n +1;
        }
    }
}

Answer accepted (score 112)

I believe to check if a key exists in a hash you just do

if (exists $strings{$string}) {
    ...
} else {
    ...
}

Answer 2 (score 10)

I would counsel against using if ($hash{$key}) since it will not do what you expect if the key exists but its value is zero or empty.

Answer 3 (score 9)

Well, your whole code can be limited to:

foreach $line (@lines){
        $strings{$1}++ if $line =~ m|my regex|;
}

If the value is not there, ++ operator will assume it to be 0 (and then increment to 1). If it is already there - it will simply be incremented.

64: How do I convert a date/time to epoch time (unix time/seconds since 1970) in Perl? (score 119553 in 2018)

Question

Given a date/time as an array of (year, month, day, hour, minute, second), how would you convert it to epoch time, i.e., the number of seconds since 1970-01-01 00:00:00 GMT?

Bonus question: If given the date/time as a string, how would you first parse it into the (y,m,d,h,m,s) array?

Answer accepted (score 20)

This is the simplest way to get unix time:

use Time::Local;
timelocal($second,$minute,$hour,$day,$month-1,$year);

Note the reverse order of the arguments and that January is month 0. For many more options, see the DateTime module from CPAN.

As for parsing, see the Date::Parse module from CPAN. If you really need to get fancy with date parsing, the Date::Manip may be helpful, though its own documentation warns you away from it since it carries a lot of baggage (it knows things like common business holidays, for example) and other solutions are much faster.

If you happen to know something about the format of the date/times you’ll be parsing then a simple regular expression may suffice but you’re probably better off using an appropriate CPAN module. For example, if you know the dates will always be in YMDHMS order, use the CPAN module DateTime::Format::ISO8601.


For my own reference, if nothing else, below is a function I use for an application where I know the dates will always be in YMDHMS order with all or part of the “HMS” part optional. It accepts any delimiters (eg, “2009-02-15” or “2009.02.15”). It returns the corresponding unix time (seconds since 1970-01-01 00:00:00 GMT) or -1 if it couldn’t parse it (which means you better be sure you’ll never legitimately need to parse the date 1969-12-31 23:59:59). It also presumes two-digit years XX up to “69” refer to “20XX”, otherwise “19XX” (eg, “50-02-15” means 2050-02-15 but “75-02-15” means 1975-02-15).

use Time::Local;

sub parsedate { 
  my($s) = @_;
  my($year, $month, $day, $hour, $minute, $second);

  if($s =~ m{^\s*(\d{1,4})\W*0*(\d{1,2})\W*0*(\d{1,2})\W*0*
                 (\d{0,2})\W*0*(\d{0,2})\W*0*(\d{0,2})}x) {
    $year = $1;  $month = $2;   $day = $3;
    $hour = $4;  $minute = $5;  $second = $6;
    $hour |= 0;  $minute |= 0;  $second |= 0;  # defaults.
    $year = ($year<100 ? ($year<70 ? 2000+$year : 1900+$year) : $year);
    return timelocal($second,$minute,$hour,$day,$month-1,$year);  
  }
  return -1;
}

Answer 2 (score 34)

If you’re using the DateTime module, you can call the epoch() method on a DateTime object, since that’s what you think of as unix time.

Using DateTimes allows you to convert fairly easily from epoch, to date objects.

Alternativly, localtime and gmtime will convert an epoch into an array containing day month and year, and timelocal and timegm from the Time::Local module will do the opposite, converting an array of time elements (seconds, minutes, …, days, months etc.) into an epoch.

Answer 3 (score 17)

To parse a date, look at Date::Parse in CPAN.

65: How can I start an interactive console for Perl? (score 117894 in 2015)

Question

How can I start an interactive console for Perl, similar to the irb command for Ruby or python for Python?

Answer accepted (score 285)

You can use the perl debugger on a trivial program, like so:

perl -de1

Alternatively there’s Alexis Sukrieh’s Perl Console application, but I haven’t used it.

Answer 2 (score 59)

Not only did Matt Trout write an article about a REPL, he actually wrote one - Devel::REPL

I’ve used it a bit and it works fairly well, and it’s under active development.

BTW, I have no idea why someone modded down the person who mentioned using “perl -e” from the console. This isn’t really a REPL, true, but it’s fantastically useful, and I use it all the time.

Answer 3 (score 35)

I wrote a script I call “psh”:

#! /usr/bin/perl

while (<>) {
  chomp;
  my $result = eval;
  print "$_ = $result\n";
}

Whatever you type in, it evaluates in Perl:

> gmtime(2**30)
gmtime(2**30) = Sat Jan 10 13:37:04 2004

> $x = 'foo'
$x = 'foo' = foo

> $x =~ s/o/a/g
$x =~ s/o/a/g = 2

> $x
$x = faa

66: How can I output UTF-8 from Perl? (score 115758 in 2013)

Question

I am trying to write a Perl script using the “utf8” pragma, and I’m getting unexpected results. I’m using Mac OS X 10.5 (Leopard), and I’m editing with TextMate. All of my settings for both my editor and operating system are defaulted to writing files in utf-8 format.

However, when I enter the following into a text file, save it as a “.pl”, and execute it, I get the friendly “diamond with a question mark” in place of the non-ASCII characters.

#!/usr/bin/env perl -w

use strict;
use utf8;

my $str = 'Çirçös';
print( "$str\n" );

Any idea what I’m doing wrong? I expect to get ‘Çirçös’ in the output, but I get ‘�ir��s’ instead.

Answer accepted (score 154)

use utf8; does not enable Unicode output - it enables you to type Unicode in your program. Add this to the program, before your print() statement:

binmode(STDOUT, ":utf8");

See if that helps. That should make STDOUT output in UTF-8 instead of ordinary ASCII.

Answer 2 (score 82)

You can use the open pragma.

For eg. below sets STDOUT, STDIN & STDERR to use UTF-8….

use open qw/:std :utf8/;

Answer 3 (score 64)

TMTOWTDI, chose the method that best fits how you work. I use the environment method so I don’t have to think about it.

In the environment:

export PERL_UNICODE=SDL

on the command line:

perl -CSDL -le 'print "\x{1815}"';

or with binmode:

binmode(STDOUT, ":utf8");          #treat as if it is UTF-8
binmode(STDIN, ":encoding(utf8)"); #actually check if it is UTF-8

or with PerlIO:

open my $fh, ">:utf8", $filename
    or die "could not open $filename: $!\n";

open my $fh, "<:encoding(utf-8)", $filename
    or die "could not open $filename: $!\n";

or with the open pragma:

use open ":encoding(utf8)";
use open IN => ":encoding(utf8)", OUT => ":utf8";

67: How can I install a CPAN module into a local directory? (score 115048 in 2017)

Question

I’m using a hosted Linux machine so I don’t have permissions to write into the /usr/lib directory.

When I try to install a CPAN module by doing the usual:

perl Makefile.PL
make test
make install

That module is extracted to a blib/lib/ folder. I have kept use blib/lib/ModuleName but it still the compiler says module can not be found. I have tried copying the .pm file into local directory and kept require ModuleName but still it gives me some error.

How can I install a module into some other directory and use it?

Answer accepted (score 36)

I had a similar problem, where I couldn’t even install local::lib

I created an installer that installed the module somewhere relative to the .pl files

The install goes like:

perl Makefile.PL PREFIX=./modulos
make
make install

Then, in the .pl file that requires the module, which is in ./

use lib qw(./modulos/share/perl/5.8.8/); # You may need to change this path
use module::name;

The rest of the files (makefile.pl, module.pm, etc) require no changes.

You can call the .pl file with just

perl file.pl

Answer 2 (score 61)

Other answers already on Stackoverflow:

From perlfaq8:


How do I keep my own module/library directory?

When you build modules, tell Perl where to install the modules.

For Makefile.PL-based distributions, use the INSTALL_BASE option when generating Makefiles:

perl Makefile.PL INSTALL_BASE=/mydir/perl

You can set this in your CPAN.pm configuration so modules automatically install in your private library directory when you use the CPAN.pm shell:

% cpan
cpan> o conf makepl_arg INSTALL_BASE=/mydir/perl
cpan> o conf commit

For Build.PL-based distributions, use the –install_base option:

perl Build.PL --install_base /mydir/perl

You can configure CPAN.pm to automatically use this option too:

% cpan
cpan> o conf mbuildpl_arg '--install_base /mydir/perl'
cpan> o conf commit

Answer 3 (score 19)

local::lib will help you. It will convince “make install” (and “Build install”) to install to a directory you can write to, and it will tell perl how to get at those modules.

In general, if you want to use a module that is in a blib/ directory, you want to say perl -Mblib ... where ... is how you would normally invoke your script.

68: The correct way to read a data file into an array (score 114674 in 2014)

Question

I have a data file, with each line having one number, like

10
20
30
40

How do I read this file and store the data into an array?

So that I can conduct some operations on this array.

Answer accepted (score 77)

Just reading the file into an array, one line per element, is trivial:

open my $handle, '<', $path_to_file;
chomp(my @lines = <$handle>);
close $handle;

Now the lines of the file are in the array @lines.

If you want to make sure there is error handling for open and close, do something like this (in the snipped below, we open the file in UTF-8 mode, too):

my $handle;
unless (open $handle, "<:encoding(utf8)", $path_to_file) {
   print STDERR "Could not open file '$path_to_file': $!\n";
   # we return 'undefined', we could also 'die' or 'croak'
   return undef
}
chomp(my @lines = <$handle>);
unless (close $handle) {
   # what does it mean if close yields an error and you are just reading?
   print STDERR "Don't care error while closing '$path_to_file': $!\n";
} 

Answer 2 (score 11)

There is the easiest method, using File::Slurp module:

use File::Slurp;
my @lines = read_file("filename", chomp => 1); # will chomp() each line

If you need some validation for each line you can use grep in front of read_file.

For example, filter lines which contain only integers:

my @lines = grep { /^\d+$/ } read_file("filename", chomp => 1);

Answer 3 (score 10)

I like…

@data = `cat /var/tmp/somefile`;

It’s not as glamorous as others, but, it works all the same. And…

$todays_data = '/var/tmp/somefile' ;
open INFILE, "$todays_data" ; 
@data = <INFILE> ; 
close INFILE ;

Cheers.

69: How to see if a directory exists or not in Perl? (score 113090 in )

Question

To see if a file exists before using it, we can use:

if (-e "filename.cgi")
{
 #proceed with your code
} 

But how to indentify a directory exists or not?

Answer accepted (score 93)

Use -d (full list of file tests)

if (-d "cgi-bin") {
    # directory called cgi-bin exists
}
elsif (-e "cgi-bin") {
    # cgi-bin exists but is not a directory
}
else {
    # nothing called cgi-bin exists
}

As a note, -e doesn’t distinguish between files and directories. To check if something exists and is a plain file, use -f.

70: How to efficiently calculate a running standard deviation? (score 113026 in 2019)

Question

I have an array of lists of numbers, e.g.:

[0] (0.01, 0.01, 0.02, 0.04, 0.03)
[1] (0.00, 0.02, 0.02, 0.03, 0.02)
[2] (0.01, 0.02, 0.02, 0.03, 0.02)
     ...
[n] (0.01, 0.00, 0.01, 0.05, 0.03)

What I would like to do is efficiently calculate the mean and standard deviation at each index of a list, across all array elements.

To do the mean, I have been looping through the array and summing the value at a given index of a list. At the end, I divide each value in my “averages list” by n (I am working with a population, not a sample from the population).

To do the standard deviation, I loop through again, now that I have the mean calculated.

I would like to avoid going through the array twice, once for the mean and then once for the SD (after I have a mean).

Is there an efficient method for calculating both values, only going through the array once? Any code in an interpreted language (e.g. Perl or Python) or pseudocode is fine.

Answer accepted (score 106)

The answer is to use Welford’s algorithm, which is very clearly defined after the “naive methods” in:

It’s more numerically stable than either the two-pass or online simple sum of squares collectors suggested in other responses. The stability only really matters when you have lots of values that are close to each other as they lead to what is known as “catastrophic cancellation” in the floating point literature.

You might also want to brush up on the difference between dividing by the number of samples (N) and N-1 in the variance calculation (squared deviation). Dividing by N-1 leads to an unbiased estimate of variance from the sample, whereas dividing by N on average underestimates variance (because it doesn’t take into account the variance between the sample mean and the true mean).

I wrote two blog entries on the topic which go into more details, including how to delete previous values online:

You can also take a look at my Java implement; the javadoc, source, and unit tests are all online:

Answer 2 (score 70)

The basic answer is to accumulate the sum of both x (call it ‘sum_x1’) and x2 (call it ‘sum_x2’) as you go. The value of the standard deviation is then:

stdev = sqrt((sum_x2 / n) - (mean * mean)) 

where

mean = sum_x / n

This is the sample standard deviation; you get the population standard deviation using ‘n’ instead of ‘n - 1’ as the divisor.

You may need to worry about the numerical stability of taking the difference between two large numbers if you are dealing with large samples. Go to the external references in other answers (Wikipedia, etc) for more information.

Answer 3 (score 26)

Perhaps not what you were asking, but … If you use a numpy array, it will do the work for you, efficiently:

from numpy import array

nums = array(((0.01, 0.01, 0.02, 0.04, 0.03),
              (0.00, 0.02, 0.02, 0.03, 0.02),
              (0.01, 0.02, 0.02, 0.03, 0.02),
              (0.01, 0.00, 0.01, 0.05, 0.03)))

print nums.std(axis=1)
# [ 0.0116619   0.00979796  0.00632456  0.01788854]

print nums.mean(axis=1)
# [ 0.022  0.018  0.02   0.02 ]

By the way, there’s some interesting discussion in this blog post and comments on one-pass methods for computing means and variances:

71: Counting array elements in Perl (score 112206 in 2016)

Question

How do I get the total items in an array, NOT the last id?

None of two ways I found to do this works:

my @a;
# Add some elements (no consecutive ids)
$a[0]= '1';
$a[5]= '2';
$a[23]= '3';

print $#a, "\n"; # Prints 23
print scalar(@a), "\n"; # Prints 24

I expected to get 3…

Answer accepted (score 39)

Edit: Hash versus Array

As cincodenada correctly pointed out in the comment, ysth gave a better answer: I should have answered your question with another question: “Do you really want to use a Perl array? A hash may be more appropriate.”

An array allocates memory for all possible indices up to the largest used so-far. In your example, you allocate 24 cells (but use only 3). By contrast, a hash only allocates space for those fields that are actually used.

Array solution: scalar grep

Here are two possible solutions (see below for explanation):

print scalar(grep {defined $_} @a), "\n";  # prints 3
print scalar(grep $_, @a), "\n";            # prints 3

Explanation: After adding $a[23], your array really contains 24 elements — but most of them are undefined (which also evaluates as false). You can count the number of defined elements (as done in the first solution) or the number of true elements (second solution).

What is the difference? If you set $a[10]=0, then the first solution will count it, but the second solution won’t (because 0 is false but defined). If you set $a[3]=undef, none of the solutions will count it.

Hash solution (by yst)

As suggested by another solution, you can work with a hash and avoid all the problems:

$a{0}  = 1;
$a{5}  = 2;
$a{23} = 3;
print scalar(keys %a), "\n";  # prints 3

This solution counts zeros and undef values.

Answer 2 (score 16)

It sounds like you want a sparse array. A normal array would have 24 items in it, but a sparse array would have 3. In Perl we emulate sparse arrays with hashes:

#!/usr/bin/perl

use strict;
use warnings;

my %sparse;

@sparse{0, 5, 23} = (1 .. 3);

print "there are ", scalar keys %sparse, " items in the sparse array\n",
    map { "\t$sparse{$_}\n" } sort { $a <=> $b } keys %sparse;

The keys function in scalar context will return the number of items in the sparse array. The only downside to using a hash to emulate a sparse array is that you must sort the keys before iterating over them if their order is important.

You must also remember to use the delete function to remove items from the sparse array (just setting their value to undef is not enough).

Answer 3 (score 14)

Maybe you want a hash instead (or in addition). Arrays are an ordered set of elements; if you create $foo[23], you implicitly create $foo[0] through $foo[22].

72: Better way to remove specific characters from a Perl string (score 110624 in 2015)

Question

I have dynamically generated strings like @#@!efq@!#!, and I want to remove specific characters from the string using Perl.

Currently I am doing something this (replacing the characters with nothing):

$varTemp =~ s/['\$','\#','\@','\~','\!','\&','\*','\(','\)','\[','\]','\;','\.','\,','\:','\?','\^',' ', '\`','\\','\/']//g;

Is there a better way of doing this? I am fooking for something clean.

Answer accepted (score 24)

You’ve misunderstood how character classes are used:

$varTemp =~ s/[\$#@~!&*()\[\];.,:?^ `\\\/]+//g;

does the same as your regex (assuming you didn’t mean to remove ' characters from your strings).

Edit: The + allows several of those “special characters” to match at once, so it should also be faster.

Answer 2 (score 17)

You could use the tr instead:

       $p =~ tr/fo//d;

will delete every f and every o from $p. In your case it should be:

       $p =~ tr/\$#@~!&*()[];.,:?^ `\\\///d

Answer 3 (score 7)

With a character class this big it is easier to say what you want to keep. A caret in the first position of a character class inverts its sense, so you can write

$varTemp =~ s/[^"%'+\-0-9<=>a-z_{|}]+//gi

or, using the more efficient tr

$varTemp =~ tr/"%'+\-0-9<=>A-Z_a-z{|}//cd

73: how to remove the first two columns in a file using shell (awk, sed, whatever) (score 109506 in 2016)

Question

I have a file with many lines in each line there are many columns(fields) separated by blank " " the numbers of columns in each line are different I want to remove the first two columns how to?

Answer 2 (score 138)

You can do it with cut:

cut -d " " -f 3- input_filename > output_filename

Explanation:

  • cut: invoke the cut command
  • -d " ": use a single space as the delimiter (cut uses TAB by default)
  • -f: specify fields to keep
  • 3-: all the fields starting with field 3
  • input_filename: use this file as the input
  • &gt; output_filename: write the output to this file.

Alternatively, you can do it with awk:

awk '{$1=""; $2=""; sub("  ", " "); print}' input_filename > output_filename

Explanation:

  • awk: invoke the awk command
  • $1=""; $2="";: set field 1 and 2 to the empty string
  • sub(...);: clean up the output fields because fields 1 & 2 will still be delimited by " "
  • print: print the modified line
  • input_filename &gt; output_filename: same as above.

Answer 3 (score 22)

Here’s one way to do it with Awk that’s relatively easy to understand:

awk '{print substr($0, index($0, $3))}'

This is a simple awk command with no pattern, so action inside {} is run for every input line.

The action is to simply prints the substring starting with the position of the 3rd field.

  • $0: the whole input line
  • $3: 3rd field
  • index(in, find): returns the position of find in string in
  • substr(string, start): return a substring starting at index start

If you want to use a different delimiter, such as comma, you can specify it with the -F option:

awk -F"," '{print substr($0, index($0, $3))}'

You can also operate this on a subset of the input lines by specifying a pattern before the action in {}. Only lines matching the pattern will have the action run.

awk 'pattern{print substr($0, index($0, $3))}'

Where pattern can be something such as:

  • /abcdef/: use regular expression, operates on $0 by default.
  • $1 ~ /abcdef/: operate on a specific field.
  • $1 == blabla: use string comparison
  • NR &gt; 1: use record/line number
  • NF &gt; 0: use field/column number

74: What’s the safest way to iterate through the keys of a Perl hash? (score 109288 in 2008)

Question

If I have a Perl hash with a bunch of (key, value) pairs, what is the preferred method of iterating through all the keys? I have heard that using each may in some way have unintended side effects. So, is that true, and is one of the two following methods best, or is there a better way?

# Method 1
while (my ($key, $value) = each(%hash)) {
    # Something
}

# Method 2
foreach my $key (keys(%hash)) {
    # Something
}

Answer accepted (score 192)

The rule of thumb is to use the function most suited to your needs.

If you just want the keys and do not plan to ever read any of the values, use keys():

foreach my $key (keys %hash) { ... }

If you just want the values, use values():

foreach my $val (values %hash) { ... }

If you need the keys and the values, use each():

keys %hash; # reset the internal iterator so a prior each() doesn't affect the loop
while(my($k, $v) = each %hash) { ... }

If you plan to change the keys of the hash in any way except for deleting the current key during the iteration, then you must not use each(). For example, this code to create a new set of uppercase keys with doubled values works fine using keys():

%h = (a => 1, b => 2);

foreach my $k (keys %h)
{
  $h{uc $k} = $h{$k} * 2;
}

producing the expected resulting hash:

(a => 1, A => 2, b => 2, B => 4)

But using each() to do the same thing:

%h = (a => 1, b => 2);

keys %h;
while(my($k, $v) = each %h)
{
  $h{uc $k} = $h{$k} * 2; # BAD IDEA!
}

produces incorrect results in hard-to-predict ways. For example:

(a => 1, A => 2, b => 2, B => 8)

This, however, is safe:

keys %h;
while(my($k, $v) = each %h)
{
  if(...)
  {
    delete $h{$k}; # This is safe
  }
}

All of this is described in the perl documentation:

% perldoc -f keys
% perldoc -f each

Answer 2 (score 25)

One thing you should be aware of when using each is that it has the side effect of adding “state” to your hash (the hash has to remember what the “next” key is). When using code like the snippets posted above, which iterate over the whole hash in one go, this is usually not a problem. However, you will run into hard to track down problems (I speak from experience ;), when using each together with statements like last or return to exit from the while ... each loop before you have processed all keys.

In this case, the hash will remember which keys it has already returned, and when you use each on it the next time (maybe in a totaly unrelated piece of code), it will continue at this position.

Example:

my %hash = ( foo => 1, bar => 2, baz => 3, quux => 4 );

# find key 'baz'
while ( my ($k, $v) = each %hash ) {
    print "found key $k\n";
    last if $k eq 'baz'; # found it!
}

# later ...

print "the hash contains:\n";

# iterate over all keys:
while ( my ($k, $v) = each %hash ) {
    print "$k => $v\n";
}

This prints:

found key bar
found key baz
the hash contains:
quux => 4
foo => 1

What happened to keys “bar” and baz"? They’re still there, but the second each starts where the first one left off, and stops when it reaches the end of the hash, so we never see them in the second loop.

Answer 3 (score 20)

The place where each can cause you problems is that it’s a true, non-scoped iterator. By way of example:

while ( my ($key,$val) = each %a_hash ) {
    print "$key => $val\n";
    last if $val; #exits loop when $val is true
}

# but "each" hasn't reset!!
while ( my ($key,$val) = each %a_hash ) {
    # continues where the last loop left off
    print "$key => $val\n";
}

If you need to be sure that each gets all the keys and values, you need to make sure you use keys or values first (as that resets the iterator). See the documentation for each.

75: What is the difference between my and our in Perl? (score 107621 in 2015)

Question

I know what my is in Perl. It defines a variable that exists only in the scope of the block in which it is defined. What does our do? How does our differ from my?

Answer accepted (score 211)

Great question: How does our differ from my and what does our do?

In Summary:

Available since Perl 5, my is a way to declare:

  • non-package variables, that are
  • private,
  • new,
  • non-global variables,
  • separate from any package. So that the variable cannot be accessed in the form of $package_name::variable.


On the other hand, our variables are:

  • package variables, and thus automatically
  • global variables,
  • definitely not private,
  • nor are they necessarily new; and they
  • can be accessed outside the package (or lexical scope) with the qualified namespace, as $package_name::variable.


Declaring a variable with our allows you to predeclare variables in order to use them under use strict without getting typo warnings or compile-time errors. Since Perl 5.6, it has replaced the obsolete use vars, which was only file-scoped, and not lexically scoped as is our.

For example, the formal, qualified name for variable $x inside package main is $main::x. Declaring our $x allows you to use the bare $x variable without penalty (i.e., without a resulting error), in the scope of the declaration, when the script uses use strict or use strict "vars". The scope might be one, or two, or more packages, or one small block.

Answer 2 (score 59)

The PerlMonks and PerlDoc links from cartman and Olafur are a great reference - below is my crack at a summary:

my variables are lexically scoped within a single block defined by {} or within the same file if not in {}s. They are not accessible from packages/subroutines defined outside of the same lexical scope / block.

our variables are scoped within a package/file and accessible from any code that use or require that package/file - name conflicts are resolved between packages by prepending the appropriate namespace.

Just to round it out, local variables are “dynamically” scoped, differing from my variables in that they are also accessible from subroutines called within the same block.

Answer 3 (score 48)

An example:

use strict;

for (1 .. 2){
    # Both variables are lexically scoped to the block.
    our ($o);  # Belongs to 'main' package.
    my  ($m);  # Does not belong to a package.

    # The variables differ with respect to newness.
    $o ++;
    $m ++;
    print __PACKAGE__, " >> o=$o m=$m\n";  # $m is always 1.

    # The package has changed, but we still have direct,
    # unqualified access to both variables, because the
    # lexical scope has not changed.
    package Fubb;
    print __PACKAGE__, " >> o=$o m=$m\n";
}

# The our() and my() variables differ with respect to privacy.
# We can still access the variable declared with our(), provided
# that we fully qualify its name, but the variable declared
# with my() is unavailable.
print __PACKAGE__, " >> main::o=$main::o\n";  # 2
print __PACKAGE__, " >> main::m=$main::m\n";  # Undefined.

# Attempts to access the variables directly won't compile.
# print __PACKAGE__, " >> o=$o\n";
# print __PACKAGE__, " >> m=$m\n";

# Variables declared with use vars() are like those declared
# with our(): belong to a package; not private; and not new.
# However, their scoping is package-based rather than lexical.
for (1 .. 9){
    use vars qw($uv);
    $uv ++;
}

# Even though we are outside the lexical scope where the
# use vars() variable was declared, we have direct access
# because the package has not changed.
print __PACKAGE__, " >> uv=$uv\n";

# And we can access it from another package.
package Bubb;
print __PACKAGE__, " >> main::uv=$main::uv\n";

76: How can I list all of the files in a directory with Perl? (score 107466 in 2014)

Question

Is there a function in Perl that lists all the files and directories in a directory? I remember that Java has the File.list() to do this? Is there a comparable method in Perl?

Answer accepted (score 83)

If you want to get content of given directory, and only it (i.e. no subdirectories), the best way is to use opendir/readdir/closedir:

opendir my $dir, "/some/path" or die "Cannot open directory: $!";
my @files = readdir $dir;
closedir $dir;

You can also use:

my @files = glob( $dir . '/*' );

But in my opinion it is not as good - mostly because glob is quite complex thing (can filter results automatically) and using it to get all elements of directory seems as a too simple task.

On the other hand, if you need to get content from all of the directories and subdirectories, there is basically one standard solution:

use File::Find;

my @content;
find( \&wanted, '/some/path');
do_something_with( @content );

exit;

sub wanted {
  push @content, $File::Find::name;
  return;
}

Answer 2 (score 12)

readdir() does that.

Check http://perldoc.perl.org/functions/readdir.html

opendir(DIR, $some_dir) || die "can't opendir $some_dir: $!";
@dots = grep { /^\./ && -f "$some_dir/$_" } readdir(DIR);
closedir DIR;

Answer 3 (score 11)

Or File::Find

use File::Find;
finddepth(\&wanted, '/some/path/to/dir');
sub wanted { print };

It’ll go through subdirectories if they exist.

77: How do I sleep for a millisecond in Perl? (score 107325 in 2012)

Question

How do I sleep for shorter than a second in Perl?

Answer accepted (score 104)

From the Perldoc page on sleep:

For delays of finer granularity than one second, the Time::HiRes module (from CPAN, and starting from Perl 5.8 part of the standard distribution) provides usleep().

Actually, it provides usleep() (which sleeps in microseconds) and nanosleep() (which sleeps in nanoseconds). You may want usleep(), which should let you deal with easier numbers. 1 millisecond sleep (using each):

use strict;
use warnings;

use Time::HiRes qw(usleep nanosleep);

# 1 millisecond == 1000 microseconds
usleep(1000);
# 1 microsecond == 1000 nanoseconds
nanosleep(1000000);

If you don’t want to (or can’t) load a module to do this, you may also be able to use the built-in select() function:

# Sleep for 250 milliseconds
select(undef, undef, undef, 0.25);

Answer 2 (score 34)

Time::HiRes:

  use Time::HiRes;
  Time::HiRes::sleep(0.1); #.1 seconds
  Time::HiRes::usleep(1); # 1 microsecond.

http://perldoc.perl.org/Time/HiRes.html

Answer 3 (score 12)

From perlfaq8:


How can I sleep() or alarm() for under a second?

If you want finer granularity than the 1 second that the sleep() function provides, the easiest way is to use the select() function as documented in select in perlfunc. Try the Time::HiRes and the BSD::Itimer modules (available from CPAN, and starting from Perl 5.8 Time::HiRes is part of the standard distribution).

78: In Perl, how to remove ^M from a file? (score 106742 in 2017)

Question

I have a script that is appending new fields to an existing CSV, however ^M characters are appearing at the end of the old lines so the new fields end up on a new row instead of the same one. How do I remove ^M characters from a CSV file using Perl?

Answer accepted (score 14)

You found out you can also do this:

$line=~ tr/\015//d;

Answer 2 (score 46)

^M is carriage return. You can do this:

$str =~ s/\r//g

Answer 3 (score 23)

Or a 1-liner:

perl -p -i -e 's/\r\n$/\n/g' file1.txt file2.txt ... filen.txt

79: Split a string into array in Perl (score 106685 in 2018)

Question

my $line = "file1.gz file2.gz file3.gz";
my @abc = split('', $line);
print "@abc\n";

Expected output:

file1.gz
file2.gz
file3.gz

I want the output to be file1.gz in $abc[0], file2.gz in $abc[1], and file3.gz in $abc[2]. How do I split $line?

Answer accepted (score 15)

Splitting a string by whitespace is very simple:

print $_, "\n" for split ' ', 'file1.gz file1.gz file3.gz';

This is a special form of split actually (as this function usually takes patterns instead of strings):

As another special case, split emulates the default behavior of the command line tool awk when the PATTERN is either omitted or a literal string composed of a single space character (such as ' ' or "\x20"). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were /\s+/; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator.


Here’s an answer for the original question (with a simple string without any whitespace):

Perhaps you want to split on .gz extension:

my $line = "file1.gzfile1.gzfile3.gz";
my @abc = split /(?<=\.gz)/, $line;
print $_, "\n" for @abc;

Here I used (?&lt;=...) construct, which is look-behind assertion, basically making split at each point in the line preceded by .gz substring.

If you work with the fixed set of extensions, you can extend the pattern to include them all:

my $line = "file1.gzfile2.txtfile2.gzfile3.xls";
my @exts = ('txt', 'xls', 'gz');
my $patt = join '|', map { '(?<=\.' . $_ . ')' } @exts;
my @abc = split /$patt/, $line;
print $_, "\n" for @abc;

Answer 2 (score 10)

Having $line as it is now, you can simply split the string based on at least one whitespace separator

my @answer = split(' ', $line); # creates an @answer array

then

print("@answer\n");               # print array on one line

or

print("$_\n") for (@answer);      # print each element on one line

I prefer using () for split, print and for.

Answer 3 (score -1)

I found this one to be very simple!

my $line = "file1.gz file2.gz file3.gz";

my @abc =  ($line =~ /(\w+[.]\w+)/g);

print $abc[0],"\n";
print $abc[1],"\n";
print $abc[2],"\n";

output:

file1.gz 
file2.gz 
file3.gz

Here take a look at this tutorial to find more on Perl regular expression and scroll down to More matching section.

80: Printing everything except the first field with awk (score 105846 in 2018)

Question

I have a file that looks like this:

AE  United Arab Emirates
AG  Antigua & Barbuda
AN  Netherlands Antilles
AS  American Samoa
BA  Bosnia and Herzegovina
BF  Burkina Faso
BN  Brunei Darussalam

And I ’d like to invert the order, printing first everything except $1 and then $1:

United Arab Emirates AE

How can I do the “everything except field 1” trick?

Answer accepted (score 81)

Assigning $1 works but it will leave a leading space: awk '{first = $1; $1 = ""; print $0, first; }'

You can also find the number of columns in NF and use that in a loop.

Answer 2 (score 101)

$1="" leaves a space as Ben Jackson mentioned, so use a for loop:

awk '{for (i=2; i<=NF; i++) print $i}' filename

So if your string was “one two three”, the output will be:

two
three

If you want the result in one row, you could do as follows:

awk '{for (i=2; i<NF; i++) printf $i " "; print $NF}' filename

This will give you: “two three”

Answer 3 (score 62)

Use the cut command with the --complement option:

$ echo a b c | cut -f 1 -d ' '
a
$ echo a b c | cut -f 1,2 -d ' '
a b
$ echo a b c | cut -f 1 -d ' ' --complement
b c

81: “End of script output before headers” error in Apache (score 105429 in 2014)

Question

Apache on Windows gives me the following error when I try to access my Perl script:

Server error!

The server encountered an internal error and was unable to complete your request.

Error message: 
End of script output before headers: sample.pl

If you think this is a server error, please contact the webmaster.

Error 500

localhost
Apache/2.4.4 (Win32) OpenSSL/1.0.1e PHP/5.5.3

this is my sample script

#!"C:\xampp\perl\bin\perl.exe"
print "Hello World";

but not working on browser

Answer 2 (score 18)

Check file permissions.

I had exactly the same error on a Linux machine with the wrong permissions set.

chmod 755 myfile.pl

solved the problem.

Answer 3 (score 15)

If this is a CGI script for the web, then you must output your header:

#!"C:\xampp\perl\bin\perl.exe"

print "Content-Type: text/html\n\n";
print "Hello World";

The following error message tells you this End of script output before headers: sample.pl

Or even better, use the CGI module to output the header:

#!"C:\xampp\perl\bin\perl.exe"

use strict;
use warnings;

use CGI;

print CGI::header();
print "Hello World";

82: Neatest way to remove linebreaks in Perl (score 105050 in 2009)

Question

I’m maintaining a script that can get its input from various sources, and works on it per line. Depending on the actual source used, linebreaks might be Unix-style, Windows-style or even, for some aggregated input, mixed(!).

When reading from a file it goes something like this:

@lines = <IN>;
process(\@lines);

...

sub process {
    @lines = shift;
    foreach my $line (@{$lines}) {
        chomp $line;
        #Handle line by line
    }
}

So, what I need to do is replace the chomp with something that removes either Unix-style or Windows-style linebreaks. I’m coming up with way too many ways of solving this, one of the usual drawbacks of Perl :)

What’s your opinion on the neatest way to chomp off generic linebreaks? What would be the most efficient?

Edit: A small clarification - the method ‘process’ gets a list of lines from somewhere, not nessecarily read from a file. Each line might have

  • No trailing linebreaks
  • Unix-style linebreaks
  • Windows-style linebreaks
  • Just Carriage-Return (when original data has Windows-style linebreaks and is read with $/ = ‘’)
  • An aggregated set where lines have different styles

Answer accepted (score 88)

After digging a bit through the perlre docs a bit, I’ll present my best suggestion so far that seems to work pretty good. Perl 5.10 added the character class as a generalized linebreak:

$line =~ s/\R//g;

It’s the same as:

(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])

I’ll keep this question open a while yet, just to see if there’s more nifty ways waiting to be suggested.

Answer 2 (score 12)

Whenever I go through input and want to remove or replace characters I run it through little subroutines like this one.

sub clean {

    my $text = shift;

    $text =~ s/\n//g;
    $text =~ s/\r//g;

    return $text;
}

It may not be fancy but this method has been working flawless for me for years.

Answer 3 (score 7)

Reading perlport I’d suggest something like

$line =~ s/\015?\012?$//;

to be safe for whatever platform you’re on and whatever linefeed style you may be processing because what’s in may differ through different Perl flavours.

83: Grep to find item in Perl array (score 105013 in 2017)

Question

Every time I input something the code always tells me that it exists. But I know some of the inputs do not exist. What is wrong?

#!/usr/bin/perl

@array = <>;
print "Enter the word you what to match\n";
chomp($match = <STDIN>);

if (grep($match, @array)) {
    print "found it\n";
}

Answer accepted (score 30)

The first arg that you give to grep needs to evaluate as true or false to indicate whether there was a match. So it should be:

# note that grep returns a list, so $matched needs to be in brackets to get the 
# actual value, otherwise $matched will just contain the number of matches
if (my ($matched) = grep $_ eq $match, @array) {
    print "found it: $matched\n";
}

If you need to match on a lot of different values, it might also be worth for you to consider putting the array data into a hash, since hashes allow you to do this efficiently without having to iterate through the list.

# convert array to a hash with the array elements as the hash keys and the values are simply 1
my %hash = map {$_ => 1} @array;

# check if the hash contains $match
if (defined $hash{$match}) {
    print "found it\n";
}

Answer 2 (score 27)

You seem to be using grep() like the Unix grep utility, which is wrong.

Perl’s grep() in scalar context evaluates the expression for each element of a list and returns the number of times the expression was true. So when $match contains any “true” value, grep($match, @array) in scalar context will always return the number of elements in @array.

Instead, try using the pattern matching operator:

if (grep /$match/, @array) {
    print "found it\n";
}

Answer 3 (score 2)

In addition to what eugene and stevenl posted, you might encounter problems with using both &lt;&gt; and &lt;STDIN&gt; in one script: &lt;&gt; iterates through (=concatenating) all files given as command line arguments.

However, should a user ever forget to specify a file on the command line, it will read from STDIN, and your code will wait forever on input

84: automatically get loop index in foreach loop in perl (score 104712 in )

Question

If I have the following array in Perl:

@x = qw(a b c);

and I iterate over it with foreach, then $_ will refer to the current element in the array:

foreach (@x) {
    print;
}

will print:

abc

Is there a similar way to get the index of the current element, without manually updating a counter? Something such as:

foreach (@x) {
    print $index;
}

where $index is updated like $_ to yield the output:

012

Answer accepted (score 106)

Like codehead said, you’d have to iterate over the array indices instead of its elements. I prefer this variant over the C-style for loop:

for my $i (0 .. $#x) {
    print "$i: $x[$i]\n";
}

Answer 2 (score 43)

In Perl prior to 5.10, you can say

#!/usr/bin/perl

use strict;
use warnings;

my @a = qw/a b c d e/;

my $index;
for my $elem (@a) {
    print "At index ", $index++, ", I saw $elem\n";
}

#or

for my $index (0 .. $#a) {
    print "At index $index I saw $a[$elem]\n";
}    

In Perl 5.10, you use state to declare a variable that never gets reinitialized (unlike ones create with my). This lets you keep the $index variable in a smaller scope, but can lead to bugs (if you enter the loop a second time it will still have the last value):

#!/usr/bin/perl

use 5.010;
use strict;
use warnings;

my @a = qw/a b c d e/;

for my $elem (@a) {
    state $index;
    say "At index ", $index++, ", I saw $elem";
}

In Perl 5.12 you can say

#!/usr/bin/perl

use 5.012; #this enables strict
use warnings;

my @a = qw/a b c d e/;

while (my ($index, $elem) = each @a) {
    say "At index $index I saw $elem";
}

But be warned: you there are restrictions to what you are allowed to do with @a while iterating over it with each.

It won’t help you now, but in Perl 6 you will be able to say

#!/usr/bin/perl6

my @a = <a b c d e>;
for @a Z 0 .. Inf -> $elem, $index {
    say "at index $index, I saw $elem"
}

The Z operator zips the two lists together (i.e. it takes one element from the first list, then one element from the second, then one element from the first, and so on). The second list is a lazy list that contains every integer from 0 to infinity (at least theoretically). The -&gt; $elem, $index says that we are taking two values at a time from the result of the zip. The rest should look normal to you (unless you are not familiar with the say function from 5.10 yet).

Answer 3 (score 22)

perldoc perlvar does not seem to suggest any such variable.

85: Search and replace a particular string in a file using Perl (score 104110 in 2017)

Question

Possible Duplicate:
How to replace a string in an existing file in Perl?

I need to create a subroutine that does a search and replace in file.

Here’s the contents of myfiletemplate.txt:

CATEGORY1=youknow_<PREF>  
CATEGORY2=your/<PREF>/goes/here/

Here’s my replacement string: ABCD

I need to replace all instances of &lt;PREF&gt; to ABCD

Answer accepted (score 22)

Quick and dirty:

#!/usr/bin/perl -w

use strict;

open(FILE, "</tmp/yourfile.txt") || die "File not found";
my @lines = <FILE>;
close(FILE);

foreach(@lines) {
   $_ =~ s/<PREF>/ABCD/g;
}

open(FILE, ">/tmp/yourfile.txt") || die "File not found";
print FILE @lines;
close(FILE);

Perhaps it i a good idea not to write the result back to your original file; instead write it to a copy and check the result first.

Answer 2 (score 41)

A one liner:

perl -pi.back -e 's/<PREF>/ABCD/g;' inputfile

Answer 3 (score 16)

You could also do this:

#!/usr/bin/perl

use strict;
use warnings;

$^I = '.bak'; # create a backup copy 

while (<>) {
   s/<PREF>/ABCD/g; # do the replacement
   print; # print to the modified file
}

Invoke the script with by

./script.pl input_file

You will get a file named input_file, containing your changes, and a file named input_file.bak, which is simply a copy of the original file.

86: How do I update all my CPAN modules to their latest versions? (score 103764 in 2016)

Question

How do I update all my CPAN modules to their latest versions?

Answer accepted (score 139)

An alternative method to using upgrade from the default CPAN shell is to use cpanminus and cpan-outdated.

These are so easy and nimble to use that I hardly ever go back to CPAN shell. To upgrade all of your modules in one go, the command is:

cpan-outdated -p | cpanm

I recommend you install cpanminus like the docs describe:

curl -L https://cpanmin.us | perl - App::cpanminus

And then install cpan-outdated along with all other CPAN modules using cpanm:

cpanm App::cpanoutdated

BTW: If you are using perlbrew then you will need to repeat this for every Perl you have installed under it.

You can find out more about cpanminus and cpan-outdated at the Github repos here:

Answer 2 (score 48)

An easy way to upgrade all Perl packages (CPAN modules) is the following way:

cpan upgrade /(.*)/

cpan will recognize the regular expression like this and will update/upgrade all packages installed.

Answer 3 (score 31)

For Strawberry Perl, try:

cpan -u

87: How should I do integer division in Perl? (score 103299 in 2009)

Question

What is a good way to always do integer division in Perl?

For example, I want:

real / int = int

int / real = int

int / int = int

Answer accepted (score 44)

You can cast ints in Perl:

int(5/1.5) = 3;

Answer 2 (score 83)

The lexically scoped integer pragma forces Perl to use integer arithmetic in its scope:

print 3.0/2.1 . "\n";    # => 1.42857142857143
{
  use integer;
  print 3.0/2.1 . "\n";  # => 1
}
print 3.0/2.1 . "\n";    # => 1.42857142857143

Answer 3 (score 5)

int(x+.5) will round positive values toward the nearest integer. Rounding up is harder.

To round toward zero:

int($x)

For the solutions below, include the following statement:

use POSIX;

To round down: POSIX::floor($x)

To round up: POSIX::ceil($x)

To round away from zero: POSIX::floor($x) - int($x) + POSIX::ceil($x)

To round off to the nearest integer: POSIX::floor($x+.5)

Note that int($x+.5) fails badly for negative values. int(-2.1+.5) is int(-1.6), which is -1.

88: Rename Files and Directories (Add Prefix) (score 101108 in 2011)

Question

I would like to add prefix on all folders and directories.

Example:

I have

Hi.jpg
1.txt
folder/
this.file_is.here.png
another_folder.ok/

I would like to add prefix “PRE_”

PRE_Hi.jpg
PRE_1.txt
PRE_folder/
PRE_this.file_is.here.png
PRE_another_folder.ok/

Regards,

Answer 2 (score 192)

Thanks to Peter van der Heijden, here’s one that’ll work for filenames with spaces in them:

for f in * ; do mv -- "$f" "PRE_$f" ; done

(“–” is needed to succeed with files that begin with dashes, whose names would otherwise be interpreted as switches for the mv command)

Answer 3 (score 76)

Use the rename script this way:

$ rename 's/^/PRE_/' *

There are no problems with metacharacters or whitespace in filenames.

89: How can I extract substrings from a string in Perl? (score 98783 in 2009)

Question

Consider the following strings:

  1. Scheme ID: abc-456-hu5t10 (High priority) *****

  2. Scheme ID: frt-78f-hj542w (Balanced)

  3. Scheme ID: 23f-f974-nm54w (super formula run) *****

and so on in the above format - the parts in bold are changes across the strings.

==> Imagine I’ve many strings of format Shown above. I want to pick 3 substrings (As shown in BOLD below) from the each of the above strings.

  • 1st substring containing the alphanumeric value (in eg above it’s “abc-456-hu5t10”)
  • 2nd substring containing the word (in eg above it’s “High priority”)
  • 3rd substring containing * (IF * is present at the end of the string ELSE leave it )

How do I pick these 3 substrings from each string shown above? I know it can be done using regular expressions in Perl… Can you help with this?

Answer accepted (score 32)

You could do something like this:

my $data = <<END;
1) Scheme ID: abc-456-hu5t10 (High priority) *
2) Scheme ID: frt-78f-hj542w (Balanced)
3) Scheme ID: 23f-f974-nm54w (super formula run) *
END

foreach (split(/\n/,$data)) {
  $_ =~ /Scheme ID: ([a-z0-9-]+)\s+\(([^)]+)\)\s*(\*)?/ || next;
  my ($id,$word,$star) = ($1,$2,$3);
  print "$id $word $star\n";
}

The key thing is the Regular expression:

Scheme ID: ([a-z0-9-]+)\s+\(([^)]+)\)\s*(\*)?

Which breaks up as follows.

The fixed String “Scheme ID:”:

Scheme ID: 

Followed by one or more of the characters a-z, 0-9 or -. We use the brackets to capture it as $1:

([a-z0-9-]+)

Followed by one or more whitespace characters:

\s+

Followed by an opening bracket (which we escape) followed by any number of characters which aren’t a close bracket, and then a closing bracket (escaped). We use unescaped brackets to capture the words as $2:

\(([^)]+)\)

Followed by some spaces any maybe a *, captured as $3:

\s*(\*)?

Answer 2 (score 4)

You could use a regular expression such as the following:

/([-a-z0-9]+)\s*\((.*?)\)\s*(\*)?/

So for example:

$s = "abc-456-hu5t10 (High priority) *";
$s =~ /([-a-z0-9]+)\s*\((.*?)\)\s*(\*)?/;
print "$1\n$2\n$3\n";

prints

abc-456-hu5t10
High priority
*

Answer 3 (score 3)

(\S*)\s*\((.*?)\)\s*(\*?)


(\S*)    picks up anything which is NOT whitespace
\s*      0 or more whitespace characters
\(       a literal open parenthesis
(.*?)    anything, non-greedy so stops on first occurrence of...
\)       a literal close parenthesis
\s*      0 or more whitespace characters
(\*?)    0 or 1 occurances of literal *

90: Why does modern Perl avoid UTF-8 by default? (score 97673 in 2018)

Question

I wonder why most modern solutions built using Perl don’t enable UTF-8 by default.

I understand there are many legacy problems for core Perl scripts, where it may break things. But, from my point of view, in the 21st century, big new projects (or projects with a big perspective) should make their software UTF-8 proof from scratch. Still I don’t see it happening. For example, Moose enables strict and warnings, but not Unicode. Modern::Perl reduces boilerplate too, but no UTF-8 handling.

Why? Are there some reasons to avoid UTF-8 in modern Perl projects in the year 2011?


Commenting @tchrist got too long, so I’m adding it here.

It seems that I did not make myself clear. Let me try to add some things.

tchrist and I see situation pretty similarly, but our conclusions are completely in opposite ends. I agree, the situation with Unicode is complicated, but this is why we (Perl users and coders) need some layer (or pragma) which makes UTF-8 handling as easy as it must be nowadays.

tchrist pointed to many aspects to cover, I will read and think about them for days or even weeks. Still, this is not my point. tchrist tries to prove that there is not one single way “to enable UTF-8”. I have not so much knowledge to argue with that. So, I stick to live examples.

I played around with Rakudo and UTF-8 was just there as I needed. I didn’t have any problems, it just worked. Maybe there are some limitation somewhere deeper, but at start, all I tested worked as I expected.

Shouldn’t that be a goal in modern Perl 5 too? I stress it more: I’m not suggesting UTF-8 as the default character set for core Perl, I suggest the possibility to trigger it with a snap for those who develop new projects.

Another example, but with a more negative tone. Frameworks should make development easier. Some years ago, I tried web frameworks, but just threw them away because “enabling UTF-8” was so obscure. I did not find how and where to hook Unicode support. It was so time-consuming that I found it easier to go the old way. Now I saw here there was a bounty to deal with the same problem with Mason 2: How to make Mason2 UTF-8 clean?. So, it is pretty new framework, but using it with UTF-8 needs deep knowledge of its internals. It is like a big red sign: STOP, don’t use me!

I really like Perl. But dealing with Unicode is painful. I still find myself running against walls. Some way tchrist is right and answers my questions: new projects don’t attract UTF-8 because it is too complicated in Perl 5.

Answer accepted (score 1134)


🌴 🐪🐫🐪🐫🐪 🌞 𝕲𝖔 𝕿𝖍𝖔𝖚 𝖆𝖓𝖉 𝕯𝖔 𝕷𝖎𝖐𝖊𝖜𝖎𝖘𝖊 🌞 🐪🐫🐪 🐁

𝓔𝓭𝓲𝓽 : 𝙎𝙞𝙢𝙥𝙡𝙚𝙨𝙩 : 𝟕 𝘿𝙞𝙨𝙘𝙧𝙚𝙩𝙚 𝙍𝙚𝙘𝙤𝙢𝙢𝙚𝙣𝙙𝙖𝙩𝙞𝙤𝙣𝙨
  1. Set your PERL_UNICODE envariable to AS. This makes all Perl scripts decode @ARGV as UTF‑8 strings, and sets the encoding of all three of stdin, stdout, and stderr to UTF‑8. Both these are global effects, not lexical ones.

  2. At the top of your source file (program, module, library, dohickey), prominently assert that you are running perl version 5.12 or better via:

    use v5.12;  # minimal for unicode string feature
    use v5.14;  # optimal for unicode string feature
    ```</li>
    <li><p>Enable warnings, since the previous declaration only enables strictures and features, not warnings. I also suggest promoting Unicode warnings into exceptions, so use both these lines, not just one of them.  Note however that under v5.14, the `utf8` warning class comprises three other subwarnings which can all be separately enabled: `nonchar`, `surrogate`, and `non_unicode`. These you may wish to exert greater control over.</p>
    
    ```perl
    use warnings;
    use warnings qw( FATAL utf8 );
    ```</li>
    <li><p>Declare that this source unit is encoded as UTF‑8. Although once upon a time this pragma did other things, it now serves this one singular purpose alone and no other:</p>
    
    ```perl
    use utf8;
    ```</li>
    <li><p>Declare that anything that opens a filehandle <em>within this lexical scope but not elsewhere</em> is to assume that that stream is encoded in UTF‑8 unless you tell it otherwise. That way you do not affect other module’s or other program’s code.</p>
    
    ```perl
    use open qw( :encoding(UTF-8) :std );
    ```</li>
    <li><p>Enable named characters via `\N{CHARNAME}`.</p>
    
    ```perl
    use charnames qw( :full :short );
    ```</li>
    <li><p>If you have a `DATA` handle, you must explicitly set its encoding. If you want this to be UTF‑8, then say:</p>
    
    ```perl
    binmode(DATA, ":encoding(UTF-8)");
    ```</li>
    </ol>
    
    There is of course no end of other matters with which you may eventually find yourself concerned, but these will suffice to approximate the state goal to “make everything just work with UTF‑8”, albeit for a somewhat weakened sense of those terms.   
    
    One other pragma, although it is not Unicode related, is:  
    
    ```perl
          use autodie;

    It is strongly recommended.


    🎅 𝕹 𝖔 𝕸 𝖆 𝖌 𝖎 𝖈 𝕭 𝖚 𝖑 𝖑 𝖊 𝖙 🎅

    Saying that “Perl should [somehow!] enable Unicode by default” doesn’t even start to begin to think about getting around to saying enough to be even marginally useful in some sort of rare and isolated case. Unicode is much much more than just a larger character repertoire; it’s also how those characters all interact in many, many ways.

    Even the simple-minded minimal measures that (some) people seem to think they want are guaranteed to miserably break millions of lines of code, code that has no chance to “upgrade” to your spiffy new Brave New World modernity.

    It is way way way more complicated than people pretend. I’ve thought about this a huge, whole lot over the past few years. I would love to be shown that I am wrong. But I don’t think I am. Unicode is fundamentally more complex than the model that you would like to impose on it, and there is complexity here that you can never sweep under the carpet. If you try, you’ll break either your own code or somebody else’s. At some point, you simply have to break down and learn what Unicode is about. You cannot pretend it is something it is not.

    🐪 goes out of its way to make Unicode easy, far more than anything else I’ve ever used. If you think this is bad, try something else for a while. Then come back to 🐪: either you will have returned to a better world, or else you will bring knowledge of the same with you so that we can make use of your new knowledge to make 🐪 better at these things.


    💡 𝕴𝖉𝖊𝖆𝖘 𝖋𝖔𝖗 𝖆 𝖀𝖓𝖎𝖈𝖔𝖉𝖊 ⸗ 𝕬𝖜𝖆𝖗𝖊 🐪 𝕷𝖆𝖚𝖓𝖉𝖗𝖞 𝕷𝖎𝖘𝖙 💡

    At a minimum, here are some things that would appear to be required for 🐪 to “enable Unicode by default”, as you put it:

    1. All 🐪 source code should be in UTF-8 by default. You can get that with use utf8 or export PERL5OPTS=-Mutf8.

    2. The 🐪 DATA handle should be UTF-8. You will have to do this on a per-package basis, as in binmode(DATA, ":encoding(UTF-8)").

    3. Program arguments to 🐪 scripts should be understood to be UTF-8 by default. export PERL_UNICODE=A, or perl -CA, or export PERL5OPTS=-CA.

    4. The standard input, output, and error streams should default to UTF-8. export PERL_UNICODE=S for all of them, or I, O, and/or E for just some of them. This is like perl -CS.

    5. Any other handles opened by 🐪 should be considered UTF-8 unless declared otherwise; export PERL_UNICODE=D or with i and o for particular ones of these; export PERL5OPTS=-CD would work. That makes -CSAD for all of them.

    6. Cover both bases plus all the streams you open with export PERL5OPTS=-Mopen=:utf8,:std. See uniquote.

    7. You don’t want to miss UTF-8 encoding errors. Try export PERL5OPTS=-Mwarnings=FATAL,utf8. And make sure your input streams are always binmoded to :encoding(UTF-8), not just to :utf8.

    8. Code points between 128–255 should be understood by 🐪 to be the corresponding Unicode code points, not just unpropertied binary values. use feature "unicode_strings" or export PERL5OPTS=-Mfeature=unicode_strings. That will make uc("\xDF") eq "SS" and "\xE9" =~ /\w/. A simple export PERL5OPTS=-Mv5.12 or better will also get that.

    9. Named Unicode characters are not by default enabled, so add export PERL5OPTS=-Mcharnames=:full,:short,latin,greek or some such. See uninames and tcgrep.

    10. You almost always need access to the functions from the standard Unicode::Normalize module various types of decompositions. export PERL5OPTS=-MUnicode::Normalize=NFD,NFKD,NFC,NFKD, and then always run incoming stuff through NFD and outbound stuff from NFC. There’s no I/O layer for these yet that I’m aware of, but see nfc, nfd, nfkd, and nfkc.

    11. String comparisons in 🐪 using eq, ne, lc, cmp, sort, &c&cc are always wrong. So instead of @a = sort @b, you need @a = Unicode::Collate-&gt;new-&gt;sort(@b). Might as well add that to your export PERL5OPTS=-MUnicode::Collate. You can cache the key for binary comparisons.

    12. 🐪 built-ins like printf and write do the wrong thing with Unicode data. You need to use the Unicode::GCString module for the former, and both that and also the Unicode::LineBreak module as well for the latter. See uwc and unifmt.

    13. If you want them to count as integers, then you are going to have to run your \d+ captures through the Unicode::UCD::num function because 🐪’s built-in atoi(3) isn’t currently clever enough.

    14. You are going to have filesystem issues on 👽 filesystems. Some filesystems silently enforce a conversion to NFC; others silently enforce a conversion to NFD. And others do something else still. Some even ignore the matter altogether, which leads to even greater problems. So you have to do your own NFC/NFD handling to keep sane.

    15. All your 🐪 code involving a-z or A-Z and such MUST BE CHANGED, including m//, s///, and tr///. It’s should stand out as a screaming red flag that your code is broken. But it is not clear how it must change. Getting the right properties, and understanding their casefolds, is harder than you might think. I use unichars and uniprops every single day.

    16. Code that uses \p{Lu} is almost as wrong as code that uses [A-Za-z]. You need to use \p{Upper} instead, and know the reason why. Yes, \p{Lowercase} and \p{Lower} are different from \p{Ll} and \p{Lowercase_Letter}.

    17. Code that uses [a-zA-Z] is even worse. And it can’t use \pL or \p{Letter}; it needs to use \p{Alphabetic}. Not all alphabetics are letters, you know!

    18. If you are looking for 🐪 variables with /[\$\@\%]\w+/, then you have a problem. You need to look for /[\$\@\%]\p{IDS}\p{IDC}*/, and even that isn’t thinking about the punctuation variables or package variables.

    19. If you are checking for whitespace, then you should choose between \h and \v, depending. And you should never use \s, since it DOES NOT MEAN [\h\v], contrary to popular belief.

    20. If you are using \n for a line boundary, or even \r\n, then you are doing it wrong. You have to use \R, which is not the same!

    21. If you don’t know when and whether to call Unicode::Stringprep, then you had better learn.

    22. Case-insensitive comparisons need to check for whether two things are the same letters no matter their diacritics and such. The easiest way to do that is with the standard Unicode::Collate module. Unicode::Collate-&gt;new(level =&gt; 1)-&gt;cmp($a, $b). There are also eq methods and such, and you should probably learn about the match and substr methods, too. These are have distinct advantages over the 🐪 built-ins.

    23. Sometimes that’s still not enough, and you need the Unicode::Collate::Locale module instead, as in Unicode::Collate::Locale-&gt;new(locale =&gt; "de__phonebook", level =&gt; 1)-&gt;cmp($a, $b) instead. Consider that Unicode::Collate::-&gt;new(level =&gt; 1)-&gt;eq("d", "ð") is true, but Unicode::Collate::Locale-&gt;new(locale=&gt;"is",level =&gt; 1)-&gt;eq("d", " ð") is false. Similarly, “ae” and “æ” are eq if you don’t use locales, or if you use the English one, but they are different in the Icelandic locale. Now what? It’s tough, I tell you. You can play with ucsort to test some of these things out.

    24. Consider how to match the pattern CVCV (consonsant, vowel, consonant, vowel) in the string “niño”. Its NFD form — which you had darned well better have remembered to put it in — becomes “nino”. Now what are you going to do? Even pretending that a vowel is [aeiou] (which is wrong, by the way), you won’t be able to do something like (?=[aeiou])\X) either, because even in NFD a code point like ‘ø’ does not decompose! However, it will test equal to an ‘o’ using the UCA comparison I just showed you. You can’t rely on NFD, you have to rely on UCA.


    💩 𝔸 𝕤 𝕤 𝕦 𝕞 𝕖 𝔹 𝕣 𝕠 𝕜 𝕖 𝕟 𝕟 𝕖 𝕤 𝕤 💩

    And that’s not all. There are million broken assumptions that people make about Unicode. Until they understand these things, their 🐪 code will be broken.

    1. Code that assumes it can open a text file without specifying the encoding is broken.

    2. Code that assumes the default encoding is some sort of native platform encoding is broken.

    3. Code that assumes that web pages in Japanese or Chinese take up less space in UTF‑16 than in UTF‑8 is wrong.

    4. Code that assumes Perl uses UTF‑8 internally is wrong.

    5. Code that assumes that encoding errors will always raise an exception is wrong.

    6. Code that assumes Perl code points are limited to 0x10_FFFF is wrong.

    7. Code that assumes you can set $/ to something that will work with any valid line separator is wrong.

    8. Code that assumes roundtrip equality on casefolding, like lc(uc($s)) eq $s or uc(lc($s)) eq $s, is completely broken and wrong. Consider that the uc("σ") and uc("ς") are both "Σ", but lc("Σ") cannot possibly return both of those.

    9. Code that assumes every lowercase code point has a distinct uppercase one, or vice versa, is broken. For example, "ª" is a lowercase letter with no uppercase; whereas both "ᵃ" and "ᴬ" are letters, but they are not lowercase letters; however, they are both lowercase code points without corresponding uppercase versions. Got that? They are not \p{Lowercase_Letter}, despite being both \p{Letter} and \p{Lowercase}.

    10. Code that assumes changing the case doesn’t change the length of the string is broken.

    11. Code that assumes there are only two cases is broken. There’s also titlecase.

    12. Code that assumes only letters have case is broken. Beyond just letters, it turns out that numbers, symbols, and even marks have case. In fact, changing the case can even make something change its main general category, like a \p{Mark} turning into a \p{Letter}. It can also make it switch from one script to another.

    13. Code that assumes that case is never locale-dependent is broken.

    14. Code that assumes Unicode gives a fig about POSIX locales is broken.

    15. Code that assumes you can remove diacritics to get at base ASCII letters is evil, still, broken, brain-damaged, wrong, and justification for capital punishment.

    16. Code that assumes that diacritics \p{Diacritic} and marks \p{Mark} are the same thing is broken.

    17. Code that assumes \p{GC=Dash_Punctuation} covers as much as \p{Dash} is broken.

    18. Code that assumes dash, hyphens, and minuses are the same thing as each other, or that there is only one of each, is broken and wrong.

    19. Code that assumes every code point takes up no more than one print column is broken.

    20. Code that assumes that all \p{Mark} characters take up zero print columns is broken.

    21. Code that assumes that characters which look alike are alike is broken.

    22. Code that assumes that characters which do not look alike are not alike is broken.

    23. Code that assumes there is a limit to the number of code points in a row that just one \X can match is wrong.

    24. Code that assumes \X can never start with a \p{Mark} character is wrong.

    25. Code that assumes that \X can never hold two non-\p{Mark} characters is wrong.

    26. Code that assumes that it cannot use "\x{FFFF}" is wrong.

    27. Code that assumes a non-BMP code point that requires two UTF-16 (surrogate) code units will encode to two separate UTF-8 characters, one per code unit, is wrong. It doesn’t: it encodes to single code point.

    28. Code that transcodes from UTF‐16 or UTF‐32 with leading BOMs into UTF‐8 is broken if it puts a BOM at the start of the resulting UTF-8. This is so stupid the engineer should have their eyelids removed.

    29. Code that assumes the CESU-8 is a valid UTF encoding is wrong. Likewise, code that thinks encoding U+0000 as "\xC0\x80" is UTF-8 is broken and wrong. These guys also deserve the eyelid treatment.

    30. Code that assumes characters like &gt; always points to the right and &lt; always points to the left are wrong — because they in fact do not.

    31. Code that assumes if you first output character X and then character Y, that those will show up as XY is wrong. Sometimes they don’t.

    32. Code that assumes that ASCII is good enough for writing English properly is stupid, shortsighted, illiterate, broken, evil, and wrong. Off with their heads! If that seems too extreme, we can compromise: henceforth they may type only with their big toe from one foot (the rest still be ducktaped).

    33. Code that assumes that all \p{Math} code points are visible characters is wrong.

    34. Code that assumes \w contains only letters, digits, and underscores is wrong.

    35. Code that assumes that ^ and ~ are punctuation marks is wrong.

    36. Code that assumes that ü has an umlaut is wrong.

    37. Code that believes things like contain any letters in them is wrong.

    38. Code that believes \p{InLatin} is the same as \p{Latin} is heinously broken.

    39. Code that believe that \p{InLatin} is almost ever useful is almost certainly wrong.

    40. Code that believes that given $FIRST_LETTER as the first letter in some alphabet and $LAST_LETTER as the last letter in that same alphabet, that [${FIRST_LETTER}-${LAST_LETTER}] has any meaning whatsoever is almost always complete broken and wrong and meaningless.

    41. Code that believes someone’s name can only contain certain characters is stupid, offensive, and wrong.

    42. Code that tries to reduce Unicode to ASCII is not merely wrong, its perpetrator should never be allowed to work in programming again. Period. I’m not even positive they should even be allowed to see again, since it obviously hasn’t done them much good so far.

    43. Code that believes there’s some way to pretend textfile encodings don’t exist is broken and dangerous. Might as well poke the other eye out, too.

    44. Code that converts unknown characters to ? is broken, stupid, braindead, and runs contrary to the standard recommendation, which says NOT TO DO THAT! RTFM for why not.

    45. Code that believes it can reliably guess the encoding of an unmarked textfile is guilty of a fatal mélange of hubris and naïveté that only a lightning bolt from Zeus will fix.

    46. Code that believes you can use 🐪 printf widths to pad and justify Unicode data is broken and wrong.

    47. Code that believes once you successfully create a file by a given name, that when you run ls or readdir on its enclosing directory, you’ll actually find that file with the name you created it under is buggy, broken, and wrong. Stop being surprised by this!

    48. Code that believes UTF-16 is a fixed-width encoding is stupid, broken, and wrong. Revoke their programming licence.

    49. Code that treats code points from one plane one whit differently than those from any other plane is ipso facto broken and wrong. Go back to school.

    50. Code that believes that stuff like /s/i can only match "S" or "s" is broken and wrong. You’d be surprised.

    51. Code that uses \PM\pM* to find grapheme clusters instead of using \X is broken and wrong.

    52. People who want to go back to the ASCII world should be whole-heartedly encouraged to do so, and in honor of their glorious upgrade they should be provided gratis with a pre-electric manual typewriter for all their data-entry needs. Messages sent to them should be send via an ᴀʟʟᴄᴀᴘs telegraph at 40 characters per line and hand-delivered by a courier. STOP.


    🎁 🐪 𝕭𝖔𝖎𝖑𝖊𝖗⸗𝖕𝖑𝖆𝖙𝖊 𝖋𝖔𝖗 𝖀𝖓𝖎𝖈𝖔𝖉𝖊⸗𝕬𝖜𝖆𝖗𝖊 𝕮𝖔𝖉𝖊 🐪 🎁

    My own boilerplate these days tends to look like this:

    use 5.014;
    
    use utf8;
    use strict;
    use autodie;
    use warnings; 
    use warnings    qw< FATAL  utf8     >;
    use open        qw< :std  :utf8     >;
    use charnames   qw< :full >;
    use feature     qw< unicode_strings >;
    
    use File::Basename      qw< basename >;
    use Carp                qw< carp croak confess cluck >;
    use Encode              qw< encode decode >;
    use Unicode::Normalize  qw< NFD NFC >;
    
    END { close STDOUT }
    
    if (grep /\P{ASCII}/ => @ARGV) { 
       @ARGV = map { decode("UTF-8", $_) } @ARGV;
    }
    
    $0 = basename($0);  # shorter messages
    $| = 1;
    
    binmode(DATA, ":utf8");
    
    # give a full stack dump on any untrapped exceptions
    local $SIG{__DIE__} = sub {
        confess "Uncaught exception: @_" unless $^S;
    };
    
    # now promote run-time warnings into stackdumped exceptions
    #   *unless* we're in an try block, in which 
    #   case just generate a clucking stackdump instead
    local $SIG{__WARN__} = sub {
        if ($^S) { cluck   "Trapped warning: @_" } 
        else     { confess "Deadly warning: @_"  }
    };
    
    while (<>)  {
        chomp;
        $_ = NFD($_);
        ...
    } continue {
        say NFC($_);
    }
    
    __END__

    😱 𝕾 𝖀 𝕸 𝕸 𝕬 𝕽 𝖄 😱

    I don’t know how much more “default Unicode in 🐪” you can get than what I’ve written. Well, yes I do: you should be using Unicode::Collate and Unicode::LineBreak, too. And probably more.

    As you see, there are far too many Unicode things that you really do have to worry about for there to ever exist any such thing as “default to Unicode”.

    What you’re going to discover, just as we did back in 🐪 5.8, that it is simply impossible to impose all these things on code that hasn’t been designed right from the beginning to account for them. Your well-meaning selfishness just broke the entire world.

    And even once you do, there are still critical issues that require a great deal of thought to get right. There is no switch you can flip. Nothing but brain, and I mean real brain, will suffice here. There’s a heck of a lot of stuff you have to learn. Modulo the retreat to the manual typewriter, you simply cannot hope to sneak by in ignorance. This is the 21ˢᵗ century, and you cannot wish Unicode away by willful ignorance.

    You have to learn it. Period. It will never be so easy that “everything just works”, because that will guarantee that a lot of things don’t work — which invalidates the assumption that there can ever be a way to “make it all work”.

    You may be able to get a few reasonable defaults for a very few and very limited operations, but not without thinking about things a whole lot more than I think you have.

    As just one example, canonical ordering is going to cause some real headaches. 😭"\x{F5}" ‘õ’, "o\x{303}" ‘õ’, "o\x{303}\x{304}" ‘ȭ’, and "o\x{304}\x{303}" ‘ō̃’ should all match ‘õ’, but how in the world are you going to do that? This is harder than it looks, but it’s something you need to account for. 💣

    If there’s one thing I know about Perl, it is what its Unicode bits do and do not do, and this thing I promise you: “ ̲ᴛ̲ʜ̲ᴇ̲ʀ̲ᴇ̲ ̲ɪ̲s̲ ̲ɴ̲ᴏ̲ ̲U̲ɴ̲ɪ̲ᴄ̲ᴏ̲ᴅ̲ᴇ̲ ̲ᴍ̲ᴀ̲ɢ̲ɪ̲ᴄ̲ ̲ʙ̲ᴜ̲ʟ̲ʟ̲ᴇ̲ᴛ̲ ̲ ” 😞

    You cannot just change some defaults and get smooth sailing. It’s true that I run 🐪 with PERL_UNICODE set to "SA", but that’s all, and even that is mostly for command-line stuff. For real work, I go through all the many steps outlined above, and I do it very, ** very** carefully.


    😈 ¡ƨdləɥ ƨᴉɥʇ ədoɥ puɐ ʻλɐp əɔᴉu ɐ əʌɐɥ ʻʞɔnl poo⅁ 😈

Answer 2 (score 95)

There are two stages to processing Unicode text. The first is “how can I input it and output it without losing information”. The second is “how do I treat text according to local language conventions”.

tchrist’s post covers both, but the second part is where 99% of the text in his post comes from. Most programs don’t even handle I/O correctly, so it’s important to understand that before you even begin to worry about normalization and collation.

This post aims to solve that first problem

When you read data into Perl, it doesn’t care what encoding it is. It allocates some memory and stashes the bytes away there. If you say print $str, it just blits those bytes out to your terminal, which is probably set to assume everything that is written to it is UTF-8, and your text shows up.

Marvelous.

Except, it’s not. If you try to treat the data as text, you’ll see that Something Bad is happening. You need go no further than length to see that what Perl thinks about your string and what you think about your string disagree. Write a one-liner like: perl -E 'while(&lt;&gt;){ chomp; say length }' and type in 文字化け and you get 12… not the correct answer, 4.

That’s because Perl assumes your string is not text. You have to tell it that it’s text before it will give you the right answer.

That’s easy enough; the Encode module has the functions to do that. The generic entry point is Encode::decode (or use Encode qw(decode), of course). That function takes some string from the outside world (what we’ll call “octets”, a fancy of way of saying “8-bit bytes”), and turns it into some text that Perl will understand. The first argument is a character encoding name, like “UTF-8” or “ASCII” or “EUC-JP”. The second argument is the string. The return value is the Perl scalar containing the text.

(There is also Encode::decode_utf8, which assumes UTF-8 for the encoding.)

If we rewrite our one-liner:

perl -MEncode=decode -E 'while(<>){ chomp; say length decode("UTF-8", $_) }'

We type in 文字化け and get “4” as the result. Success.

That, right there, is the solution to 99% of Unicode problems in Perl.

The key is, whenever any text comes into your program, you must decode it. The Internet cannot transmit characters. Files cannot store characters. There are no characters in your database. There are only octets, and you can’t treat octets as characters in Perl. You must decode the encoded octets into Perl characters with the Encode module.

The other half of the problem is getting data out of your program. That’s easy to; you just say use Encode qw(encode), decide what the encoding your data will be in (UTF-8 to terminals that understand UTF-8, UTF-16 for files on Windows, etc.), and then output the result of encode($encoding, $data) instead of just outputting $data.

This operation converts Perl’s characters, which is what your program operates on, to octets that can be used by the outside world. It would be a lot easier if we could just send characters over the Internet or to our terminals, but we can’t: octets only. So we have to convert characters to octets, otherwise the results are undefined.

To summarize: encode all outputs and decode all inputs.

Now we’ll talk about three issues that make this a little challenging. The first is libraries. Do they handle text correctly? The answer is… they try. If you download a web page, LWP will give you your result back as text. If you call the right method on the result, that is (and that happens to be decoded_content, not content, which is just the octet stream that it got from the server.) Database drivers can be flaky; if you use DBD::SQLite with just Perl, it will work out, but if some other tool has put text stored as some encoding other than UTF-8 in your database… well… it’s not going to be handled correctly until you write code to handle it correctly.

Outputting data is usually easier, but if you see “wide character in print”, then you know you’re messing up the encoding somewhere. That warning means “hey, you’re trying to leak Perl characters to the outside world and that doesn’t make any sense”. Your program appears to work (because the other end usually handles the raw Perl characters correctly), but it is very broken and could stop working at any moment. Fix it with an explicit Encode::encode!

The second problem is UTF-8 encoded source code. Unless you say use utf8 at the top of each file, Perl will not assume that your source code is UTF-8. This means that each time you say something like my $var = 'ほげ', you’re injecting garbage into your program that will totally break everything horribly. You don’t have to “use utf8”, but if you don’t, you must not use any non-ASCII characters in your program.

The third problem is how Perl handles The Past. A long time ago, there was no such thing as Unicode, and Perl assumed that everything was Latin-1 text or binary. So when data comes into your program and you start treating it as text, Perl treats each octet as a Latin-1 character. That’s why, when we asked for the length of “文字化け”, we got 12. Perl assumed that we were operating on the Latin-1 string “æå­åã” (which is 12 characters, some of which are non-printing).

This is called an “implicit upgrade”, and it’s a perfectly reasonable thing to do, but it’s not what you want if your text is not Latin-1. That’s why it’s critical to explicitly decode input: if you don’t do it, Perl will, and it might do it wrong.

People run into trouble where half their data is a proper character string, and some is still binary. Perl will interpret the part that’s still binary as though it’s Latin-1 text and then combine it with the correct character data. This will make it look like handling your characters correctly broke your program, but in reality, you just haven’t fixed it enough.

Here’s an example: you have a program that reads a UTF-8-encoded text file, you tack on a Unicode PILE OF POO to each line, and you print it out. You write it like:

while(<>){
    chomp;
    say "$_ 💩";
}

And then run on some UTF-8 encoded data, like:

perl poo.pl input-data.txt

It prints the UTF-8 data with a poo at the end of each line. Perfect, my program works!

But nope, you’re just doing binary concatenation. You’re reading octets from the file, removing a \n with chomp, and then tacking on the bytes in the UTF-8 representation of the PILE OF POO character. When you revise your program to decode the data from the file and encode the output, you’ll notice that you get garbage (“ð©”) instead of the poo. This will lead you to believe that decoding the input file is the wrong thing to do. It’s not.

The problem is that the poo is being implicitly upgraded as latin-1. If you use utf8 to make the literal text instead of binary, then it will work again!

(That’s the number one problem I see when helping people with Unicode. They did part right and that broke their program. That’s what’s sad about undefined results: you can have a working program for a long time, but when you start to repair it, it breaks. Don’t worry; if you are adding encode/decode statements to your program and it breaks, it just means you have more work to do. Next time, when you design with Unicode in mind from the beginning, it will be much easier!)

That’s really all you need to know about Perl and Unicode. If you tell Perl what your data is, it has the best Unicode support among all popular programming languages. If you assume it will magically know what sort of text you are feeding it, though, then you’re going to trash your data irrevocably. Just because your program works today on your UTF-8 terminal doesn’t mean it will work tomorrow on a UTF-16 encoded file. So make it safe now, and save yourself the headache of trashing your users’ data!

The easy part of handling Unicode is encoding output and decoding input. The hard part is finding all your input and output, and determining which encoding it is. But that’s why you get the big bucks :)

Answer 3 (score 48)

We’re all in agreement that it is a difficult problem for many reasons, but that’s precisely the reason to try to make it easier on everybody.

There is a recent module on CPAN, utf8::all, that attempts to “turn on Unicode. All of it”.

As has been pointed out, you can’t magically make the entire system (outside programs, external web requests, etc.) use Unicode as well, but we can work together to make sensible tools that make doing common problems easier. That’s the reason that we’re programmers.

If utf8::all doesn’t do something you think it should, let’s improve it to make it better. Or let’s make additional tools that together can suit people’s varying needs as well as possible.

`

91: How can I remove text within parentheses with a regex? (score 97531 in 2009)

Question

I’m trying to handle a bunch of files, and I need to alter then to remove extraneous information in the filenames; notably, I’m trying to remove text inside parentheses. For example:

filename = "Example_file_(extra_descriptor).ext"

and I want to regex a whole bunch of files where the parenthetical expression might be in the middle or at the end, and of variable length.

What would the regex look like? Perl or Python syntax would be preferred.

Answer accepted (score 103)

s/\([^)]*\)//

So in Python, you’d do:

re.sub(r'\([^)]*\)', '', filename)

Answer 2 (score 36)

The pattern that matches substrings in parentheses having no other ( and ) characters in between (like (xyz 123) in Text (abc(xyz 123)) is

\([^()]*\)

Details:

  • \( - an opening round bracket (note that in POSIX BRE, ( should be used, see sed example below)
  • [^()]* - zero or more (due to the * Kleene star quantifier) characters other than those defined in the negated character class/POSIX bracket expression, that is, any chars other than ( and )
  • \) - a closing round bracket (no escaping in POSIX BRE allowed)

Removing code snippets:

  • JavaScript: string.replace(/\([^()]*\)/g, '')
  • PHP: preg_replace('~\([^()]*\)~', '', $string)
  • Perl: $s =~ s/\([^()]*\)//g
  • Python: re.sub(r'\([^()]*\)', '', s)
  • C#: Regex.Replace(str, @"\([^()]*\)", string.Empty)
  • VB.NET: Regex.Replace(str, "\([^()]*\)", "")
  • Java: s.replaceAll("\\([^()]*\\)", "")
  • Ruby: s.gsub(/\([^()]*\)/, '')
  • R: gsub("\\([^()]*\\)", "", x)
  • Lua: string.gsub(s, "%([^()]*%)", "")
  • Bash/sed: sed 's/([^()]*)//g'
  • Tcl: regsub -all {\([^()]*\)} $s "" result
  • C++ std::regex: std::regex_replace(s, std::regex(R"(\([^()]*\))"), "")
  • Objective-C:
    NSRegularExpression regex = [NSRegularExpression regularExpressionWithPattern:@"\([^()]*\)" options:NSRegularExpressionCaseInsensitive error:&error]; NSString modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:@""];
  • Swift: s.replacingOccurrences(of: "\\([^()]*\\)", with: "", options: [.regularExpression])

Answer 3 (score 21)

I would use:

\([^)]*\)

92: How can I store the result of a system command in a Perl variable? (score 97221 in 2013)

Question

$ cat test.pl
my $pid = 5892;
my $not = system("top -H -p $pid -n 1 | grep myprocess | wc -l");
print "not = $not\n";
$ perl test.pl
11
not = 0
$

I want to capture the result i.e. 11 into a variable. How can I do that?

Answer accepted (score 50)

From Perlfaq8:

You’re confusing the purpose of system() and backticks (). system() runs a command and returns exit status information (as a 16 bit value: the low 7 bits are the signal the process died from, if any, and the high 8 bits are the actual exit value). Backticks () run a command and return what it sent to STDOUT.

$exit_status   = system("mail-users");
    $output_string = `ls`;

There are many ways to execute external commands from Perl. The most commons with their meanings are:

  • system() : you want to execute a command and don’t want to capture its output
  • exec: you don’t want to return to the calling perl script
  • backticks : you want to capture the output of the command
  • open: you want to pipe the command (as input or output) to your script

Also see How can I capture STDERR from an external command?

Answer 2 (score 11)

The easiest way is to use the ```` feature in Perl. This will execute what is inside and return what was printed to stdout:

 my $pid = 5892;
 my $var = `top -H -p $pid -n 1 | grep myprocess | wc -l`;
 print "not = $var\n";

This should do it.

Answer 3 (score 7)

Try using qx{command} rather than backticks. To me, it’s a bit better because: you can do SQL with it and not worry about escaping quotes and such. Depending on the editor and screen, my old eyes tend to miss the tiny back ticks, and it shouldn’t ever have an issue with being overloaded like using angle brackets versus glob.

93: How can I compile my Perl script so it can be executed on systems without perl installed? (score 96840 in 2009)

Question

I have a .pl file and I want to execute that file in any system even though perl is not installed. How can i achieve it?

Can any one let me know with some good examples to do that?

Answer 2 (score 43)

pp can create an executable that includes perl and your script (and any module dependencies), but it will be specific to your architecture, so you couldn’t run it on both Windows and linux for instance.

From its doc:

To make a stand-alone executable, suitable for running on a machine that doesn’t have perl installed:

   % pp -o packed.exe source.pl        # makes packed.exe
   # Now, deploy 'packed.exe' to target machine...
   $ packed.exe                        # run it

(% and $ there are command prompts on different machines).

Answer 3 (score 29)

  1. Install PAR::Packer. Example for *nix:

    sudo cpan -i PAR::Packer

    For Strawberry Perl for Windows or for ActivePerl and MSVC installed:

    cpan -i PAR::Packer
  2. Pack it with pp. It will create an executable named “example” or “example.exe” on Windows.

    pp -o example example.pl

This would work only on the OS where it was built.

P.S. It is really hard to find a Unix clone without Perl. Did you mean Windows?

94: How can I delete a newline if it is the last character in a file? (score 96750 in 2017)

Question

I have some files that I’d like to delete the last newline if it is the last character in a file. od -c shows me that the command I run does write the file with a trailing new line:

0013600   n   t  >  \n

I’ve tried a few tricks with sed but the best I could think of isn’t doing the trick:

sed -e '$s/\(.*\)\n$/\1/' abc

Any ideas how to do this?

Answer accepted (score 212)

perl -pe 'chomp if eof' filename >filename2

or, to edit the file in place:

perl -pi -e 'chomp if eof' filename

[Editor's note: `-pi -e` was originally `-pie`, but, as noted by several commenters and explained by @hvd, the latter doesn't work.]

This was described as a ‘perl blasphemy’ on the awk website I saw.

But, in a test, it worked.

Answer 2 (score 54)

You can take advantage of the fact that shell command substitutions remove trailing newline characters:

Simple form that works in bash, ksh, zsh:

printf %s "$(< in.txt)" > out.txt

Portable (POSIX-compliant) alternative (slightly less efficient):

printf %s "$(cat in.txt)" > out.txt

Note:

  • If in.txt ends with multiple newline characters, the command substitution removes all of them - thanks, @Sparhawk. (It doesn’t remove whitespace characters other than trailing newlines.)
  • Since this approach reads the entire input file into memory, it is only advisable for smaller files.
  • printf %s ensures that no newline is appended to the output (it is the POSIX-compliant alternative to the nonstandard echo -n; see http://pubs.opengroup.org/onlinepubs/009696799/utilities/echo.html and https://unix.stackexchange.com/a/65819)

A guide to the other answers:

  • If Perl is available, go for the accepted answer - it is simple and memory-efficient (doesn’t read the whole input file at once).

  • Otherwise, consider ghostdog74’s Awk answer - it’s obscure, but also memory-efficient; a more readable equivalent (POSIX-compliant) is:

    • awk 'NR &gt; 1 { print prev } { prev=$0 } END { ORS=""; print }' in.txt
    • Printing is delayed by one line so that the final line can be handled in the END block, where it is printed without a trailing \n due to setting the output-record separator (OFS) to an empty string.
  • If you want a verbose, but fast and robust solution that truly edits in-place (as opposed to creating a temp. file that then replaces the original), consider jrockway’s Perl script.

Answer 3 (score 45)

You can do this with head from GNU coreutils, it supports arguments that are relative to the end of the file. So to leave off the last byte use:

head -c -1

To test for an ending newline you can use tail and wc. The following example saves the result to a temporary file and subsequently overwrites the original:

if [[ $(tail -c1 file | wc -l) == 1 ]]; then
  head -c -1 file > file.tmp
  mv file.tmp file
fi

You could also use sponge from moreutils to do “in-place” editing:

[[ $(tail -c1 file | wc -l) == 1 ]] && head -c -1 file | sponge file

You can also make a general reusable function by stuffing this in your .bashrc file:

# Example:  remove-last-newline < multiline.txt
function remove-last-newline(){
    local file=$(mktemp)
    cat > $file
    if [[ $(tail -c1 $file | wc -l) == 1 ]]; then
        head -c -1 $file > $file.tmp
        mv $file.tmp $file
    fi
    cat $file
}
Update

As noted by KarlWilbur in the comments and used in Sorentar’s answer, truncate --size=-1 can replace head -c-1 and supports in-place editing.

95: How to compile a Perl script to a Windows executable with Strawberry Perl? (score 96303 in 2016)

Question

What would be the easiest way to compile a simple Perl script to an executable under Windows with Strawberry Perl (as I understand it’s possible and free)?

In the past I’ve used ActiveState compiler and perl2exe and was simple enough. However, now after a few computer changes and OS updates I’ve lost the licenses and I’d like to find a better/permanent solution.

Answer accepted (score 27)

Install PAR::Packer from CPAN (it is free) and use pp utility.

Answer 2 (score 16)

  :: short answer :
  :: perl -MCPAN -e "install PAR::Packer" 
  pp -o <<DesiredExeName>>.exe <<MyFancyPerlScript>> 

  :: long answer - create the following cmd , adjust vars to your taste ...
  :: next_line_is_templatized
  :: file:compile-morphus.1.2.3.dev.ysg.cmd v1.0.0
  :: disable the echo
  @echo off

  :: this is part of the name of the file - not used
  set _Action=run

  :: the name of the Product next_line_is_templatized
  set _ProductName=morphus

  :: the version of the current Product next_line_is_templatized
  set _ProductVersion=1.2.3

  :: could be dev , test , dev , prod next_line_is_templatized
  set _ProductType=dev

  :: who owns this Product / environment next_line_is_templatized
  set _ProductOwner=ysg

  :: identifies an instance of the tool ( new instance for this version could be created by simply changing the owner )   
  set _EnvironmentName=%_ProductName%.%_ProductVersion%.%_ProductType%.%_ProductOwner%

  :: go the run dir
  cd %~dp0

  :: do 4 times going up
  for /L %%i in (1,1,5) do pushd ..

  :: The BaseDir is 4 dirs up than the run dir
  set _ProductBaseDir=%CD%
  :: debug echo BEFORE _ProductBaseDir is %_ProductBaseDir%
  :: remove the trailing \

  IF %_ProductBaseDir:~-1%==\ SET _ProductBaseDir=%_ProductBaseDir:~0,-1%
  :: debug echo AFTER _ProductBaseDir is %_ProductBaseDir%
  :: debug pause


  :: The version directory of the Product 
  set _ProductVersionDir=%_ProductBaseDir%\%_ProductName%\%_EnvironmentName%

  :: the dir under which all the perl scripts are placed
  set _ProductVersionPerlDir=%_ProductVersionDir%\sfw\perl

  :: The Perl script performing all the tasks
  set _PerlScript=%_ProductVersionPerlDir%\%_Action%_%_ProductName%.pl

  :: where the log events are stored 
  set _RunLog=%_ProductVersionDir%\data\log\compile-%_ProductName%.cmd.log

  :: define a favorite editor 
  set _MyEditor=textpad

  ECHO Check the variables 
  set _
  :: debug PAUSE
  :: truncate the run log
  echo date is %date% time is %time% > %_RunLog%


  :: uncomment this to debug all the vars 
  :: debug set  >> %_RunLog%

  :: for each perl pm and or pl file to check syntax and with output to logs
  for /f %%i in ('dir %_ProductVersionPerlDir%\*.pl /s /b /a-d' ) do echo %%i >> %_RunLog%&perl -wc %%i | tee -a  %_RunLog% 2>&1


  :: for each perl pm and or pl file to check syntax and with output to logs
  for /f %%i in ('dir %_ProductVersionPerlDir%\*.pm /s /b /a-d' ) do echo %%i >> %_RunLog%&perl -wc %%i | tee -a  %_RunLog% 2>&1

  :: now open the run log
  cmd /c start /max %_MyEditor% %_RunLog%


  :: this is the call without debugging
  :: old 
  echo CFPoint1  OK The run cmd script %0 is executed >> %_RunLog%
  echo CFPoint2  OK  compile the exe file  STDOUT and STDERR  to a single _RunLog file >> %_RunLog%
  cd %_ProductVersionPerlDir%

  pp -o %_Action%_%_ProductName%.exe %_PerlScript% | tee -a %_RunLog% 2>&1 

  :: open the run log
  cmd /c start /max %_MyEditor% %_RunLog%

  :: uncomment this line to wait for 5 seconds
  :: ping localhost -n 5

  :: uncomment this line to see what is happening 
  :: PAUSE

  ::
  :::::::
  :: Purpose: 
  :: To compile every *.pl file into *.exe file under a folder 
  :::::::
  :: Requirements : 
  :: perl , pp , win gnu utils tee 
  :: perl -MCPAN -e "install PAR::Packer" 
  :: text editor supporting <<textEditor>> <<FileNameToOpen>> cmd call syntax
  :::::::
  :: VersionHistory
  :: 1.0.0 --- 2012-06-23 12:05:45 --- ysg --- Initial creation from run_morphus.cmd
  :::::::
  :: eof file:compile-morphus.1.2.3.dev.ysg.cmd v1.0.0

Answer 3 (score 5)

There are three packagers, and two compilers:

free packager: PAR
commercial packagers: perl2exe, perlapp
compilers: B::C, B::CC

http://search.cpan.org/dist/B-C/perlcompile.pod

(Note: perlfaq3 is still wrong)

For strawberry you need perl-5.16 and B-C from git master (1.43), as B-C-1.42 does not support 5.16.

96: How to use a variable in the replacement side of the Perl substitution operator? (score 96218 in 2008)

Question

I would like to do the following:

$find="start (.*) end";
$replace="foo \1 bar";

$var = "start middle end";
$var =~ s/$find/$replace/;

I would expect $var to contain “foo middle bar”, but it does not work. Neither does:

$replace='foo \1 bar';

Somehow I am missing something regarding the escaping.


I fixed the missing ‘s’

Answer accepted (score 76)

On the replacement side, you must use $1, not \1.

And you can only do what you want by making replace an evalable expression that gives the result you want and telling s/// to eval it with the /ee modifier like so:

$find="start (.*) end";
$replace='"foo $1 bar"';

$var = "start middle end";
$var =~ s/$find/$replace/ee;

print "var: $var\n";

To see why the "" and double /e are needed, see the effect of the double eval here:

$ perl
$foo = "middle";
$replace='"foo $foo bar"';
print eval('$replace'), "\n";
print eval(eval('$replace')), "\n";
__END__
"foo $foo bar"
foo middle bar

(Though as ikegami notes, a single /e or the first /e of a double e isn’t really an eval(); rather, it tells the compiler that the substitution is code to compile, not a string. Nonetheless, eval(eval(...)) still demonstrates why you need to do what you need to do to get /ee to work as desired.)

Answer 2 (score 12)

Deparse tells us this is what is being executed:

$find = 'start (.*) end';
$replace = "foo \cA bar";
$var = 'start middle end';
$var =~ s/$find/$replace/;

However,

 /$find/foo \1 bar/

Is interpreted as :

$var =~ s/$find/foo $1 bar/;

Unfortunately it appears there is no easy way to do this.

You can do it with a string eval, but thats dangerous.

The most sane solution that works for me was this:

$find = "start (.*) end"; 
$replace = 'foo \1 bar';

$var = "start middle end"; 

sub repl { 
    my $find = shift; 
    my $replace = shift; 
    my $var = shift;

    # Capture first 
    my @items = ( $var =~ $find ); 
    $var =~ s/$find/$replace/; 
    for( reverse 0 .. $#items ){ 
        my $n = $_ + 1; 
        #  Many More Rules can go here, ie: \g matchers  and \{ } 
        $var =~ s/\\$n/${items[$_]}/g ;
        $var =~ s/\$$n/${items[$_]}/g ;
    }
    return $var; 
}

print repl $find, $replace, $var; 
A rebuttal against the ee technique:

As I said in my answer, I avoid evals for a reason.

$find="start (.*) end";
$replace='do{ print "I am a dirty little hacker" while 1; "foo $1 bar" }';

$var = "start middle end";
$var =~ s/$find/$replace/ee;

print "var: $var\n";

this code does exactly what you think it does.

If your substitution string is in a web application, you just opened the door to arbitrary code execution.

Good Job.

Also, it WON’T work with taints turned on for this very reason.

$find="start (.*) end";
$replace='"' . $ARGV[0] . '"';

$var = "start middle end";
$var =~ s/$find/$replace/ee;

print "var: $var\n"


$ perl /tmp/re.pl  'foo $1 bar'
var: foo middle bar
$ perl -T /tmp/re.pl 'foo $1 bar' 
Insecure dependency in eval while running with -T switch at /tmp/re.pl line 10.

However, the more careful technique is sane, safe, secure, and doesn’t fail taint. ( Be assured tho, the string it emits is still tainted, so you don’t lose any security. )

Answer 3 (score 7)

# perl -de 0
$match="hi(.*)"
$sub='$1'
$res="hi1234"
$res =~ s/$match/$sub/gee
p $res
  1234

Be careful, though. This causes two layers of eval to occur, one for each e at the end of the regex:

  1. $sub –> $1
  2. $1 –> final value, in the example, 1234

97: What exactly does Perl’s “bless” do? (score 96108 in 2008)

Question

I understand one uses the “bless” keyword in Perl inside a class’s “new” method:

sub new {
    my $self = bless { };
    return $self;
}    

But what exactly is “bless” doing to that hash reference ?

Answer accepted (score 138)

In general, bless associates an object with a class.

package MyClass;
my $object = { };
bless $object, "MyClass";

Now when you invoke a method on $object, Perl know which package to search for the method.

If the second argument is omitted, as in your example, the current package/class is used.

For the sake of clarity, your example might be written as follows:

sub new { 
  my $class = shift; 
  my $self = { }; 
  bless $self, $class; 
} 

EDIT: See kixx’s good answer for a little more detail.

Answer 2 (score 77)

bless associates a reference with a package.

It doesn’t matter what the reference is to, it can be to a hash (most common case), to an array (not so common), to a scalar (usually this indicates an inside-out object), to a regular expression, subroutine or TYPEGLOB (see the book Object Oriented Perl: A Comprehensive Guide to Concepts and Programming Techniques by Damian Conway for useful examples) or even a reference to a file or directory handle (least common case).

The effect bless-ing has is that it allows you to apply special syntax to the blessed reference.

For example, if a blessed reference is stored in $obj (associated by bless with package “Class”), then $obj-&gt;foo(@args) will call a subroutine foo and pass as first argument the reference $obj followed by the rest of the arguments (@args). The subroutine should be defined in package “Class”. If there is no subroutine foo in package “Class”, a list of other packages (taken form the array @ISA in the package “Class”) will be searched and the first subroutine foo found will be called.

Answer 3 (score 9)

Short version: it’s marking that hash as attached to the current package namespace (so that that package provides its class implementation).

98: How can I ssh inside a Perl script? (score 94512 in 2015)

Question

I want to SSH to a server and execute a simple command like “id” and get the output of it and store it to a file on my primary server. I do not have privileges to install Net::SSH which would make my task very easy. Please provide me a solution for this. I tried using back-ticks but I am not able to store the output on the machine from which my script runs.

Answer 2 (score 14)

The best way to run commands remotely using SSH is

$ ssh user@host "command" > output.file

You can use this either in bash or in perl. However, If you want to use perl you can install the perl modules in your local directory path as suggested by brian in his comment or from Perl FAQ at “How do I keep my own module/library directory?”. Instead of using Net::SSH I would suggest to use Net::SSH::Perl with the below example.

#!/usr/bin/perl -w
use strict;
use lib qw("/path/to/module/");

use Net::SSH::Perl;

my $hostname = "hostname";
my $username = "username";
my $password = "password";

my $cmd = shift;

my $ssh = Net::SSH::Perl->new("$hostname", debug=>0);
$ssh->login("$username","$password");
my ($stdout,$stderr,$exit) = $ssh->cmd("$cmd");
print $stdout;

Answer 3 (score 8)

You can always install modules locally, and that is the method you should look into; however, you should be able to get away with

#!/usr/bin/perl

use strict;
use warnings;

my $id = qx/ssh remotehost id 2>&1/;

chomp $id;

print "id is [$id]\n"

99: In Perl, what is the difference between a .pm (Perl module) and .pl (Perl script) file? (score 94271 in 2012)

Question

What is the Difference between .pm (Perl module) and .pl (Perl script) file?

Please also tell me why we return 1 from file. If return 2 or anything else, it’s not generating any error, so why do we return 1 from Perl module?

Answer 2 (score 74)

At the very core, the file extension you use makes no difference as to how perl interprets those files.

However, putting modules in .pm files following a certain directory structure that follows the package name provides a convenience. So, if you have a module Example::Plot::FourD and you put it in a directory Example/Plot/FourD.pm in a path in your @INC, then use and require will do the right thing when given the package name as in use Example::Plot::FourD.

The file must return true as the last statement to indicate successful execution of any initialization code, so it’s customary to end such a file with 1; unless you’re sure it’ll return true otherwise. But it’s better just to put the 1;, in case you add more statements.

If EXPR is a bareword, the require assumes a “.pm” extension and replaces “::” with “/” in the filename for you, to make it easy to load standard modules. This form of loading of modules does not risk altering your namespace.

All use does is to figure out the filename from the package name provided, require it in a BEGIN block and invoke import on the package. There is nothing preventing you from not using use but taking those steps manually.

For example, below I put the Example::Plot::FourD package in a file called t.pl, loaded it in a script in file s.pl.

C:\Temp> cat t.pl
package Example::Plot::FourD;

use strict; use warnings;

sub new { bless {} => shift }

sub something { print "something\n" }

"Example::Plot::FourD"

C:\Temp> cat s.pl
#!/usr/bin/perl
use strict; use warnings;

BEGIN {
    require 't.pl';
}

my $p = Example::Plot::FourD->new;
$p->something;


C:\Temp> s
something

This example shows that module files do not have to end in 1, any true value will do.

Answer 3 (score -2)

A .pl is a single script.

In .pm (Perl Module) you have functions that you can use from other Perl scripts:

A Perl module is a self-contained piece of Perl code that can be used by a Perl program or by other Perl modules. It is conceptually similar to a C link library, or a C++ class.

100: Perl - Multiple condition if statement without duplicating code? (score 94162 in )

Question

This is a Perl program, run using a terminal (Windows Command Line). I am trying to create an “if this and this is true, or this and this is true” if statement using the same block of code for both conditions without having to repeat the code.

if ($name eq "tom" and $password eq "123!") elsif ($name eq "frank" and $password eq "321!") {

print "You have gained access.";

} else {

print "Access denied!";

}

Answer accepted (score 20)

Simple:

if ( $name eq 'tom' && $password eq '123!'
    || $name eq 'frank' && $password eq '321!'
) {

(use the high-precedence &amp;&amp; and || in expressions, reserving and and or for flow control, to avoid common precedence errors)

Better:

my %password = (
    'tom' => '123!',
    'frank' => '321!',
);

if ( exists $password{$name} && $password eq $password{$name} ) {

Answer 2 (score 3)

I don’t recommend storing passwords in a script, but this is a way to what you indicate:

use 5.010;
my %user_table = ( tom => '123!', frank => '321!' );

say ( $user_table{ $name } eq $password ? 'You have gained access.'
    :                                     'Access denied!'
    );

Any time you want to enforce an association like this, it’s a good idea to think of a table, and the most common form of table in perl is the hash.

Answer 3 (score 1)

if (   ($name eq "tom" and $password eq "123!")
    or ($name eq "frank" and $password eq "321!")) {
    print "You have gained access.";
}
else {
    print "Access denied!";
}
(Other people: I’m pretty sure John Doe here isn’t actually hardcoding passwords; he’s just using it as an example.)