Windows PowerShell Tip of the Week

Here’s a quick tip on working with Windows PowerShell. These are published every week for as long as we can come up with new tips. If you have a tip you’d like us to share or a question about how to do something, let us know.

Find more tips in the Windows PowerShell Tip of the Week archive.

Filtering Collections With Regular Expressions

If you were to sit down and make a list of the greatest technological innovations of the past 100 years, the odds are pretty good that the “lowly” wildcard character wouldn’t appear anywhere on that list. There’s no doubt that people tend to take the wildcard character for granted; there’s also no doubt that the wildcard character can be very, very useful, maybe even a lifesaver.

For example, suppose you have several hundred files in a folder, and suppose you need to get back a list of all the .PS1 files in that folder. Are you going to have to retrieve a collection of all those files and then manually look through that collection file-by-file, hoping to spot the files that have a .PS1 file extension? Of course not; instead, you can simply use the following command to filter out everything except the .PS1 files:

dir *.ps1

Or maybe you need a list of all the files that start with the letter q? No problem; wildcards can help you there, too:

dir q*.*

The “lowly” wildcard character? We think not!

One of the cool things about Windows PowerShell is the extent to which PowerShell has embraced the use of the wildcard character. It probably comes as no surprise that the Get-ChildItem cmdlet allows you to use wildcards; after all, Get-ChildItem performs many of the same chores as the venerable dir command:

Get-ChildItem *.ps1

However, plenty of other cmdlets support wildcards as well. What’s that? Which cmdlets support wildcard characters? To be honest, we don’t know, at least not off the top of our heads. But we do know how you can find out if a given cmdlet supports the use if wildcards: check the help documentation. For example, suppose you want to know if Clear-Content supports the use of wildcards. Well, just type Get-Help Clear-Content –full and then look at the table that accompanies each cmdlet parameter:

-path <string[]>
Specifies the paths to the items from which content is deleted. Wildcards are permitted. The paths must be paths to items, not to
containers. For example, you must specify a path to one more files, not a path to a directory. Wildcards are permitted. This parameter is
required, but the parameter name ("-Path") is optional.

        Required?                    true
        Position?                    1
        Default value                N/A - The path must be specified
        Accept pipeline input?       true (ByPropertyName)
        Accept wildcard characters?  True

Ah, it ooks like the –path parameter does support wildcards. That means that you can erase the contents of all the .TXT files in a folder by using a command similar to this:

Clear-Content C:\Scripts\*.txt

Pretty cool, huh?

Of course, wildcards do have their limitations. For example, suppose you have a bunch of files in the folder C:\Scripts and some of these files have numbers in the file name (e.g., Test_002.txt). Let’s further suppose that these numbers actually mean something, and it would be very handy if you could retrieve a list of all the files that have a number somewhere in the file name. Can you retrieve that list using wildcard characters? Well, no, not really. Granted, you could use a command like this one to retrieve any file that has the number 2 in the file name:

Get-Childitem C:\Scripts\*2*

That works, but it’s not particularly convenient; after all, you’ll now have to repeat the process for all the other digits between 0 and 9. Oh, and what if you were only interested in files that had the numbers 000 through 099 somewhere in the file name? Or what if you didn’t care about numbers at all, but were instead interested in file names that started with the letters L, M, N, O, or P? And what if – well, you get the idea: in those cases, wildcards are only of minimal value.

So is that any reason to fret? Of course not; after all, the Windows PowerShell team thought of everything, especially when it comes to the Where-Object cmdlet.

Regular Expressions and the Where-Object Cmdlet

We don’t have time to discuss regular expressions in any detail today; if you’re new to the subject you might take a look at our TechNet Magazine article on the subject. (The article is written for VBScript scripters, but most – if not all – of the actual regular expression syntax applies to PowerShell as well.) Suffice to say that regular expressions let you do all sorts of things that can’t be done using plain old wildcard characters. For example, you said something about a list of all the files that have a number somewhere in their file name? This command should do the trick:

Get-ChildItem C:\Scripts | Where-Object {$_.Name -match "\d"}

Good question: what are we doing with this command? Well, to begin with, we’re using Get-ChildItem to return a collection of all the files found in the folder C:\Scripts. And you’re right, we don’t want all the files found in C:\Scripts, do we? Instead, we only want those files that have a number somewhere in the file name. That’s why we pipe the collection to the Where-Object cmdlet and let Where-Object do our filtering for us.

And how does Where-Object know which files have a number somewhere in the file name and which ones don’t? Well, Where-Object does a regular expression search on each file name; we know that it’s doing a regular expression search because it’s using the –match operator. (Which, by the way, does a case-insensitive search. To do a case-sensitive search – that is, a search where an uppercase A is viewed as a different character than a lowercase a – use the –cmatch operator instead.)

We also know two other things about Where-Object and what it’s doing. First, we’re matching on the file name; we know that because we reference the $_ object (which represents the current object in the pipeline) and the Name property. Second, it’s looking for a number of any kind. How do we know that? Because the regular expression construction \d means, “Find any digit between 0 and 9.”

That’s how we know that.

So is this really going to work? Let’s give it a try and find out:

Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---         2/29/2008   9:20 PM       1059 280.txt
-a---          4/3/2008  11:14 AM       1665 bootmgr2.vbs
-a---         3/14/2008   3:31 PM      35962 ctp2.txt
-a---         3/26/2008   8:46 AM     114156 ctp2_cmdlets.txt
-a---         2/12/2008  11:07 AM        253 DebugMe.ps1
-a---          4/9/2008  10:22 AM        649 fv.ps1
-a---         2/23/2008  11:42 PM      31350 temptxt3.123
-a---         2/23/2008  11:42 PM      43560 temptxt4.34
-a---         4/14/2008   2:24 PM        437 test.ps1
-a---         2/25/2008   1:24 PM      25276 test2.ps1
-a---         2/22/2008   1:28 PM      15618 voter55s_results.txt
-a---         2/22/2008   2:02 PM      43560 votes_round2.txt
-a---         2/22/2008   2:02 PM      31260 votes_round43.txt
-a---         2/18/2008   8:18 PM      15408 words2.txt
-a---         2/18/2008   8:18 PM       8880 words3.txt
-a---         2/18/2008   8:18 PM       6624 words4.txt
-a---         2/18/2008   8:18 PM       3440 words5.txt
-a---         2/18/2008   8:18 PM       6368 words6.txt
-a---         2/18/2008   8:18 PM      16624 words7.txt
-a---         2/18/2008   8:18 PM       6496 words8.txt
-a---         2/18/2008   8:18 PM       2336 words9.txt
-a---         4/17/2008   3:00 PM        186 z.ps1

Take a look at each file in the list: each file name should include a number of some kind somewhere in the name.

That’s pretty useful in and of itself; however, we can get even fancier. If you look closely at the output of our first command, you’ll see that, at times, a number appears in the actual name of the file; at other times the number appears in the file extension. (And sometimes we get a number in both places.) Suppose we didn’t want any files that had numbers in the file extension, suppose we wanted to limit the returned data to files that have numbers n the file name itself. Here’s one way to do that:

Get-ChildItem C:\Scripts | Where-Object {$_.Name -match "\d\.[^\d]"}

We’ve taken the same basic approach here as we did with our first command; the only difference, of course, is that we use a new regular expression. In this case, we’re looking for files where the Name property includes any number (\d) followed by a period (\.) and then not followed by a number; the caret symbol (^, more properly referred to here as the negation operator) means “not.” Thus the syntax [^\d] means, “Anything but a number.”

Yes, it sounds weird, and it looks a little weird, too; that’s typically the case with regular expressions. But, as is also typically the case with regular expressions, it works:

Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---         2/29/2008   9:20 PM       1059 280.txt
-a---          4/3/2008  11:14 AM       1665 bootmgr2.vbs
-a---         3/14/2008   3:31 PM      35962 ctp2.txt
-a---         2/25/2008   1:24 PM      25276 test2.ps1
-a---         2/22/2008   2:02 PM      43560 votes_round2.txt
-a---         2/22/2008   2:02 PM      31260 votes_round43.txt
-a---         2/18/2008   8:18 PM      15408 words2.txt
-a---         2/18/2008   8:18 PM       8880 words3.txt
-a---         2/18/2008   8:18 PM       6624 words4.txt
-a---         2/18/2008   8:18 PM       3440 words5.txt
-a---         2/18/2008   8:18 PM       6368 words6.txt
-a---         2/18/2008   8:18 PM      16624 words7.txt
-a---         2/18/2008   8:18 PM       6496 words8.txt
-a---         2/18/2008   8:18 PM       2336 words9.txt

Let’s try another numeric example. Suppose you have a folder with file names similar to this:

Test_001.txt
Test_011A.txt
Test_037.txt
Test_224.txt
Test_357.txt
Test_661.txt

Let’s say you want to retrieve only those files that are numbered from 000 to 099. In that case, you might want to try this command:

Get-ChildItem C:\Scripts | Where-Object {$_.Name -match "Test_0[0-9][0-9]\."}

Again, we take the same basic approach: we use Get-ChildItem to retrieve all the files, then use Where-Object to pick out only those files numbered 000 to 099. So how do we determine which files are numbered 001 through 099? That’s actually pretty easy: we simply look for the string value Test_0 followed by any digit 0 through 9 ([0-9]) followed by any digit 0 through 9 ([0-9]), followed by a period (\.). That will filter out a file like Test_224.txt; in that case, the string Test_0 file Test_011A.txt will be fail to make our final list of files. Why? Because the numeric value 011 is followed by the letter A rather than a period. In fact, only 2 files will make the cut:

Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---         4/17/2008   3:08 PM          0 Test_001.txt
-a---         4/17/2008   3:08 PM          0 Test_037.txt

Admittedly, there are other ways we could write this regular expression; no doubt there are better ways we could write this regular expression. But that’s not the point; the point is to simply show you how you can employ regular expressions to help you target the exact set of files you want to work with.

Because we mentioned this scenario earlier, let’s try one last command, a command that retrieves all the files that have a file name starting with the letter L, M, N, O, or P:

Get-ChildItem C:\Scripts | Where-Object {$_.Name -match "^[lmnop]"}

As you can seem this is a much simpler regular expression: we’re just asking Where-Object to check the beginning of the string (the caret symbol again; when used outside a square bracket the caret symbol means that we want to match the beginning of the string) for any of the characters enclosed in the square brackets; in other words, any file name starting with L, M, N, O, or P. In turn, we should get back something similar to this:

Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---          2/8/2008   8:47 PM         32 lettercase.txt
-a---         2/18/2008   8:13 PM          0 Matches.txt
-a---          3/4/2008   1:21 PM          0 New Text Document.txt
-a---         2/23/2008  10:13 PM         22 numbers.txt
-a---          4/3/2008   5:55 PM      17875 os_info.vbs
-a---         2/22/2008   1:24 PM      33660 output.txt
-a---         1/14/2008   8:16 AM     229376 pool.mdb
-a---          2/9/2008   9:15 PM        724 presidents.txt
-a---         3/31/2008   8:01 AM        981 progress.htm

Not bad, eh?

Incidentally, Where-Object isn’t the only cmdlet that supports the use of regular expressions; for example, you can also use regular expressions with the Select-String cmdlet, a cmdlet that can search a text file for specified values. For example, suppose you typically format phone numbers so they look like this:

(555)-123-4567

Need to know if a phone number similar to that appears anywhere in the file C:\Scripts\Test.txt? Try this command and see what happens:

Get-Content C:\Scripts\Test.txt | Select-String "\(\d{3}\)-\d{3}-\d{4}"

We’ll see you again next week.