When I speak at conferences, such as Tech•Ed or TechMentor, I get into the habit of making proclamations—general-rule announcements that help users remember key points about things like Windows PowerShell. My latest proclamation is, "If you're parsing a string in Windows PowerShell, you're doing something wrong."
This comes from my philosophy regarding Windows PowerShell™ being an object-oriented shell. If you're doing things like dumping lists of services into a text file and then parsing that text file to see which services are started, you're working too hard. That's a valid approach in a text-based OS such as UNIX, but Windows PowerShell (as well as Windows® itself) lets you use objects in a much more efficient way.
Even your own scripts should produce objects, not formatted text, so that the shell's various formatting, filtering, exporting, and other commands can be used to manipulate the output from your scripts. (In my July 2008 Windows PowerShell column
, you can read more about the concept of custom objects as script output.)
At a recent TechMentor in Orlando, Fla., one of my students reminded me that nearly all rules, especially general rules, have exceptions. "What about finding stuff in an IIS log file? Don't you pretty much have to parse text in that case?"
Well, er, yes. And fortunately, as object-friendly as it may be, Windows PowerShell is no slouch when it comes to parsing strings of text. And now that you mention it, IIS log files, firewall log files, and other text-based logs are perfect examples.
A Real-World Case
I actually had to parse a set of firewall log files for a company I was working for. An employee had been caught viewing some inappropriate Web sites, and as part of the follow-up investigation, the Human Resources department needed a comprehensive list of the Web sites he'd been visiting. While that's a bit tough to pull out of a single day's log file, the folks in HR wanted to go back for weeks—a task I was not eager to do manually.
The company's Dynamic Host Configuration Protocol (DHCP) server indicated that the employee's computer had been using the same IP address (let's say 192.168.17.54 for the purposes of this example) for several months. That, of course, is not unusual since it was a desktop computer that was rarely turned off. And since the firewall log kept a record of source IP addresses, I knew that Windows PowerShell could be of help.
The secret lies in the often-overlooked Select-String command. And you'll also need a working knowledge of regular expressions (which I covered back in the November 2007 installment of Windows PowerShell
The Select-String command will accept a file path full of text files, a regular expression, or a simple string to look for. It will then output each line from each log file that matches the regular expression or simple string. To begin my task, I simply wanted to get every line containing the IP address of the employee's desktop computer. Each log file line contained a date and a time stamp, which is all the folks in the Human Resources department were after.
Here's the command:
select-string -path c:\logs\*.txt -pattern "192.168.17.54"
The –simpleMatch parameter specifies that the pattern I provided is just a simple string, not a regular expression. Figure 1 shows some of the output, which could also be piped to a file. It is important to note that the output includes both the file name and line number that the match was found in, which can be very useful if you want to go back at some point and dig for more information.
Figure 1 Output from a Select-String command (Click the image for a larger view)
After I provided the HR folks with exactly what they asked for, they realized that it wasn't what they wanted after all. My report included visits to numerous IP addresses, such as 22.214.171.124 (the MSN® Web site). The next request from the investigator was to trim down my report to include only visits to a specific IP address, which they had identified as belonging to one of the Web sites in question. For this column, I won't reveal the real IP address that came up in the investigation. Instead, I'll use 126.96.36.199 for this example (although the MSN Web site typically isn't considered inappropriate).
This request could be a bit more difficult. In the log file I was working with, the source and destination IP addresses were right next to one another and separated by a comma. So I could simply change my search string to "192.168.17.54,188.8.131.52" and repeat the search.
In a more complex log file, however, it's possible there would be variable data stored between the two IP addresses, and so a simple string match won't work. In that scenario you would have to resort to a regular expression, which will also work well on simpler log formats, so that's what I'll demonstrate here.
Within a regular expression, the period character is a wild card for any single character. And I can use the sub-expression (.)* to search for any number of characters in between my two IP addresses. However, it is necessary that I use a backslash to escape the literal periods that appear in the IP addresses themselves.
The resulting command is:
select-string -path c:\logs\*.txt -pattern
I removed the –simpleMatch parameter since I'm using a regular expression this time. The resulting output showed only those visits from the particular employee's computer to the specific Web site identified as inappropriate. The output also included the date and time stamp information that the investigator wanted. Figure 2 shows a portion of output you might get when you are running a command such as this.
Figure 2 Narrowed search results that show only visits to a specific site (Click the image for a larger view)
But I can go one better and pipe the output to Format-Table and use its ability to display calculated columns. I can have the table include the file name of the log file and the line number at which the match was found, and I'll even display the matching line itself. However, I can have the shell replace the regular expression match with an empty string so only the remainder of the line—the date and time stamp in my example—will display. This is an advanced trick, but it further demonstrates how Windows PowerShell can manipulate string data and produce highly customized output, all in a single command line:
select-string -path c:\logs\*.txt -pattern
"192\.168\.17\.54(.)*207\.68\.172\.246" -allmatches |
Figure 3 shows what my final results might look like.
Figure 3 Formatted output from a Select-String command (Click the image for a larger view)
It's a Stringy World
I am quick to proclaim the object-oriented nature of Windows PowerShell as one of its greatest strengths. But there are, nonetheless, times when objects aren't an option.
Windows PowerShell may live in an object-oriented world. Fortunately, the Windows PowerShell team recognized that your world frequently contains external data in formatted strings, and so they included the Select-String command. Armed with Select-String and a familiarity with regular expressions, you can use Windows PowerShell to write one-liners that will parse the most complicated strings.