Professor Windows - May 2005
How Log Parser 2.2 Works
Software Design Engineer in Test, Microsoft Corporation
Jeffrey A. Johnson, Software Design Engineer in Test, Microsoft Corporation
I have been asked "what does Log Parser do?" many times in the past couple of years. Every time this happens, I find myself in real trouble trying to figure out how to describe this tool. Every time I find a sentence that tries to describe it, I later discover that the sentence leaves out important facets of the tool, and doesn't convey the real essence of its flexibility. Sure, this might be developer's paranoia, but nonetheless, I never feel totally satisfied with my answer.
I won't tell you what Log Parser does. It's what you do with it that matters. Many times I stared surprised at particular applications of Log Parser that I didn't know were possible. What I'll tell you is how it works, hoping that this will give you an insight into what you can then do with it, and I will show you three of the most common applications of the tool. I am sure you will find surprising new uses for Log Parser as well.
Before we jump to the description of the inner mechanisms of Log Parser, you might want to have some background on its history. Version 2.0 was the first version available outside Microsoft. Log Parser 1.0 was a Microsoft internal test tool that I developed a few years ago to automate tests for IIS logging. The tool gathered so many internal users around it, I made version 2.0 public as a free Web download in November 2001.
Shortly after that, version 2.1 shipped as part of the IIS 6.0 Resource Kit, containing important updates to the SQL engine capabilities.
Finally, version 2.2 , the current version, shipped in January 2005, containing a large number of input and output formats, and other significant upgrades to the SQL engine core.
How Does Log Parser Work?
Log Parser gives you a way to create a data processing pipeline by mixing and matching input formats and output formats as needed, using a query written in a dialect of the SQL language to create the plumbing between these two components.
Input formats are components capable of reading and returning information from a variety of sources. The information provided by an input format is similar to a database table, in which records are made up of a number of fields, and each field has values belonging to one of five possible data types: INTEGER, STRING, TIMESTAMP, REAL, and NULL.
The input formats provided by Log Parser 2.2 include:
- Input formats that parse log files generated by IIS and return the entries in the logs
- Input formats that parse generic text log files formatted according to the CSV, TSV, NCSA, W3C, and XML standards and return the fields contained in the logs
- An input format that returns events from the Windows Event Log
- Input formats that return information on Active Directory objects, on files and directories, and on registry keys
- An input format that parses NetMon capture files and returns information on TCP/IP packets and connections
If none of the available input formats satisfy our needs, Log Parser 2.2 allows you to write custom input format plug-ins. These are simple COM objects that can be written in any programming language, including C# and scripting languages. All you have to do is decide what information you want the plug-in to return, and once implemented, you can use it as if it were a "standard" input format.
Output formats perform the opposite function of the input formats: they consume records and do something useful with the fields contained in the records. The output formats provided with Log Parser 2.2 can:
- Save records to text files formatted according to the CSV, TSV, W3C, and XML standards
- Save records to text files formatted according to generic user-specified templates
- Display records to the console or to a GUI window
- Upload records to a table in a SQL database
- Format records according to the Syslog standard, and dispatch records to a Syslog server, to a text file, or to a user
- Create Excel-style charts that present the record's numeric data in a graphical format
Again, if none of the available output formats satisfies your needs, you can use the Log Parser COM objects programmatically to consume the records.
Core SQL Engine
In between the input and output formats is a data processing engine capable of executing queries in a dialect of the SQL language. A Log Parser query is basically the "glue" that holds together an input format and an output format, specifying how the data provided by an input format is to be transformed before reaching the output format.
The simplest instance of this glue is a "SELECT *" query, which simply dictates that the records coming from the input format should be transferred as-is to the output format. This simple query is usually employed when converting file formats: For example you choose an input format capable of parsing data in the source file, an output format that writes records to a file according to the target format, and use "SELECT *" to transfer the records with no processing.
However, the SQL language was designed for more powerful tasks than a simple "SELECT *". You can use Log Parser to filter, sort, and aggregate the data coming from an input format, sending the results of your calculations to an output format that will display and/or store your distilled information.
Command-line vs. Scriptable COM Objects
You can use Log Parser in two different ways: from the command line, using the "LogParser.exe" executable, and from applications, using the Log Parser COM objects exported by the "LogParser.dll" library. The choice of which you want to use depends on the task that you want to achieve with Log Parser.
In the scenarios that follow I will make use of the command-line executable, but you should keep in mind that all the examples can also be implemented as scripts using the Log Parser COM objects. That's flexibility, isn't it?
Searching for Data
One of the most common applications of Log Parser involves searching for specific information in a particular set of data. Imagine that you are managing an IIS Web site, and you have been told that some users are finding that some unspecified pages are taking a very long time to execute.
The first thing you might want to find out is which pages are actually taking so long to execute. Do you want to open the IIS log files with notepad and stare at millions of entries? I don't think so. If you had your IIS logs stored in a SQL database, you might want to run a query to find out the pages that are taking more than 10 seconds to execute" if you did, it's probable you wouldn't need Log Parser - at least not for this problem. But what if you don't have your logs in a SQL database? In this case, you can use Log Parser to parse the IIS log files, find out which pages are taking more than 10 seconds, and display the results to the console window.
Here's how. Assuming that the IIS Web server is logging in the W3C extended log format, you use the IISW3C input format to parse the IIS log files.
Before running any query, you need to know which fields are exported by this input format. To find this you go to the command line, making sure that LogParser.exe is in the path, and execute this command:
C:\>logparser �h �i:IISW3C
This command gives us a concise help on the IISW3C input format, including a listing of all the fields exported by the IISW3C input format:
LogFilename (S) LogRow (I) date (T) time (T) c-ip (S) cs-username (S) s-sitename (S) s-computername (S) s-ip (S) s-port (I) cs-method (S) cs-uri-stem (S) cs-uri-query (S) sc-status (I) sc-substatus (I) sc-win32-status (I) sc-bytes (I) cs-bytes (I) time-taken (I) cs-version (S) cs-host (S) cs(User-Agent) (S) cs(Cookie) (S) cs(Referer) (S) s-event (S) s-process-type (S) s-user-time (R) s-kernel-time (R) s-page-faults (I) s-total-procs (I) s-active-procs (I) s-stopped-procs (I)
From this output, and with the help of the IIS Logging documentation , you understand that you want the "cs-uri-stem" field of all the log entries whose "time-taken" field is greater than 10,000 (the time-taken field measures the number of milliseconds that the page takes to execute). Those letters in parentheses are the types of the fields. The "cs-uri-stem" field is of the STRING type, while the "time-taken" field is of the INTEGER type.
Next, navigate to the folder where the IIS log files are saved in, and execute this command:
C:\Logs>logparser "SELECT cs-uri-stem FROM ex*.log WHERE time-taken > 10000" �i:IISW3C
Depending on the size of your log files, you might need to wait a few seconds before the command starts spewing out results. An example output could be:
cs-uri-stem ------------------- /home/login.asp /home/login.asp /home/login.asp /home/login.asp /images/button.gif /home/login.asp /home/login.asp /home/login.asp /home/login.asp /home/login.asp Press a key...
First, notice that we are using "-i:IISW3C" to specify the input format. This parameter tells Log Parser that we want the IISW3C input format to generate data for us.
Second, notice that we are not specifying any output format; when this happens, Log Parser selects automatically the "NAT" output format, which displays the query records to the console window. This output format actually displays ten records and then pauses. It's waiting for you to press a key, after which it will go ahead and display the next ten records.
Third, notice that the FROM clause of our SQL query specifies the actual source of the data that we want our input format to parse. In the IISW3C input format case, we can specify any number of files, local or remote, even containing wildcards. There are more parameters you can specify in the FROM clause, but I won't talk about these options here" you can find them in the Log Parser documentation under "Input Formats."
Going back to our example, from the results you can see that most of the times the "login.asp" page takes a long time to execute. However, you may also see that some other pages are occasionally taking more than ten seconds to load, so you might want to find out how many times each different page took more than ten seconds to execute. Pages that appear a small number of times in the output are probably spurious "noise" in our logs.
So, go ahead and run this command:
C:\Logs>logparser "SELECT TOP 10 cs-uri-stem, COUNT(*) FROM ex*.log WHERE time-taken > 10000 GROUP BY cs-uri-stem ORDER BY COUNT(*) DESC" �i:IISW3C
Your output might look like this:
cs-uri-stem COUNT(ALL *) ------------------------ ------------ /home/login.asp 371 /images/lglogo.gif 2 /images/button.gif 1 /news/today.htm 1
This commands Log Parser to do the following:
- Find all the log entries for pages that took more than ten seconds to execute
- For each distinct page URL in these results, count how many times the page URL appears in the results in other words, how many times the page took more than ten seconds to execute
- Sort the results according to the calculated number of appearances in descending order, and only show the topmost ten results
From the output above, you can conclude that the "login.asp" page frequently takes more than ten seconds to execute; the other pages in the results are just spurious entries that likely do not indicate any particular problem.
Exporting Logs to a SQL Database
For many users, the query capabilities implemented by Log Parser are not enough. For example, companies that manage Web sites with very heavy traffic usually want the Web server logs to be stored in a central database, where they can run any number of queries and correlate data across multiple Web servers. The same might be true for the Windows Event Logs, and for other log files.
Log Parser can help in these cases as well, by providing a "SQL" output format that can be used to upload query results to a table in a database.
Generally speaking, a simple "SELECT *" query from a log file to a SQL table might do the trick, but most of the times the logs contain entries that are not worth being stored persistently in a database. For example, you might want to leave out requests to image files; there may be requests for nonexistent pages as well.
The following command does the job:
C:\Logs>logparser "SELECT * INTO LogsTable FROM ex*.log WHERE TO_LOWERCASE (EXTRACT_EXTENSION(cs-uri-stem)) NOT IN ('gif';'jpg';'png') AND sc-status <> 404" �i:IISW3C �o:SQL �database:LogsDatabase
This command asks Log Parser to filter out entries for pages ending up in ".gif", ".jpg", and ".png", regardless of the capitalization of the URL. It also leaves out requests that generated the HTTP error code "404", that is, "page not found".
Everything else is handed to the SQL output format, which will save the records to the table specified in the INTO clause. Also, this output format accepts a "database" parameter that we can use to specify the name of the target database, together with many other parameters that let us specify other properties of the connection to the SQL database.
To do this, you could put the command above in a batch file, use the SCHTASKS.EXE utility to schedule the batch file to run, say, every hour, and � poof! � you'd have an automated mechanism that periodically uploads entries from your log files to your SQL database.
Note that every time you execute the command above, Log Parser goes through all the log entries over and over, uploading all of them multiple times to the database. What we really need is a mechanism to say "only parse the log entries not parsed yet."
Here's where a new Log Parser 2.2 feature comes handy: the "incremental parsing" feature, also know as "checkpointing."
Most Log Parser input formats support a parameter, called "iCheckpoint", that tells the input format to save to a private data file (a "checkpoint" file) the current state of all the files being parsed. When you execute a query using a checkpoint file generated during a previous run, the input format will only parse those portions of the input files that were not parsed before � that is, new entries only.
So, all you need to do is specify the name of a checkpoint file with the "iCheckpoint" parameter, as follows:
C:\Logs>logparser "SELECT * INTO LogsTable FROM ex*.log WHERE TO_LOWERCASE (EXTRACT_EXTENSION(cs-uri-stem)) NOT IN ('gif';'jpg';'png') AND sc-status <> 404" �i:IISW3C �o:SQL �database:LogsDatabase �iCheckpoint:MyCheckpoint.lpc
The very first time you execute this command, the "MyCheckpoint.lpc" file does not exist, and the IISW3C input format parses all the entries in all the log files, filtering and uploading the resulting data to the database. After parsing all the logs, the IISW3C input format saves the current size of each log file to the "MyCheckpoint.lpc" file.
When you execute the same command at a later time, the IISW3C input format will load the checkpoint file and will parse the logs starting at the position where they were left in the previous run, thus processing, filtering, and uploading new log entries only. When completed, it will update the checkpoint file with the new size of the log files, and the cycle repeats itself continually.
If you have followed closely the architectural description at the beginning of this article, you might have already figured out that by just changing the input format, you can reuse this command to upload to a database any log supported by the Log Parser input formats.Want to upload event log entries? Just use the EVT input format, as follows:
C:\Logs>logparser "SELECT * INTO EventLogsTable FROM System" �i:EVT �o:SQL � database:LogsDatabase �iCheckpoint:MyCheckpoint.lpc
By now you should have figured out that by selecting the right input format and writing the right query, you can obtain whatever information you need from log files, the system, or whatever else is supported by Log Parser.
However, we have only seen commands that return information to the console window, or that upload records to a SQL database. If you are using Log Parser to generate reports for your management team, you would like something more human-friendly than a SQL table, wouldn't you?
Generating an HTML Page
For this example, assume that we want to generate a report containing all the logons on a specified remote computer, which we will retrieve from the computer's Security event log:
C:\>logparser "SELECT ComputerName, TimeGenerated AS LogonTime, STRCAT(STRCAT(EXTRACT_TOKEN (Strings, 1, '|'), '\\'), EXTRACT_TOKEN(Strings, 0, '|')) AS Username FROM \\SERVER01 \Security WHERE EventID IN (552; 528) AND EventCategoryName = 'Logon/Logoff'" -i:EVT
Don't get scared by the query most of its apparent complexity is due to some string manipulation we're doing in order to extract the full <doman>\<username> string from the event data.
Using the NAT output format, the output would look something like this:
ComputerName LogonTime Username ------------ ------------------- ---------------------------- SERVER01 2004-06-24 09:47:12 NT AUTHORITY\NETWORK SERVICE SERVER01 2004-06-24 09:48:05 SERVERDC\Administrator SERVER01 2004-06-24 09:48:05 NT AUTHORITY\NETWORK SERVICE SERVER01 2004-06-24 10:00:59 NT AUTHORITY\NETWORK SERVICE SERVER01 2004-06-24 10:01:00 NT AUTHORITY\LOCAL SERVICE SERVER01 2004-06-24 10:01:06 SERVER01\Gabriele SERVER01 2004-06-24 10:01:15 NT AUTHORITY\LOCAL SERVICE SERVER01 2004-06-24 10:01:18 SERVERDC\Monica SERVER01 2004-06-24 10:01:18 NT AUTHORITY\LOCAL SERVICE SERVER01 2004-06-24 10:01:24 NT AUTHORITY\LOCAL SERVICE
In order to format this output as HTML, we will use the TPL output format, which takes as a parameter a text file (a "template" file) that describes how the output is to be formatted. In our case, we will write the template file in such a way that the output becomes an HTML table, as follows:
<LPHEADER> <HTML> <HEAD><TITLE>Logon/Logoff Report</TITLE></HEAD> <BODY> <H1>Logon/Logoff Report</H1> <TABLE BGCOLOR="#D0D0E0" BORDER="1" CELLPADDING="10"> <TR> <TH>%FIELDNAME_1%</TH> <TH>%FIELDNAME_2%</TH> <TH>%FIELDNAME_3%</TH> </TR> </LPHEADER> <LPBODY> <TR> <TD><TT>%FIELD_1%</TT></TD> <TD><TT>%FIELD_2%</TT></TD> <TD><TT>%FIELD_3%</TT></TD> </TR> </LPBODY> <LPFOOTER> </BODY> </HTML> </LPFOOTER>
I'm not going to delve into the details of the template file � its format is exhaustively explained in the online documentation accompanying Log Parser. Notice that the template files is divided into three sections (a header, a body, and a footer), and that each section uses special "%variables%" that get substituted at run-time with values from the query results.
We are now ready to generate our report by using the TPL output format with our previous command:
C:\>logparser "SELECT ComputerName, TimeGenerated AS LogonTime, STRCAT(STRCAT(EXTRACT_TOKEN (Strings, 1, '|'), '\\'), EXTRACT_TOKEN(Strings, 0, '|')) AS Username INTO Report.html FROM \\SERVER01\Security WHERE EventID IN (552; 528) AND EventCategoryName = 'Logon/Logoff'" - i:EVT �o:TPL �tpl:MyTemplate.txt
This command creates a "Report.html" file which opened in a browser will look like Figure 1.
If your browser does not support inline frames, click here to view on a separate page.
As a last example, I'll show you how you can generate a chart directly out of the results of a query, without going through Excel.
The following command calculates the total number of hits for every hour from IIS log files, using the CHART output format to create a JPG image containing a chart:
C:\>logparser "SELECT TO_STRING(time, 'HH') AS Hour, COUNT(*) AS Hits INTO MyChart.jpg FROM ex*.log GROUP BY Hour ORDER BY Hour ASC" -i:IISW3C -o:CHART -chartType:ColumnClustered - chartTitle:"Hourly Hits" -groupSize:420x280
The output will look something like Figure 2
If your browser does not support inline frames, click here to view on a separate page.
We have seen that the Log Parser architecture lends itself to numerous applications, all made possible by the flexibility of Log Parser's "data processing pipeline". I showed you that you can use Log Parser to search for data, to upload logs to a SQL database, and to generate reports, and these are just three examples of what users commonly do with the tool.
When I said that it's hard to come up with a single sentence that describes what Log Parser does, I wasn't kidding - was I?
For More Information
Take a look at the following resources:
- http://www.logparser.com : Dubbed the "Unofficial Log Parser Support Site", this site, owned and maintained by Mike Gunderloy, hosts a public forum where users ask and answer questions, exchange tricks, and report problems, together with a Knowledge Base with answers to the most frequently asked questions, and a wealth of resource links to other Log Parser related articles on the Web.
- The "Microsoft Log Parser Toolkit" book is packed with information and scripts from myself and many other Log Parser users. It shows tips and tricks about the tool, helping users get the most out of Log Parser.
- Tales From the Script : This article from the Scripting Guys at TechNet shows how to programmatically use the Log Parser scriptable COM objects, how to select input and output formats from a script, how to set properties, and how to consume records generated by a Log Parser query.
For a list and additional information on all Professor Windows columns, click here .
For any feedback regarding the content of this column, please write to Microsoft TechNet . Please be aware that this is not a technical support alias and a response is not guaranteed.