Best Practices for the Predictor Resource

Article
11/12/2009

This topic lists the best practices for deploying and optimizing the Predictor resource, and analyzing results.

Deploying the Predictor Resource

Optimizing the Predictor Resource

Controlling Accuracy

Controlling Number of Cases

Controlling the Fractions of Properties

Deploying the Predictor Resource

If you are using an analysis model for real-time online predictions, copy your analysis model on to the Web server.

This way you can secure your Data Warehouse behind the firewall and still make use of the Prediction Model. For instructions, see Deploying Predictor and Securing a Predictor Deployment.
Multiple instances of the Predictor resource are supported per Commerce Server Web site, but only one instance is supported per computer.
You can install the Predictor resource on each computer that contains your Commerce Server Data Warehouse.
You can install the Predictor resource on dedicated computers to preserve system memory.
Running the Predictor resource on a separate computer uses more network bandwidth than running the Predictor resource and the Data Warehouse on the same computer.
The Predictor resource requires that you use Microsoft SQL Server as your database. For information about the version of SQL Server required, see Installing Prerequisite Software.
A site can run several different analysis models, and each model can be based on data from a different database.
Commerce Server 2002 does not limit the number of models you can use for each site or server.
Building analysis models is memory intensive. You can prevent a resource hit on the Web servers by building a model once and then copying it to each of your Web servers (using the IPredictorClient::LoadModelFromDB method with the PredictorClient object). Each Web server can host multiple analysis models (for different predictive functionality).

Optimizing the Predictor Resource

For best performance during the model build process, it is recommended that you build analysis models on the same computer that contains your Commerce Server Data Warehouse.

You can improve the performance of the Predictor resource in the following ways:

Control the sample size used to build an analysis model
Control the size of the tables that the model configurations use to build the analysis models
Control how many properties are to be included in an analysis model
Build your model on the same computer that contains your Data Warehouse

Controlling Accuracy

You can control the relative accuracy of an analysis model and the time it takes to build it, by varying the sample size. The larger the sample, the more accurate the analysis model and the longer it takes to build the model.

It is recommended that you use a minimum of 10,000 samples for the Predictor to work successfully.

However, the accuracy of analysis models larger than 30,000 to 50,000 cases does not significantly improve, except for predicting ad clicks. If you are predicting ad clicks, the accuracy of the analysis model should continue to increase substantially as the sample size increases.

Controlling Number of Cases

You control the number of cases used to build an analysis model by modifying Number of Cases in the Model Build Properties dialog box. You can also pre-sample your data using a view or table and build the model from that sample. With this method, you should be careful to sample the cases at random, instead of choosing them sequentially.

Controlling the Fractions of Properties

You can control the fractions of properties used for inputs to predict the accuracy of an analysis model and the time it takes to build the model. The higher the fraction, the more accurate the model, but the longer it takes to build the model. To control the number of properties that are included in an analysis model, modify Input Fraction and Output Fraction in the Model Build Properties dialog box. A value of 1.0 indicates that the Predictor resource is to use all of the properties available to build the model. If the database contains many properties, (for example, a product catalog with tens or hundreds of thousands of products) it is best to use a lower number. Unless you have a reason to do otherwise, it is probably best to use equal input and output fractions.

When you try to improve the results of your Prediction model, consider the data sizes and limits, and the performance criteria for the Predictor resource.

Best Practices for the Predictor Resource

Deploying the Predictor Resource

Optimizing the Predictor Resource

Controlling Accuracy

Controlling Number of Cases

Controlling the Fractions of Properties

See Also

Additional resources