Deleting Orphan Data from the Data Warehouse

During normal operation of the Data Warehouse, you should schedule the Delete Data DTS task to clear fact data on a regular basis. The data that the Delete Data DTS task removes is typically the largest amount of data within the Data Warehouse (For example, requests, visits, and the content of the VisitInfo table).

Over time, you might experience slower import times and want to delete orphaned data of particular dimensions that grow large (for example, LogUser, URI, and IPRef).

Orphans are instances of data in the parent class (in the context of a parent-child relationship) that do not have any associated child instances. For example, in the Data Warehouse, the OrderFormheader class is the parent class for the OrderFormLineItems class, which means for each instance in the OrderFormLineItems (childClass) there is a corresponding instance in the OrderFormHeader (parentClass). If a delete operation deleted some instances in OrderFormLineItems, the corresponding instances in OrderFormHeader (parent class) are considered to be orphaned.

You can delete the instances of orphaned data from the Data Warehouse by issuing a delete command with the DELETE ORPHANS clause against the parent class by using the Commerce OLE DB provider. For more information, see Delete Syntax.

You cannot use the delete command with the DELETE ORPHANS clause to remove orphaned data from the URI, LogUser and URIQuery classes because they are parent classes for several facts, such as visit and requests, and their parent-child relationships are enforced in the logimport task rather than by the Commerce OLE DB provider.

Ee796765.note(en-US,CS.20).gifNote

  • A delete command without the DELETE ORPHANS clause will not work against the URI, LogUser and URIQuery classes. This is by design.

In order to delete the orphaned data within these tables, it is necessary to invoke the DeleteOrphans_UriQuery, DeleteOrphans_Uri, and DeleteOrphans_LogUser stored procedures. These stored procedures can be invoked with the SQL Query Analyzer by the following syntax:

exec DeleteOrphans_UriQuery
exec DeleteOrphans_Uri
exec DeleteOrphans_LogUser

Ee796765.note(en-US,CS.20).gifNote

  • The DeleteOrphan stored procedures do not work on the following dimensions:
    • LogUser
    • URI
    • UriQuery

These dimensions participate in an application-level relationship that the OLE DB provider cannot process.

The DeleteOrphans_Uri and DeleteOrphans_LogUser stored procedures must be updated whenever a new or custom Commerce Event is defined.

Use the following code to update the DeleteOrphans_Uri stored procedure:

CREATE PROCEDURE DeleteOrphans_UriQuery
AS
SELECT [QueryStringKey] INTO [#MSCSTempUriQuery] FROM [UriQuery] WHERE (0 = 1)
CREATE UNIQUE CLUSTERED INDEX [URI_del_temp_idx] ON [#MSCSTempUriQuery]([QueryStringKey]) WITH IGNORE_DUP_KEY
INSERT INTO [#MSCSTempUriQuery] SELECT [Request].[QueryStringKey] AS [QueryStringKey] FROM [Request]
INSERT INTO [#MSCSTempUriQuery] SELECT [RequestByDateByUriByQueryString].[QueryStringKey] AS [QueryStringKey] FROM [RequestByDateByUriByQueryString]
DELETE FROM [UriQuery] WHERE ([UriQuery].[QueryStringKey] NOT IN 
   (SELECT [#MSCSTempUriQuery].[QueryStringKey] FROM [#MSCSTempUriQuery])) 
GO
 
CREATE PROCEDURE DeleteOrphans_URI
AS
SELECT [UriKey] INTO [#MSCSTempURI] FROM [URI] WHERE (0 = 1)
CREATE UNIQUE CLUSTERED INDEX [URI_del_temp_idx] ON [#MSCSTempURI]([UriKey]) WITH IGNORE_DUP_KEY
INSERT INTO [#MSCSTempURI] SELECT [Basket].[UriKey] AS [UriKey] FROM [Basket]
INSERT INTO [#MSCSTempURI] SELECT [CampaignEvent].[UriKey] AS [UriKey] FROM [CampaignEvent]
INSERT INTO [#MSCSTempURI] SELECT [FirstUriByDate].[FirstUriKey] AS [UriKey] FROM [FirstUriByDate]
INSERT INTO [#MSCSTempURI] SELECT [HitsInfo].[UriKey] AS [UriKey] FROM [HitsInfo]
INSERT INTO [#MSCSTempURI] SELECT [LastUriByDate].[LastUriKey] AS [UriKey] FROM [LastUriByDate]
INSERT INTO [#MSCSTempURI] SELECT [OpenUserVisit].[FirstUriKey] AS [UriKey] FROM [OpenUserVisit]
INSERT INTO [#MSCSTempURI] SELECT [OpenUserVisit].[SecondUriKey] AS [UriKey] FROM [OpenUserVisit]
INSERT INTO [#MSCSTempURI] SELECT [OpenUserVisit].[LastUriKey] AS [UriKey] FROM [OpenUserVisit]
INSERT INTO [#MSCSTempURI] SELECT [Order].[UriKey] AS [UriKey] FROM [Order]
INSERT INTO [#MSCSTempURI] SELECT [Request].[UriKey] AS [UriKey] FROM [Request]
INSERT INTO [#MSCSTempURI] SELECT [RequestByDateByUriByQueryString].[UriKey] AS [UriKey] FROM [RequestByDateByUriByQueryString]
DELETE FROM [URI] WHERE ([URI].[UriKey] NOT IN 
   (SELECT [#MSCSTempURI].[UriKey] FROM [#MSCSTempURI])) 
GO

CREATE PROCEDURE DeleteOrphans_LogUser
AS
SELECT [UserKey] INTO [#MSCSTempLogUser] FROM [LogUser] WHERE (0 = 1)
CREATE UNIQUE CLUSTERED INDEX [LogUser_del_temp_idx] ON [#MSCSTempLogUser]([UserKey]) WITH IGNORE_DUP_KEY
INSERT INTO [#MSCSTempLogUser] SELECT [Basket].[UserKey] AS [UserKey] FROM [Basket]
INSERT INTO [#MSCSTempLogUser] SELECT [Order].[UserKey] AS [UserKey] FROM [Order]
INSERT INTO [#MSCSTempLogUser] SELECT [CampaignEvent].[UserKey] AS [UserKey] FROM [CampaignEvent]
INSERT INTO [#MSCSTempLogUser] SELECT [OpenUserVisit].[UserKey] AS [UserKey] FROM [OpenUserVisit]
INSERT INTO [#MSCSTempLogUser] SELECT [Visit].[UserKey] AS [UserKey] FROM [Visit]
INSERT INTO [#MSCSTempLogUser] SELECT [VisitInfo].[UserKey] AS [UserKey] FROM [VisitInfo]
INSERT INTO [#MSCSTempLogUser] SELECT [Request].[UserKey] AS [UserKey] FROM [Request]
DELETE FROM [LogUser] WHERE ([LogUser].[UserKey] NOT IN 
   (SELECT [#MSCSTempLogUser].[UserKey] FROM [#MSCSTempLogUser])) 
GO

See Also

Running the Data Deletion DTS Task

ETL Process for the Data Deletion DTS Task

Scripting for the Data Deletion DTS Task

Best Practices for Data Warehouse

Workflow for Running the DTS Tasks

Troubleshooting the Data Warehouse Import Process

Copyright © 2005 Microsoft Corporation.
All rights reserved.