Deleting Orphan Data from the Data Warehouse

During typical operation of the Data Warehouse, you should schedule the Data Deletion DTS task to clear fact data regularly. The data that the Data Deletion DTS task removes is typically the largest amount of data in the Data Warehouse; for example, requests, visits, and the content of the VisitInfo table.

Over time, you might experience slower import times and want to delete orphaned data of particular dimensions that grow large, for example, LogUser, URI, and IPRef.

Orphans are instances of data in the parent class, in the context of a parent-child relationship, that do not have any associated child instances. For example, in the Data Warehouse, the OrderFormheader class is the parent class for the OrderFormLineItems class. This means for each instance in the OrderFormLineItems (childClass) there is a corresponding instance in the OrderFormHeader (parentClass). If a delete operation deletes some instances in OrderFormLineItems, the corresponding instances in OrderFormHeader (parent class) are orphans.

You can delete the instances of orphaned data from the Data Warehouse by issuing a delete command that uses the DELETE ORPHANS clause against the parent class by using the Commerce OLE DB Provider.

You cannot use the delete command that uses the DELETE ORPHANS clause to remove orphaned data from the URI, LogUser and URIQuery classes because they are parent classes for several facts, such as visit and requests, and their parent-child relationships are enforced in the log import task instead of by the Commerce OLE DB Provider.

A delete command without the DELETE ORPHANS clause will not work against the URI, LogUser and URIQuery classes. This is by design.

In order to delete the orphaned data within these tables, you must call the DeleteOrphans_UriQuery, DeleteOrphans_Uri, and DeleteOrphans_LogUser stored procedures. You can call these stored procedures by using the following syntax in SQL Query Analyzer:

exec DeleteOrphans_UriQuery
exec DeleteOrphans_Uri
exec DeleteOrphans_LogUser

The DeleteOrphan stored procedures do not work on the following dimensions:

  • LogUser

  • URI

  • UriQuery

These dimensions participate in an application-level relationship that the OLE DB Provider cannot process.

You must update the DeleteOrphans_Uri and DeleteOrphans_LogUser stored procedures whenever you define a new or custom Commerce Event.

Use the following code to update the DeleteOrphans_Uri stored procedure:

CREATE PROCEDURE DeleteOrphans_UriQuery
AS
SELECT [QueryStringKey] INTO [#MSCSTempUriQuery] FROM [UriQuery] WHERE (0 = 1)
CREATE UNIQUE CLUSTERED INDEX [URI_del_temp_idx] ON [#MSCSTempUriQuery]([QueryStringKey]) WITH IGNORE_DUP_KEY
INSERT INTO [#MSCSTempUriQuery] SELECT [Request].[QueryStringKey] AS [QueryStringKey] FROM [Request]
INSERT INTO [#MSCSTempUriQuery] SELECT [RequestByDateByUriByQueryString].[QueryStringKey] AS [QueryStringKey] FROM [RequestByDateByUriByQueryString]
DELETE FROM [UriQuery] WHERE ([UriQuery].[QueryStringKey] NOT IN 
   (SELECT [#MSCSTempUriQuery].[QueryStringKey] FROM [#MSCSTempUriQuery])) 
GO
 
CREATE PROCEDURE DeleteOrphans_URI
AS
SELECT [UriKey] INTO [#MSCSTempURI] FROM [URI] WHERE (0 = 1)
CREATE UNIQUE CLUSTERED INDEX [URI_del_temp_idx] ON [#MSCSTempURI]([UriKey]) WITH IGNORE_DUP_KEY
INSERT INTO [#MSCSTempURI] SELECT [Basket].[UriKey] AS [UriKey] FROM [Basket]
INSERT INTO [#MSCSTempURI] SELECT [CampaignEvent].[UriKey] AS [UriKey] FROM [CampaignEvent]
INSERT INTO [#MSCSTempURI] SELECT [FirstUriByDate].[FirstUriKey] AS [UriKey] FROM [FirstUriByDate]
INSERT INTO [#MSCSTempURI] SELECT [HitsInfo].[UriKey] AS [UriKey] FROM [HitsInfo]
INSERT INTO [#MSCSTempURI] SELECT [LastUriByDate].[LastUriKey] AS [UriKey] FROM [LastUriByDate]
INSERT INTO [#MSCSTempURI] SELECT [OpenUserVisit].[FirstUriKey] AS [UriKey] FROM [OpenUserVisit]
INSERT INTO [#MSCSTempURI] SELECT [OpenUserVisit].[SecondUriKey] AS [UriKey] FROM [OpenUserVisit]
INSERT INTO [#MSCSTempURI] SELECT [OpenUserVisit].[LastUriKey] AS [UriKey] FROM [OpenUserVisit]
INSERT INTO [#MSCSTempURI] SELECT [Order].[UriKey] AS [UriKey] FROM [Order]
INSERT INTO [#MSCSTempURI] SELECT [Request].[UriKey] AS [UriKey] FROM [Request]
INSERT INTO [#MSCSTempURI] SELECT [RequestByDateByUriByQueryString].[UriKey] AS [UriKey] FROM [RequestByDateByUriByQueryString]
DELETE FROM [URI] WHERE ([URI].[UriKey] NOT IN 
   (SELECT [#MSCSTempURI].[UriKey] FROM [#MSCSTempURI])) 
GO

CREATE PROCEDURE DeleteOrphans_LogUser
AS
SELECT [UserKey] INTO [#MSCSTempLogUser] FROM [LogUser] WHERE (0 = 1)
CREATE UNIQUE CLUSTERED INDEX [LogUser_del_temp_idx] ON [#MSCSTempLogUser]([UserKey]) WITH IGNORE_DUP_KEY
INSERT INTO [#MSCSTempLogUser] SELECT [Basket].[UserKey] AS [UserKey] FROM [Basket]
INSERT INTO [#MSCSTempLogUser] SELECT [Order].[UserKey] AS [UserKey] FROM [Order]
INSERT INTO [#MSCSTempLogUser] SELECT [CampaignEvent].[UserKey] AS [UserKey] FROM [CampaignEvent]
INSERT INTO [#MSCSTempLogUser] SELECT [OpenUserVisit].[UserKey] AS [UserKey] FROM [OpenUserVisit]
INSERT INTO [#MSCSTempLogUser] SELECT [Visit].[UserKey] AS [UserKey] FROM [Visit]
INSERT INTO [#MSCSTempLogUser] SELECT [VisitInfo].[UserKey] AS [UserKey] FROM [VisitInfo]
INSERT INTO [#MSCSTempLogUser] SELECT [Request].[UserKey] AS [UserKey] FROM [Request]
DELETE FROM [LogUser] WHERE ([LogUser].[UserKey] NOT IN 
   (SELECT [#MSCSTempLogUser].[UserKey] FROM [#MSCSTempLogUser])) 
GO

See Also

Other Resources

Deleting Data from the Data Warehouse