Chapter 10. Mining the Web-House

Table of Contents

The Portal as an information source
Architecture
Collecting data: logging and the role of meta-data
Analytics
How to flow the results back
External Information Sources

This chapter totally reverses our point of view: from building a portal to sell services and products to collecting information about customers during their visits. Both views are - and that is one of the greater problems - highly connected: A portal is at the same time an offer (information, services or products) and a measuring tool. But the measurements are again dependent on the content that is offered by the portal - supposedly adjusted after the measurments have been analyzed and the results have flown back into the portal. But the two roles of a portal are very different with respect to the technology used and the persons involved: The "output" side is all real-time and driven by business ideas. The input side needs much more time for analysis and is usually done by special analysts and specialists for data-mining and data-warehouses.

Purpose

The purpose of this chapter is to clarify the why and how of integrating a data-warehouse with the enterprise portal. Not to provide an introduction to data-warehouses or data-mining in general. Examples of analytics run in the warehouse serve only the purpose to define informations needed from the portal.

But before diving into the technicalities it pays to think about the reasons for the increasing interest in "web-shouses".

Example 10.1. An artificial advisor

The banking business has always been service intensive. The banking personnel knew their customers (this was actually the base of their credit business) and at the same time the personnel also knew how the bank worked internally. They knew the applications and how to use them to extract information for customers.

Figure 10.1.

Nowadays banks need to cut down on personnel costs and one way to do so is to provide their services through a customer friendly (read personalized) portal. We can learn about the requirements of such a portal from the functions that were traditionally provided by client advisors. These functions can be split into two different areas: Learning information about the customer for input into the banking systems. Using banking systems to extract, transform and aggregate information for the customer. Both areas together form what business calls Customer Relationsship Management (CRM)

The Portal as an information source

The information that can be collected through a portal is a direct result of what the portal provides in its User Interface. Behavioral information (e.g. clickstream data about page impressions) but also transaction information (e.g. orders) or information coming from collaborative services (e.g. forum activity). Customization performed by a user typically are indicators of special interests (e.g. setting filters for news or research). A search interface also provides information about interests.

Figure 10.2.

An important question in an enterprise portal is typically whether the users have to be authenticated or not. In case of no authentication it is much harder to personalize a site during the progress of a session because of the realtime requirements and the lack of information. Market-basket analysis in supermarkets sometimes tries to deduce single customers from shopping sequences using e.g. "phenomenal information".