Data Warehouse with Oracle Data Integrator (ODI)





Building a Data Warehouse: typical steps and milestones

  • Research and specify business needs (Key Indicators)
  • Identify data sources relevant to generate key indicators
  • Define business rules to transform source information into key indicators
  • Model the data structure of the target warehouse to store the key indicators
  • Populate the indicators by implementing business rules
  • Measure the overall accuracy of the data by setting up data quality rules
  • Develop reports on key indicators
  • Make key indicators and metadata available to business users through ad-hoc query tools or predefined reports
  • Measure business users’ satisfaction and add/modify key indicators


Using Oracle Data Integrator (ODI) in a Data Warehouse project: actors involved and some assigned tasks



Business User
  • Access the final calculated key indicators
  • Use reports and ad-hoc queries
  • May need to understand the definition of an indicator
  • May need to be aware of data quality issues
    • when was the last time the table was updated?
    • How many records were added, update, removed in the table?
    • What are the rules that calculate a particular indicator?
    • Where does the data come from, and how is it transformed?

Business Analyst
  • Define the key indicators
  • Identify the source applications
    • How many different source applications need to be considered?
    • Is the data needed for key indicators available in the selected pool of source applications?
    • What data quality issues are present in the source systems?
  • Specify business rules to transform source data into meaningful target indicators
    • Projects can use the ODI Designer to directly specify the business rules
    • For each target table, specify also:
      • Target datastore - name the target datastore
      • Description of transformation - describe its purpose
      • Integration strategy - How data should be written to target (replace table, append, update, etc). Each strategy specified will correspond to an ODI Integration Knowledge Module.
      • Refresh frequency
      • Dependencies - what datastores need to be loaded or jobs executed prior to this one
      • Source datastores - source databases, applications, etc used
      • Source ODI Model -
      • Datastore name, Fiel Mappings and transformations, Links or Join criteria, Filters, Data Quality requirements, constraint names and expressions, etc
  • Maintain translation data from operational semantics to the Data Warehouse semantic

Developer
  • Implement the business rules as specified by business analysts
  • Provide executable scenarios to the production team
  • Must understand infrastructure details and have business knowledge of source applications

Metadata Administrator
  • Reverse engineer source and target applications
    • Understand content and structure of source applications
    • Connect to source applications and capture their metadata
    • Define data quality business rules (specified by Business Analyst) in ODI repository
      • What level of Data Quality is required?
      • who are the business owners of source data?
      • What should be done with rejected records?
      • There should be an error recycling strategy?
      • How would business users modify erroneous source data?
      • Should a GUI be provided for source data correction?
  • Guarantee the overall consistency of Metadata in the various environments (dev, test, prod: repository)
  • Participate in the data modeling of key indicators
  • Add comments, descriptions and integrity rules (PK, FK, Check, etc)in the metadata
  • Provide version management

Database Administrator
  • Define technical database structure supporting the various layers in the data warehouse project (and ODI structure)
  • Create database profiles needed by ODI
  • Create schemas and databases for the staging areas
    • Describe columns inside the data dictionary of database - COMMENT ON TABLE/COLUMN
    • Avoid using PKs of source systems as PKs in target tables. Use counter or identity columns instead.
    • Design Referential Integrity and reverse engineerd FKs in ODI models.
    • Do not implement RIs on target database (for performance). Data quality control should guarantee data integrity.
    • Standardize obejct naming conventions
  • Distributed and maintain the descriptions of the environments (topology)

System Admin
  • Maintain technical resources for the Data Warehouse project
  • May install and monitor schedule agents
  • May backup and restore repositories
  • Install and monitor ODI console
  • Setup the various enviroments (dev, test, prod)

Security Admin
  • Define the security policy for the ODI Repository
  • Creates ODI users and grant rights on models, projects and contexts

Operator
  • Import released and tested scenarios into production environment
  • Schedule the execution of production scenarios
  • Monitor execution logs and restart failed sessions




Conceptual Architecture of an ODI solution

  • The ODI Repository is the central component.
  • ODI Repository stores configuration information about
    • IT infrastructure and topology
    • metadata of all applications
    • projects
    • Interfaces
    • Models
    • Packages
    • Scenarios
    • Solutions - versioning control
  • You can have various repositories within an IT infrastructure, linked ti separated environments that exchange metadata and scenarios (i.e. development, test, user acceptance test, production, hot fix)

Example: two environments, components, actors and tasks


(source: Oracle 2008)



Suggested environment configuration for a Data Warehouse project with ODI:


(1) A single master repository:
  • holds all the topology and security information.
  • All the work repositories are registered here.
  • Contains all the versions of objects that are committed by the designers.

(2) “Development” work repository:
  • Shared by all ODI designers
  • Holds all the projects and models under development.

(3) “Testing” work repository
  • shared by the IT testing team.
  • Contains all the projects and models being tested for future release.

(4) “User Acceptance Tests” work repository:
  • shared by the IT testing team and business analysts.
  • Contains all the projects and models about to be released.
  • Business analysts will use the ODI Console on top of this repository to validate the scenarios and transformations before releasing them to production.

(5) “Production” work repository
  • shared by the production team, the operators and the business analysts.
  • Contains all the projects and models in read-only mode for metadata lineage, as well as all the released scenarios.

(6) “Hot fix” work repository:
  • shared by the maintenance team and the development team.
  • Usually empty, but whenever a critical error happens in production, the maintenance team restores the corresponding projects and models in this repository and performs the corrections with the help of the development team.
  • Once the problems are solved, the scenarios are released directly to the production repository and the new models and projects are versioned in the master repository.



Typical environment (I):


(source: Oracle 2010)



Typical environment (II): Separate Master Repository for Production


(source: Oracle 2010)

Wanderlust

(coffee break)
Spending lots of your time configuring RAC, freshing MVs, checking Grid Control charts..
So I go for a walk. Wander about, wonder...

Oracle On-line Analytic Processing (OLAP) (I)



OLAP Cubes can play three (3) roles in an Oracle Installation:


(a) Full-featured, fully integrated multidimensional server
  • Applications can query all data using a dimensional query language (Oracle OLAP API or MDX)
  • OLAP engine runs within the kernel of Oracle Database.
  • Dimensional objects (cubes, measures, dimensions, levels, hierarchies, attributes) are stored in Oracle Database in their native multidimensional format.
  • Cubes and other dimensional objects are first class data objects represented in the data dictionary.
  • Data security is administered in the standard way, by granting and revoking privileges to users and roles.
  • Applications can query dimensional objects using SQL.

(b) A Summary management solution
  • The Oracle cube is used to manage summary data and is accessed transparently by SQL-based BI applications as a Cube-organized Materialized View using Query rewrite feature.
  • Cube-Organized MVs were introduced on Oracle 11g, and play the same role as table-based MVs
  • Cube-organized MVs allow any application to transparently access summary data managed by the cube.
  • Cube-organized MVs provide substantial query performance enhancement with much lower maintenance overhead than table-based MV alternatives
  • With Cube-organized MVs, applications query the detail tables and the database automatically rewrites the query to access summary data in the materialized view.

To expose the cube as a Materialized View, you can use the Analytic Workspace Manager:

Both detail and summary data in the cube are exposed in the Materialized View.
When the detail tables are queried, the optimizer rewrite the query to use the cube-organized materialized view
SQL> select t.calendar_year_name,
       p.department_name, 
       cu.region_name,
       sum(s.quantity) as quantity, 
       sum(s.sales) as sales
from times t, 
     customers cu, 
     products p,
     sales_fact s
where cu.customer_key = s.customer
  and p.item_key = s.product
  and s.day_key  = t.day_key
group by t.calendar_year_name, p.department_name, cu.region_name;


Explain plan: 
OPERATION
-----------------------------------

SELECT STATEMENT
  HASH
    CUBE SCAN CB$SALES_CUBE


(c) A supplier of rich analytic content
  • OLAP cubes are exposed as Cube views.
  • Cube views include detail and aggregate data as individual columns (measures)
  • Cube views can list complex analytic content, including hiearchical aggregations, statistical forecasts, additive and non-additive aggregations, calculated measures, etc


Cubes and Dimensions seen from a relational perspective

Since Oracle provides a complete SQL interface to the OLAP cube:
  • Cubes and Dimensionas can be thought of as relational objects.
  • Cubes as relational objects that offer improved performance and advanced analytic content, and
  • Dimensions as relational objects that include columns with information useful to creating hiearchical queries.
  • The cube is simply another data type in the database.




Some benefits OLAP Cubes bring to Business Intelligence applications


(a) Improved query performance for Ad-hoc query patterns
  • As query patterns become less predictable, creation and maintenance of materialized views for specific queries becomes impractical.
  • Consider a data model with four dimensions (time, customer, product, channel), each with six levels of summarization (i.e in the time dimension: day, week, month, quarter, half year and year).
  • In this case, there are (4**6 -1) = 4095 possible combinations representing sumary level data that users might query.

(b) Improved query performance for summary data
(c) Fast incremental update
(d)Rich analytic content
(e)Metadata that describes the logical business model and the relational representations of the cube



(Q) What two categories can OLAP Metadata be grouped into?

(a) Metadata about the Cube's structure, data, and how it is calculated
(b) Metadata about how the cube is represented for query using SQL cube views
  • Description of the cube's structure include information on: dimensions, hierarchies, levels, attributes and measures.
  • For measures, metadata is available describing how the cube is calculated and the calculation expression of a measure.
SQL> select table_name from dict where table_name like '%CUBE%';
TABLE_NAME
------------------------------
...
DBA_CUBES
DBA_CUBE_ATTRIBUTES
DBA_CUBE_ATTR_VISIBILITY
DBA_CUBE_BUILD_PROCESSES
DBA_CUBE_CALCULATED_MEMBERS
DBA_CUBE_DIMENSIONALITY
DBA_CUBE_DIMENSIONS
DBA_CUBE_DIM_LEVELS
DBA_CUBE_DIM_MODELS
DBA_CUBE_DIM_VIEWS
DBA_CUBE_DIM_VIEW_COLUMNS

TABLE_NAME
------------------------------
DBA_CUBE_HIERARCHIES
DBA_CUBE_HIER_LEVELS
DBA_CUBE_HIER_VIEWS
DBA_CUBE_HIER_VIEW_COLUMNS
DBA_CUBE_MEASURES
DBA_CUBE_VIEWS
DBA_CUBE_VIEW_COLUMNS
...

=> Checking the existing dimensions in SALESTRACK analytic workspace 
SQL> select * from dba_cube_dimensions;

OWNER    DIMENSION_NA DIMENSION_TYPE   AW_NAME      DEFAULT_HIERARCHY_NAME       DESCRIPTION
---------- ------------ ----------------- ------------ ------------------------------ --------------------
OLAPTRAIN  PRODUCT2 STANDARD   SALESTRACK   STANDARD         product2
OLAPTRAIN  CHANNEL STANDARD   SALESTRACK   SALES_CHANNEL        Channel
OLAPTRAIN  GEOGRAPHY STANDARD   SALESTRACK   REGIONAL         Geography
OLAPTRAIN  PRODUCT STANDARD   SALESTRACK   STANDARD         Product
OLAPTRAIN  TIME  TIME    SALESTRACK   CALENDAR         Time

You can use SQL Developer to:
  • check existing hierachies, levels and order within hierarchies in the 'PRODUCT2' dimension.


  • List existing cubes, measures within the cube, whether a measure is derived or stored and the equation that generates that measure:



Cube Views
  • OLAP Cubes are exposed through views on a star schema.
  • As the structure of dimensions and cubes are updated,views are automatically maintained
  • If you add new measures, the cube view will be updated
  • Views are always in sync with the underlying dimensions and cubes
  • A single cube view exposes (as columns) all the aggregations and all the calculations for a cube
SQL> select * from dba_cube_views;

OWNER    CUBE_NAME    VIEW_NAME
---------- --------------- -------------------
OLAPTRAIN  FORECAST    FORECAST_VIEW
OLAPTRAIN  SALES_CUBE    SALES_CUBE_VIEW
OLAPTRAIN  NEW_CUBE    NEW_CUBE_VIEW

SQL> select owner, cube_name, view_name, column_name from dba_cube_view_columns;

OWNER    CUBE_NAME    VIEW_NAME     COLUMN_NAME
---------- --------------- ------------------------------ ------------------------------
OLAPTRAIN  FORECAST    FORECAST_VIEW    BEST_FIT
OLAPTRAIN  FORECAST    FORECAST_VIEW    LINEAR_REGRESSION
OLAPTRAIN  SALES_CUBE    SALES_CUBE_VIEW    SALES
OLAPTRAIN  SALES_CUBE    SALES_CUBE_VIEW    QUANTITY
OLAPTRAIN  SALES_CUBE    SALES_CUBE_VIEW    SALES_YTD
OLAPTRAIN  SALES_CUBE    SALES_CUBE_VIEW    SALES_YTD_PY
...

=> The Cube's SQL interface is also available in SQL Developer: