dbms-notes: writing blocks to disk

Data Warehouse: Four (4) typical transformation and loading scenarios

(a) Key Lookup Scenario
(b) Business Rule Violation Scenario
(c) Data error scenario
(d) Pivoting Scenario

(a) key Lookup Scenario
You need to load sales transaction data into a retail Data Warehouse

In the operational system, each product is identified by Uniform Price Codes (UPCs);

In the Data Warehouse, product is identified by product_id.

You need to

(1) Map UPC --> PRODUCT_ID; and
(2)As you load the source data, lookup the map table and transform the data accordingly

(b) Business Rule Violation Scenario

Some UPC Codes are invalid

In which ways can you implement the transformation and handle the invalid UPCs?

i.e: new sales data might not have valid UPC codes

(1) use Create Table As Select (CTAS) to identify and log invalid UPCs
(2) use an outer join (+) and convert invalid UPCs to NULL
(3) use multi-table INSERT and separate the invalid UPCs to a different table

(1) use CTAS to identify and log invalid UPCs

(2) use an outer join (+) and convert invalid UPCs to NULL

(3)use multi-table INSERT and separate the invalid UPCs to a different table

(c) Data Error Scenarios

If the quality of the data is unknown, you may face unexpected errors, such as a a failed datatype conversion, or a constraint violation.

Such errors can happen independently of whether the UPC for the particular record is or not valid.

Here you can use a DML error logging table to capture unexpected errors.

The need for automated integration solutions

Challenges to data integration:

Terabytes to Pentabytes of data

Fenomenal data explosion

DI/ETL market growing at 15%

Most enterprises looking to moving away from scripts to solutions

Demand to near real-time/real-time data integration

Explosion in the number of sources: from 10s to 1000s

Use of ETL tools: (Oct 2011 Forrester survey)

90% of enterprise still make use of batch data loading into DW or BI platform

50% performing real-time/near real-time integration

47% performing batch database migrations

New data integration requirements:

support for all types of data (structured, semi-structured, unstructured)

Support for Big Data (integrate with Hadoop)

Top three benefits found of using Oracle Data Integrator

improvement in project completion (resource management, ability to reuse codes, reduce manual hand-coding, reducing error and troubleshooting)

shorter reporting cycle

Deferred hardware upgrade

( source: here )

About Oracle Data Integrator

As with the Oracle Warehouse Builder (OWB), ODI uses the database (ISO-92 RDBMS) as the ETL engine
ODI supports ISO-92 RDBMS (Oracle, DB2, SQL Server, etc)
Supports batch, event-based and real-time integration
Extensible through Knowledge Modules
ODI Engine runs directly on the source and target
ODI strong on heterogeneity of application sources
ODI includes application adapters for various application sources

Benefits and improvements:

Simpliies creation of data flows through declarative methodology, reducing the number of steps
Unifies integration tooling across unstructured, semi-structured and structured data
Extends functionality via Knowledge Modules for integrating to applications out-of-the box, providing new customizations
integrates management using OEM to simplify operations across all middleware, applications, database

Using ODI: Software Architecture

ODI architecture is build around several components:

ODI Studio - Graphical client
Agents - Java modules that run in source, target, or WebLogic server
ODI Console - Web console runs on WebLogic server
Extension for the Fusion Middleware Control Console
Repositories - Master and Work repositories

Designer module	Defines declarative rules for data transformation and data integrity. All project development occurs here Database and applications metadata imported and defined here Use metadata and rules to generate data integration scenarios or load plans for production Core module for developers and metadata administrators
Operator module	Monitors data integration processes in production Display execution logs (error counts, number of processed rows, execution statistics, executed code, etc) Can be used for debugging on design time
Topology module	defines physical and logical architecture of the infrastructure Infrastructure or project administrators register servers, database schemas, database catalogs and agents in the master repository
Security module	Manages user profiles and privileges Assign access authorization to objects and features

The Agent coordinates the execution of the ODI scenarios. It

(a) retrieves the code stored in the ODI repository,
(b) connects to the various source and target systems and
(c) orchestrates the overall data integration process
(d) when execution is completed, the agent updates the execution logs in the repository and reports error messages and executions statistics. These can be reviewed in the Operator navigator at ODI Studio, or through the web interfaces.

Two types of Agents:

Standalone Agent - can be installed on the source or target systems. Requires a JVM.

Java EE Agent - Deployed on Oracle WebLogic Server.

The ODI Repository consists of a Master Repository and one or more Work repositories.

Repositories are sets of tables that can be stored in RDBMS (i.e. Oracle, SQL Server, DB2, etc).

All objects used by the various ODI modules are stored in the repositories

The Master Repository contains

Security information (user profiles and privileges)

topology information

source code for all version of all ODI objects

Several work repositories can coexist in the same installation (development, quality control, production)

Work Repositories store information for

Models (i.e. metadata) - datastores, columns, data integrity constraints, cross references, data lineage and impact analysis

Projects - interfaces, packages, procedures, folders, knowlege models, variables

Runtime information - scenarios, load plans, scheduling information and logs

Pages

Data Warehouse: Concepts on ETL Processes (Transforming)

Data Warehouse with Oracle Data Integrator (ODI) (II)

Big Data, Deep Data, and who needs apples?