(coffee break)
I have admired Roman Loranc's photographs for a long time.
Born in Poland, Loranc moved to California years ago and has been making incredible photographs.
Check his website and the interview with him below
(coffee break)
I have admired Roman Loranc's photographs for a long time.
Born in Poland, Loranc moved to California years ago and has been making incredible photographs.
Check his website and the interview with him below
(a) Key Lookup Scenario
(b) Business Rule Violation Scenario
(c) Data error scenario
(d) Pivoting Scenario
(a) key Lookup Scenario
- You need to load sales transaction data into a retail Data Warehouse
- In the operational system, each product is identified by Uniform Price Codes (UPCs);
- In the Data Warehouse, product is identified by product_id.
- You need to
(1) Map UPC --> PRODUCT_ID; and
(2)As you load the source data, lookup the map table and transform the data accordingly
(b) Business Rule Violation Scenario
- Some UPC Codes are invalid
- In which ways can you implement the transformation and handle the invalid UPCs?
- i.e: new sales data might not have valid UPC codes
(1) useCreate Table As Select (CTAS)
to identify and log invalid UPCs
(2) use an outer join (+) and convert invalid UPCs to NULL
(3) use multi-table INSERT and separate the invalid UPCs to a different table
(1) use CTAS to identify and log invalid UPCs
(2) use an outer join (+) and convert invalid UPCs to NULL
(3)use multi-table INSERT and separate the invalid UPCs to a different table
(c) Data Error Scenarios
- If the quality of the data is unknown, you may face unexpected errors, such as a a failed datatype conversion, or a constraint violation.
- Such errors can happen independently of whether the UPC for the particular record is or not valid.
- Here you can use a DML error logging table to capture unexpected errors.
Challenges to data integration:
- Terabytes to Pentabytes of data
- Fenomenal data explosion
- DI/ETL market growing at 15%
- Most enterprises looking to moving away from scripts to solutions
- Demand to near real-time/real-time data integration
- Explosion in the number of sources: from 10s to 1000s
Use of ETL tools: (Oct 2011 Forrester survey)
- 90% of enterprise still make use of batch data loading into DW or BI platform
- 50% performing real-time/near real-time integration
- 47% performing batch database migrations
New data integration requirements:
- support for all types of data (structured, semi-structured, unstructured)
- Support for Big Data (integrate with Hadoop)
Top three benefits found of using Oracle Data Integrator
( source: here )
- improvement in project completion (resource management, ability to reuse codes, reduce manual hand-coding, reducing error and troubleshooting)
- shorter reporting cycle
- Deferred hardware upgrade
Designer module |
|
Operator module |
|
Topology module |
|
Security module |
|
(a) retrieves the code stored in the ODI repository,
(b) connects to the various source and target systems and
(c) orchestrates the overall data integration process
(d) when execution is completed, the agent updates the execution logs in the repository and reports error messages and executions statistics. These can be reviewed in the Operator navigator at ODI Studio, or through the web interfaces.
Two types of Agents:
- Standalone Agent - can be installed on the source or target systems. Requires a JVM.
- Java EE Agent - Deployed on Oracle WebLogic Server.
- The ODI Repository consists of a Master Repository and one or more Work repositories.
- Repositories are sets of tables that can be stored in RDBMS (i.e. Oracle, SQL Server, DB2, etc).
- All objects used by the various ODI modules are stored in the repositories
- The Master Repository contains
- Security information (user profiles and privileges)
- topology information
- source code for all version of all ODI objects
- Several work repositories can coexist in the same installation (development, quality control, production)
Work Repositories store information for
- Models (i.e. metadata) - datastores, columns, data integrity constraints, cross references, data lineage and impact analysis
- Projects - interfaces, packages, procedures, folders, knowlege models, variables
- Runtime information - scenarios, load plans, scheduling information and logs