Data Warehouse with Oracle Data Integrator (ODI)





Building a Data Warehouse: typical steps and milestones

  • Research and specify business needs (Key Indicators)
  • Identify data sources relevant to generate key indicators
  • Define business rules to transform source information into key indicators
  • Model the data structure of the target warehouse to store the key indicators
  • Populate the indicators by implementing business rules
  • Measure the overall accuracy of the data by setting up data quality rules
  • Develop reports on key indicators
  • Make key indicators and metadata available to business users through ad-hoc query tools or predefined reports
  • Measure business users’ satisfaction and add/modify key indicators


Using Oracle Data Integrator (ODI) in a Data Warehouse project: actors involved and some assigned tasks



Business User
  • Access the final calculated key indicators
  • Use reports and ad-hoc queries
  • May need to understand the definition of an indicator
  • May need to be aware of data quality issues
    • when was the last time the table was updated?
    • How many records were added, update, removed in the table?
    • What are the rules that calculate a particular indicator?
    • Where does the data come from, and how is it transformed?

Business Analyst
  • Define the key indicators
  • Identify the source applications
    • How many different source applications need to be considered?
    • Is the data needed for key indicators available in the selected pool of source applications?
    • What data quality issues are present in the source systems?
  • Specify business rules to transform source data into meaningful target indicators
    • Projects can use the ODI Designer to directly specify the business rules
    • For each target table, specify also:
      • Target datastore - name the target datastore
      • Description of transformation - describe its purpose
      • Integration strategy - How data should be written to target (replace table, append, update, etc). Each strategy specified will correspond to an ODI Integration Knowledge Module.
      • Refresh frequency
      • Dependencies - what datastores need to be loaded or jobs executed prior to this one
      • Source datastores - source databases, applications, etc used
      • Source ODI Model -
      • Datastore name, Fiel Mappings and transformations, Links or Join criteria, Filters, Data Quality requirements, constraint names and expressions, etc
  • Maintain translation data from operational semantics to the Data Warehouse semantic

Developer
  • Implement the business rules as specified by business analysts
  • Provide executable scenarios to the production team
  • Must understand infrastructure details and have business knowledge of source applications

Metadata Administrator
  • Reverse engineer source and target applications
    • Understand content and structure of source applications
    • Connect to source applications and capture their metadata
    • Define data quality business rules (specified by Business Analyst) in ODI repository
      • What level of Data Quality is required?
      • who are the business owners of source data?
      • What should be done with rejected records?
      • There should be an error recycling strategy?
      • How would business users modify erroneous source data?
      • Should a GUI be provided for source data correction?
  • Guarantee the overall consistency of Metadata in the various environments (dev, test, prod: repository)
  • Participate in the data modeling of key indicators
  • Add comments, descriptions and integrity rules (PK, FK, Check, etc)in the metadata
  • Provide version management

Database Administrator
  • Define technical database structure supporting the various layers in the data warehouse project (and ODI structure)
  • Create database profiles needed by ODI
  • Create schemas and databases for the staging areas
    • Describe columns inside the data dictionary of database - COMMENT ON TABLE/COLUMN
    • Avoid using PKs of source systems as PKs in target tables. Use counter or identity columns instead.
    • Design Referential Integrity and reverse engineerd FKs in ODI models.
    • Do not implement RIs on target database (for performance). Data quality control should guarantee data integrity.
    • Standardize obejct naming conventions
  • Distributed and maintain the descriptions of the environments (topology)

System Admin
  • Maintain technical resources for the Data Warehouse project
  • May install and monitor schedule agents
  • May backup and restore repositories
  • Install and monitor ODI console
  • Setup the various enviroments (dev, test, prod)

Security Admin
  • Define the security policy for the ODI Repository
  • Creates ODI users and grant rights on models, projects and contexts

Operator
  • Import released and tested scenarios into production environment
  • Schedule the execution of production scenarios
  • Monitor execution logs and restart failed sessions




Conceptual Architecture of an ODI solution

  • The ODI Repository is the central component.
  • ODI Repository stores configuration information about
    • IT infrastructure and topology
    • metadata of all applications
    • projects
    • Interfaces
    • Models
    • Packages
    • Scenarios
    • Solutions - versioning control
  • You can have various repositories within an IT infrastructure, linked ti separated environments that exchange metadata and scenarios (i.e. development, test, user acceptance test, production, hot fix)

Example: two environments, components, actors and tasks


(source: Oracle 2008)



Suggested environment configuration for a Data Warehouse project with ODI:


(1) A single master repository:
  • holds all the topology and security information.
  • All the work repositories are registered here.
  • Contains all the versions of objects that are committed by the designers.

(2) “Development” work repository:
  • Shared by all ODI designers
  • Holds all the projects and models under development.

(3) “Testing” work repository
  • shared by the IT testing team.
  • Contains all the projects and models being tested for future release.

(4) “User Acceptance Tests” work repository:
  • shared by the IT testing team and business analysts.
  • Contains all the projects and models about to be released.
  • Business analysts will use the ODI Console on top of this repository to validate the scenarios and transformations before releasing them to production.

(5) “Production” work repository
  • shared by the production team, the operators and the business analysts.
  • Contains all the projects and models in read-only mode for metadata lineage, as well as all the released scenarios.

(6) “Hot fix” work repository:
  • shared by the maintenance team and the development team.
  • Usually empty, but whenever a critical error happens in production, the maintenance team restores the corresponding projects and models in this repository and performs the corrections with the help of the development team.
  • Once the problems are solved, the scenarios are released directly to the production repository and the new models and projects are versioned in the master repository.



Typical environment (I):


(source: Oracle 2010)



Typical environment (II): Separate Master Repository for Production


(source: Oracle 2010)

1 comment:

  1. Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing.
    Regards,
    Informatica training institute in Chennai|Best Informatica Training Institute In Chennai

    ReplyDelete