
Building examples of using SQL functions supported by Oracle
Check the post here (in construction)
Consider the example below:
- It is a key technique in DWH: allow you to transform multiple rows of input into fewer and generally wider rows
- Allow to transform the presentation of the data, rotating rows into columns and presenting data in a crosstabular format, or vice-versa.
- When pivoting, an aggregation operator is applied for each item in the pivot column value list
- Use the
pivot_clause
of theSELECT
statement to write crosstabulation queries that rotate rows into columns, aggregating data in the process of the rotation.
- Pivoting: Transactional format <=> Crosstabular (relational) format
- The view SALES_VIEW displays data in a transactional format
- It lists amount and quantity sold by product, country, quarter and channel.
- In this case, a product may be sold through various channels in a given country and a given quarter of year.
- If you need to see the sales by channel_id on a relational or crosstabular format, you can use a pivoting operation:
SQL> CREATE VIEW sales_view AS SELECT prod_name product, country_name country, channel_id channel, SUBSTR(calendar_quarter_desc, 6,2) quarter, SUM(amount_sold) amount_sold, SUM(quantity_sold) quantity_sold FROM sales, times, customers, countries, products WHERE sales.time_id = times.time_id AND sales.prod_id = products.prod_id AND sales.cust_id = customers.cust_id AND customers.country_id = countries.country_id GROUP BY prod_name, country_name, channel_id, SUBSTR(calendar_quarter_desc, 6, 2);
- Using the PIVOT clause you can reconstruct the data on a cross tabular form:
SQL> SELECT * FROM (SELECT product, channel, amount_sold FROM sales_view WHERE country = 'United States of America' ) S PIVOT (SUM(amount_sold) FOR CHANNEL IN (3 AS DIRECT_SALES, 4 AS INTERNET_SALES, 5 AS CATALOG_SALES, 9 AS TELESALES)) ORDER BY product; PRODUCT DIRECT_SALES INTERNET_SALES CATALOG_SALES TELESALES ------------------------- ---------------------- ---------------------- ---------------------- ------------------ ... Home Theatre Package... 2052150.1 310431.22 Internal 6X CD-ROM 135284.16 15839.73 Internal 8X CD-ROM 158157.23 24398.22 Keyboard Wrist Rest 110554 22477.41 1354.87 ... 71 rows selected
Declarative Design
- Declarative Design is a design method that focuses on “What” to do (the Declarative Rules) rather than “How” to do it (the Process).
- Declarative Design in ODI uses the well known relational paradigm to declare in the form of an Interface the declarative rules for a data integration task, which includes designation of sources, targets, and transformations.
- Declarative rules often apply to metadata to transform data and are usually described in natural language by business users.
- In a typical data integration project (such as a Data Warehouse project), these rules are defined during the specification phase in documents written by business analysts in conjunction with project managers.
- They can very often be implemented using SQL expressions, provided that the metadata they refer to is known and qualified in a metadata repository.
- Types of declarative rules: Mappings, Joins, Filters and Constraints
The Package above (Load Customers) specifies a workflow that loads a sequence of tables:
(1) Initialization step: Execute procedureInitialization
(type Procedure)
(2) Execute interface Load Countries (type: Flow(interface)), linked object: TRG_COUNTRY
(3) Execute interface Load Regions (type: Flow(interface)), linked object: TRG_REGION
(4) Execute interface Load Cities (type: Flow(interface)), linked object: TRG_CITY
(5) refresh variable "CUST_FLAG". Load Customers? (type: Evaluate Variable), linked objects: CUST_FLAG
IF CUST_FLAG = 1 (true) THEN Load Customers, ELSE Final Step
(6) Final Step (type procedure)
(7) If any of the steps above fails,the Package runs the "Send Admin email" step that sends an email using an ODI tool.
Interface
It is set of declarative rules that describe the loading of a datastore or a temporary target structure from one or more source datastores.
An interface is a reusable dataflow.
Components of an Integration Interface:
- Target Datastore - element that will be loaded by the interface.
- May be permanent (defined in a model) or temporary (created by the interface)
- Datasets - One target is loaded with data coming from various datasets
- each dataset corresponds to one diagram of source datastores and the mappings used to load the target datastore from these source datastores.
- Diagram of Source Datastores
- Mapping - defines the transformations performed on one or various source coluns to load one target column
- Staging area
- Flow
- Control
In the example above,
- The interface defines a set of rules to load the target
TRG_CUSTOMER
Datastore with data coming from three source datastores (SRC_AGEGROUP, SRC_CUSTOMER, SRC_SALES_PERSON
)- It defines a Join between
SRC_CUSTOMER
andSRC_SALES_PERSON
- it defines a Lookup:
CUST.AGE BETWEEN AGE_GROUP_AGE_MIN and AGE_GROUP.AGE_MAX
- Define Mappings between the attributes in the sources datastores and attributes in the target datastore
LookupJoinMapping
Interface dataflow
- Automatically generated by ODI from the declarative rules in the interface
- Based on the configuration files and context (target, sources, where is the staging area, etc)
- ODI automatically computes the flow depending on the configuration in the interface's diagram.
- It proposes default KMs for the data flow.
- The Flow tab enables you to view the data flow and select the KMs used to load and integrate data.
In the flow, the following items appear:
Source Sets:
DataSets:
- Source Datastores that are within the same dataset, located on the same physical data server and which are joined with Joins located on the Source are grouped in a single source set in the flow diagram.
- A source set represents a group of datastores that can be extracted at the same time.
Staging Area:
- Datasets appear as yellow boxes in the Staging Area.
Target:
- Appears as a box that includes the different datasets, the target (if located on the same data server), and possibly some of the sources (if located on
the same data server)
- Appears as a separate box if it is located in a different schema from the staging area
You use the following KMs in the flow:
- Loading KM (LKM):
- Define how data is moved.
- One LKM is selected for each Source Set for moving data from the sources to the staging area.
- It can be also selected to move data from the Staging Area - when different from the Target - to the Target, when a
single technology IKM is selected for the Staging Area.- Integration KM (IKM): They define how data is integrated into the target. One IKM is typically
selected on the Target. When the staging area is different from the target, the
selected IKM can be a multi-technology IKM that moves and integrates data from
the Staging Area into the Target.
Procedure
A Procedure is a reusable component that groups a sequence of operations that do not fit in the interface concept.
Examples of procedures:
■ wait and unzip a file
■ send a batch of files via FTP
■ receive emails
■ purge a database