Tuesday, November 23, 2010

ETL Process Overview

In Oracle BI Apps architecture we have seen ETL is one of the key component. In this post let us see what exactly ETL does...

Following figure illustrates the ETL Overview


Here in the above figure we can see two stages SDE and SIL..
Let us see the each component of  Source Dependent Extract(SDE) in detail.

Business Component: 
The Business Component presents source data elements in commonly understood business terms. It translates operational data from your sources into meaningful business terms, hiding the complexity of the source systems. The Business Component performs joins in the data source to help extract data efficiently and to facilitate incremental extraction. The Business Component is part of Source Dependent Extract (extract mapping) that enables data extracts into universal staging tables in preparation for the source independent loads. Business components and SDE mappings provide significant out-of-the-box value as described below:


• There are thousands of tables in each of the operational systems. It takes time and effort to figure out which of those tables are required for integrating data from the operational source into the Business Analytics Data Warehouse. This provides significant time and effort savings.

• The mechanism to extract data from various operational systems is different. Some have a proprietary API (e.g. ABAP in SAP R/3); others need access to underlying database tables. Oracle BI Applications’ business component configuration has a uniform architecture that leverages different tools (e.g. Informatica Power Connect for SAP R/3, Informatica Power Connect for PeopleSoft etc) from the data integration platform to build the extraction layer, which hides these complexities.

• The components are architected to extract data from the operational sources in the least intrusive form – e.g, rather than join tables in the operational systems again and again for different data warehouse entities, the extract components have been architected to get the data once, which is then leveraged wherever required. This results in reducing the data warehouse load on the operational systems, freeing up the cycles for supporting operations, which is what they are meant to do.

• Wherever possible, SDE mappings are designed to extract data from the operational sources in an incremental fashion and refresh the enterprise data warehouse. This reduces the load on the operational systems considerably and significantly reduces the time required to refresh the data warehouse.

• The SDE mappings are designed to leverage indexes in the operational systems that can speed up the data extraction.

• SDE mappings have a configuration layer where business rules specific to the implementation can be configured. This avoids spaghetti code and results in a consistent, highly maintainable data integration environment.

• The data integration components are provided with a sample extension column (X_CUSTOM) that can be used as a reference for bringing additional custom fields into the Business Analytics Data Warehouse. Hence the task for extending the data model is reduced to identifying the right source columns, adding them to the data model and connecting it in the extraction layer using the sample column as a reference. This significantly reduces the time and effort required to extend the data model with required fields.

• Adherence to strict naming conventions for the data integration components helps increase understandability and maintainability. The data integration components are organized such that integration tasks can be run in parallel wherever possible, depending on the availability of resources. This can help reduce the time required for extracting data from the operational systems and refreshing the enterprise data warehouse.

Source Adapter:
The Source Adapter Mapplet which is part of SDE (Source Dependent Extract) mapping converts source-specific data elements from the source tables into the source independent data elements required for the Source Independent Load mappings for Data Warehouse code standardization.

Universal Staging Area:
Universal staging tables are used as temporary storage for extracted source data prior to transformation in the Source Independent Load process for loading into the Business Analytics Data Warehouse. The two main purposes of the staging tables are:
• Quick and efficient data extraction: Staging tables allow for faster data extraction, leaving complex transformations for later and allowing for a shorter operational window.
• Flexibility with data load: Staging tables integrate data from different source systems.

For Source Independent Load(SIL) refer to the previous post... 



No comments:

Post a Comment