Skip to main content

Posts

Showing posts from April, 2012

Facts and Fact table- Data warehouse fundamentals- part 5

Types of fact table Fact tables are of two types CFT (Cumulative fact table) SFT (Snap Shot fact table) Cumulative fact table If we are loading the values into fact table based on time, then that is called cumulative fact table. Snap Shot fact table If we are loading the values based on client requirement, then that is called snap shot fact tables. Types of facts Fact : Fact is a numeric value, based on that numeric value; we are going to analyze data. There are three types of facts are there. They are as follows. Additive fact Semi additive fact Non additive fact Additive fact: - If fact values are coming from all dimensional tables, then such a fact is called additive fact. Semi additive fact: - If fact values are coming from few dimension tables, then that is called semi additive fact. Example: Transactions at bank (say if I drop deposit some money in my mom’s account, she only gets to know that amount is credited. She won’t know

Schema-Data warehouse fundamental part 4

SCHEMA- Collection of fact table and dimensional table. There are 3 types of schemes. Star schema Snowflakes schema Integrated schema/galaxy schema/fact constellation STAR SCHEMA Star Schema, every dimension table contains one primary key. Dimensional table contains de-normalized values. Fact table contains foreign keys of dimensional tables The relationship between fact and dimension table is foreign key and primary key Every fact table contains one surrogate key (Surrogate key is system generated key) Surrogate key is system generated key and it contains numerical values. Fact table contains normalized data. Fact table contains numeric’s or measures or facts or KPI (Key performance Indicators) In Fact table, we are storing values at lowest level and this is known as “Fact granularity” or “Grain of Fact” (Data’s are stored in –years-months-weeks-days. Data’s saved in daily basis gives more information when compared with data stored in weekly or monthly

Data Warehouse fundamentals-Part 3

OLTP application oriented detailed accurate, as of the moment of access serves the clerical community can be updated requirements for processing understood before initial development compatible with the Software Development Life Cycle performance sensitive accessed a unit at a time transaction driven control of update a major concern in terms of ownership high availability managed in its entirety non redundancy static structure; variable contents small amount of data used in a process OLAP subject oriented summarized, otherwise refined represents values over time, snapshots serves the managerial community is not updated requirements for processing not completely understood before development completely different life cycle performance relaxed accessed a set at a time analysis driven control of update no issue relaxed availability managed by subsets redundancy flexible structure large amount of data used in a process OLTP OLAP

Approaches-Data warehouse fundamentals-part 2

There are two ways of approaches to Data Warehouse. Top Down approach or Inmon Approach Bottom Down approach or Drill Up Approach Or kimball approach INMON APPROACH (TOP DOWN APPROACH OR DRILL DOWN APPROACH) First build data warehouse and then load data marts Faster approach Bill Inmon definition on data warehouse is – “It is a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions”. Data mart are dependent on DWH RALPH KINBALL APPROACH (BOTTOM UP APPROACH OR DRILL UP APPROACH) Ralph Kimball definition on data warehouse is – “Data warehouse is the conglomerate (Composed of heterogeneous materials or elements) of all data marts within the enterprise. Information is always stored in the dimensional model”. First build Data Mart and then build data warehouse. Data marts are independent. Slow Approach

Data Warehouse Fundamentals-Part 1

Database - Database is nothing but collection of data. Data warehouse is a RDBMS which is used for taking decision about profits, losses etc which contains previous and current data. A Data Warehouse (DW) is simply a consolidation of data from a variety of sources that is designed to support strategic and tactical decision making.  Its main purpose is to provide a coherent picture of the business at a point in time.   Business Intelligence refers to a set of methods and techniques that are used by organizations for tactical and strategic decision making. It leverages technologies that focus on counts, statistics and business objectives to improve business performance. BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text minin

AGGREGATOR TRANSFORMATION

LET’S PLAY WITH TRANSFORMATION As mentioned earlier, Transformations are business logics which are used to make inconsistent data to consistent format . 1) Aggregator Transformation It is used to do some aggregation concepts like sum, average, count, count distinct (count distinct and Non null column). This transformation does not generate an error output . Options 1)       Advanced / Basic Display or hide options to configure multiple aggregations for multiple outputs. By default, the advanced options are hidden. 2)       Aggregation Name In the Advanced display, type a friendly name for the aggregation. 3)       Group By Columns In the Advanced display, select columns for grouping by using the Available Input Columns list as described below. 4)       Key Scale In the Advanced display, optionally specify the approximate number of keys that the aggregation can write. By default, the value of this option is Unspecified . If both the Key Scale and Keys