Skip to main content

Data Warehouse Fundamentals-Part 1


Database- Database is nothing but collection of data.
Data warehouse is a RDBMS which is used for taking decision about profits, losses etc which contains previous and current data.


A Data Warehouse (DW) is simply a consolidation of data from a variety of sources that is designed to support strategic and tactical decision making.  Its main purpose is to provide a coherent picture of the business at a point in time.  


Business Intelligence refers to a set of methods and techniques that are used by organizations for tactical and strategic decision making. It leverages technologies that focus on counts, statistics and business objectives to improve business performance.


BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining and predictive analytics. 

Example: We know KFC as example is operating in almost 100 plus countries. If KFC wants to know to analyze its profits and losses, it needs help of data warehousing.
By using data warehousing, It can come to conclusions like what is been sold more in particular region and based on sale, it can improve and attract customers by introducing different varieties of dishes, as instance if at some region, if students are major customers in a particular region, KFC can improve to attract students more and lessen their concentration on other items they prepare so that profit increases.

The process of Extracting data from different sources like oracle, SQL Server, DB2, IMS, etc and changing those inconsistent data to consistent format, is called ETL Process.














Staging Area:
The staging Area has four main steps:

1) Data Merging: We use join concepts and union operators to join different sources.
2) Data Cleansing: Removing unwanted data using trim ( ), Substring ( ) functions etc.
3) Data Scrubbing: Creating a new structure using old structure.  
Example: If salary column is present in a table, then creating annual salary column by writing queries like ASAL= SAL*12.
Or creating columns like FULL NAME=FIRST NAME+LAST NAME.
4) Data Aggregation: Using functions like SUM ( ), AVG ( ) Etc.

DATA MART: Data Mart is subset of Data Warehouse. It contains only one department information. It is also called as HPQS-High Performance Query Structure.




Comments

  1. Nice Work dude...keep it up!!

    ReplyDelete
  2. The supervising team of the best data warehouse consulting companies has great knowledge due to which I had chosen this blog. The experts program in this article helped me in transforming the large quantity of data efficiently. This blog helped me in assisting the topmost solutions of the proper design scoped and risk management, which get directly linked with the data transformation.

    ReplyDelete

Post a Comment

Popular posts from this blog

Zip/Unzip multiple files and also include password for zipped file using SSIS

We have many scenario that we need to Zip many files which we come across and then so some operations like either sending it as a email or just moving zipped file to some other destinations etc. But we were using manual method to zip multiple files. In this post, I tried to create a package which will zip multiple files using SSIS. Here for Zipping files purpose, I'm using 7-ZIP which is free software available in google sites. Download files and install onto your system. First let me show how to Zip on file and later I will show how to zip multiple files using SSIS and 7Zip tool. Compressing Single file. Here I'm trying to Zip one single flat file which is of 40MB size. I kept this file in C:\Documents and Settings\\Desktop\test\source folder. Now to compress this file, I will open my SSIS and I'm dragging and dropping EXECUTE PROCESS TASK from Control Flow. Now right click on Execute Process task and go for edit and select Process option. In process tab,

SSIS: The Value Was Too Large To Fit In The Output Column

I had a SSIS package where I was calling a stored procedure in OLEDB Source and it was returning a “The Value Was Too Large to Fit in the Output Column” error. Well, My Datatype in OLEDB source was matching with my OLEDB Destination table. However, when I googled, we got solutions like to increase the output of OLEDB Source using Advanced Editor option . I was not at all comfortable with their solution as my source, destination and my intermediate transformation all are having same length and data type and I don’t want to change. Then I found that I was missing SET NOCOUNT ON option was missing in Stored Procedure. Once I added it, my data flow task ran successfully. 

How to move multiple files in ssis and also rename simultaneously

There are two ways to achieve this. 1) We can move the flat files and then rename it. 2) While moving files itself, automatic rename should be done. We will do the second type. The criteria is to rename the files while moving from source to destination. So for that, we need FILE SYSTEM TASK to be included. Secondly since we need to move many files, we will use FOR EACH LOOP CONTAINER. To fetch all the files, we can use FOR EACH LOOP task in SSIS. In collection tab, we can select FOREACH FILE enumerator option for fetching files and we can change enumerator configuration Folder option: Points to source where we need to fetch files. Files: will give us idea whether we need to fetch all the files (*.*) or if we give extension like *.txt, it is going to fetch only  .txt files . Once I give Source name in FOR EACH LOOP container, It is going to fetch all the files corresponding to that path. Retrieve file name: This option is used to let the variables mentioned in VARIA