Skip to main content

Data Warehouse Fundamentals-Part 1


Database- Database is nothing but collection of data.
Data warehouse is a RDBMS which is used for taking decision about profits, losses etc which contains previous and current data.


A Data Warehouse (DW) is simply a consolidation of data from a variety of sources that is designed to support strategic and tactical decision making.  Its main purpose is to provide a coherent picture of the business at a point in time.  


Business Intelligence refers to a set of methods and techniques that are used by organizations for tactical and strategic decision making. It leverages technologies that focus on counts, statistics and business objectives to improve business performance.


BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining and predictive analytics. 

Example: We know KFC as example is operating in almost 100 plus countries. If KFC wants to know to analyze its profits and losses, it needs help of data warehousing.
By using data warehousing, It can come to conclusions like what is been sold more in particular region and based on sale, it can improve and attract customers by introducing different varieties of dishes, as instance if at some region, if students are major customers in a particular region, KFC can improve to attract students more and lessen their concentration on other items they prepare so that profit increases.

The process of Extracting data from different sources like oracle, SQL Server, DB2, IMS, etc and changing those inconsistent data to consistent format, is called ETL Process.














Staging Area:
The staging Area has four main steps:

1) Data Merging: We use join concepts and union operators to join different sources.
2) Data Cleansing: Removing unwanted data using trim ( ), Substring ( ) functions etc.
3) Data Scrubbing: Creating a new structure using old structure.  
Example: If salary column is present in a table, then creating annual salary column by writing queries like ASAL= SAL*12.
Or creating columns like FULL NAME=FIRST NAME+LAST NAME.
4) Data Aggregation: Using functions like SUM ( ), AVG ( ) Etc.

DATA MART: Data Mart is subset of Data Warehouse. It contains only one department information. It is also called as HPQS-High Performance Query Structure.




Comments

  1. Nice Work dude...keep it up!!

    ReplyDelete
  2. The supervising team of the best data warehouse consulting companies has great knowledge due to which I had chosen this blog. The experts program in this article helped me in transforming the large quantity of data efficiently. This blog helped me in assisting the topmost solutions of the proper design scoped and risk management, which get directly linked with the data transformation.

    ReplyDelete

Post a Comment

Popular posts from this blog

BIG Data, Hadoop – Chapter 2 - Data Life Cycle

Data Life Cycle The data life cycle is pictorial defined as show below:     As we see, in our current system, we capture/ Extract our data, then we store it and later we process for reporting and analytics. But in case of big data, the problem lies in storing and then processing it faster. Hence Hadoop takes this portion, where it stores the data in effective format (Hadoop distributed File System) and also process using its engine (Map Reduce Engine). Since Map Reduce engine or Hadoop engine need data on HDFS format to process, We have favorable tools available in market to do this operation. As an example, Scoop is a tool which converts RDBMS to HDFS. Likewise we have SAP BOD to convert sap system data to HDFS.

SSIS: The Value Was Too Large To Fit In The Output Column

I had a SSIS package where I was calling a stored procedure in OLEDB Source and it was returning a “The Value Was Too Large to Fit in the Output Column” error. Well, My Datatype in OLEDB source was matching with my OLEDB Destination table. However, when I googled, we got solutions like to increase the output of OLEDB Source using Advanced Editor option . I was not at all comfortable with their solution as my source, destination and my intermediate transformation all are having same length and data type and I don’t want to change. Then I found that I was missing SET NOCOUNT ON option was missing in Stored Procedure. Once I added it, my data flow task ran successfully. 

How to Copy or Move Multiple Files from One Folder to Another Folder using Talend

Hello all, In this Post, I will explain how to move Multiple Files from One Folder (Say Source) to Other folder (Say Destination). This Post will also helps you to understand How to Declare Variable and Use it. To Declare a variable, We are go to use Contexts option in repository. Lets say we have two .txt files in Path D:/Source/ . My Requirement is to move the files from Source Folder ( D:/Source/ ) to Destination Folder ( D:/Dest/ ). Step 1: Open a New job Step 2: Now right click and Create a New Contexts from Repository. Give some Name and give Next. Step 3: Now Fill in the Source Directory Details where the loop on files should happen as shown in the snippet and give finish. Step 4: Now Context is created and The values will be changing based on each file in Folder. Step 5: Click and Drag the context from Repository to Context Job Window below the Job Designer. Step 6: If we Expand the Contexts, We can find the variable SourcePath is holdi...