Skip to main content

Data Warehouse fundamentals-Part 3


OLTP

  • application oriented
  • detailed
  • accurate, as of the moment of access
  • serves the clerical community
  • can be updated
  • requirements for processing understood before initial development
  • compatible with the Software Development Life Cycle
  • performance sensitive
  • accessed a unit at a time
  • transaction driven
  • control of update a major concern in terms of ownership
  • high availability
  • managed in its entirety
  • non redundancy
  • static structure; variable contents
  • small amount of data used in a process


OLAP

  • subject oriented
  • summarized, otherwise refined
  • represents values over time, snapshots
  • serves the managerial community
  • is not updated
  • requirements for processing not completely understood before development
  • completely different life cycle
  • performance relaxed
  • accessed a set at a time
  • analysis driven
  • control of update no issue
  • relaxed availability
  • managed by subsets
  • redundancy
  • flexible structure
  • large amount of data used in a process
OLTP
OLAP
Contains Current Information
Contains previous + current information
Used to Run business
Used to analyze business
Data is updated and deleted
Data are only read
Size is less than 2TB
Size is more than 2TB

OLAP has 4 types
  1. DOLAP (Desktop OLAP): Ex: FoxPro
  2. ROLAP (Relational OLAP): Ex: RDBMS (Tables)-BOXIR2, COGNOS (Analysis)
  3. MOLAP (Multi Dimensional): Ex: Cubes, BOXIR2, COGNOS (Tools)
  4. HOLAP (Hybrid OLAP): ROLAP+MOLAP.
Data Modeling: Designing database/tables.

There are many types of data modeling but we use only 2 types.
  1. Entity relationship (E-R) Model: cannot allow to store duplicates values
  2. Dimensional Model: Allow Duplicate values.

Dimensional modeling has 3 phases.
  1. Conceptual Modeling: We are analyzing number of tables and columns.
  2. Logical modeling: Once tables are created, we are going to relate tables.
  3. Physical modeling: We are moving all schema’s/tables to data warehouse

NOTE:
SCHEMA- Collection of fact table and dimensional table.


Comments

Popular posts from this blog

BIG Data, Hadoop – Chapter 2 - Data Life Cycle

Data Life Cycle The data life cycle is pictorial defined as show below:     As we see, in our current system, we capture/ Extract our data, then we store it and later we process for reporting and analytics. But in case of big data, the problem lies in storing and then processing it faster. Hence Hadoop takes this portion, where it stores the data in effective format (Hadoop distributed File System) and also process using its engine (Map Reduce Engine). Since Map Reduce engine or Hadoop engine need data on HDFS format to process, We have favorable tools available in market to do this operation. As an example, Scoop is a tool which converts RDBMS to HDFS. Likewise we have SAP BOD to convert sap system data to HDFS.

How to Copy or Move Multiple Files from One Folder to Another Folder using Talend

Hello all, In this Post, I will explain how to move Multiple Files from One Folder (Say Source) to Other folder (Say Destination). This Post will also helps you to understand How to Declare Variable and Use it. To Declare a variable, We are go to use Contexts option in repository. Lets say we have two .txt files in Path D:/Source/ . My Requirement is to move the files from Source Folder ( D:/Source/ ) to Destination Folder ( D:/Dest/ ). Step 1: Open a New job Step 2: Now right click and Create a New Contexts from Repository. Give some Name and give Next. Step 3: Now Fill in the Source Directory Details where the loop on files should happen as shown in the snippet and give finish. Step 4: Now Context is created and The values will be changing based on each file in Folder. Step 5: Click and Drag the context from Repository to Context Job Window below the Job Designer. Step 6: If we Expand the Contexts, We can find the variable SourcePath is holdi...

ROW_NUMBER () using SSIS

Hi Everyone, Would like to share the knowledge how to achieve the ROW_NUMBER () Functionality through SSIS. For this, we shall consider an example. The business logic that needed to be followed was that I had to assign a “Twin Code” to each record. This meant that for each family in the database, if two or more members were born on the same day they should be treated as twins. The twins should be assigned a Code enumerating them in order of birth. This can be achieved through SQL by just writing a simple ROW_NUMBER () function. To achieve this same in SSIS, We shall in need of Data Flow task. Connect an OLEDB Source to the Family table. Now, use a Sort transformation which is likely to be used as ORDER BY Statement in our ROW_NUMBER () Function. We are going to sort by FamilyID and DateOfBirth Column. Now pull out a Script Component. Because we need to “Partition By” Family ID and DateOfBirth, We shall include those as an Input in...