Skip to main content

Posts

Showing posts from 2015

BIG Data, Hadoop – Chapter 4 - Hadoop Daemons

The back end components of Hadoop system can be visualized as shown below. Name Node and Data Node will be explained in detail in my next blog. All these Daemons are nothing but a piece of code. Java code is running at the background. In order to run Java Code, we need JVM, So each daemon service need some JVM service to run. Job Tracker- Any operation can be considered as a Job, example Read a text file is a job. This is handled by Job Tracker. Task tracker- A job can have many tasks. Like connection to file is one of the task, Reading the data is other task, displaying/processing the data is another task. These are managed by Task Tracker.

BIG Data, Hadoop – Chapter 3 - Hadoop Eco Systems

Pictorial Representation of Hadoop Eco Systems is as shown below. YARN system are not present in first generation of Hadoop development. (Hadoop 1.x versions). Remember, we do not have Yarn Cluster Resource Management System in Hadoop 1.x version which was a disadvantage as any other operations on HDFS, has to be converted to MR code (Map-Reduce Algorithm) and then it use to process the data. With help of YARN (Yet Another Resource Negotiator) in place, we can process HDFS files directly without converting it to into MR code, with the help of some additional languages such as Spark, Giraffe etc.,

BIG Data, Hadoop – Chapter 2 - Data Life Cycle

Data Life Cycle The data life cycle is pictorial defined as show below:     As we see, in our current system, we capture/ Extract our data, then we store it and later we process for reporting and analytics. But in case of big data, the problem lies in storing and then processing it faster. Hence Hadoop takes this portion, where it stores the data in effective format (Hadoop distributed File System) and also process using its engine (Map Reduce Engine). Since Map Reduce engine or Hadoop engine need data on HDFS format to process, We have favorable tools available in market to do this operation. As an example, Scoop is a tool which converts RDBMS to HDFS. Likewise we have SAP BOD to convert sap system data to HDFS.

BIG Data, Hadoop – Chapter 1 - Understanding Big Data & Hadoop

Understanding Big Data We all in recent time, came across the word ‘Big Data’. So the question is what exactly is Big Data? How much TB or GB or data is called a Big Data? Well, there is no standard size definition for Big Data. If current system when not able to handle the data, then, we call such data as Big Data. (Big Data is just a terminology used in IT) As an example, if I take a text file of 50 GB, Processing a text file of 50 GB size on our Laptop or computer is not a huge task but if we take a smart phone, processing 10 GB of data is huge task. That means, for mobile phone, that 50 GB of data is Big Data. Understanding Hadoop Our current systems such as ETL tools, reporting tools, programming environment all have capability of handling few petabyte of Data. And the growth of data annually is shown below in chart And also the growth of unstructured, Semi structured data are increasingly every day. So there is a need of more adv...

Comma Separated Values and Group the Data- SQL

Hi All, Recently got into a situation of grouping set of data with a comma delimited. Here is the requirement. I have a table like this below,     And the way data should be shown as So let us create a sample data set to achieve this: create table #t1 ( ID INT , Name Varchar ( 10 )) Insert into #t1 Select 1 , 'a' Union Select 1 , 'b' Union Select 1 , 'c' Union Select 2 , 'e' Union Select 2 , 'f' Union Select 2 , 'a' Union Select 2 , 'H' Union Select 3 , 'X' And query for the output SELECT   ID        , STUFF (( SELECT ', ' + CAST ( Name  AS VARCHAR ( 10 ))          FROM #t1          WHERE ID = t . ID          FOR XML PATH ( '' ), TYPE ). value ( '.' , 'NVARCHAR(MAX)' ), 1 , 2 , ' ' ...

SSIS Excel Error: Unexpected error from external database driver () when importing data from excel

Recently I had copied a excel file for analysis from SFTP site to development server from my account and asked my team mate to analyze it, by loading it into table. He logged in to the server and used a simple data flow task in SSIS with source as Excel to load the data. When he was connecting excel using excel connection manager, he had a weird error. Unexpected error from external database driver ()  He was unsure of why he was getting the error when selecting the sheet names. Then when reported, we quickly got into the possible reason of security. As I copied the file from SFTP to Development machine using my account, it had limited the access to other users. When he was trying to load, he was facing the error because it was a read-only file for him. Soon I granted full permission to him on excel security and it worked for him. There are various different solution on internet but initially none worked for us when we were trying to resolve. Hope this so...

SCD using date- Derive Effective From and Effective To Date using SQL

We can able to maintain old historical and current data (SCD) using the date column. Let me show you people how to write a small query to achieve it. For example, Assume we have a column called currency which has currency values of all the countries and your business needs it to keep it uniform, say convert all currency of different countries to USD, so it will be easy for Business judgement and analysis. For this, let me take an example of Yahoo! Finance . Yahoo! Finance API provides us currency conversion of all country based on time. I am not here going to tell how to pull the data from Yahoo! Finance API, But I am using such data which I pulled to explain how SCD can be achieved. Below is a image, which depicts, the data which we got from Yahoo! Finance. We wanted this to get converted as shown below: Where we wanted to derive Effective_To column, which indicates, from one date to other date, the currency was so & so for a country against USD. ...

SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Transaction (Process ID 68) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction."

SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Transaction (Process ID 68) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction." When you receive this error, Please verify if TABLE LOCK option is ticked in OLEDB Destination table. By Checking the TABLE LOCK option, this error will be resolved.

Load JSON Format Data into SQL Server

Hi All, Many situation when we are pulling the data from website or calling the data through an API, we end up with JSON data. If we needed to load JSON Data into Database, we need to break the JSON data into columns. We will see how to break the JSON data using C#. Consider below JSON Data. {     "requestId":"299f#150942d9c47",    "result":[        {           "id":10357,          "leadId":3032,          "activityDate":"2015-06-16T20:20:36Z",          "activityTypeId":1,          "primaryAttributeValueId":26,          "primaryAttributeValue":"eos-celebration",          "attributes":[   ...

Error 2 System.UnauthorizedAccessException: Access to the path '*:\*******\******\******.dtsx' is denied. at System.IO.__Error.WinIOError(Int32 errorCode,String maybeFullPath) at System.IO.File.SetAttributes(String path, FileAttributes fileAttributes) at Microsoft.DataTransformationServices.Project.DataTransformationsProjectBuilder.BuildIncremental(IOutputWindow outputWindow)

Recently, when one of my team mate created a package in the system dedicated to him and later, deployed it into Server. When we tried to run the package from Server, we got this error. All these days, we never had a problem but suddenly we had this issue as Server was recently updated. "Error 2  System.UnauthorizedAccessException:  Access to the path '*:\*******\******\****** .dtsx' is denied.      at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.File.SetAttributes(String path, FileAttributes fileAttributes) at Microsoft.DataTransformationServices.Project.DataTransformationsProjectBuilder.BuildIncremental(IOutputWindow outputWindow)  0 0 " I did a quick research and found out error is due to security limit given to User accessing the server. Solution: Right Click on the Package and Go to Properties. In Properties, Go to Security Tab and Check if the user is given Full Permission....

Classes, Methods, Objects..!!!! How do we relate to real world to understand it

Hello Everyone, Since, I started off my career, I was hearing about classes, Methods, publics, private, objects blab blab and I tried a lot to understand it but somehow I couldn’t get a chance to learn it properly. (I am Electronics and Communication Engineer, LOL) However, now I made one sincere attempt and learnt a bit of it, which I would like to share with you guys who are having trouble in understanding it, as it was with me. Let us get into some story how an organization works and from there, we will try to relate the classes, objects, methods. We all have come across how a Multi-National Company works. Consider for instance, Microsoft Inc. Microsoft is one of the top MNC, anyone would dream to work with. We also know that there are many departments in Microsoft. As example, Data Management team, Licensing Team, Marketing team, Development team, testing team etc. Everyone has their own functionality and work they do is different from each other though all are work...

SQL Server 70-461 Exams fundamental - Part 2 (Tips from Querying from Microsoft SQL Server Training kit by Itzik-Ben-Gan)

 In continuation with my previous blog,  here , we will discuss further more. Table Columns and Rows Usually the table is represented by “Fields” and “Records” which is wrong terminology. Just to improve our standard on SQL, we should refer as “Columns” and “Rows”. “Fields” are interfaces what we have on client applications and “Records” are what we have in files and cursors. Tables are logical and they are represented by Columns and Rows. There is nothing called “NULL Value” as mentioned above, null represents missing value. So we should rather call just NULL instead of saying it “NULL Value”. SQL Engine This is an interesting part of SQL. How the query is processed? Let us answer that in some time. SQL is English like language which uses common terms we use in English but converts into a code later stage. SEQUEL is original format (Structured English Query Language) was coined first but due to trademark dispute with Airline company, it is renamed as ...