Skip to main content

BIG Data, Hadoop – Chapter 1 - Understanding Big Data & Hadoop


Understanding Big Data


We all in recent time, came across the word ‘Big Data’. So the question is what exactly is Big Data? How much TB or GB or data is called a Big Data?

Well, there is no standard size definition for Big Data. If current system when not able to handle the data, then, we call such data as Big Data. (Big Data is just a terminology used in IT)

As an example, if I take a text file of 50 GB, Processing a text file of 50 GB size on our Laptop or computer is not a huge task but if we take a smart phone, processing 10 GB of data is huge task. That means, for mobile phone, that 50 GB of data is Big Data.

Understanding Hadoop

Our current systems such as ETL tools, reporting tools, programming environment all have capability of handling few petabyte of Data.

And the growth of data annually is shown below in chart





And also the growth of unstructured, Semi structured data are increasingly every day.



So there is a need of more advanced tool which are capable of processing and storing these data. The system is also needed to have the four V’s (Velocity, Variety, Volume, and Veracity)

Hadoop is one of such tool which helps us in handing BIG Data both to process and as well as Store it effectively. Hadoop can handle all the data types, such as Images, Videos, Database files, File Systems etc.



Comments

  1. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in MSBI, kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on in MSBI. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Nitesh Kumar
    MaxMunus
    E-mail: nitesh@maxmunus.com
    Skype id: nitesh_maxmunus
    Ph:(+91) 8553912023
    http://www.maxmunus.com/


    ReplyDelete
  2. Thank you your words! I wanted to write lot of my blogs but time is not sufficient. I will try to find some time to share my knowledge.

    -Dhinakaran

    ReplyDelete
  3. Thank you! I would like to share knowledge to help others to understand the concepts in simple terms.

    ReplyDelete


  4. Thank you for sharing the article. The data that you provided in the blog is informative and effective. The information which you have provided is very good. It is very useful who is looking for
    Best Devops Training Institute

    ReplyDelete
  5. Good post and informative. Thank you very much for sharing this good article, it was so good to read and useful to improve my knowledge as updated, keep blogging. Thank you for sharing any good knowledge and thanks for fantastic efforts.
    Salesforce Training in Chennai

    Salesforce Online Training in Chennai

    Salesforce Training in Bangalore

    Salesforce Training in Hyderabad

    Salesforce training in ameerpet

    Salesforce Training in Pune

    Salesforce Online Training

    Salesforce Training

    ReplyDelete
  6. Very nice Blog,keep sharing more information with us.
    thank you.....

    big data hadoop training

    hadoop admin training

    ReplyDelete

Post a Comment

Popular posts from this blog

BIG Data, Hadoop – Chapter 2 - Data Life Cycle

Data Life Cycle The data life cycle is pictorial defined as show below:     As we see, in our current system, we capture/ Extract our data, then we store it and later we process for reporting and analytics. But in case of big data, the problem lies in storing and then processing it faster. Hence Hadoop takes this portion, where it stores the data in effective format (Hadoop distributed File System) and also process using its engine (Map Reduce Engine). Since Map Reduce engine or Hadoop engine need data on HDFS format to process, We have favorable tools available in market to do this operation. As an example, Scoop is a tool which converts RDBMS to HDFS. Likewise we have SAP BOD to convert sap system data to HDFS.

SSIS: The Value Was Too Large To Fit In The Output Column

I had a SSIS package where I was calling a stored procedure in OLEDB Source and it was returning a “The Value Was Too Large to Fit in the Output Column” error. Well, My Datatype in OLEDB source was matching with my OLEDB Destination table. However, when I googled, we got solutions like to increase the output of OLEDB Source using Advanced Editor option . I was not at all comfortable with their solution as my source, destination and my intermediate transformation all are having same length and data type and I don’t want to change. Then I found that I was missing SET NOCOUNT ON option was missing in Stored Procedure. Once I added it, my data flow task ran successfully. 

How to move multiple files in ssis and also rename simultaneously

There are two ways to achieve this. 1) We can move the flat files and then rename it. 2) While moving files itself, automatic rename should be done. We will do the second type. The criteria is to rename the files while moving from source to destination. So for that, we need FILE SYSTEM TASK to be included. Secondly since we need to move many files, we will use FOR EACH LOOP CONTAINER. To fetch all the files, we can use FOR EACH LOOP task in SSIS. In collection tab, we can select FOREACH FILE enumerator option for fetching files and we can change enumerator configuration Folder option: Points to source where we need to fetch files. Files: will give us idea whether we need to fetch all the files (*.*) or if we give extension like *.txt, it is going to fetch only  .txt files . Once I give Source name in FOR EACH LOOP container, It is going to fetch all the files corresponding to that path. Retrieve file name: This option is used to let the variables mentioned in VARIA