Skip to main content


Solution : PyMSSql - import _mssql - ImportError: DLL load failed: The specific module could not be found

When you install pymssql module using pip in python and then try to import pymssql, you may end up with “PyMSSql - import _mssql - ImportError: DLL load failed: The specific module could not be found” error, if you are using pymssql version 2.1 or greater.
This is due to security reason, pymssql now is not linked with SSL and FreeTDS, unlike previous version where SSL and FreeTDS were jointly linked with pymssql during installation. This information can be found on pymssql site found
In order to overcome, we need to install supporting components FreeTDS and OpenSSL independently and then pymssql will work without any issue.
Below are the steps to download and configure FreeTDS and OpenSSL.
FreeTDS can be downloaded And extract the file which is download. Now place the extract folder where your python module is installed.  (Can be kept anywhere but to avoid accident deletion of the folder, I prefer h…
Recent posts

HBase Error : zookeeper.znode.parent mismatch

The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
If you come across this error, while starting Hbase, then, check Hbase-Site.XML file (For me it was in /usr/lib/hbase-0.96.2-hadoop2/conf folder).
Check if Znode is present else add this property to existing XML node.
<configuration>         <property>                 <name>zookeeper.znode.parent</name>                 <value>/hbase-unsecure</value>         </property> </configuration>
This is done for Stand-Alone mode. I am not sure how it is done for clusters.

SSIS: The Value Was Too Large To Fit In The Output Column

I had a SSIS package where I was calling a stored procedure in OLEDB Source and it was returning a “The Value Was Too Large to Fit in the Output Column” error.
Well, My Datatype in OLEDB source was matching with my OLEDB Destination table. However, when I googled, we got solutions like to increase the output of OLEDB Source using Advanced Editor option. I was not at all comfortable with their solution as my source, destination and my intermediate transformation all are having same length and data type and I don’t want to change.

Then I found that I was missing SET NOCOUNT ONoption was missing in Stored Procedure. Once I added it, my data flow task ran successfully.

BIG Data, Hadoop – Chapter 4 - Hadoop Daemons

The back end components of Hadoop system can be visualized as shown below.

Name Node and Data Node will be explained in detail in my next blog.

All these Daemons are nothing but a piece of code. Java code is running at the background. In order to run Java Code, we need JVM, So each daemon service need some JVM service to run.

Job Tracker- Any operation can be considered as a Job, example Read a text file is a job. This is handled by Job Tracker.

Task tracker- A job can have many tasks. Like connection to file is one of the task, Reading the data is other task, displaying/processing the data is another task. These are managed by Task Tracker.

BIG Data, Hadoop – Chapter 3 - Hadoop Eco Systems

Pictorial Representation of Hadoop Eco Systems is as shown below.

YARN system are not present in first generation of Hadoop development. (Hadoop 1.x versions).

Remember, we do not have Yarn Cluster Resource Management System in Hadoop 1.x version which was a disadvantage as any other operations on HDFS, has to be converted to MR code (Map-Reduce Algorithm) and then it use to process the data.

With help of YARN (Yet Another Resource Negotiator) in place, we can process HDFS files directly without converting it to into MR code, with the help of some additional languages such as Spark, Giraffe etc.,

BIG Data, Hadoop – Chapter 2 - Data Life Cycle

Data Life CycleThe data life cycle is pictorial defined as show below:

As we see, in our current system, we capture/ Extract our data, then we store it and later we process for reporting and analytics.
But in case of big data, the problem lies in storing and then processing it faster. Hence Hadoop takes this portion, where it stores the data in effective format (Hadoop distributed File System) and also process using its engine (Map Reduce Engine).

Since Map Reduce engine or Hadoop engine need data on HDFS format to process, We have favorable tools available in market to do this operation.
As an example, Scoop is a tool which converts RDBMS to HDFS. Likewise we have SAP BOD to convert sap system data to HDFS.

BIG Data, Hadoop – Chapter 1 - Understanding Big Data & Hadoop

Understanding Big Data
We all in recent time, came across the word ‘Big Data’. So the question is what exactly is Big Data? How much TB or GB or data is called a Big Data?
Well, there is no standard size definition for Big Data. If current system when not able to handle the data, then, we call such data as Big Data. (Big Data is just a terminology used in IT)
As an example, if I take a text file of 50 GB, Processing a text file of 50 GB size on our Laptop or computer is not a huge task but if we take a smart phone, processing 10 GB of data is huge task. That means, for mobile phone, that 50 GB of data is Big Data.
Understanding HadoopOur current systems such as ETL tools, reporting tools, programming environment all have capability of handling few petabyte of Data.
And the growth of data annually is shown below in chart

And also the growth of unstructured, Semi structured data are increasingly every day.

So there is a need of more advanced tool which are capable of processing and storing th…