MSBI

Posts

SCALA: Function to load the Data From Data Frame to CSV File using Scala

Hi All, Below is the code which can be created as csv.scala file and can be called inside your MAIN Scala function to push the data from Data Frame to CSV files. import au.com.bytecode.opencsv.CSVWriter import java.time.format.DateTimeFormatter import org.apache.commons.io.FilenameUtils import java.io.FileWriter object CSVExport { def exportCSVFile (oracleUser: String,oraclePassword: String,oracleURL: String,oracleOutPutFilePath:String,oracleOutPutFileDateFormat:String,oracleQueryFilePath:String): Unit = { /* Read date part from the date pattern*/ val dateNow: String = DateTimeFormatter.ofPattern(oracleOutPutFileDateFormat).format(java.time.LocalDate.now) /* Reconstruct the new file name*/ val basename = FilenameUtils.getBaseName(oracleOutPutFilePath) val extension = FilenameUtils....

Scala: Function to create Oracle Connection

Hi All, We can use, below code to create a function in SCALA to connect to Oracle Database and call this function on your main SCALA by passing the parameter. Please be sure to install the necessary drivers - oracle JDBC pool drivers def connJdbc (oracleUser: String,oraclePassword: String,oracleURL: String): java.sql.Connection = { val ods = new OracleDataSource() ods.setUser(oracleUser) ods.setURL(oracleURL) ods.setPassword(oraclePassword) ods.getConnection() } }

Solution : PyMSSql - import _mssql - ImportError: DLL load failed: The specific module could not be found

When you install pymssql module using pip in python and then try to import pymssql, you may end up with “ PyMSSql - import _mssql - ImportError: DLL load failed: The specific module could not be found ” error, if you are using pymssql version 2.1 or greater. This is due to security reason, pymssql now is not linked with SSL and FreeTDS, unlike previous version where SSL and FreeTDS were jointly linked with pymssql during installation. This information can be found on pymssql site found http://pymssql.org/en/latest/freetds.html#windows In order to overcome, we need to install supporting components FreeTDS and OpenSSL independently and then pymssql will work without any issue. Below are the steps to download and configure FreeTDS and OpenSSL. FreeTDS can be downloaded https://github.com/ramiro/freetds/releases And extract the file which is download. Now place the extract folder where your python module is installed. (Can be kept anywhere but to avoid accide...

HBase Error : zookeeper.znode.parent mismatch

The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. If you come across this error, while starting Hbase, then, check Hbase-Site.XML file (For me it was in /usr/lib/hbase-0.96.2-hadoop2/conf folder). Check if Znode is present else add this property to existing XML node. <configuration> <property> <name>zookeeper.znode.parent</name> <value>/hbase-unsecure</value> </property> </configuration> This is done for Stand-Alone mode. I am not sure how it is done for clusters.

SSIS: The Value Was Too Large To Fit In The Output Column

I had a SSIS package where I was calling a stored procedure in OLEDB Source and it was returning a “The Value Was Too Large to Fit in the Output Column” error. Well, My Datatype in OLEDB source was matching with my OLEDB Destination table. However, when I googled, we got solutions like to increase the output of OLEDB Source using Advanced Editor option . I was not at all comfortable with their solution as my source, destination and my intermediate transformation all are having same length and data type and I don’t want to change. Then I found that I was missing SET NOCOUNT ON option was missing in Stored Procedure. Once I added it, my data flow task ran successfully.

BIG Data, Hadoop – Chapter 4 - Hadoop Daemons

The back end components of Hadoop system can be visualized as shown below. Name Node and Data Node will be explained in detail in my next blog. All these Daemons are nothing but a piece of code. Java code is running at the background. In order to run Java Code, we need JVM, So each daemon service need some JVM service to run. Job Tracker- Any operation can be considered as a Job, example Read a text file is a job. This is handled by Job Tracker. Task tracker- A job can have many tasks. Like connection to file is one of the task, Reading the data is other task, displaying/processing the data is another task. These are managed by Task Tracker.