Skip to main content

SCALA: Function to load the Data From Data Frame to CSV File using Scala

Hi All, 

 Below is the code which can be created as csv.scala file and can be called inside your MAIN Scala function to push the data from Data Frame to CSV files.


import au.com.bytecode.opencsv.CSVWriter

import java.time.format.DateTimeFormatter

import org.apache.commons.io.FilenameUtils

import java.io.FileWriter

 

object CSVExport {

 

  def exportCSVFile (oracleUser: String,oraclePassword: String,oracleURL: String,oracleOutPutFilePath:String,oracleOutPutFileDateFormat:String,oracleQueryFilePath:String): Unit = {

 

    /* Read date part from the date pattern*/

    val dateNow: String = DateTimeFormatter.ofPattern(oracleOutPutFileDateFormat).format(java.time.LocalDate.now)

 

    /* Reconstruct the new file name*/

    val basename = FilenameUtils.getBaseName(oracleOutPutFilePath)

    val extension = FilenameUtils.getExtension(oracleOutPutFilePath)

    val path = FilenameUtils.getFullPath(oracleOutPutFilePath)

    //println(path,basename,dateNow,extension)

    val newFileName = path + "\\" + basename + "_" + dateNow + "." + extension

    //println(newFileName)

 

    /* Read the query from the file*/

    val queryString = scala.io.Source.fromFile(oracleQueryFilePath).mkString

    //print(queryString)

 

    /* Connect to Oracle DB and run the query and save to results set*/

    val conExport = OracleConnect.connJdbc(oracleUser, oraclePassword, oracleURL)

    val statementExport = conExport.createStatement()

    statementExport.setFetchSize(1000)

    val resultSet: java.sql.ResultSet = statementExport.executeQuery(queryString)

 

    /* Write the result set to CSV file*/

    val csvWriter = new CSVWriter(new FileWriter(newFileName), ',')

    csvWriter.writeAll(resultSet, true)

 

    csvWriter.close()

  }

}


Comments

Post a Comment

Popular posts from this blog

BIG Data, Hadoop – Chapter 2 - Data Life Cycle

Data Life Cycle The data life cycle is pictorial defined as show below:     As we see, in our current system, we capture/ Extract our data, then we store it and later we process for reporting and analytics. But in case of big data, the problem lies in storing and then processing it faster. Hence Hadoop takes this portion, where it stores the data in effective format (Hadoop distributed File System) and also process using its engine (Map Reduce Engine). Since Map Reduce engine or Hadoop engine need data on HDFS format to process, We have favorable tools available in market to do this operation. As an example, Scoop is a tool which converts RDBMS to HDFS. Likewise we have SAP BOD to convert sap system data to HDFS.

OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)" returned message "The Microsoft Access database engine cannot open or write to the file ''. It is already opened exclusively by another user, or you need permission to view and write its data.". Msg 7303, Level 16, State 1, Line 1 Cannot initialize the data source object of OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)".

OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)" returned message "The Microsoft Access database engine cannot open or write to the file ''. It is already opened exclusively by another user, or you need permission to view and write its data.". Msg 7303, Level 16, State 1, Line 1 Cannot initialize the data source object of OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)". If you get this error while Loading Data From Excel to SQL Server, then, close the Excel sheet opened and try to run queries again.

Talend ETL Part 1: SQL Server Database to Excel Sheet

Hello All, Of many ETL tools available in Market, One of the strong tool is Talend. Difference between other ETL tools and tools like Pentaho, Talend, Clover ETL, Adeptia Integration etc, is that they support NO SQL Cross domains, BIG Data, Hadoop etc. Other ETL tools like, SSIS, Informatica are now coming with their higher versions, which consists of Hadoop Integration. Basically We can say, there are two databases types. 1) RDBMS (Example: SQL Server, MySQL, Oracle etc) 2) Non RDBMS (Example: MongoDB, InfiniDB etc) Talend Supports Non RDBMS databases. Here I would like to share my hands on experience on Talend and how to use it and explain basic components of Talend. Approx there are 500 components we can find in Talend. So lets Kick Start from Basics. First lets try to load Data from Microsoft SQL Server to Excel. Steps: Step1: Open Talend Studio. Step 2: Right click on Job Design and Create a new Job by giving some job name. Step 3: Give the name o...