Mar 182013

Let me just sum this up for you right up front: If you have multiple sheets in a spreadsheet with different fields in them, use multiple Excel input steps, not just one.  Read on for the long version.

This week I got some sample data in Excel format.  The file contained 17 sheets, each with a different set of fields in the header row (which was on the 2nd row by the way, not the first).  So I dragged an Excel input step into PDI, and went to work configuring it.  Based on the data model description (provided in a separate file), I got all 17 sheet names and then narrowed them down to the 3 that I needed for my purposes.  I set the start row to “1″ for each (apparently the header row is 0 based), and got all the fields.  There were over 800 of them.  I narrowed those down to just the 80 or so I needed, and continued to set up the dimension tables via the “Dimension Lookup / Update” step.  I was also sure to update the “Spread sheet type (engine)” to Excel 2007 XLSX (Apache POI) to match the file format I was provided.

Once I had it all set up, I ran the transformation, which completed successfully.  Unfortunately it only successfully pulled data out of one of the three worksheets.  Upon closer inspection, I found that the two sheets from which data was not retrieved had some formatting in them.  The cells within that sheet looked like this:


When I hovered over the warning sign, the message was “The number in this cell is formatted as text or preceded by an apostrophe.”  My first thought was that I needed to get a new data set that had no formatting, and I wasn’t sure where to go from here.  I asked around, and Sal Scalisi, a colleague of mine here at Pentaho produced a more simple example with the same formatting I had in which everything worked just fine.  The difference?  He only had one sheet in his Excel file.  He suggested that I use one Excel input for each sheet in the Excel workbook, and he was right on the money.  I rearranged the transformation, and everything works extremely well.  This is the end result:

Hopefully this tip is helpful to some of you out there dealing with Excel data!


  2 Responses to “Pro Tip: Pentaho Microsoft Excel Input Step”

  1. You are required to get out the top loan website
    after which you need to match a straightforward a fairly easy online form
    about the website of the borrowed funds.

  2. I have read so many content concerning the blogger lovers however this paragraph is in fact a nice piece of writing, keep it up.|

 Leave a Reply



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>