Let me just sum this up for you right up front: If you have multiple sheets in a spreadsheet with different fields in them, use multiple Excel input steps, not just one. Read on for the long version.
This week I got some sample data in Excel format. The file contained 17 sheets, each with a different set of fields in the header row (which was on the 2nd row by the way, not the first). So I dragged an Excel input step into PDI, and went to work configuring it. Based on the data model description (provided in a separate file), I got all 17 sheet names and then narrowed them down to the 3 that I needed for my purposes. I set the start row to “1″ for each (apparently the header row is 0 based), and got all the fields. There were over 800 of them. I narrowed those down to just the 80 or so I needed, and continued to set up the dimension tables via the “Dimension Lookup / Update” step. I was also sure to update the “Spread sheet type (engine)” to Excel 2007 XLSX (Apache POI) to match the file format I was provided.
Once I had it all set up, I ran the transformation, which completed successfully. Unfortunately it only successfully pulled data out of one of the three worksheets. Upon closer inspection, I found that the two sheets from which data was not retrieved had some formatting in them. The cells within that sheet looked like this:
When I hovered over the warning sign, the message was “The number in this cell is formatted as text or preceded by an apostrophe.” My first thought was that I needed to get a new data set that had no formatting, and I wasn’t sure where to go from here. I asked around, and Sal Scalisi, a colleague of mine here at Pentaho produced a more simple example with the same formatting I had in which everything worked just fine. The difference? He only had one sheet in his Excel file. He suggested that I use one Excel input for each sheet in the Excel workbook, and he was right on the money. I rearranged the transformation, and everything works extremely well. This is the end result:
Hopefully this tip is helpful to some of you out there dealing with Excel data!