Notes from Daily Encounters with Technology RSS 2.0
 
# Sunday, June 13, 2010

A few months ago I worked on a small spare time project which included some manipulation of binary Excel (.xls) files. This seemingly simple task soon turned out to be quite a challenge if you want to handle it right. The post you are reading is a short summary of my experiences. They should make your choices easier if you are about to tackle a similar problem.

The most obvious choice for handling .xls files is Excel automation using the Excel object model. As long as your application is always going to be used interactively, you should be fine. It’s probably the best method to use in spite of a few downsides:

  • The object model is COM based which means you’ll have to do interop if you are developing a .NET application. Fortunately there are some nice improvements in .NET Framework which make coping with COM in C# much easier.
  • Your application will have to be compiled in 32-bit in order for it to work on 64-bit Windows. This usually isn’t a real limitation just don’t forget to switch the target platform from Any CPU to x86 if you’re doing development on a 32-bit OS to avoid the problem. (You’re going to notice it yourself when developing on a 64-bit OS.)
  • Don’t try unit testing the Excel automation code outside your IDE (e.g. on your continuous integration server). As described below this scenario is not supported.

As soon as you want your code to be run non-interactively, you’re out of luck with Excel automation. Since all Office applications assume to be running on the interactive desktop, there are several reasons why such usage is not supported and is even strongly discouraged by Microsoft. If you want to run your code on a web server, as a Windows service, a scheduled task or just automatically test it on your build server, you’ll have to find a different approach. And in this case there are no obvious choices.

Your best bet is to use Open XML file format (.xlsx, .xlsm) instead of the binary one (.xls) if this is an option for you. Since this is a well documented XML based format you can manipulate it directly without running Excel at all. You can even use Open XML SDK for Microsoft Office which includes strongly typed classes that simplify many common tasks.

This is not the case with the binary format. Microsoft doesn’t provide or support any SDK or API for manipulating the files directly. You are only provided with detailed documentation of the format. Based on it some third party solutions have been developed. Aspose and SoftArtisans have their own commercial offerings which I haven’t evaluated because as such they weren’t suitable for me.

On the other hand the only free library that I have found is MyXLS and this one leaves much to be desired. It also seems to be pretty much abandoned with the latest release almost a year ago and only a single commit to the repository in this year so far. That being said, it still might prove useful if you only want to create the files, mostly focusing on the appearance with only minimal requirements regarding formulas. According to the samples this seems to work fine. Reading existing files is another story. You are more or less on your own as soon as you need to read cells with formulas. This made the library useless for me therefore I used another approach based on the fact that the OLE DB Provider for Jet can be used to read and write data in Excel worksheets.

At first I haven’t even considered this possibility because I was convinced that this method can only be used on worksheets designed as database tables (having columns of data with or without header columns). This article proved me wrong and in spite of many issues in the demo project it showed off techniques which turned out really useful for me. The most important one was the possibility to address regions, not only complete worksheets – this can be used to retrieve and set individual cell values as well as for accessing database table-like blocks of data in a part of the worksheet. Let’s take a look at a sample:

SELECT F1 FROM [Sheet1$C2:C2];

This query retrieves a value of a single cell. The same syntax can be used to access any region: after the worksheet name with a trailing $ character the region is defined just like in Excel formulas. If the region doesn’t include column headers (specified by including HDR=No in the Extended Properties of the connection string) the columns are named F# where # is the sequential number of the column in the defined region. Similarly the following query can be used to set a value of an individual cell:

UPDATE [Sheet1$C2:C2] SET F1 = 'Value';

A few more things worth mentioning:

  • Be aware of the IMEX option in the connection string. It doesn’t seem to be documented very well, probably its best description is here.
  • Make sure you keep the connection to the file open if you plan to do more data access later. Opening the connection takes quite some time therefore the performance will be terrible if you keep closing and reopening it.
  • The technique doesn’t seem to work if the worksheet name begins with a space. At least I couldn’t make it work, no matter how I set the quotes.
  • I encountered a file which couldn’t be opened in this way but the problem was resolved after I opened the file in Excel and saved it again.
Sunday, June 13, 2010 1:47:10 PM (Central European Daylight Time, UTC+02:00)  #    Comments [0] - Trackback
Development | .NET | Software | Office
# Wednesday, April 28, 2010

Query parameters are a very useful Excel feature allowing parameterization of database queries used to import data in Excel. They are really simple to use as well. On the Definition tab of the Connection Properties dialog there is a Parameters… button at the bottom. It gets enabled as soon as there is a parameter defined in the Command text – you define it by typing in a question mark (?) instead of a value in the WHERE clause of the query, as seen in the image below.

 Connection Propertie dialog

Unfortunately there is a limitation for using this functionality which turns out pretty unintuitive in Excel 2007. It is only supported for Microsoft Query based queries. For all other types of queries available in Excel you get the error “No value given for one or more parameters.” when you add a parameter to the Command text. Since the error doesn’t even hint at the real cause of the problem it took me some time before I figured it out. The thing to remember is: If you want to use query parameters, you must select Microsoft Query as the external data source when importing the data for the first time. This can’t be changed at a later time.

Get External Data From Microsof Query

Wednesday, April 28, 2010 8:01:23 PM (Central European Daylight Time, UTC+02:00)  #    Comments [0] - Trackback
Software | Office
# Saturday, November 24, 2007

Not so long ago I've been called to my boss's office to prevent him losing unsaved work in a PowerPoint presentation. It turned out that when he tried to save the file to a new location the message box with the overwrite warning for some reason didn't render completely and it was impossible to close it. As it turned out at the end I could have just killed the application and restart it, since the AutoRecover feature kicked in and offered a version of the file with all changes applied.

But just to be on the save side I wanted to copy the AutoRecover files to a save location before actually killing the application. But unlike Word or Excel where the location of these files is set in the options, PowerPoint does not have such an option. After some googling I finally stumbled across a page, correctly stating that the files are stored in the %temp% folder and named ppt*.tmp. I decided to publish this info here just in case I need it again.

Saturday, November 24, 2007 3:11:03 PM (Central European Standard Time, UTC+01:00)  #    Comments [0] - Trackback
Software | Office
# Saturday, June 02, 2007

If you need cheap and simple OCR functionality Microsoft Office Document Imaging Type Library (MODI) is a nice option if its requirements (Microsoft Office 2003 or later) and limitations (limited language support) don't bother you. Here is a simple C# function that does OCR on the image with the specified path:

static string OCR(string path)
{
    MODI.Document doc = new MODI.Document();
    doc.Create(path);
    doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);
    string result = ((MODI.Image)doc.Images[0]).Layout.Text;
    System.Runtime.InteropServices.Marshal.ReleaseComObject(doc);
    return result;
}

However, there is another Microsoft Office object model related problem. For Office 2003 users to be able to use your application, the MODI 11.0 (2003 version) must be referenced in the project and the release version of the application must be compiled on a machine with Office 2003 installed. In such a case VB6 still managed to compile the project on a machine with a newer version of Office installed, since the newer version of the type library was automatically used (MODI 12.0 for Office 2007 in this case). On the other hand strong type checking at compile time prevents that in C#.

If you want to keep using Office 2007 and be able to compile such a project, the only solution is to install Microsoft Office Document Imaging as the only component of Office 2003 along the existing Office 2007 installation. Unfortunately this overwrites the Microsoft Office Document Image Writer printer driver from 2007 with the older version, therefore you'll have to start a lengthy process of repairing the Office 2007 installation afterwards. And don't forget to apply all the service packs and updates for Office 2003 before that since this will also overwrite the printer driver and you'll have to repair Office 2007 once again. I learned that the hard way.

Saturday, June 02, 2007 10:39:34 AM (Central European Daylight Time, UTC+02:00)  #    Comments [4] - Trackback
Development | .NET | Software | Office
Page 1 of 1 in the SoftwareOffice category
Sponsored Ads

About Me
Twitter
I really need to give this a try: Notify Property Weaver : http://t.co/WRiDR7Rt 1 day ago
The Web is the new Terminal: Are you using the Web's Keyboard Shortcuts and Hotkeys? http://t.co/4PSPFgIy via @shanselman 1 day ago
Do Hard Things | Sealed Abstract http://t.co/6LDRAcrb (via Instapaper) 1 day ago
Potepanja v naravi: Abram na Nanosu http://t.co/vtlUEWJg 1 day ago
@MladenPrajdic @andrejt use the middle mouse button then 3 days ago
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

All Content © 2012, Damir Arh, M. Sc. Send mail to the author(s) - Privacy Policy - Sign In
Based on DasBlog theme 'Business' created by Christoph De Baene (delarou)
Social Network Icon Pack by Komodo Media, Rogie King is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.