Wednesday, November 30, 2011

Excel Changes Values and Formatting when Importing CSV

When opening a text-based data format with Excel (CSV, tab separated, etc.) you will find that Excel has a nasty habit of automatically interpreting the data types of the cells.  Once it (thinks it) has determined the proper data type, it will also reformat the data to its default formats for numbers, currency, dates, times, etc.

For example, I work with data that involves a lot of timestamps.  I frequently have timestamp data in a MySQL database that looks like this:

   2011-11-30 13:56:02

This is a typical timestamp format that is recognized by many different programs and programming languages.

Usually I will export data from the database into CSV format and load it into Excel to do some tweaking before I turn it into a visualization.  However, when reading the data from the CSV file, Excel, detecting it as a date/time value, will reformat it according to its default formatting rules, like this:

   11/30/2011 13:02

If I then save the CSV file, Excel will save it out using its default format (MM/dd/yyyy hh:mm) instead of my original format (yyyy-MM-dd hh:mm:ss).

Yes, ladies and gentlemen, Excel just changed my values and corrupted my data.

Here is how to prevent this from happening:


  1. In Windows Explorer rename the file, changing the extension from ".csv" to ".txt".  (If you don't see file extensions, do the following:

    1. Press the Alt key in the Explorer window.  (This will make the menu bar visible.)

    2. In the newly visible menu bar, select Tools -> Folder Options ...

    3. Click on the "View" tab.

    4. UNCHECK the box that says "Hide extensions for known file types."

    5. Click OK.

    6. You should now be able to see the ".csv", ".txt", ".whatever" extensions on the file names.



  2. In Excel, open the newly renamed file.  (You will probably need to change the file type selector to "All Files (*.*)" in order to find your .txt file.)  After clicking "Open", Excel will present you with a window titled "Text Import Wizard - Step 1 of 3".

  3. In Step 1, choose the "Delimited" radio button.  Click "Next".

  4. In Step 2, in the "Delimiters" section, check the "Comma" box (or whatever delimiter your file is using) and make sure that all other delimiter boxes are UNCHECKED.  You should now see your data divided into the appropriate columns at the bottom of this window.  Click "Next".

  5. In Step 3, click on the first column.  Use the scrollbar to scroll over to your rightmost column.  Hold down the Shift key and click on the final column.  All columns should now be selected.

  6. Click on the "Text" radio button in the "column data format" section.  Now click "Finish."


Following these steps, Excel will NOT try to automatically determine a data type, nor will it reformat or change your data in any way.

No comments:

Post a Comment