post Category: coding — Jon Watson @ 9:02 am — post Comments (0)

For my final trick this week, I will share the correct invocation to load a typical Comma Separated Values (CSV) file into a MySQL table.

Not all text files are created the same, but in general CSV files are pretty predictable. A CSV file generally:

  • Has each field enclosed in double quotes
  • Has each field separated by a comma
  • Has a new line character at…well…the end of each line
  • Has a single line at the beginning containing the column names

If this sounds like your CSV file, the – after creating your table in the database – this should work for you:

LOAD DATA INFILE ‘foo.csv’ INTO TABLE FOO FIELDS TERMINATED BY ‘,’ ENCLOSED BY ‘”‘ LINES TERMINATED BY ‘\n’ IGNORE 1 LINES;

Enjoy!

Reblog this post [with Zemanta]

Tags: , ,

Rate this post:

Some related posts:

`
post Category: General Musings — Jon Watson @ 5:46 pm — post Comments (0)

Hot on the heels of my recent escapades with table indexes, I realized that some of the column definitions in my 314 field table were incorrect. The table is far too large to attempt to open in PHPMyAdmin so I had to hand bomb the changes from the command line.

I had a variety of columns that were of the correct type (VARCHAR) but not of the correct length. To change column BAR in table FOO from a VARCHAR(10) to a VARCHAR(20), use the following syntax:

alter table FOO modify BAR varchar(20);

Keep in mind that every record has to be accessed in order to make a change to a column so if you’re lucky like me and have 3 million records, you might want to string all your mods together and go to bed.

Rate this post:

There are no related posts to this one. Have some randomness:

`
post Category: General Musings — Jon Watson @ 2:25 pm — post Comments (0)
MySQL GUI Tools

Image via Wikipedia

I recently had the dubious pleasure of having to deal with a single exported database table with over 3 million records in it. That in itself isn’t a problem, but slugging around a 3GB CSV file and then having to wangle it six ways to Sunday to get it into a MySQL table had a little higher PITA factor than I was looking for.

One of my earlier attempts at getting the data into the MySQL database was to use the (relatively) new CSV engine that MySQL supports. This engine actually allows you to completely by-pass loading your CSV file into MySQL; rather, you can create a table using the CSV engine, and then just replace the data file within the MySQL directory with your actual CSV file. Presto blammo! Instantaneously loaded data. The only real downside from my perspective is that the CSV engine does not support indexes which, when you’re dealing with over 3 million records, is a deal breaker.

More information on using the MySQL CVS engine can be found here.

I ended up creating the table (which has 314 fields) and then using the MySQL LOAD DATA INFILE command to get the file into a regular old ISAM table which supports indexes. I then turned my thoughts to what, exactly, I should index.

First, keep in mind that an index is a method the database engine can employ to break rows up into smaller chunks so it can know very generally where things are in the table. When you query the table, the index is consulted to quickly get to a general set of records and then each record is examined to find the exact one(s) you want. Think of indexing like an address book with alphabetic tabs sticking out. If you’re looking for your friend George Smith’s address, you can open it to the S tab and then thumb through each page looking for George. That’s orders of magnitude faster than starting at page one of the As and then looking at every single page until you stumble across George way down in the Ses. Those tabs are essentially indexes.

Ok, so knowing that, we should index every field, right?

No. We shouldn’t. It is possible to over-index a table and end up shooting yourself in the foot. There are certain types of fields which benefit exactly nothing from being indexed. Here are my very basic, database engine agnostic, and high level requirements for choosing what fields to index in a table:

  • If the field is unique in every row, do not index it. You will actually lose speed because the index will not prevent the database engine rom having to look at every row anyhow, and it has the additional baggage of the useless index to contend with.
  • If the field is the same in every row, do not index it for the same reasons listed in the first point
  • Fields that you think you will probably be using in your WHERE clause should probably be indexed because those are the fields that the database engine is going to be looking at. But remember the first two points.
  • Fields that have over 50% unique values in your table are EXCELLENT candidates for indexing.

Those are my very general rules that I consider when determining what fields to index in a table. Each situation is unique and there are literally hundreds of other factors to take under consideration but if you stick with just these four, you’re more than halfway there.

Reblog this post [with Zemanta]

Tags: , , , , , , ,

Rate this post:

Some related posts:

`