Splitting Large Files Carefully

There comes a time every now and then where you need to split up large files. If you have the luxury of those files being split anywhere, you can use something like WinRAR, which will split it up into whatever size you like. If it’s good enough for file sharing, it’s good enough for you, right?

The problem is, sometimes you actually need to process that data. And even worse, sometimes the files aren’t equally laid out – so if you cut it off, you’re in trouble. After toying with the head and tail commands and even working with vim, Brad mentioned the split function.

While it isn’t perfect, wanting to split a file into equal-sized pieces, it’s possible to make it work. First, I decided that the file needed to be in 4 pieces. So I counted up from the bottom of the file. Let’s say that I have a file with 4 million records, and I want a 1 million-line chunk at the end, that means I would run split with a –lines parameter of 3000000. In actuality, my file needed to be split at a very precise point, so I used that line number. And it worked. I had parts 1-3 combined in the “big” file and part 4 in the “small” one.

Then I took that 3/4 file that I created, and did the same thing to build file #3 in the process, and so on. Worked great. Just make sure that you get the right size in the split command line. Also make sure that the larger piece of the file has the larger line count – otherwise it will split the second file into that same line count, and create a third file, and so on.

Incidentally, this is very useful in importing a massive export from Movable Type (but can be used in many places).


Posted

in