Splitting a giant CSV to smaller files in macOS
It was an eventful day where i got a request to review a 10 million row CSV. Tried to use excel in mac to open this giant file but excel couldn't even open the file completely:
So what do i do? Like every sane person would do , i decided to use my python skills to split up the file into chunks to save the manual effort. Then it just flashed on my mind. I am just lazy.
So next step, instead of writing a small python snippet for 5 minutes, i decided to go all out to search for a simple way. Apparently there is an easier way to do this in Linux/mac with a built-in utility called ‘split’.
So this is how you do it in the terminal:
>split -l <linecounttosplit> <filename.csv>
The above commands the splits the file based on the line count specified. But the result contains files without the csv extensions. So time to solve that with a simple shell script:
for i in *; do mv "$i" "$i.csv"; done
What this does is — appends .csv to all files in current directory. Now we can easily open the chunked files in excel and spend the merry time processing them.
The end. Thanks for reading!