Spaces Between Each Character in Text Output

It looks like some Unicode text format or UTF-8. The first piece works fine because it contains the correct header (or BOM character: see http://en.wikipedia.org/wiki/Byte_order_mark). To have all other piece files work fine too, you must configure GSplit to insert the correct header in all other piece files (that is the correct BOM character or header data). To find out what, just open your first piece file in a hex editor for instance.

Spaces Between Each Character in Text Output

Our office is receiving large text data sets that we need to break into smaller data sets so that we can put them onto the appropriate media to transfer them to our internal GIS. The data we receive are text files in CSV format, roughly 30 million records per file, and about 15 GB each. We tried to split a large text file down to smaller files at 1.5 GB each. Though the format of the first output file came out right, there is a space inserted between every single character in subsequent files. It appears that part of the issue are two carriage returns inserted at the end of each line in the output files. We’ve also discovered that importing a smaller test file into Excel and saving it as a CSV, then changing the name of the .CSV file to .TXT the split worked fine. Unfortunately, we won’t have the luxury of doing that kind of conversion against the real data, given the large file sizes.

The first few lines of the unsplit data appears as follows:

“LRIMOShipNo”,“Latitude”,“Longitude”,“AdditionalInfo”,“CallSign”,Heading,MMSI,“MovementDateTime”,MovementID,“ShipName”,“ShipType”,“Speed”,“Beam”,“Draught”,“Length”,“Destination”,“DestinationTidied”,“ETA”,“MoveStatus”
“0”,“51.2998117”,“4.3312517”,“N/A”,“9205202”,0,205286790,“2011-12-31 23:59:40.137”,4000000016617601,“AD FUNDUM “,“N/A”,“0”,“11”,“0”,“19”,” “,””,“9999-12-31 23:59:59.000”,“N/A”
“0”,“51.23203”,“4.523385”,“N/A”,"OT2873 ",334.10000000000002,205287390,“2011-12-31 23:59:38.137”,4000000016617601,"AXIOMA ",“Cargo”,“0”,“12”,“0”,“85”,“SCHOTEN “,””,“2011-08-08 00:00:00.000”,“Under way sailing”

The first few lines of the second split file appears as follows:

“LRIMOShipNo”,“Latitude”,“Longitude”,“AdditionalInfo”,“CallSign”,Heading,MMSI,“MovementDateTime”,MovementID,“ShipName”,“ShipType”,“Speed”,“Beam”,“Draught”,“Length”,“Destination”,“DestinationTidied”,“ETA”,"MoveStatus"
1 2 3 : 5 9 : 3 5 . 0 3 0 " , 4 0 0 0 0 0 0 0 1 7 9 2 8 0 0 1 , " S A G I T T A R I U S " , " C a r g o " , " 0 " , " 1 1 " , " 0 " , " 1 1 0 " , " N L A M S 0 2 3 3 M 1 2 2 4 8 0 0 0 1 5 " , " " , " 9 9 9 9 - 1 2 - 3 1 2 3 : 5 9 : 5 9 . 0 0 0 " , " U n d e r w a y u s i n g e n g i n e "

" 0 " , " 5 2 . 2 4 6 4 8 8 3 " , " 6 . 8 0 0 1 8 6 7 " , " N / A " , " P F 3 6 3 1 " , 1 0 6 . 0 9 9 9 9 9 9 9 9 9 9 9 9 9 , 2 4 4 6 6 0 9 0 1 , " 2 0 1 2 - 0 3 - 3 1 2 3 : 5 9 : 2 0 . 0 3 0 " , 4 0 0 0 0 0 0 0 1 7 9 2 7 9 9 1 , " R E H O B O T H " , " N / A " , " 0 " , " 1 0 " , " 2 " , " 8 6 " , " L O C H E M " , " " , " 9 9 9 9 - 1 2 - 3 1 2 3 : 5 9 : 5 9 . 0 0 0 " , " E n g a g e d i n f i s h i n g "

" 0 " , " 5 1 . 6 5 9 6 4 8 3 " , " 6 . 5 7 2 6 2 3 3 " , " N / A " , " P G 9 2 9 1 " , 8 6 . 2 9 9 9 9 9 9 9 9 9 9 9 9 9 7 , 2 4 4 6 6 0 9 0 4 , " 2 0 1 2 - 0 3 - 3 1 2 3 : 5 2 : 5 0 . 0 3 0 " , 4 0 0 0 0 0 0 0 1 7 9 2 7 9 3 1 , " C U N E R A " , " C a r g o " , " 5 . 1 " , " 1 1 " , " 0 . 2 " , " 1 7 7 " , " F R O U A R D " , " " , " 2 0 1 2 - 1 0 - 2 8 1 7 : 2 2 : 0 0 . 0 0 0 " , " U n d e r w a y u s i n g e n g i n e "

Thanks we will give that a try.