Splitting UFT-16 / UCS-2 LE BOM. CR LF Split and Insert header

cerfus · March 19, 2016, 11:59am

Hello there fellows, I made this post to illustrate some problems that GSplit first time users may encounter, it took me a while to figure out how to get the desired results.

First of all to keep in mind, GSplit does it’s magic on a byte by byte basis, so get a hex editor to do your tests, any available for free will do.

Search Pattern:
I needed to split SQL Server Management Studio ‘generate scripts’ files and the first issue I found is that when I tried to split a file after the nth occurrence of 0x0D0x0A din’t work at all. Well, that is because UTF-16 and UCS-2 LE BOM use 2 bytes for each character displayed on your text editor, such as notepad++. If you want it to correctly split after each CR/LF you will need to set the search pattern to: “0x0D0x000x0A0x00”
[Type and Size][Blocked Pieces][I want to split after the nth occurrence of a specified pattern]

Inserting custom headers:
Again, remember the byte by byte processing? well, if you’re working with a 2-byte encoded you will need to translate the text you want to insert to hex code, that hex editor you downloaded will help you here… For example, I will add "My custom header.[CR][LF]"
What I did was to create a new text file with the same enconding, write that text down and open it with the hex editor, then copied the hex dump and pasted on Notepad++,
here is what it looks like:
4d007900200063007500730074006f006d0020006800650061006400650072002e000d000a00
since it was just a hex dump and GSplit uses 0x to denote hex, you can use the search and replace RegEx as follows:
search for “(…)” two contiguous characters and replace by “0x\1” (quotes shouuld be ignored)
And you end up with:
0x4d0x000x790x000x200x000x630x000x750x000x730x000x740x000x6f0x000x6d0x000x200x000x680x000x650x000x610x000x640x000x650x000x720x000x2e0x000x0d0x000x0a0x00

and just one more thing, if you’re using a By Order Mark (BOM) you will need to insert that first, in my case it was UCS-2 LE BOM so I needed to add 0xff0xfe to the very beggining of it.
like this:
0xff0xfe4d007900200063007500730074006f006d0020006800650061006400650072002e000d000a00

[Other properties][Tags and headers][Do not add GSplit tags to pieces (checked)][Insert additional header to pieces(checked)][Insert the following line (special characters allowed)]

I hope this post will help future GSplit users because I believe it’s a very powerful tool, however it is assumed in most posts I’ve seen that you’re working on a 1-byte encoded file… so these are my two cents.

cerfus.

gdgsupport · March 21, 2016, 6:56pm

Thank you for your great tutorial!

cerfus · March 22, 2016, 7:46am

Thank you for the kind words and for making GSplit available for free. Appreciated.