posted: 07 Jun 2011
As part of some expansion on our corporate intranet, I have begun building out a Documents API, similar to that from Google. The XML includes nodes that are more focused on our own particular business and internal arrangements, but pulls from the Google counterpart in many other ways. I wanted the API to be RESTful as far as possible and therefore needed to accept files directly via HTTP POST. Integrating this into an application seemed pretty straight forward, since CFHTTP allows you to specify a cfhttpparam with type="file", everything seemed sweet.
Unfortunately, POSTing files through CFHTTP has given me a number of headaches, particularly with .docx files. The resulting file, saved via a pretty conventional CFFILE action="upload" process increases by 2 bytes! Which is somewhat annoying. Here's the code I was using:
I've not done many parametric tests and haven't been able to identify if the 'Accept-Encoding' and 'TE' headers have caused the problems, but they are necessary to prevent any compression of the response which CFHttp doesn't like. So I took a delve into the HTTP spec and in the end, wrote my own HTTP body to achieve the result. I read the binary file in to a byte array 'myfile' using standard CFFile and then use this to initialize a Java String object using ISO-8859-1 character encoding. The divider can be any string that's not found anywhere else in the body, I don't scan for it, so there is a potential for conflict here, but the length of the hash makes it 'pretty' unlikely. The important thing to note here is that the Content-Disposition declaration comes immediately after the divider, that a single blank line separates the Content-Disposition declaration and the content of that POST field and that a single line follows this content before the next divider. The final thing to note is that dividers are all prefixed with '--' except for the terminal one, which is also suffixed with '--'.
There may be a nicer way to achieve the 'string' response of the actual file data, HTTP being a text based protocol, the body has to be sent as a string, but for now, this approach is working. The most important thing during testing and getting this to work was that the character encoding was consistent and I found that 'ISO-8859-1' proved the best. UTF-8 didn't work so well.