文件上传的秘密（二）向编码有关问题说再见 _Web前端

文件上传的秘密（二）向编码问题说再见

到此，我们已经实现了表单文件上传的功能，解析后的文件以二进制的方式写入到文件系统里。但是，有点小小遗憾，请求发送到服务器端后，服务器并不知道请求是以何种字符集进行编码的。这不是一个技术问题，而是整个软件行业中对字符集的编码与转化缺乏一个统一的标准。因此，这个问题留给了开发人员，告诉浏览器，以何种字符集进行编码、在服务器端以何种字符集进行解码。

如果开发人员没有告诉编码，浏览器将以 iso-8859-1作为默认编码。开发人员告诉浏览器编码后，浏览器根据两个 boundary内的 MIME数据，决定是否对该区块内的内容进行编码，是二进制的内容则忽略，非二进制的需要进行编码。在 JVM中，对于非二进制文件，如文本文件，程序代码对此过程进行了正确的编码设置和解码过程后，程序能得到正确的内容数据，但仍然不能保证最终这些内容以正确的编码方式存储到文件系统上。

因为 JVM在启动的时候，用 file.encoding参数指定 JVM对文件存储及读取的编码，并且，一旦 JVM启动后，此参数不可修改。为什么要因为这个参数，上传文件后还需要再次转换文件内的内容呢，谁也不想做这麻烦的事情。

因此，除了按照指定的编码对文件明进行解码外，还需要对非二进制的文件内容也要进行解码。我们需要一个一步到位的、简单的解决编码问题的方案，是的，我们需要这个简单的方案。

回顾前一篇中的 MultiPartFile 类，对其进行稍微的改动。

public abstract class MultiPartFile {
	protected String name;
	protected int start;
	protected int end;
	protected String charset = "iso-8859-1";

	public abstract void append(byte[] buff, int off, int len)
			throws IOException;

	public abstract void close() throws IOException;

// 此处省略其他非重要方法
//...
}

对于二进制的文件，用 MultiPartBinaryFile表示，继承 MultiPartFile类，用 FileOutputStream来读写文件。

public class MultiPartBinaryFile extends MultiPartFile {
	private FileOutputStream fos;
	
	
	public void append(byte[] buff, int off, int len) throws IOException {
		fos.write(buff, off, len);
	}

	public void close() throws IOException {
		fos.close();
	}
// 此处省略其他非重要代码
// ..
}

对于文本文件，需要指定文件存储时，所采用的编码方式。向文件里写入数据时，用指定的字符集进行编码，为了加快这个速度，这里使用 BufferedWriter类。

public class MultiPartTextFile extends MultiPartFile {

	private Writer writer;
	
	public MultiPartTextFile(String name) throws IOException {
		super(name);
		writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(name)));
	}
	public MultiPartTextFile(String name, String charset) throws IOException {
		super(name, charset);
		writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(name), charset));
	}
	
	public void append(byte[] buff, int off, int len) throws IOException {
		byte[] wb = new byte[len];
		System.arraycopy(buff, off, wb, 0, len);
		writer.write(new String(wb, super.charset));
	}

	public void close() throws IOException {
		writer.close();
	}
}

这两文类封装好后，对以前的分析文件的代码做一下改进。让分析文件的代码自动识别二进制与文本内容。

                                MultiPartFile multiPartFile = null;

				// determine the start position of content-type line
				int pos = start = BoyerMoore.indexOf(buffer, _CTRF, end);
				
				// determine the line of content-type
				start = BoyerMoore.indexOf(buffer, _CTRF, start + _CTRF.length);
				byte[] line = new byte[start - pos];
				System.arraycopy(buffer, end, line, 0, line.length);
				String contentType = new String(line);
				if (contentType.indexOf(_TEXT_CONTENT_TYPE_PREFIX) != -1) {
					multiPartFile = new MultiPartTextFile(dir + name, encoding);
				} else {
					multiPartFile = new MultiPartBinaryFile(dir + name);
				}

				multiPartFile.setStart(start + _CTRF.length * 2);
				return multiPartFile;

好了，编码的问题就彻底解决了，程序中只要一次指定了编码字符集，就向乱码问题说再见。程序员是不是可以越来越"懒"了:)