Skip to content

Improve encoding support #211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 9, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions documentation/api_jszip/generate.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ options.type | string | `base64` | The type of zip to return, see below
options.comment | string | | The comment to use for the zip file.
options.mimeType | string | `application/zip` | mime-type for the generated file. Useful when you need to generate a file with a different extension, ie: ".ods".
options.platform | string | `DOS` | The platform to use when generating the zip file.
options.encodeFileName | function | encode with UTF-8 | the function to encode the file name / comment.

Possible values for `type` :

Expand Down Expand Up @@ -58,6 +59,13 @@ If you set the platform value on nodejs, be sure to use `process.platform`.
force the platform to `UNIX` the generated zip file will have a strange
behavior on UNIX platforms.

__About `encodeFileName`__ :

By default, JSZip uses UTF-8 to encode the file names / comments. You can use
this method to force an other encoding. Note : the encoding used is not stored
in a zip file, not using UTF-8 may lead to encoding issues.
The function takes a string and returns a bytes array (Uint8Array or Array).

__Returns__ : The generated zip file.

__Throws__ : An exception if the asked `type` is not available in the browser,
Expand Down Expand Up @@ -137,3 +145,17 @@ link.href = url;
```


Using a custom charset :

```js
// using iconv-lite for example
var iconv = require('iconv-lite');

zip.generate({
type: 'uint8array',
encodeFileName: function (string) {
return iconv.encode(string, 'your-encoding');
}
});
```

24 changes: 24 additions & 0 deletions documentation/api_jszip/load.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ options.base64 | boolean | false | set to `true` if the data is
options.checkCRC32 | boolean | false | set to `true` if the read data should be checked against its CRC32.
options.optimizedBinaryString | boolean | false | set to true if (and only if) the input is a string and has already been prepared with a 0xFF mask.
options.createFolders | boolean | false | set to true to create folders in the file path automatically. Leaving it false will result in only virtual folders (i.e. folders that merely represent part of the file path) being created.
options.decodeFileName | function | decode from UTF-8 | the function to decode the file name / comment.

You shouldn't update the data given to this method : it is kept as it so any
update will impact the stored data.
Expand All @@ -39,6 +40,16 @@ Zip features not (yet) supported :
* password protected zip
* multi-volume zip


__About `decodeFileName`__ :

A zip file has a flag to say if the filename and comment are encoded with UTF-8.
If it's not set, JSZip has **no way** to know the encoding used. It usually
is the default encoding of the operating system.

The function takes the bytes array (Uint8Array or Array) and returns the
decoded string.

__Returns__ : The current JSZip object.

__Throws__ : An exception if the loaded data is not valid zip data or if it
Expand Down Expand Up @@ -79,3 +90,16 @@ zip.folder("subfolder").load(data);
// the content of data will be loaded in subfolder/
```

Using a custom charset :

```js
// using iconv-lite for example
var iconv = require('iconv-lite');

zip.load(content, {
decodeFileName: function (bytes) {
return iconv.decode(bytes, 'your-encoding');
}
});
```

23 changes: 19 additions & 4 deletions documentation/limitations.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,22 @@ Some data are discarded (file metadata) and other are added (subfolders).

### Encodings support

JSZip only supports utf8 : if the names of the files inside the zip are not in
utf8 (or ASCII), they won't be interpreted correctly. If the content is a text
not encoded with utf8 (or ASCII), the `asText()` method won't decode it
correctly.
JSZip only supports UTF-8 natively. A zip file doesn't contain the name of the
encoding used, you need to know it before doing anything.

#### File name

If the name of a file inside the zip is encoded with UTF-8 then JSZip can
detect it (Language encoding flag, Unicode Path Extra Field). If not, JSZip
can't detect the encoding used and will generate [Mojibake](https://en.wikipedia.org/wiki/Mojibake).
You can use the [encodeFileName]({{site.baseurl}}/documentation/api_jszip/generate.html)
option and the [decodeFileName]({{site.baseurl}}/documentation/api_jszip/load.html)
option to encode/decode using a custom encoding.

#### File content

The `asText()` method uses UTF-8 to decode the content. If you have a text in
a different encoding, you can get the bytes array with `asUint8Array()` and
decode it with a lib (iconv, iconv-lite, etc) on your side.
To save a text using a non-UTF-8 encoding, do the same : encode it into a
Uint8Array before adding it to JSZip.
14 changes: 11 additions & 3 deletions lib/load.js
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
'use strict';
var base64 = require('./base64');
var utf8 = require('./utf8');
var utils = require('./utils');
var ZipEntries = require('./zipEntries');
module.exports = function(data, options) {
var files, zipEntries, i, input;
options = options || {};
options = utils.extend(options || {}, {
base64: false,
checkCRC32: false,
optimizedBinaryString : false,
createFolders: false,
decodeFileName: utf8.utf8decode
});
if (options.base64) {
data = base64.decode(data);
}
Expand All @@ -12,12 +20,12 @@ module.exports = function(data, options) {
files = zipEntries.files;
for (i = 0; i < files.length; i++) {
input = files[i];
this.file(input.fileName, input.decompressed, {
this.file(input.fileNameStr, input.decompressed, {
binary: true,
optimizedBinaryString: true,
date: input.date,
dir: input.dir,
comment : input.fileComment.length ? input.fileComment : null,
comment : input.fileCommentStr.length ? input.fileCommentStr : null,
unixPermissions : input.unixPermissions,
dosPermissions : input.dosPermissions,
createFolders: options.createFolders
Expand Down
57 changes: 22 additions & 35 deletions lib/object.js
Original file line number Diff line number Diff line change
Expand Up @@ -173,24 +173,6 @@ var decToHex = function(dec, bytes) {
return hex;
};

/**
* Merge the objects passed as parameters into a new one.
* @private
* @param {...Object} var_args All objects to merge.
* @return {Object} a new object with the data of the others.
*/
var extend = function() {
var result = {}, i, attr;
for (i = 0; i < arguments.length; i++) { // arguments is not enumerable in some browsers
for (attr in arguments[i]) {
if (arguments[i].hasOwnProperty(attr) && typeof result[attr] === "undefined") {
result[attr] = arguments[i][attr];
}
}
}
return result;
};

/**
* Transforms the (incomplete) options from the user into the complete
* set of options to create a file.
Expand All @@ -203,7 +185,7 @@ var prepareFileAttrs = function(o) {
if (o.base64 === true && (o.binary === null || o.binary === undefined)) {
o.binary = true;
}
o = extend(o, defaults);
o = utils.extend(o, defaults);
o.date = o.date || new Date();
if (o.compression !== null) o.compression = o.compression.toUpperCase();

Expand Down Expand Up @@ -438,12 +420,16 @@ var generateDosExternalFileAttr = function (dosPermissions, isDir) {
* @param {JSZip.CompressedObject} compressedObject the compressed object.
* @param {number} offset the current offset from the start of the zip file.
* @param {String} platform let's pretend we are this platform (change platform dependents fields)
* @param {Function} encodeFileName the function to encode the file name / comment.
* @return {object} the zip parts.
*/
var generateZipParts = function(name, file, compressedObject, offset, platform) {
var generateZipParts = function(name, file, compressedObject, offset, platform, encodeFileName) {
var data = compressedObject.compressedContent,
useCustomEncoding = encodeFileName !== utf8.utf8encode,
encodedFileName = utils.transformTo("string", encodeFileName(file.name)),
utfEncodedFileName = utils.transformTo("string", utf8.utf8encode(file.name)),
comment = file.comment || "",
encodedComment = utils.transformTo("string", encodeFileName(comment)),
utfEncodedComment = utils.transformTo("string", utf8.utf8encode(comment)),
useUTF8ForFileName = utfEncodedFileName.length !== file.name.length,
useUTF8ForComment = utfEncodedComment.length !== comment.length,
Expand Down Expand Up @@ -515,7 +501,7 @@ var generateZipParts = function(name, file, compressedObject, offset, platform)
// Version
decToHex(1, 1) +
// NameCRC32
decToHex(crc32(utfEncodedFileName), 4) +
decToHex(crc32(encodedFileName), 4) +
// UnicodeName
utfEncodedFileName;

Expand All @@ -534,7 +520,7 @@ var generateZipParts = function(name, file, compressedObject, offset, platform)
// Version
decToHex(1, 1) +
// CommentCRC32
decToHex(this.crc32(utfEncodedComment), 4) +
decToHex(this.crc32(encodedComment), 4) +
// UnicodeName
utfEncodedComment;

Expand All @@ -553,7 +539,7 @@ var generateZipParts = function(name, file, compressedObject, offset, platform)
header += "\x0A\x00";
// general purpose bit flag
// set bit 11 if utf8
header += (useUTF8ForFileName || useUTF8ForComment) ? "\x00\x08" : "\x00\x00";
header += !useCustomEncoding && (useUTF8ForFileName || useUTF8ForComment) ? "\x00\x08" : "\x00\x00";
// compression method
header += compressedObject.compressionMethod;
// last mod file time
Expand All @@ -567,20 +553,20 @@ var generateZipParts = function(name, file, compressedObject, offset, platform)
// uncompressed size
header += decToHex(compressedObject.uncompressedSize, 4);
// file name length
header += decToHex(utfEncodedFileName.length, 2);
header += decToHex(encodedFileName.length, 2);
// extra field length
header += decToHex(extraFields.length, 2);


var fileRecord = signature.LOCAL_FILE_HEADER + header + utfEncodedFileName + extraFields;
var fileRecord = signature.LOCAL_FILE_HEADER + header + encodedFileName + extraFields;

var dirRecord = signature.CENTRAL_FILE_HEADER +
// version made by (00: DOS)
decToHex(versionMadeBy, 2) +
// file header (common to file and central directory)
header +
// file comment length
decToHex(utfEncodedComment.length, 2) +
decToHex(encodedComment.length, 2) +
// disk number start
"\x00\x00" +
// internal file attributes TODO
Expand All @@ -590,11 +576,11 @@ var generateZipParts = function(name, file, compressedObject, offset, platform)
// relative offset of local header
decToHex(offset, 4) +
// file name
utfEncodedFileName +
encodedFileName +
// extra field
extraFields +
// file comment
utfEncodedComment;
encodedComment;

return {
fileRecord: fileRecord,
Expand Down Expand Up @@ -634,7 +620,7 @@ var out = {
}
file = this.files[filename];
// return a new object, don't let the user mess with our internal objects :)
fileClone = new ZipObject(file.name, file._data, extend(file.options));
fileClone = new ZipObject(file.name, file._data, utils.extend(file.options));
relativePath = filename.slice(this.root.length, filename.length);
if (filename.slice(0, this.root.length) === this.root && // the file is in the current root
search(relativePath, fileClone)) { // and the file matches the function
Expand Down Expand Up @@ -741,14 +727,15 @@ var out = {
* @return {String|Uint8Array|ArrayBuffer|Buffer|Blob} the zip file
*/
generate: function(options) {
options = extend(options || {}, {
options = utils.extend(options || {}, {
base64: true,
compression: "STORE",
compressionOptions : null,
type: "base64",
platform: "DOS",
comment: null,
mimeType: 'application/zip'
mimeType: 'application/zip',
encodeFileName: utf8.utf8encode
});

utils.checkSupport(options.type);
Expand All @@ -770,7 +757,7 @@ var out = {
localDirLength = 0,
centralDirLength = 0,
writer, i,
utfEncodedComment = utils.transformTo("string", this.utf8encode(options.comment || this.comment || ""));
encodedComment = utils.transformTo("string", options.encodeFileName(options.comment || this.comment || ""));

// first, generate all the zip parts.
for (var name in this.files) {
Expand All @@ -788,7 +775,7 @@ var out = {

var compressedObject = generateCompressedObjectFrom.call(this, file, compression, compressionOptions);

var zipPart = generateZipParts.call(this, name, file, compressedObject, localDirLength, options.platform);
var zipPart = generateZipParts.call(this, name, file, compressedObject, localDirLength, options.platform, options.encodeFileName);
localDirLength += zipPart.fileRecord.length + compressedObject.compressedSize;
centralDirLength += zipPart.dirRecord.length;
zipData.push(zipPart);
Expand All @@ -811,9 +798,9 @@ var out = {
// offset of start of central directory with respect to the starting disk number
decToHex(localDirLength, 4) +
// .ZIP file comment length
decToHex(utfEncodedComment.length, 2) +
decToHex(encodedComment.length, 2) +
// .ZIP file comment
utfEncodedComment;
encodedComment;


// we have all the parts (and the total length)
Expand Down
18 changes: 18 additions & 0 deletions lib/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -324,3 +324,21 @@ exports.isRegExp = function (object) {
return Object.prototype.toString.call(object) === "[object RegExp]";
};

/**
* Merge the objects passed as parameters into a new one.
* @private
* @param {...Object} var_args All objects to merge.
* @return {Object} a new object with the data of the others.
*/
exports.extend = function() {
var result = {}, i, attr;
for (i = 0; i < arguments.length; i++) { // arguments is not enumerable in some browsers
for (attr in arguments[i]) {
if (arguments[i].hasOwnProperty(attr) && typeof result[attr] === "undefined") {
result[attr] = arguments[i][attr];
}
}
}
return result;
};

8 changes: 5 additions & 3 deletions lib/zipEntries.js
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,12 @@ ZipEntries.prototype = {
// warning : the encoding depends of the system locale
// On a linux machine with LANG=en_US.utf8, this field is utf8 encoded.
// On a windows machine, this field is encoded with the localized windows code page.
this.zipComment = this.reader.readString(this.zipCommentLength);
var zipComment = this.reader.readData(this.zipCommentLength);
var decodeParamType = support.uint8array ? "uint8array" : "array";
// To get consistent behavior with the generation part, we will assume that
// this is utf8 encoded.
this.zipComment = jszipProto.utf8decode(this.zipComment);
// this is utf8 encoded unless specified otherwise.
var decodeContent = utils.transformTo(decodeParamType, zipComment);
this.zipComment = this.loadOptions.decodeFileName(decodeContent);
},
/**
* Read the end of the Zip 64 central directory.
Expand Down
Loading