OLD | NEW |
(Empty) | |
| 1 /*! |
| 2 |
| 3 \page localization Localization |
| 4 |
| 5 The Windows Installer originally dates from 1999, and it hasn't kept up with the
times. |
| 6 Most problematic is that as of this writing [2013 Nov] it doesn't have official
support for Unicode. |
| 7 Support is largely undocumented, and in any case is only partial. |
| 8 This causes all sorts of problems with modern tools, many of which default to UT
F-8. |
| 9 WiX, as XML, does not exactly default to UTF-8, but it's close to it by conventi
on. |
| 10 Some languages are Unicode-only, notably those based on Indic scripts. |
| 11 |
| 12 Overall, these limitation generate two kinds of issues. |
| 13 The first class is deficits in the shipped product. |
| 14 In short, some strings in certain languages simply cannot be localized. |
| 15 The second class is a burden on compliance testing before release. |
| 16 Unicode support is not official in Windows Installer; indeed it's only partial s
upport at that. |
| 17 But also the WiX toolset does not provide a complete set of assurance tools, |
| 18 even though all the necessary compiler mechanisms seem to be present. |
| 19 As a result, manual verification of localization must occur for each language. |
| 20 |
| 21 \par MSI format |
| 22 |
| 23 The format of Windows Installer affects these issues. |
| 24 While it's a proprietary format, it's not totally opaque. |
| 25 Overall, it's uses the [COM Structured Storage][structured_storage] format, |
| 26 which is just an archive file with an internal file system. |
| 27 In their lingo, a "storage" is a directory and "stream" is a file. |
| 28 Two important aspects of localization use mechanisms whose documentation uses th
ese terms. |
| 29 The "Summary Information Stream" is a file that provides metadata for the instal
ler database. |
| 30 Embedded language transforms sit in the final MSI as storages. |
| 31 Some commentary on this topic is available on Rob Mensching's blog; |
| 32 he's the principal author of WiX and a former member of the Windows Installer
team. |
| 33 - Mensching |
| 34 [Inside the MSI file format] (http://robmensching.com/blog/posts/2003/11/25/in
side-the-msi-file-format) |
| 35 - Mensching |
| 36 [Inside the MSI file format, again] (http://robmensching.com/blog/posts/2004/2
/10/inside-the-msi-file-format-again) |
| 37 |
| 38 [structured_storage]: http://msdn.microsoft.com/en-us/library/aa380369%28VS.85%2
9.aspx |
| 39 |
| 40 The Summary Information Stream is an important structure for two reasons. |
| 41 The first is that it needs to be localized. |
| 42 It contains the strings that appear in the Control Panel tool that lists all ins
talled programs. |
| 43 This includes basic things like the name of the program. |
| 44 The second is that one of its fields lists the languages supported in a multiple
-language installer. |
| 45 - MSDN |
| 46 [Summary Property Descriptions] (http://msdn.microsoft.com/en-us/library/windo
ws/desktop/aa372049%28v=vs.85%29.aspx) |
| 47 |
| 48 It's worth noting that these summary properties were not originally designed for
installation. |
| 49 Rather, they are derived from the summary information for documents defined by t
he COM structured storage interface. |
| 50 The names originally used for these fields were carried over to the SIS for inst
allers, |
| 51 even when they didn't fit at all, |
| 52 One example is use of the page count field to represent the minimum required ins
taller version. |
| 53 Likewise, certain fields were dropped entirely, such as "total editing time". |
| 54 - MSDN |
| 55 [Summary Information Property Set] (http://msdn.microsoft.com/en-us/library/wi
ndows/desktop/aa380376%28v=vs.85%29.aspx) |
| 56 This page is about COM Structured Storage, not Windows Installer. |
| 57 |
| 58 The tool OpenMCDF, available on sourceforge, is a .NET component to manipulate s
tructured storage. |
| 59 It contains a simple structured storage browser that can show if embedded transf
orms are present. |
| 60 - SourceForge |
| 61 [OpenMCDF] (http://sourceforge.net/projects/openmcdf/) |
| 62 |
| 63 |
| 64 \par Installation Strings |
| 65 |
| 66 A simple MSI contains two internal files that contain localizable strings. |
| 67 - An installation database itself. |
| 68 This is a set of tables, some of which contain properties and UI strings. |
| 69 - The Summary Information Stream. |
| 70 This data appears principally in the Add/Remove Programs Menu within the Contr
ol Panel. |
| 71 We're also using embedded directories (storages) to store language-specific tran
sforms. |
| 72 These transforms, however, are generated from simple MSI's that contain only the
two files above. |
| 73 |
| 74 These two files are localized differently and have different levels of support f
or Unicode. |
| 75 In short, the installer database mostly supports Unicode and the SIS does not. |
| 76 Unicode is not officially supported anywhere, but it seems to work in practice f
or installer databases. |
| 77 The SIS is another story. |
| 78 Trying to use Unicode in the SIS leads to Windows thinking the MSI is malformed
and refusing to install it. |
| 79 The SIS can be localized into many languages, but not all; see the section on co
de pages below. |
| 80 |
| 81 |
| 82 \par Language Identifiers |
| 83 |
| 84 Microsoft has its own system of identifying languages. |
| 85 (Could it have been otherwise?) |
| 86 A language is identified with an LCID, which variously stands for language code
identifier or locale identifier. |
| 87 The inconsistency is internal to Microsoft, because it itself doesn't have a com
pletely consistent usage of LCID. |
| 88 The earlier LCID is the one accepted by Windows system calls; |
| 89 the later one is used for the .NET library. |
| 90 They are mostly the same, but they're not identical. |
| 91 |
| 92 Windows installer does not itself use any of the textual identifiers that are av
ailable, |
| 93 such as ISO 693 or its IETF usage. |
| 94 WiX has some support for textual identifiers, but doesn't document how LCID's ar
e generated from them. |
| 95 As a result, WiX should be considered unreliable for this. |
| 96 LCID reference tables are usually specified as hexadecimal constant, |
| 97 since the LCID itself is a bit field. |
| 98 Infuriatingly, WiX doesn't support hexadecimal integer constants, |
| 99 requiring a conversion of the LCID to decimal. |
| 100 |
| 101 The notion of a character set is given by a code page, |
| 102 which is simply an assignment of character strings to glyph strings. |
| 103 It's not as simple as a character-to-glyph map, |
| 104 since a double-byte character set (DBCS) uses an internal escape code |
| 105 and mixes single- and double-byte characters. |
| 106 Regardless of these details, the important distinction is between the so-called
"ANSI code pages" and Unicode. |
| 107 |
| 108 - MSDN |
| 109 [Language Identifiers] (http://msdn.microsoft.com/en-us/library/dd318691%28v=v
s.85%29.aspx) |
| 110 The format of LCID as the operating system recognizes them. |
| 111 - MSDN |
| 112 [Language Identifier Constants and Strings] (http://msdn.microsoft.com/en-us/l
ibrary/dd318693%28v=vs.85%29.aspx). |
| 113 Note: This is for the .NET version, but contains enough information to derive
the Window system call version. |
| 114 |
| 115 |
| 116 \par Code Pages |
| 117 |
| 118 The SIS is the only place where code pages really matter, because the SIS doesn'
t support Unicode. |
| 119 |
| 120 Microsoft has its own system of character sets, called code pages. |
| 121 Code pages predate Unicode, and while Microsoft is shifting over to Unicode, the
transition isn't complete. |
| 122 As long as the Windows Installer doesn't fully support Unicode, code pages will
be relevant. |
| 123 Even afterwards, they'll remain relevant until the first working version is the
oldest one requiring support. |
| 124 |
| 125 A code page specifies an encoding from byte strings to character strings. |
| 126 The details aren't particularly relevant to installation issues. |
| 127 What matters are the identifiers. |
| 128 With Unicode, you can settle on a single identifier, "UTF-8", that represents an
encoding. |
| 129 Code page identifiers are 16-bit integers. |
| 130 The most common one is 1252 (Latin-1), used for English and most Western Europea
n languages. |
| 131 The code page identifier 65001 means UTF-8, but it's not universally supported. |
| 132 |
| 133 Even within the Microsoft environment, code page identifiers are not completely
consistent. |
| 134 The Windows operating system accepts a basic set of code pages; |
| 135 overall these are alphabetic writing systems. |
| 136 There are also OEM code pages and non-native code pages. |
| 137 Code pages are used by Windows for font selection (amongst other things). |
| 138 Either or both of the code page itself and the requisite display fonts might be
absent on a machine. |
| 139 Furthermore, .NET introduced some others, such one for UTF-16LE, |
| 140 but these are only available to managed applications. |
| 141 Luckily, there don't seem to be any multiply-assigned code page identifiers. |
| 142 All this matters because various tables of code page identifiers tend not to dis
tinguish clearly between these categories. |
| 143 Even armed with a table, the developer or a linguist should be cautious about se
lecting a code page. |
| 144 |
| 145 The SIS uses, however, so-called "ANSI code pages", sometimes called "Windows co
de pages". |
| 146 This doesn't seem to be particularly well-defined term. |
| 147 It seems to mean "acceptable to a system call ending with A and not W", |
| 148 but even that's not clear. |
| 149 What is clear is that even though 65001 for UTF-8 appears in some tables of ANSI
code pages, |
| 150 it isn't one insofar as Windows Installer is concerned. |
| 151 Caveat emptor, indeed. |
| 152 |
| 153 - MSDN |
| 154 [Code Pages] (http://msdn.microsoft.com/en-us/library/windows/desktop/dd317752
%28v=vs.85%29.aspx) |
| 155 Commentary and historical information. |
| 156 - Wikipedia |
| 157 [Code page] (http://en.wikipedia.org/wiki/Code_page) |
| 158 Do not take as authoritative, but nevertheless useful for understanding some t
hings you might see elsewhere. |
| 159 - MSDN |
| 160 [Code Page Identifiers] (http://msdn.microsoft.com/en-us/library/dd317756%28VS
.85%29.aspx) |
| 161 - Heath Stewart's blog |
| 162 [MSI Databases and Code Pages] (http://blogs.msdn.com/b/heaths/archive/2005/10
/05/msi-databases-and-code-pages.aspx) |
| 163 Explains how older tools dealt with this issue; WiX is much easier. |
| 164 Contains some useful background information. |
| 165 - Mailing list wix-users |
| 166 [Build time selection of codepage...] (http://comments.gmane.org/gmane.comp.wi
ndows.devel.wix.user/44388) |
| 167 One developer decided not to localize the SIS at all. |
| 168 - [Character Sets And Code Pages At The Push Of A Button] (http://www.i18nguy.co
m/unicode/codepages.html) |
| 169 A wealth of information about the specific code page and other encodings. |
| 170 Hopefully we never need to dig this far in. |
| 171 |
| 172 |
| 173 \par Multiple-language Installers |
| 174 |
| 175 The Windows Installer does not have good mechanisms for multiple languages. |
| 176 Each installation package is a single-language installer at the time of installa
tion. |
| 177 In some situations it's feasible to simply ship out single-language installers, |
| 178 such as internal deployment within a multinational company. |
| 179 In others, where multiple languages are required, this leads to bloat because of
duplication of installation assets. |
| 180 |
| 181 The documented way to support this situation is to deliver two assets: |
| 182 (1) a basic MSI and (2) a transform (MST) that changes the language of the MSI
. |
| 183 There's syntax on the `msiexec` command line to apply a transform before install
ation. |
| 184 For consumer software, this is too much, so people started using an installation
driver executable. |
| 185 This executable would typically present a menu for the locale, |
| 186 or sometimes infer it from the environment. |
| 187 After selecting a locale, it would select and apply an appropriate transform. |
| 188 This is the documented way of doing things. |
| 189 |
| 190 There is, however, an undocumented way to automatically apply a transform. |
| 191 The Summary Information has a property called `Template` that contains a list of
languages supported. |
| 192 These languages are specified by a language identifer (LCID), a decimal integer, |
| 193 whose sublanguage may either be generic (zero) or specific. |
| 194 For example, 1033 is US English. |
| 195 The first (or only) language on the list specifies the language of the non-trans
formed installation database. |
| 196 Each subsequent identifier specifies an additional language. |
| 197 The MSI must contain an embedded transform for each additional language. |
| 198 (If it doens't, Windows Installer throws up an error message and aborts installa
tion.) |
| 199 An embedded transform is an MST file in a substorage (directory) whose name is t
he decimal LCID. |
| 200 Windows Installer checks the language list before installation proper begins to
determine the most appropriate language. |
| 201 If finds one that's not first on the list, it applies its embedded transform bef
ore installation proper begins. |
| 202 Otherwise it uses the installer database as-is, without any transformation. |
| 203 |
| 204 - installsite.org |
| 205 [Multi-Language MSI Packages without Setup.exe Launcher] (http://www.installsi
te.org/pages/en/msi/articles/embeddedlang/) |
| 206 The original page that documents the automatic application of embedded languag
e transforms. |
| 207 First written by Andreas Kerl at Microsoft Germany and then translated into En
glish. |
| 208 - MSDN |
| 209 [Template Summary property] (http://msdn.microsoft.com/en-us/library/windows/d
esktop/aa372070%28v=vs.85%29.aspx) |
| 210 Documents the format of the Template property in the SIS. |
| 211 From this page: "Merge Modules are the only packages that may have multiple la
nguages." |
| 212 Contrary to this quotation, MSI files are also allowed to have multiple langua
ges, although such support is undocumented. |
| 213 - MSDN |
| 214 [Embedded Transforms] (http://msdn.microsoft.com/en-us/library/windows/desktop
/aa368347%28v=vs.85%29.aspx) |
| 215 This page not only isn't very informative, it's also slightly wrong. |
| 216 There's an implication that an embedded transform is stored as a file, which i
s only sort-of right; |
| 217 the actual mechanism is as a substorage (directory). |
| 218 |
| 219 */ |
OLD | NEW |