summaryrefslogtreecommitdiffstats
path: root/Documentation/i18n.txt
blob: 2dd79db5cbf674d3eeeb70c78cc961d0b15503ab (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
Git is to some extent character encoding agnostic.

 - The contents of the blob objects are uninterpreted sequences
   of bytes.  There is no encoding translation at the core
   level.

 - Path names are encoded in UTF-8 normalization form C. This
   applies to tree objects, the index file, ref names, as well as
   path names in command line arguments, environment variables
   and config files (`.git/config` (see linkgit:git-config[1]),
   linkgit:gitignore[5], linkgit:gitattributes[5] and
   linkgit:gitmodules[5]).
+
Note that Git at the core level treats path names simply as
sequences of non-NUL bytes, there are no path name encoding
conversions (except on Mac and Windows). Therefore, using
non-ASCII path names will mostly work even on platforms and file
systems that use legacy extended ASCII encodings. However,
repositories created on such systems will not work properly on
UTF-8-based systems (e.g. Linux, Mac, Windows) and vice versa.
Additionally, many Git-based tools simply assume path names to
be UTF-8 and will fail to display other encodings correctly.

 - Commit log messages are typically encoded in UTF-8, but other
   extended ASCII encodings are also supported. This includes
   ISO-8859-x, CP125x and many others, but _not_ UTF-16/32,
   EBCDIC and CJK multi-byte encodings (GBK, Shift-JIS, Big5,
   EUC-x, CP9xx etc.).

Although we encourage that the commit log messages are encoded
in UTF-8, both the core and Git Porcelain are designed not to
force UTF-8 on projects.  If all participants of a particular
project find it more convenient to use legacy encodings, Git
does not forbid it.  However, there are a few things to keep in
mind.

. 'git commit' and 'git commit-tree' issues
  a warning if the commit log message given to it does not look
  like a valid UTF-8 string, unless you explicitly say your
  project uses a legacy encoding.  The way to say this is to
  have i18n.commitencoding in `.git/config` file, like this:
+
------------
[i18n]
	commitencoding = ISO-8859-1
------------
+
Commit objects created with the above setting record the value
of `i18n.commitencoding` in its `encoding` header.  This is to
help other people who look at them later.  Lack of this header
implies that the commit log message is encoded in UTF-8.

. 'git log', 'git show', 'git blame' and friends look at the
  `encoding` header of a commit object, and try to re-code the
  log message into UTF-8 unless otherwise specified.  You can
  specify the desired output encoding with
  `i18n.logoutputencoding` in `.git/config` file, like this:
+
------------
[i18n]
	logoutputencoding = ISO-8859-1
------------
+
If you do not have this configuration variable, the value of
`i18n.commitencoding` is used instead.

Note that we deliberately chose not to re-code the commit log
message when a commit is made to force UTF-8 at the commit
object level, because re-coding to UTF-8 is not necessarily a
reversible operation.