summaryrefslogtreecommitdiffstats
path: root/t/t4210-log-i18n.sh
diff options
context:
space:
mode:
authorÆvar Arnfjörð Bjarmason <avarab@gmail.com>2019-06-28 01:39:04 +0200
committerJunio C Hamano <gitster@pobox.com>2019-06-28 18:11:09 +0200
commit4e2443b1813dded87c9cc1138f22af73748022b8 (patch)
tree6b3939c932a90bb10bee05f7c1f8e3f7867b7b53 /t/t4210-log-i18n.sh
parentThe third batch (diff)
downloadgit-4e2443b1813dded87c9cc1138f22af73748022b8.tar.xz
git-4e2443b1813dded87c9cc1138f22af73748022b8.zip
log tests: test regex backends in "--encode=<enc>" tests
Improve the tests added in 04deccda11 ("log: re-encode commit messages before grepping", 2013-02-11) to test the regex backends. Those tests never worked as advertised, due to the is_fixed() optimization in grep.c (which was in place at the time), and the needle in the tests being a fixed string. We'd thus always use the "fixed" backend during the tests, which would use the kwset() backend. This backend liberally accepts any garbage input, so invalid encodings would be silently accepted. In a follow-up commit we'll fix this bug, this test just demonstrates the existing issue. In practice this issue happened on Windows, see [1], but due to the structure of the existing tests & how liberal the kwset code is about garbage we missed this. Cover this blind spot by testing all our regex engines. The PCRE backend will spot these invalid encodings. It's possible that this test breaks the "basic" and "extended" backends on some systems that are more anal than glibc about the encoding of locale issues with POSIX functions that I can remember, but PCRE is more careful about the validation. 1. https://public-inbox.org/git/nycvar.QRO.7.76.6.1906271113090.44@tvgsbejvaqbjf.bet/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t4210-log-i18n.sh')
-rwxr-xr-xt/t4210-log-i18n.sh41
1 files changed, 40 insertions, 1 deletions
diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
index 7c519436ef..86d22c1d4c 100755
--- a/t/t4210-log-i18n.sh
+++ b/t/t4210-log-i18n.sh
@@ -1,12 +1,15 @@
#!/bin/sh
test_description='test log with i18n features'
-. ./test-lib.sh
+. ./lib-gettext.sh
# two forms of é
utf8_e=$(printf '\303\251')
latin1_e=$(printf '\351')
+# invalid UTF-8
+invalid_e=$(printf '\303\50)') # ")" at end to close opening "("
+
test_expect_success 'create commits in different encodings' '
test_tick &&
cat >msg <<-EOF &&
@@ -53,4 +56,40 @@ test_expect_success 'log --grep does not find non-reencoded values (latin1)' '
test_must_be_empty actual
'
+for engine in fixed basic extended perl
+do
+ prereq=
+ result=success
+ if test $engine = "perl"
+ then
+ result=failure
+ prereq="PCRE"
+ else
+ prereq=""
+ fi
+ force_regex=
+ if test $engine != "fixed"
+ then
+ force_regex=.*
+ fi
+ test_expect_$result GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not find non-reencoded values (latin1 + locale)" "
+ cat >expect <<-\EOF &&
+ latin1
+ utf8
+ EOF
+ LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$latin1_e\" >actual &&
+ test_cmp expect actual
+ "
+
+ test_expect_success GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not find non-reencoded values (latin1 + locale)" "
+ LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$utf8_e\" >actual &&
+ test_must_be_empty actual
+ "
+
+ test_expect_$result GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not die on invalid UTF-8 value (latin1 + locale + invalid needle)" "
+ LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$invalid_e\" >actual &&
+ test_must_be_empty actual
+ "
+done
+
test_done