summaryrefslogtreecommitdiffstats
path: root/t/t4015-diff-whitespace.sh
diff options
context:
space:
mode:
authorStefan Beller <sbeller@google.com>2017-06-30 22:53:07 +0200
committerJunio C Hamano <gitster@pobox.com>2017-06-30 22:59:42 +0200
commit2e2d5ac184de8facde4e14cec8b4e2a154480ed8 (patch)
tree7000d0063d6eacf35fab609487416284da838b05 /t/t4015-diff-whitespace.sh
parentdiff.c: buffer all output if asked to (diff)
downloadgit-2e2d5ac184de8facde4e14cec8b4e2a154480ed8.tar.xz
git-2e2d5ac184de8facde4e14cec8b4e2a154480ed8.zip
diff.c: color moved lines differently
When a patch consists mostly of moving blocks of code around, it can be quite tedious to ensure that the blocks are moved verbatim, and not undesirably modified in the move. To that end, color blocks that are moved within the same patch differently. For example (OM, del, add, and NM are different colors): [OM] -void sensitive_stuff(void) [OM] -{ [OM] - if (!is_authorized_user()) [OM] - die("unauthorized"); [OM] - sensitive_stuff(spanning, [OM] - multiple, [OM] - lines); [OM] -} void another_function() { [del] - printf("foo"); [add] + printf("bar"); } [NM] +void sensitive_stuff(void) [NM] +{ [NM] + if (!is_authorized_user()) [NM] + die("unauthorized"); [NM] + sensitive_stuff(spanning, [NM] + multiple, [NM] + lines); [NM] +} However adjacent blocks may be problematic. For example, in this potentially malicious patch, the swapping of blocks can be spotted: [OM] -void sensitive_stuff(void) [OM] -{ [OMA] - if (!is_authorized_user()) [OMA] - die("unauthorized"); [OM] - sensitive_stuff(spanning, [OM] - multiple, [OM] - lines); [OMA] -} void another_function() { [del] - printf("foo"); [add] + printf("bar"); } [NM] +void sensitive_stuff(void) [NM] +{ [NMA] + sensitive_stuff(spanning, [NMA] + multiple, [NMA] + lines); [NM] + if (!is_authorized_user()) [NM] + die("unauthorized"); [NMA] +} If the moved code is larger, it is easier to hide some permutation in the code, which is why some alternative coloring is needed. This patch implements the first mode: * basic alternating 'Zebra' mode This conveys all information needed to the user. Defer customization to later patches. First I implemented an alternative design, which would try to fingerprint a line by its neighbors to detect if we are in a block or at the boundary. This idea iss error prone as it inspected each line and its neighboring lines to determine if the line was (a) moved and (b) if was deep inside a hunk by having matching neighboring lines. This is unreliable as the we can construct hunks which have equal neighbors that just exceed the number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter as a line, that is permutated to AXYZCXYZBXYZD..'). Instead this provides a dynamic programming greedy algorithm that finds the largest moved hunk and then has several modes on highlighting bounds. A note on the options '--submodule=diff' and '--color-words/--word-diff': In the conversion to use emit_line in the prior patches both submodules as well as word diff output carefully chose to call emit_line with sign=0. All output with sign=0 is ignored for move detection purposes in this patch, such that no weird looking output will be generated for these cases. This leads to another thought: We could pass on '--color-moved' to submodules such that they color up moved lines for themselves. If we'd do so only line moves within a repository boundary are marked up. It is useful to have moved lines colored, but there are annoying corner cases, such as a single line moved, that is very common. For example in a typical patch of C code, we have closing braces that end statement blocks or functions. While it is technically true that these lines are moved as they show up elsewhere, it is harmful for the review as the reviewers attention is drawn to such a minor side annoyance. For now let's have a simple solution of hardcoding the number of moved lines to be at least 3 before coloring them. Note, that the length is applied across all blocks to find the 'lonely' blocks that pollute new code, but do not interfere with a permutated block where each permutation has less lines than 3. Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t4015-diff-whitespace.sh')
-rwxr-xr-xt/t4015-diff-whitespace.sh261
1 files changed, 261 insertions, 0 deletions
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 289806d0c7..29704ae14e 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -972,4 +972,265 @@ test_expect_success 'option overrides diff.wsErrorHighlight' '
'
+test_expect_success 'detect moved code, complete file' '
+ git reset --hard &&
+ cat <<-\EOF >test.c &&
+ #include<stdio.h>
+ main()
+ {
+ printf("Hello World");
+ }
+ EOF
+ git add test.c &&
+ git commit -m "add main function" &&
+ git mv test.c main.c &&
+ test_config color.diff.oldMoved "normal red" &&
+ test_config color.diff.newMoved "normal green" &&
+ git diff HEAD --color-moved --no-renames | test_decode_color >actual &&
+ cat >expected <<-\EOF &&
+ <BOLD>diff --git a/main.c b/main.c<RESET>
+ <BOLD>new file mode 100644<RESET>
+ <BOLD>index 0000000..a986c57<RESET>
+ <BOLD>--- /dev/null<RESET>
+ <BOLD>+++ b/main.c<RESET>
+ <CYAN>@@ -0,0 +1,5 @@<RESET>
+ <BGREEN>+<RESET><BGREEN>#include<stdio.h><RESET>
+ <BGREEN>+<RESET><BGREEN>main()<RESET>
+ <BGREEN>+<RESET><BGREEN>{<RESET>
+ <BGREEN>+<RESET><BGREEN>printf("Hello World");<RESET>
+ <BGREEN>+<RESET><BGREEN>}<RESET>
+ <BOLD>diff --git a/test.c b/test.c<RESET>
+ <BOLD>deleted file mode 100644<RESET>
+ <BOLD>index a986c57..0000000<RESET>
+ <BOLD>--- a/test.c<RESET>
+ <BOLD>+++ /dev/null<RESET>
+ <CYAN>@@ -1,5 +0,0 @@<RESET>
+ <BRED>-#include<stdio.h><RESET>
+ <BRED>-main()<RESET>
+ <BRED>-{<RESET>
+ <BRED>-printf("Hello World");<RESET>
+ <BRED>-}<RESET>
+ EOF
+
+ test_cmp expected actual
+'
+
+test_expect_success 'detect malicious moved code, inside file' '
+ test_config color.diff.oldMoved "normal red" &&
+ test_config color.diff.newMoved "normal green" &&
+ test_config color.diff.oldMovedAlternative "blue" &&
+ test_config color.diff.newMovedAlternative "yellow" &&
+ git reset --hard &&
+ cat <<-\EOF >main.c &&
+ #include<stdio.h>
+ int stuff()
+ {
+ printf("Hello ");
+ printf("World\n");
+ }
+
+ int secure_foo(struct user *u)
+ {
+ if (!u->is_allowed_foo)
+ return;
+ foo(u);
+ }
+
+ int main()
+ {
+ foo();
+ }
+ EOF
+ cat <<-\EOF >test.c &&
+ #include<stdio.h>
+ int bar()
+ {
+ printf("Hello World, but different\n");
+ }
+
+ int another_function()
+ {
+ bar();
+ }
+ EOF
+ git add main.c test.c &&
+ git commit -m "add main and test file" &&
+ cat <<-\EOF >main.c &&
+ #include<stdio.h>
+ int stuff()
+ {
+ printf("Hello ");
+ printf("World\n");
+ }
+
+ int main()
+ {
+ foo();
+ }
+ EOF
+ cat <<-\EOF >test.c &&
+ #include<stdio.h>
+ int bar()
+ {
+ printf("Hello World, but different\n");
+ }
+
+ int secure_foo(struct user *u)
+ {
+ foo(u);
+ if (!u->is_allowed_foo)
+ return;
+ }
+
+ int another_function()
+ {
+ bar();
+ }
+ EOF
+ git diff HEAD --no-renames --color-moved=zebra| test_decode_color >actual &&
+ cat <<-\EOF >expected &&
+ <BOLD>diff --git a/main.c b/main.c<RESET>
+ <BOLD>index 27a619c..7cf9336 100644<RESET>
+ <BOLD>--- a/main.c<RESET>
+ <BOLD>+++ b/main.c<RESET>
+ <CYAN>@@ -5,13 +5,6 @@<RESET> <RESET>printf("Hello ");<RESET>
+ printf("World\n");<RESET>
+ }<RESET>
+ <RESET>
+ <BRED>-int secure_foo(struct user *u)<RESET>
+ <BRED>-{<RESET>
+ <BLUE>-if (!u->is_allowed_foo)<RESET>
+ <BLUE>-return;<RESET>
+ <BRED>-foo(u);<RESET>
+ <BLUE>-}<RESET>
+ <BLUE>-<RESET>
+ int main()<RESET>
+ {<RESET>
+ foo();<RESET>
+ <BOLD>diff --git a/test.c b/test.c<RESET>
+ <BOLD>index 1dc1d85..2bedec9 100644<RESET>
+ <BOLD>--- a/test.c<RESET>
+ <BOLD>+++ b/test.c<RESET>
+ <CYAN>@@ -4,6 +4,13 @@<RESET> <RESET>int bar()<RESET>
+ printf("Hello World, but different\n");<RESET>
+ }<RESET>
+ <RESET>
+ <BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
+ <BGREEN>+<RESET><BGREEN>{<RESET>
+ <YELLOW>+<RESET><YELLOW>foo(u);<RESET>
+ <BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
+ <BGREEN>+<RESET><BGREEN>return;<RESET>
+ <YELLOW>+<RESET><YELLOW>}<RESET>
+ <YELLOW>+<RESET>
+ int another_function()<RESET>
+ {<RESET>
+ bar();<RESET>
+ EOF
+
+ test_cmp expected actual
+'
+
+test_expect_success 'no effect from --color-moved with --word-diff' '
+ cat <<-\EOF >text.txt &&
+ Lorem Ipsum is simply dummy text of the printing and typesetting industry.
+ EOF
+ git add text.txt &&
+ git commit -a -m "clean state" &&
+ cat <<-\EOF >text.txt &&
+ simply Lorem Ipsum dummy is text of the typesetting and printing industry.
+ EOF
+ git diff --color-moved --word-diff >actual &&
+ git diff --word-diff >expect &&
+ test_cmp expect actual
+'
+
+test_expect_success 'move detection ignoring whitespace ' '
+ git reset --hard &&
+ cat <<\EOF >lines.txt &&
+line 1
+line 2
+line 3
+line 4
+line 5
+line 6
+line 7
+EOF
+ git add lines.txt &&
+ git commit -m "add poetry" &&
+ cat <<\EOF >lines.txt &&
+ line 5
+ line 6
+ line 7
+line 1
+line 2
+line 3
+line 4
+EOF
+ test_config color.diff.oldMoved "magenta" &&
+ test_config color.diff.newMoved "cyan" &&
+ git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
+ cat <<-\EOF >expected &&
+ <BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+ <BOLD>index 734156d..eb89ead 100644<RESET>
+ <BOLD>--- a/lines.txt<RESET>
+ <BOLD>+++ b/lines.txt<RESET>
+ <CYAN>@@ -1,7 +1,7 @@<RESET>
+ <GREEN>+<RESET> <GREEN>line 5<RESET>
+ <GREEN>+<RESET> <GREEN>line 6<RESET>
+ <GREEN>+<RESET> <GREEN>line 7<RESET>
+ line 1<RESET>
+ line 2<RESET>
+ line 3<RESET>
+ line 4<RESET>
+ <RED>-line 5<RESET>
+ <RED>-line 6<RESET>
+ <RED>-line 7<RESET>
+ EOF
+ test_cmp expected actual &&
+
+ git diff HEAD --no-renames -w --color-moved| test_decode_color >actual &&
+ cat <<-\EOF >expected &&
+ <BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+ <BOLD>index 734156d..eb89ead 100644<RESET>
+ <BOLD>--- a/lines.txt<RESET>
+ <BOLD>+++ b/lines.txt<RESET>
+ <CYAN>@@ -1,7 +1,7 @@<RESET>
+ <CYAN>+<RESET> <CYAN>line 5<RESET>
+ <CYAN>+<RESET> <CYAN>line 6<RESET>
+ <CYAN>+<RESET> <CYAN>line 7<RESET>
+ line 1<RESET>
+ line 2<RESET>
+ line 3<RESET>
+ line 4<RESET>
+ <MAGENTA>-line 5<RESET>
+ <MAGENTA>-line 6<RESET>
+ <MAGENTA>-line 7<RESET>
+ EOF
+ test_cmp expected actual
+'
+
+test_expect_success 'move detection with submodules' '
+ test_create_repo bananas &&
+ echo ripe >bananas/recipe &&
+ git -C bananas add recipe &&
+ test_commit fruit &&
+ test_commit -C bananas recipe &&
+ git submodule add ./bananas &&
+ git add bananas &&
+ git commit -a -m "bananas are like a heavy library?" &&
+ echo foul >bananas/recipe &&
+ echo ripe >fruit.t &&
+
+ git diff --submodule=diff --color-moved >actual &&
+
+ # no move detection as the moved line is across repository boundaries.
+ test_decode_color <actual >decoded_actual &&
+ ! grep BGREEN decoded_actual &&
+ ! grep BRED decoded_actual &&
+
+ # nor did we mess with it another way
+ git diff --submodule=diff | test_decode_color >expect &&
+ test_cmp expect decoded_actual
+'
+
test_done