diff options
author | Jeff King <peff@peff.net> | 2024-03-12 10:17:39 +0100 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2024-03-12 21:28:10 +0100 |
commit | 2ec225d397d564d9c6bb907d85a58507dec75989 (patch) | |
tree | 8158ad80d8691ddb1d497106e17f6d3330b2fa2b /sequencer.c | |
parent | find multi-byte comment chars in NUL-terminated strings (diff) | |
download | git-2ec225d397d564d9c6bb907d85a58507dec75989.tar.xz git-2ec225d397d564d9c6bb907d85a58507dec75989.zip |
find multi-byte comment chars in unterminated buffers
As with the previous patch, we need to swap out single-byte matching for
something like starts_with() to match all bytes of a multi-byte comment
character. But for cases where the buffer is not NUL-terminated (and we
instead have an explicit size or end pointer), it's not safe to use
starts_with(), as it might walk off the end of the buffer.
Let's introduce a new starts_with_mem() that does the same thing but
also accepts the length of the "haystack" str and makes sure not to walk
past it.
Note that in most cases the existing code did not need a length check at
all, since it was written in a way that knew we had at least one byte
available (and that was all we checked). So I had to read each one to
find the appropriate bounds. The one exception is sequencer.c's
add_commented_lines(), where we can actually get rid of the length
check. Just like starts_with(), our starts_with_mem() handles an empty
haystack variable by not matching (assuming a non-empty prefix).
A few notes on the implementation of starts_with_mem():
- it would be equally correct to take an "end" pointer (and indeed,
many of the callers have this and have to subtract to come up with
the length). I think taking a ptr/size combo is a more usual
interface for our codebase, though, and has the added benefit that
the function signature makes it harder to mix up the three
parameters.
- we could obviously build starts_with() on top of this by passing
strlen(str) as the length. But it's possible that starts_with() is a
relatively hot code path, and it should not pay that penalty (it can
generally return an answer proportional to the size of the prefix,
not the whole string).
- it naively feels like xstrncmpz() should be able to do the same
thing, but that's not quite true. If you pass the length of the
haystack buffer, then strncmp() finds that a shorter prefix string
is "less than" than the haystack, even if the haystack starts with
the prefix. If you pass the length of the prefix, then you risk
reading past the end of the haystack if it is shorter than the
prefix. So I think we really do need a new function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'sequencer.c')
-rw-r--r-- | sequencer.c | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/sequencer.c b/sequencer.c index 991a2dbe96..664986e3b2 100644 --- a/sequencer.c +++ b/sequencer.c @@ -1840,7 +1840,7 @@ static int is_fixup_flag(enum todo_command command, unsigned flag) static void add_commented_lines(struct strbuf *buf, const void *str, size_t len) { const char *s = str; - while (len > 0 && s[0] == comment_line_char) { + while (starts_with_mem(s, len, comment_line_str)) { size_t count; const char *n = memchr(s, '\n', len); if (!n) @@ -2562,7 +2562,7 @@ static int parse_insn_line(struct repository *r, struct todo_item *item, /* left-trim */ bol += strspn(bol, " \t"); - if (bol == eol || *bol == '\r' || *bol == comment_line_char) { + if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_str)) { item->command = TODO_COMMENT; item->commit = NULL; item->arg_offset = bol - buf; |