| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Forwarding-related changes uncovered this case
in integration.iter_limits test. Errors can happen also
when consuming data produced by cache, in which case there's
no suitable server to blame for the error.
|
| |
|
|
|
|
|
|
|
|
| |
And that made the "NO6: is KO" line extraneous.
Example in context:
[select][14162.01] => id: '15271' choosing from addresses: 0 v4 + 1 v6; names to resolve: 6 v4 + 5 v6; force_resolve: 0; NO6: IPv6 is OK
[select][14162.01] => id: '15271' choosing: 'ns1.p31.dynect.net.'@'2600:2000:2210::31#00053' with timeout 774 ms zone cut: 'amazon.com.'
[select][14162.01] => id: '15271' updating: 'ns1.p31.dynect.net.'@'2600:2000:2210::31#00053' zone cut: 'amazon.com.' with rtt 316 to srtt: 311 and variance: 89
|
|
|
|
|
|
|
|
|
| |
This reverts commit 0c9ea1332e1c4475043eab571f60915b90985999 (!1226).
CI rp:fwd-tls6.udp-asan now repeatedly shows use-after-free.
That could be a serious issue, and this commit's feature
seems less important than the risk. Let's revert until the issue
gets deeper investigation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use "monotonic" time-stamps for the dead_since field;
that breaks on system reboots, in which case we reset the stats.
(if the server was categorized as dead)
If the server times out afterwards, we'd fail the condition
`cur_state.consecutive_timeouts == old_state.consecutive_timeouts`
so its stats would not update. Therefore we'd get stuck forever
in a state where the unusable server has high priority (no_rtt_info).
This commit changes a bit more than was necessary to fix this,
including precision of the stats (in some cases).
|
|
|
|
|
|
|
|
|
|
|
|
| |
The approach was dubious: random shuffle, qsort() and choose the first.
The main functional problem was that qsort() isn't a stable sort,
so the effect of pre-shuffling is not reliable, even though I don't have
any evidence of this causing issues in practice.
The new code should also be a bit more efficient in terms of CPU and
consumed randomness, but that probably won't be noticeable.
The arrays passed into select_transport() are now const (no sorting),
which could make the code easier to "understand".
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
|
|
|
|
|
|
|
| |
They probably couldn't hang open for long, as each client request
should cause some cache-searching and thus close it, and even with
queries stopping I haven't managed to find a case where it would be
left open but... it's nicer to clean up and it should be very cheap.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch caps the timeout set on UDP queries to servers chosen in the
EXPLORE phase of the selection algorithm to two times the timeout that
would be set if we were EXPLOITing.
This measns that we no longer spend an unreasonable amount of time
probing servers that are probably dead anyway while ensuring that we do
probe them from time to time to check if they didn't come to life.
If the timeout value is capped and the server fails to respond, we don't
punish the server for it i.e. we don't cache the timeout.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switching to TCP instead of querying very slow servers over UDP has had
unwanted side effect – we would sometimes get stuck with a server
permanently switched to TCP. And if the server happens to not reply over
TCP we were in trouble.
Therefore after we TCP connect fails or timeouts we provide one last
chance for the server over UDP. This will not prevent the next request
to try TCP again on this server again, but we don't care because
DNS MUST ******* work over TCP.
|
|
|
|
|
|
|
|
|
|
| |
In particular, non-support of EDNS is implied iff FORMERR without OPT
comes. If OPT is there, one possibility is that there was something
wrong in the OPT that *we* sent, but it seems much more likely that
this particular server is just bad and we want to try another one.
https://tools.ietf.org/html/rfc6891#section-7
In particular, we would be in trouble if we dropped OPT in a zone
that is covered by DNSSEC.
|
|
|
|
|
| |
It's now consistent with KNOT_RCODE_FORMERR and the official name
https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-6
|
|
|
|
|
|
| |
Lame delegations are weird, they breed more lame delegations on broken
zones since trying another server from the same set usualy doesn't help.
We force resolution of another NS name in hope of getting somewhere.
|
|
|
|
|
|
|
| |
Previously there where resolve_badmsg and resolve_error functions used
to apply workarounds. This is now moved to selection.c and iterate.c
just provides feedback using the server selection API. Errors are now
handled centrally in selection.c:error.
|
|
|
|
|
| |
This is done by changing the type of address field in struct choice to
union inaddr and moving some conversion around.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Small things I've noticed while reading it all.
- line breaks: I believe <90 is OK, as usually the attempts to reduce
lengths impair readability
- avoid unnecessary casts; usually the type was visible
on the same line anyway
- avoid `|` on booleans
- one block gets de-indented (often badly shown in diffs)
- no need for UNRECOVERABLE_ERRORS in a header (and a weird one, too)
- recoverability from failed assertions (in case they're turned off)
|
| |
|
|
|
|
|
|
|
| |
- standardize cache key choice and ensure impossibility of collisions
- comment on interaction with GC; it would be better to give RTT
priority over most of other records
- be more robust wrt. value in cache
|
|
Design discussion: #447
Code discussion: !1030
|