summaryrefslogtreecommitdiffstats
path: root/doc/dsc-manual.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/dsc-manual.tex')
-rw-r--r--doc/dsc-manual.tex1863
1 files changed, 1863 insertions, 0 deletions
diff --git a/doc/dsc-manual.tex b/doc/dsc-manual.tex
new file mode 100644
index 0000000..501d34a
--- /dev/null
+++ b/doc/dsc-manual.tex
@@ -0,0 +1,1863 @@
+\documentclass{report}
+\usepackage{epsfig}
+\usepackage{path}
+\usepackage{fancyvrb}
+
+\def\dsc{{\sc dsc}}
+
+\DefineVerbatimEnvironment%
+ {MyVerbatim}{Verbatim}
+ {frame=lines,framerule=0.8mm,fontsize=\small}
+
+\renewcommand{\abstractname}{}
+
+\begin{document}
+
+\begin{titlepage}
+\title{DSC Manual}
+\author{Duane Wessels, Measurement Factory\\
+Ken Keys, CAIDA\\
+\\
+http://dns.measurement-factory.com/tools/dsc/}
+\date{\today}
+\end{titlepage}
+
+\maketitle
+
+\begin{abstract}
+\setlength{\parskip}{1ex}
+\section{Copyright}
+
+The DNS Statistics Collector (dsc)
+
+Copyright 2003-2007 by The Measurement Factory, Inc., 2007-2008 by Internet
+Systems Consortium, Inc., 2008-2019 by OARC, Inc.
+
+{\em info@measurement-factory.com\/}, {\em info@isc.org\/}
+
+\section{License}
+
+{\dsc} is licensed under the terms of the BSD license:
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+Neither the name of The Measurement Factory nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+
+\section{Contributors}
+\begin{itemize}
+\item Duane Wessels, Measurement Factory
+\item Ken Keys, Cooperative Association for Internet Data Analysis
+\item Sebastian Castro, New Zealand Registry Services
+\end{itemize}
+\end{abstract}
+
+
+\tableofcontents
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\chapter{Introduction}
+
+{\dsc} is a system for collecting and presenting statistics from
+a busy DNS server.
+
+\section{Components}
+
+{\dsc} consists of the following components:
+\begin{itemize}
+\item A data collector
+\item A data presenter, where data is archived and rendered
+\item A method for securely transferring data from the collector
+ to the presenter
+\item Utilities and scripts that parse XML and archive files from the collector
+\item Utilities and scripts that generate graphs and HTML pages
+\end{itemize}
+
+\subsection{The Collector}
+
+The collector is a binary program, named {\tt dsc\/}, which snoops
+on DNS messages. It is written in C and uses {\em libpcap\/} for
+packet capture.
+
+{\tt dsc\/} uses a relatively simple configuration file called {\em
+dsc.conf\/} to define certain parameters and options. The configuration
+file also determines the {\em datasets\/} that {\tt dsc\/} collects.
+
+A Dataset is a 2-D array of counters of IP/DNS message properties.
+You can define each dimension of the array independently. For
+example you might define a dataset categorized by DNS query type
+along one dimension and TLD along the other.
+{\tt dsc\/} dumps the datasets from memory to XML files every 60 seconds.
+
+\subsection{XML Data Transfer}
+
+You may run the {\dsc} collector on a remote machine. That
+is, the collector may run on a different machine than where the
+data is archived and displayed. {\dsc} includes some Perl and {\tt /bin/sh}
+scripts to move XML files from collector to presenter. One
+technique uses X.509 certificates and a secure HTTP server. The other
+uses {\em rsync\/}, presumably over {\em ssh\/}.
+
+\subsubsection{X.509/SSL}
+
+To make this work, Apache/mod\_ssl should run on the machine where data
+is archived and presented.
+Data transfer is authenticated via SSL X.509 certificates. A Perl
+CGI script handles all PUT requests on the server. If the client
+certificate is allowed, XML files are stored in the appropriate
+directory.
+
+A shell script runs on the collector to upload the XML files. It
+uses {\tt curl\/}\footnote{http://curl.haxx.se} to establish an
+HTTPS connection. XML files are bundled together with {\tt tar\/}
+before transfer to eliminate per-connection delays.
+You could use {\tt scp\/} or {\tt rsync\/} instead of
+{\tt curl\/} if you like.
+
+\path|put-file.pl| is the script that accepts PUT requests on the
+HTTP server. The HTTP server validates the client's X.509 certificate.
+If the certificate is invalid, the PUT request is denied. This
+script reads environment variables to get X.509 parameters. The
+uploaded-data is stored in a directory based on the X.509 Organizational
+Unit (server) and Common Name fields (node).
+
+\subsubsection{rsync/ssh}
+
+This technique uses the {\em rsync\/} utility to transfer files.
+You'll probably want to use {\em ssh\/} as the underlying transport,
+although you can still use the less-secure {\em rsh\/} or native
+rsync server transports if you like.
+
+If you use {\em ssh\/} then you'll need to create passphrase-less
+SSH keys so that the transfer can occur automatically. You may
+want to create special {\em dsc\/} userids on both ends as well.
+
+\subsection{The Extractor}
+
+The XML extractor is a Perl script that reads the XML files from
+{\tt dsc\/}. The extractor essentially converts the XML-structured
+data to a format that is easier (faster) for the graphing tools to
+parse. Currently the extracted data files are line-based ASCII
+text files. Support for SQL databases is planned for the future.
+
+\subsection{The Grapher}
+
+{\dsc} uses {\em Ploticus\/}\footnote{http://ploticus.sourceforge.net/}
+as the graphing engine. A Perl module and CGI script read extracted
+data files and generate Ploticus scriptfiles to generate plots. Plots
+are always generated on demand via the CGI application.
+
+\path|dsc-grapher.pl| is the script that displays graphs from the
+archived data.
+
+
+\section{Architecture}
+
+Figure~\ref{fig-architecture} shows the {\dsc} architecture.
+
+\begin{figure}
+\centerline{\psfig{figure=dsc-arch.eps,width=3.5in}}
+\caption{\label{fig-architecture}The {\dsc} architecture.}
+\end{figure}
+
+Note that {\dsc} utilizes the concept of {\em servers\/} and {\em
+nodes\/}. A server is generally a logical service, which may
+actually consist of multiple nodes. Figure~\ref{fig-architecture}
+shows six collectors (the circles) and two servers (the rounded
+rectangles). For a real-world example, consider a DNS root server.
+IP Anycast allows a DNS root server to have geographically distributed
+nodes that share a single IP address. We call each instance a
+{\em node\/} and all nodes sharing the single IP address belong
+to the same {\em server\/}.
+
+The {\dsc} collector program runs on or near\footnote{by
+``near'' we mean that packets may be sniffed remotely via Ethernet taps, switch
+port mirroring, or a SPAN port.} the remote nodes. Its XML output
+is transferred to the presentation machine via HTTPS PUTs (or something simpler
+if you prefer).
+
+The presentation machine includes an HTTP(S) server. The extractor looks
+for XML files PUT there by the collectors. A CGI script also runs on
+the HTTP server to display graphs and other information.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\chapter{Installing the Presenter}
+
+You'll probably want to get the Presenter working before the Collector.
+If you're using the secure XML data transfer, you'll need to
+generate both client- and server-side X.509 certificates.
+
+Installing the Presenter involves the following steps:
+\begin{itemize}
+\setlength{\itemsep}{0ex plus 0.5ex minus 0.0ex}
+\item
+ Install Perl dependencies
+\item
+ Install {\dsc} software
+\item
+ Create X.509 certificates (optional)
+\item
+ Set up a secure HTTP server (e.g., Apache and mod\_ssl)
+\item
+ Add some cron jobs
+\end{itemize}
+
+
+\section{Install Perl Dependencies}
+
+{\dsc} uses Perl for the extractor and grapher components. Chances are
+that you'll need Perl-5.8, or maybe only Perl-5.6. You'll also need
+these readily available third-party Perl modules, which you
+can find via CPAN:
+
+\begin{itemize}
+\setlength{\itemsep}{0ex plus 0.5ex minus 0.0ex}
+ \item CGI-Untaint (CGI::Untaint)
+ \item CGI.pm (CGI)
+ \item Digest-MD5 (Digest::MD5)
+ \item File-Flock (File::Flock)
+ \item File-Spec (File::Spec)
+ \item File-Temp (File::Temp)
+ \item Geography-Countries (Geography::Countries)
+ \item Hash-Merge (Hash::Merge)
+ \item IP-Country (IP::Country)
+ \item MIME-Base64 (MIME::Base64)
+ \item Math-Calc-Units (Math::Calc::Units)
+ \item Scalar-List-Utils (List::Util)
+ \item Text-Template (Text::Template)
+ \item URI (URI::Escape)
+ \item XML-Simple (XML::Simple)
+ \item Net-DNS-Resolver (Net::DNS::Resolver)
+
+\end{itemize}
+
+\noindent
+Also note that XML::Simple requires XML::Parser, which in
+turn requires the {\em expat\/} package.
+
+\section{Install Ploticus}
+
+{\dsc} uses Ploticus to generate plots and graphs. You can find
+this software at \verb|http://ploticus.sourceforge.net|. The {\em
+Download\/} page has links to some pre-compiled binaries and packages.
+FreeBSD and NetBSD users can find Ploticus in the ports/packages
+collection.
+
+
+\section{Install {\dsc} Software}
+
+All of the extractor and grapher tools are Perl or {\tt /bin/sh}
+scripts, so there is no need to compile anything. Still,
+you should run {\tt make} first:
+
+\begin{MyVerbatim}
+% cd presenter
+% make
+\end{MyVerbatim}
+
+If you see errors about missing Perl prerequisites, you may want
+to correct those before continuing.
+
+The next step is to install the files. Recall that
+\path|/usr/local/dsc| is the hard-coded installation prefix.
+You must create it manually:
+
+\begin{MyVerbatim}
+% mkdir /usr/local/dsc
+% make install
+\end{MyVerbatim}
+
+Note that {\dsc}'s Perl modules are installed in the
+``site\_perl'' directory. You'll probably need {\em root\/}
+privileges to install files there.
+
+\section{CGI Symbolic Links}
+
+{\dsc} has a couple of CGI scripts that are installed
+into \path|/usr/local/dsc/libexec|. You should add symbolic
+links from your HTTP server's \path|cgi-bin| directory to
+these scripts.
+
+Both of these scripts have been designed to be mod\_perl-friendly.
+
+\begin{MyVerbatim}
+% cd /usr/local/apache/cgi-bin
+% ln -s /usr/local/dsc/libexec/put-file.pl
+% ln -s /usr/local/dsc/libexec/dsc-grapher.pl
+\end{MyVerbatim}
+
+You can skip the \path|put-file.pl| link if you plan to use
+{\em rsync\/} to transfer XML files.
+If you cannot create symbolic links, you'll need to manually
+copy the scripts to the appropriate directory.
+
+
+\section{/usr/local/dsc/data}
+
+\subsection{X.509 method}
+
+This directory is where \path|put-file.pl| writes incoming XML
+files. It should have been created when you ran {\em make install\/} earlier.
+XML files are actually placed in {\em server\/} and {\em
+node\/} subdirectories based on the authorized client X.509 certificate
+parameters. If you want \path|put-file.pl| to automatically create
+the subdirectories, the \path|data| directory must be writable by
+the process owner:
+
+\begin{MyVerbatim}
+% chgrp nobody /usr/local/dsc/data/
+% chmod 2775 /usr/local/dsc/data/
+\end{MyVerbatim}
+
+Alternatively, you can create {\em server\/} and {\em node\/} directories
+in advance and make those writable.
+
+\begin{MyVerbatim}
+% mkdir /usr/local/dsc/data/x-root/
+% mkdir /usr/local/dsc/data/x-root/blah/
+% mkdir /usr/local/dsc/data/x-root/blah/incoming/
+% chgrp nobody /usr/local/dsc/data/x-root/blah/
+% chmod 2775 /usr/local/dsc/data/x-root/blah/incoming/
+\end{MyVerbatim}
+
+Make sure that \path|/usr/local/dsc/data/| is on a large partition with
+plenty of free space. You can make it a symbolic link to another
+partition if necessary. Note that a typical {\dsc} installation
+for a large DNS root server requires about 4GB to hold a year's worth
+of data.
+
+\subsection{rsync Method}
+
+The directory structure is the same as above (for X.509). The only
+differences are that:
+\begin{itemize}
+\item
+ The {\em server\/}, {\em node\/}, and {\em incoming\/}
+ directories must be made in advance.
+\item
+ The directories should be writable by the userid associated
+ with the {\em rsync}/{\em ssh\/} connection. You may want
+ to create a dedicated {\em dsc\/} userid for this.
+\end{itemize}
+
+
+\section{/usr/local/dsc/var/log}
+
+The \path|put-file.pl| script logs its activity to
+\path|put-file.log| in this directory. It should have been
+created when you ran {\em make install\/} earlier. The directory
+should be writable by the HTTP server userid (usually {\em nobody\/}
+or {\em www\/}). Unfortunately the installation isn't fancy enough
+to determine that userid yet, so you must change the ownership manually:
+
+\begin{MyVerbatim}
+% chgrp nobody /usr/local/dsc/var/log/
+\end{MyVerbatim}
+
+Furthermore, you probably want to make sure the log file does not
+grow indefinitely. For example, on FreeBSD we add this line to \path|/etc/newsyslog.conf|:
+
+\begin{MyVerbatim}
+/usr/local/dsc/var/log/put-file.log nobody:wheel 644 10 * @T00 BN
+\end{MyVerbatim}
+
+You need not worry about this directory if you are using the
+{\em rsync\/} upload method.
+
+\section{/usr/local/dsc/cache}
+
+This directory, also created by {\em make install\/} above, holds cached
+plot images. It also must be writable by the HTTP userid:
+
+\begin{MyVerbatim}
+% chgrp nobody /usr/local/dsc/cache/
+\end{MyVerbatim}
+
+\section{Cron Jobs}
+
+{\dsc} requires two cron jobs on the Presenter. The first
+is the one that processes incoming XML files. It is called
+\path|refile-and-grok.sh|. We recommend running it every
+minute. You also may want to run the jobs at a lowerer priority
+with {\tt nice\/}. Here is the cron job that we use:
+
+\begin{MyVerbatim}
+* * * * * /usr/bin/nice -10 /usr/local/dsc/libexec/refile-and-grok.sh
+\end{MyVerbatim}
+
+The other useful cron script is \path|remove-xmls.pl|. It removes
+XML files older than a specified number of days. Since most of the
+information in the XML files is archived into easier-to-parse
+data files, you can remove the XML files after a few days. This is
+the job that we use:
+
+\begin{MyVerbatim}
+@midnight find /usr/local/dsc/data/ | /usr/local/dsc/libexec/remove-xmls.pl 7
+\end{MyVerbatim}
+
+\section{Data URIs}
+
+{\dsc} uses ``Data URIs'' by default. This is a URI where the
+content is base-64 encoded into the URI string. It allows us
+to include images directly in HTML output, such that the browser
+does not have to make additional HTTP requests for the images.
+Data URIs may not work with some browsers.
+
+To disable Data URIs, edit {\em presenter/perllib/DSC/grapher.pm\/}
+and change this line:
+
+\begin{verbatim}
+ $use_data_uri = 1;
+\end{verbatim}
+
+to
+
+\begin{verbatim}
+ $use_data_uri = 0;
+\end{verbatim}
+
+Also make this symbolic link from your HTTP servers ``htdocs'' directory:
+
+\begin{verbatim}
+# cd htdocs
+# ln -s /usr/local/dsc/share/html dsc
+\end{verbatim}
+
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\chapter{Configuring the {\dsc} Presenter}
+
+This chapter describes how to create X.509 certificates and configure
+Apache/mod\_ssl. If you plan on using a different upload
+technique (such as scp or rsync) you can skip these instructions.
+
+\section{Generating X.509 Certificates}
+
+We use X.509 certificates to authenticate both sides
+of an SSL connection when uploading XML data files from
+the collector to the presenter.
+
+Certificate generation is a tricky thing. We use three different
+types of certificates:
+\begin{enumerate}
+\item A self-signed root CA certificate
+\item A server certificate
+\item Client certificates for each collector node
+\end{enumerate}
+
+In the client certificates
+we use X.509 fields to store the collector's server and node name.
+The Organizational Unit Name (OU) becomes the server name and
+the Common Name (CN) becomes the node name.
+
+The {\dsc} source code distribution includes some shell scripts
+that we have
+used to create X.509 certificates. You can find them in the
+\path|presenter/certs| directory. Note these are not installed
+into \path|/usr/local/dsc|. You should edit \path|openssl.conf|
+and enter the relevant information for your organization.
+
+\subsection{Certificate Authority}
+
+You may need to create a self-signed certificate authority if you
+don't already have one. The CA signs client and server certificates.
+You will need to distribute the CA and client certificates to
+collector sites. Here is how to use our \path|create-ca-cert.sh|
+script:
+
+\begin{MyVerbatim}
+% sh create-ca-cert.sh
+CREATING CA CERT
+Generating a 2048 bit RSA private key
+..............................................................................
+............+++
+......+++
+writing new private key to './private/cakey.pem'
+Enter PEM pass phrase:
+Verifying - Enter PEM pass phrase:
+-----
+\end{MyVerbatim}
+
+
+\subsection{Server Certificate}
+
+The server certificate is used by the HTTP server (Apache/mod\_ssl).
+The clients will have a copy of the CA certificate so they
+can validate the server's certificate when uploading XML files.
+Use the \path|create-srv-cert.sh| script to create a server
+certificate:
+
+\begin{MyVerbatim}
+% sh create-srv-cert.sh
+CREATING SERVER REQUEST
+Generating a 1024 bit RSA private key
+..........................++++++
+.....................................++++++
+writing new private key to 'server/server.key'
+Enter PEM pass phrase:
+Verifying - Enter PEM pass phrase:
+-----
+You are about to be asked to enter information that will be incorporated
+into your certificate request.
+What you are about to enter is what is called a Distinguished Name or a DN.
+There are quite a few fields but you can leave some blank
+For some fields there will be a default value,
+If you enter '.', the field will be left blank.
+-----
+Country Name (2 letter code) [AU]:US
+State or Province Name (full name) [Some-State]:Colorado
+Locality Name (eg, city) []:Boulder
+Organization Name (eg, company) [Internet Widgits Pty Ltd]:The Measurement Factory, Inc
+Organizational Unit Name (eg, section) []:DNS
+Common Name (eg, YOUR name) []:dns.measurement-factory.com
+Email Address []:wessels@measurement-factory.com
+
+Please enter the following 'extra' attributes
+to be sent with your certificate request
+A challenge password []:
+An optional company name []:
+Enter pass phrase for server/server.key:
+writing RSA key
+CREATING SERVER CERT
+Using configuration from ./openssl.conf
+Enter pass phrase for ./private/cakey.pem:
+Check that the request matches the signature
+Signature ok
+The Subject's Distinguished Name is as follows
+countryName :PRINTABLE:'US'
+stateOrProvinceName :PRINTABLE:'Colorado'
+localityName :PRINTABLE:'Boulder'
+organizationName :PRINTABLE:'The Measurement Factory, Inc'
+organizationalUnitName:PRINTABLE:'DNS'
+commonName :PRINTABLE:'dns.measurement-factory.com'
+emailAddress :IA5STRING:'wessels@measurement-factory.com'
+Certificate is to be certified until Jun 3 20:06:17 2013 GMT (3000 days)
+Sign the certificate? [y/n]:y
+
+
+1 out of 1 certificate requests certified, commit? [y/n]y
+Write out database with 1 new entries
+Data Base Updated
+\end{MyVerbatim}
+
+Note that the Common Name must match the hostname of the HTTP
+server that receives XML files.
+
+Note that the \path|create-srv-cert.sh| script rewrites the
+server key file without the RSA password. This allows your
+HTTP server to start automatically without prompting for
+the password.
+
+The script leaves the server certificate and key in the \path|server|
+directory. You'll need to copy these over to the HTTP server config
+directory as described later in this chapter.
+
+\section{Client Certificates}
+
+Generating client certificates is similar. Remember that
+the Organizational Unit Name and Common Name correspond to the
+collector's {\em server\/} and {\em node\/} names. For example:
+
+\begin{MyVerbatim}
+% sh create-clt-cert.sh
+CREATING CLIENT REQUEST
+Generating a 1024 bit RSA private key
+................................++++++
+..............++++++
+writing new private key to 'client/client.key'
+Enter PEM pass phrase:
+Verifying - Enter PEM pass phrase:
+-----
+You are about to be asked to enter information that will be incorporated
+into your certificate request.
+What you are about to enter is what is called a Distinguished Name or a DN.
+There are quite a few fields but you can leave some blank
+For some fields there will be a default value,
+If you enter '.', the field will be left blank.
+-----
+Country Name (2 letter code) [AU]:US
+State or Province Name (full name) [Some-State]:California
+Locality Name (eg, city) []:Los Angeles
+Organization Name (eg, company) [Internet Widgits Pty Ltd]:Some DNS Server
+Organizational Unit Name (eg, section) []:x-root
+Common Name (eg, YOUR name) []:LAX
+Email Address []:noc@example.com
+
+Please enter the following 'extra' attributes
+to be sent with your certificate request
+A challenge password []:
+An optional company name []:
+CREATING CLIENT CERT
+Using configuration from ./openssl.conf
+Enter pass phrase for ./private/cakey.pem:
+Check that the request matches the signature
+Signature ok
+The Subject's Distinguished Name is as follows
+countryName :PRINTABLE:'US'
+stateOrProvinceName :PRINTABLE:'California'
+localityName :PRINTABLE:'Los Angeles'
+organizationName :PRINTABLE:'Some DNS Server'
+organizationalUnitName:PRINTABLE:'x-root '
+commonName :PRINTABLE:'LAX'
+emailAddress :IA5STRING:'noc@example.com'
+Certificate is to be certified until Jun 3 20:17:24 2013 GMT (3000 days)
+Sign the certificate? [y/n]:y
+
+
+1 out of 1 certificate requests certified, commit? [y/n]y
+Write out database with 1 new entries
+Data Base Updated
+Enter pass phrase for client/client.key:
+writing RSA key
+writing RSA key
+\end{MyVerbatim}
+
+The client's key and certificate will be placed in a directory
+based on the server and node names. For example:
+
+\begin{MyVerbatim}
+% ls -l client/x-root/LAX
+total 10
+-rw-r--r-- 1 wessels wessels 3311 Mar 17 13:17 client.crt
+-rw-r--r-- 1 wessels wessels 712 Mar 17 13:17 client.csr
+-r-------- 1 wessels wessels 887 Mar 17 13:17 client.key
+-rw-r--r-- 1 wessels wessels 1953 Mar 17 13:17 client.pem
+\end{MyVerbatim}
+
+The \path|client.pem| (and \path|cacert.pem|) files should be copied
+to the collector machine.
+
+\section{Apache Configuration}
+
+\noindent
+You need to configure Apache for SSL. Here is what our configuration
+looks like:
+
+\begin{MyVerbatim}
+SSLRandomSeed startup builtin
+SSLRandomSeed startup file:/dev/random
+SSLRandomSeed startup file:/dev/urandom 1024
+SSLRandomSeed connect builtin
+SSLRandomSeed connect file:/dev/random
+SSLRandomSeed connect file:/dev/urandom 1024
+
+<VirtualHost _default_:443>
+DocumentRoot "/httpd/htdocs-ssl"
+SSLEngine on
+SSLCertificateFile /httpd/conf/SSL/server/server.crt
+SSLCertificateKeyFile /httpd/conf/SSL/server/server.key
+SSLCertificateChainFile /httpd/conf/SSL/cacert.pem
+
+# For client-validation
+SSLCACertificateFile /httpd/conf/SSL/cacert.pem
+SSLVerifyClient require
+
+SSLOptions +CompatEnvVars
+Script PUT /cgi-bin/put-file.pl
+</VirtualHost>
+\end{MyVerbatim}
+
+\noindent
+Note the last line of the configuration specifies the CGI script
+that accepts PUT requests. The {\em SSLOptions\/}
+line is necessary so that the CGI script receives certain HTTP
+headers as environment variables. Those headers/variables convey
+the X.509 information to the script so it knows where to store
+received XML files.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\chapter{Collector Installation}
+
+
+A collector machine needs only the {\em dsc\/} binary, a configuration
+file, and a couple of cron job scripts.
+
+At this point, {\dsc} lacks certain niceties such as a \path|./configure|
+script. The installation prefix, \path|/usr/local/dsc| is currently
+hard-coded.
+
+
+\section{Prerequisites}
+
+You'll need a C/C++ compiler to compile the {\tt dsc\/} source code.
+
+If the collector and archiver are different systems, you'll need a
+way to transfer data files. We recommend that you use the {\tt
+curl\/} HTTP/SSL client You may use another technique, such as {\tt
+scp\/} or {\tt rsync\/} if you prefer.
+
+\section{\tt Installation}
+
+You can compile {\tt dsc\/} from the {\tt collector\/} directory:
+
+\begin{MyVerbatim}
+% cd collector
+% make
+\end{MyVerbatim}
+
+Assuming there are no errors or problems during compilation, install
+the {\tt dsc\/} binary and other scripts with:
+
+\begin{MyVerbatim}
+% make install
+\end{MyVerbatim}
+
+This installs five files:
+\begin{Verbatim}
+/usr/local/dsc/bin/dsc
+/usr/local/dsc/etc/dsc.conf.sample
+/usr/local/dsc/libexec/upload-prep.pl
+/usr/local/dsc/libexec/upload-rsync.sh
+/usr/local/dsc/libexec/upload-x509.sh
+\end{Verbatim}
+
+Of course, if you don't want to use the default installation
+prefix, you can manually copy these files to a location
+of your choosing. If you do that, you'll also need to
+edit the cron scripts to match your choice of pathnames, etc.
+
+\section{Uploading XML Files}
+\label{sec-install-collector-cron}
+
+This section describes how XML files are transferred from
+the collector to one or more Presenter systems.
+
+As we'll see in the next chapter, each {\tt dsc} process
+has its own {\em run directory\/}. This is the directory
+where {\tt dsc} leaves its XML files. It usually has a
+name like \path|/usr/local/dsc/run/NODENAME|\@. XML files
+are removed after they are successfully transferred. If the
+Presenter is unreachable, XML files accumulate here until
+they can be transferred. Make sure that you have
+enough disk space to queue a lot of XML files in the
+event of an outage.
+
+In general we want to be able to upload XML files to multiple
+presenters. This is the reason behind the {\tt upload-prep.pl}
+script. This script runs every 60 seconds from cron:
+
+\begin{MyVerbatim}
+* * * * * /usr/local/dsc/libexec/upload-prep.pl
+\end{MyVerbatim}
+
+{\tt upload-prep.pl} looks for \path|dsc.conf| files in
+\path|/usr/local/dsc/etc| by default. For each config file
+found, it cd's to the {\em run\_dir\/} and links\footnote{as in
+``hard link'' made with \path|/bin/ln|.}
+XML files to one or more upload directories. The upload directories
+are named \path|upload/dest1|, \path|upload/dest2|, and so on.
+
+In order for all this to work, you must create the directories
+in advance. For example, if you are collecting stats on
+your nameserver named {\em ns0\/}, and want to send the XML files
+to two presenters (named oarc and archive), the directory structure
+might look like:
+
+\begin{MyVerbatim}
+% set prefix=/usr/local/dsc
+% mkdir $prefix/run
+% mkdir $prefix/run/ns0
+% mkdir $prefix/run/ns0/upload
+% mkdir $prefix/run/ns0/upload/oarc
+% mkdir $prefix/run/ns0/upload/archive
+\end{MyVerbatim}
+
+With that directory structure, the {\tt upload-prep.pl} script moves
+XML files from the \path|ns0| directory to the two
+upload directories, \path|oarc| and \path|archive|.
+
+To actually transfer files to the presenter, use either
+\path|upload-x509.sh| or \path|upload-rsync.sh|.
+
+\subsection{upload-x509.sh}
+
+This cron script is responsible for
+actually transferring XML files from the upload directories
+to the remote server. It creates a {\em tar\/} archive
+of XML files and then uploads it to the remote server with
+{\tt curl}. The script takes three commandline arguments:
+
+\begin{MyVerbatim}
+% upload-x509.sh NODE DEST URI
+\end{MyVerbatim}
+
+{\em NODE\/} must match the name of a directory under
+\path|/usr/local/dsc/run|. Similarly, {\em DEST\/} must match the
+name of a directory under \path|/usr/local/dsc/run/NODE/upload|.
+{\em URI\/} is the URL/URI that the data is uploaded to. Usually
+it is just an HTTPS URL with the name of the destination server.
+We also recommend running this from cron every 60 seconds. For
+example:
+
+\begin{MyVerbatim}
+* * * * * /usr/local/dsc/libexec/upload-x509.sh ns0 oarc \
+ https://collect.oarc.isc.org/
+* * * * * /usr/local/dsc/libexec/upload-x509.sh ns0 archive \
+ https://archive.example.com/
+\end{MyVerbatim}
+
+\path|upload-x509.sh| looks for X.509 certificates in
+\path|/usr/local/dsc/certs|. The client certificate should be named
+\path|/usr/local/dsc/certs/DEST/NODE.pem| and the CA certificate
+should be named
+\path|/usr/local/dsc/certs/DEST/cacert.pem|. Note that {\em DEST\/}
+and {\em NODE\/} must match the \path|upload-x509.sh|
+command line arguments.
+
+\subsection{upload-rsync.sh}
+
+This script can be used to transfer XML files files from the upload
+directories to the remote server. It uses {\em rsync\/} and
+assumes that {\em rsync\/} will use {\em ssh\/} for transport.
+This script also takes three arguments:
+
+\begin{MyVerbatim}
+% upload-rsync.sh NODE DEST RSYNC-DEST
+\end{MyVerbatim}
+
+Note that {\em DEST\/} is the name of the local ``upload'' directory
+and {\em RSYNC-DEST\/} is an {\em rsync\/} destination (i.e., hostname and remote directory).
+Here is how you might use it in a crontab:
+
+\begin{MyVerbatim}
+* * * * * /usr/local/dsc/libexec/upload-rsync.sh ns0 oarc \
+ dsc@collect.oarc.isc.org:/usr/local/dsc/data/Server/ns0
+* * * * * /usr/local/dsc/libexec/upload-rsync.sh ns0 archive \
+ dsc@archive.oarc.isc.org:/usr/local/dsc/data/Server/ns0
+\end{MyVerbatim}
+
+Also note that \path|upload-rsync.sh| will actually store the remote
+XML files in \path|incoming/YYYY-MM-DD| subdirectories. That is,
+if your {\em RSYNC-DEST\/} is \path|host:/usr/local/dsc/data/Server/ns0|
+then files will actually be written to
+\path|/usr/local/dsc/data/Server/ns0/incoming/YYYY-MM-DD| on {\em host},
+where \path|YYYY-MM-DD| is replaced by the year, month, and date of the
+XML files. These subdirectories reduce filesystem pressure in the event
+of backlogs.
+
+{\em rsync\/} over {\em ssh\/} requires you to use RSA or DSA public keys
+that do not have a passphrase. If you do not want to use one of
+{\em ssh\/}'s default identity files, you can create one specifically
+for this script. It should be named \path|dsc_uploader_id| (and
+\path|dsc_uploader_id.pub|) in the \$HOME/.ssh directory of the user
+that will be running the script. For example, you can create it
+with this command:
+
+\begin{MyVerbatim}
+% ssh-keygen -t dsa -C dsc-uploader -f $HOME/.ssh/dsc_uploader_id
+\end{MyVerbatim}
+
+Then add \path|dsc_uploader_id.pub| to the \path|authorized_keys|
+file of the receiving userid on the presenter system.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\chapter{Configuring and Running the {\dsc} Collector}
+
+\section{dsc.conf}
+
+Before running {\tt dsc\/} you need to create a configuration file.
+Note that configuration directive lines are terminated with a semi-colon.
+The configuration file currently understands the following directives:
+
+\begin{description}
+
+\item[local\_address]
+
+ Specifies the DNS server's local IP address. It is used
+ to determine the ``direction'' of an IP packet: sending,
+ receiving, or other. You may specify multiple local addresses
+ by repeating the {\em local\_address} line any number of times.
+
+ Example: {\tt local\_address 172.16.0.1;\/}
+ Example: {\tt local\_address 2001:4f8:0:2::13;\/}
+
+\item[run\_dir]
+
+ A directory that should become {\tt dsc\/}'s current directory
+ after it starts. XML files will be written here, as will
+ any core dumps.
+
+ Example: {\tt run\_dir "/var/run/dsc";\/}
+
+\item[minfree\_bytes]
+
+ If the filesystem where {\tt dsc\/} writes its XML files
+ does not have at least this much free space, then
+ {\tt dsc\/} will not write the XML files. This prevents
+ {\tt dsc\/} from filling up the filesystem. The XML
+ files that would have been written are simply lost and
+ cannot be receovered. {\tt dsc\/} will begin writing
+ XML files again when the filesystem has the necessary
+ free space.
+
+\item[bpf\_program]
+
+ A Berkeley Packet Filter program string. Normally you
+ should leave this unset. You may use this to further
+ restrict the traffic seen by {\tt dsc\/}. Note that {\tt
+ dsc\/} currently has one indexer that looks at all IP
+ packets. If you specify something like {\em udp port 53\/}
+ that indexer will not work.
+
+ However, if you want to monitor multiple DNS servers with
+ separate {\dsc} instances on one collector box, then you
+ may need to use {\em bpf\_program} to make sure that each
+ {\tt dsc} process sees only the traffic it should see.
+
+ Note that this directive must go before the {\em interface\/}
+ directive because {\tt dsc\/} makes only one pass through
+ the configuration file and the BPF filter is set when the
+ interface is initialized.
+
+ Example: {\tt bpf\_program "dst host 192.168.1.1";\/}
+
+\item[interface]
+
+ The interface name to sniff packets from or a pcap file to
+ read packets from. You may specify multiple interfaces.
+
+ Example:
+ {\tt interface fxp0;\/}
+ {\tt interface /path/to/dump.pcap;\/}
+
+\item[bpf\_vlan\_tag\_byte\_order]
+
+ {\tt dsc\/} knows about VLAN tags. Some operating systems
+ (FreeBSD-4.x) have a bug whereby the VLAN tag id is
+ byte-swapped. Valid values for this directive are {\tt
+ host\/} and {\tt net\/} (the default). Set this to {\tt
+ host\/} if you suspect your operating system has the VLAN
+ tag byte order bug.
+
+ Example: {\tt bpf\_vlan\_tag\_byte\_order host;\/}
+
+\item[match\_vlan]
+
+ A list of VLAN identifiers (integers). If set, only the
+ packets belonging to these VLANs are counted.
+
+ Example: {\tt match\_vlan 101 102;\/}
+
+\item[qname\_filter]
+
+ This directive allows you to define custom filters
+ to match query names in DNS messages. Please see
+ Section~\ref{sec-qname-filter} for more information.
+
+\item[dataset]
+
+ This directive is the heart of {\dsc}. However, it is also
+ the most complex.
+ To save time we recommend that you copy interesting-looking
+ dataset definitions from \path|dsc.conf.sample|. Comment
+ out any that you feel are irrelevant or uninteresting.
+ Later, as you become more familiar with {\dsc}, you may
+ want to read the next chapter and add your own custom
+ datasets.
+
+\item[output\_format]
+
+ Specify the output format, can be give multiple times to output in more then
+ one format. Default output format is XML.
+
+ Available formats are:
+ - XML
+ - JSON
+
+ Example: {\tt output\_format JSON}
+\end{description}
+
+
+\section{A Complete Sample dsc.conf}
+
+Here's how your entire {\em dsc.conf\/} file might look:
+
+\begin{MyVerbatim}
+#bpf_program
+interface em0;
+
+local_address 192.5.5.241;
+
+run_dir "/usr/local/dsc/run/foo";
+
+dataset qtype dns All:null Qtype:qtype queries-only;
+dataset rcode dns All:null Rcode:rcode replies-only;
+dataset opcode dns All:null Opcode:opcode queries-only;
+dataset rcode_vs_replylen dns Rcode:rcode ReplyLen:msglen replies-only;
+dataset client_subnet dns All:null ClientSubnet:client_subnet queries-only
+ max-cells=200;
+dataset qtype_vs_qnamelen dns Qtype:qtype QnameLen:qnamelen queries-only;
+dataset qtype_vs_tld dns Qtype:qtype TLD:tld queries-only,popular-qtypes
+ max-cells=200;
+dataset certain_qnames_vs_qtype dns CertainQnames:certain_qnames
+ Qtype:qtype queries-only;
+dataset client_subnet2 dns Class:query_classification
+ ClientSubnet:client_subnet queries-only max-cells=200;
+dataset client_addr_vs_rcode dns Rcode:rcode ClientAddr:client
+ replies-only max-cells=50;
+dataset chaos_types_and_names dns Qtype:qtype Qname:qname
+ chaos-class,queries-only;
+dataset idn_qname dns All:null IDNQname:idn_qname queries-only;
+dataset edns_version dns All:null EDNSVersion:edns_version queries-only;
+dataset do_bit dns All:null D0:do_bit queries-only;
+dataset rd_bit dns All:null RD:rd_bit queries-only;
+dataset tc_bit dns All:null TC:tc_bit replies-only;
+dataset idn_vs_tld dns All:null TLD:tld queries-only,idn-only;
+dataset ipv6_rsn_abusers dns All:null ClientAddr:client
+ queries-only,aaaa-or-a6-only,root-servers-n et-only max-cells=50;
+dataset transport_vs_qtype dns Transport:transport Qtype:qtype queries-only;
+
+dataset direction_vs_ipproto ip Direction:ip_direction IPProto:ip_proto
+ any;
+\end{MyVerbatim}
+
+\section{Running {\tt dsc}}
+
+{\tt dsc\/} accepts a single command line argument, which is
+the name of the configuration file. For example:
+
+\begin{MyVerbatim}
+% cd /usr/local/dsc
+% bin/dsc etc/foo.conf
+\end{MyVerbatim}
+
+If you run {\tt ps} when {\tt dsc} is running, you'll see two processes:
+
+\begin{MyVerbatim}
+60494 ?? S 0:00.36 bin/dsc etc/foo.conf
+69453 ?? Ss 0:10.65 bin/dsc etc/foo.conf
+\end{MyVerbatim}
+
+The first process simply forks off child processes every
+60 seconds. The child processes do the work of analyzing
+and tabulating DNS messages.
+
+Please use NTP or another technique to keep the collector's
+clock synchronized to the correct time.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\chapter{Viewing {\dsc} Graphs}
+
+To view {\dsc} data in a web browser, simply enter the
+URL to the \path|dsc-grapher.pl| CGI. But before you
+do that, you'll need to create a grapher configuration file.
+
+\path|dsc-grapher.pl| uses a simple configuration file to set certain
+menu options. This configuration file is
+\path|/usr/local/dsc/etc/dsc-grapher.cfg|. You should find
+a sample version in the same directory. For example:
+
+\begin{MyVerbatim}
+server f-root pao1 sfo2
+server isc senna+piquet
+server tmf hq sc lgh
+trace_windows 1hour 4hour 1day 1week 1month
+accum_windows 1day 2days 3days 1week
+timezone Asia/Tokyo
+domain_list isc_tlds br nl ca cz il pt cl
+domain_list isc_tlds sk ph hr ae bg is si za
+valid_domains isc isc_tlds
+
+\end{MyVerbatim}
+
+\begin{figure}
+\centerline{\psfig{figure=screenshot1.eps,width=6.5in}}
+\caption{\label{fig-screenshot1}A sample graph}
+\end{figure}
+
+Refer to Figure~\ref{fig-screenshot1} to see how
+the directives affect the visual display.
+The following three directives should always be set in
+the configuration file:
+
+\begin{description}
+\item[server]
+ This directive tells \path|dsc-grapher.pl| to list
+ the given server and its associated nodes in the
+ ``Servers/Nodes'' section of its navigation menu.
+ You can repeat this directive for each server that
+ the Presenter has.
+\item[trace\_windows]
+ Specifies the ``Time Scale'' menu options for
+ trace-based plots.
+\item[accum\_windows]
+ Specifies the ``Time Scale'' menu options for
+ ``cumulative'' plots, such as the Classification plot.
+\end{description}
+
+Note that the \path|dsc-grapher.cfg| only affects what
+may appear in the navigation window. It does NOT prevent users
+from entering other values in the URL parameters. For example,
+if you have data for a server/node in your
+\path|/usr/local/dsc/data/| directory that is not listed in
+\path|dsc-grapher.cfg|, a user may still be able to view that
+data by manually setting the URL query parameters.
+
+The configuration file accepts a number of optional directives
+as well. You may set these if you like, but they are not
+required:
+
+\begin{description}
+\item[timezone]
+ Sets the time zone for dates and times displayed in the
+ graphs.
+ You can use this if you want to override the system
+ time zone.
+ The value for this directive should be the name
+ of a timezone entry in your system database (usually found
+ in {\path|/usr/share/zoneinfo|}.
+ For example, if your system time zone is set
+ to UTC but you want the times displayed for the
+ London timezone, you can set this directive to
+ {\tt Europe/London\/}.
+\item[domain\_list]
+ This directive, along with {\em valid\_domains\/}, tell the
+ presenter which domains a nameserver is authoritative for.
+ That information is used in the TLDs subgraphs to differentiate
+ requests for ``valid'' and ``invalid'' domains.
+
+ The {\em domain\_list\/} creates a named list of domains.
+ The first token is a name for the list, and the remaining
+ tokens are domain names. The directive may be repeated with
+ the same list name, as shown in the above example.
+\item[valid\_domains]
+ This directive glues servers and domain\_lists together. The
+ first token is the name of a {\em server\/} and the second token is
+ the name of a {\em domain\_list\/}.
+\item[embargo]
+ The {\em embargo\/} directive may be used to delay the
+ availability of data via the presenter. For example, you
+ may have one instance of {\em dsc-grapher.pl\/} for internal
+ use only (password protected, etc). You may also have a
+ second instance for third-parties where data is delayed by
+ some amount of time, such as hours, days, or weeks. The value
+ of the {\em embargo\/} directive is the number of seconds which
+ data availability should be delayed. For example, if you set
+ it to 604800, then viewers will not be able to see any data
+ less than one week old.
+\item[anonymize\_ip]
+ When the {\em anonymize\_ip\/} directive is given, IP addresses
+ in the display will be anonymized. The anonymization algorithm
+ is currently hard-coded and designed only for IPv4 addresses.
+ It masks off the lower 24 bits and leaves only the first octet
+ in place.
+\item[hide\_nodes]
+ When the {\em hide\_nodes\/} directive is given, the presenter
+ will not display the list node names underneath the current
+ server. This might be useful if you have a number of nodes
+ but only want viewers to see the server as a whole, without
+ exposing the particular nodes in the cluster. Note, however,
+ that if someone already knows the name of a node they can
+ hand-craft query terms in the URL to display the data for
+ only that node. In other words, the {\em hide\_nodes\/}
+ only provides ``security through obscurity.''
+\end{description}
+
+
+The first few times you try \path|dsc-grapher.pl|, be sure to run
+{\tt tail -f} on the HTTP server error.log file.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\chapter{{\dsc} Datasets}
+
+A {\em dataset\/} is a 2-D array of counters. For example, you
+might have a dataset with ``Query Type'' along one dimension and
+``Query Name Length'' on the other. The result is a table that
+shows the distribution of query name lengths for each query type.
+For example:
+
+\vspace{1ex}
+\begin{center}
+\begin{tabular}{l|rrrrrr}
+Len & A & AAAA & A6 & PTR & NS & SOA \\
+\hline
+$\cdots$ & & & & & \\
+11 & 14 & 8 & 7 & 11 & 2 & 0 \\
+12 & 19 & 2 & 3 & 19 & 4 & 1 \\
+$\cdots$ & & & & & & \\
+255 & 0 & 0 & 0 & 0 & 0 & 0 \\
+\hline
+\end{tabular}
+\end{center}
+\vspace{1ex}
+
+\noindent
+A dataset is defined by the following parameters:
+\begin{itemize}
+\setlength{\itemsep}{0ex plus 0.5ex minus 0.0ex}
+\item A name
+\item A protocol layer (IP or DNS)
+\item An indexer for the first dimension
+\item An indexer for the second dimension
+\item One or more filters
+\item Zero or more options and parameters
+\end{itemize}
+
+\noindent
+The {\em dataset\/} definition syntax in \path|dsc.conf| is:
+
+{\tt dataset\/}
+{\em name\/}
+{\em protocol\/}
+{\em Label1:Indexer1\/}
+{\em Label2:Indexer2\/}
+{\em filter\/}
+{\em [parameters]\/};
+\vspace{2ex}
+
+\section{Dataset Name}
+
+The dataset name is used in the filename for {\tt dsc\/}'s XML
+files. Although this is an opaque string in theory, the Presenter's
+XML extractor routines must recognize the dataset name to properly
+parse it. The source code file
+\path|presenter/perllib/DSC/extractor/config.pm| contains an entry
+for each known dataset name.
+
+\section{Protocol}
+
+{\dsc} currently knows about two protocol layers: IP and DNS.
+On the {\tt dataset\/} line they are written as {\tt ip\/} and {\tt dns\/}.
+
+
+\section{Indexers}
+
+An {\em indexer\/} is simply a function that transforms the attributes
+of an IP/DNS message into an array index. For some attributes the
+transformation is straightforward. For example, the ``Query Type''
+indexer simply extracts the query type value from a DNS message and
+uses this 16-bit value as the array index.
+
+Other attributes are slightly more complicated. For example, the
+``TLD'' indexer extracts the TLD of the QNAME field of a DNS message
+and maps it to an integer. The indexer maintains a simple internal
+table of TLD-to-integer mappings. The actual integer values are
+unimportant because the TLD strings, not the integers, appear in
+the resulting XML data.
+
+When you specify an indexer on a {\tt dataset\/} line, you must
+provide both the name of the indexer and a label. The Label appears
+as an attribute in the XML output. For example,
+Figure~\ref{fig-sample-xml} shows the XML corresponding to this
+{\em dataset\/} line:
+
+\begin{MyVerbatim}
+dataset the_dataset dns Foo:foo Bar:bar queries-only;
+\end{MyVerbatim}
+
+\begin{figure}
+\begin{MyVerbatim}
+<array name="the_dataset" dimensions="2" start_time="1091663940" ...
+ <dimension number="1" type="Foo"/>
+ <dimension number="2" type="Bar"/>
+ <data>
+ <Foo val="1">
+ <Bar val="0" count="4"/>
+ ...
+ <Bar val="100" count="41"/>
+ </Foo>
+ <Foo val="2">
+ ...
+ </Foo>
+ </data>
+</array>
+\end{MyVerbatim}
+\caption{\label{fig-sample-xml}Sample XML output}
+\end{figure}
+
+In theory you are free to choose any label that you like, however,
+the XML extractors look for specific labels. Please use the labels
+given for the indexers in Tables~\ref{tbl-dns-indexers}
+and~\ref{tbl-ip-indexers}.
+
+\subsection{IP Indexers}
+
+\begin{table}
+\begin{center}
+\begin{tabular}{|lll|}
+\hline
+Indexer & Label & Description \\
+\hline
+ip\_direction & Direction & one of sent, recv, or other \\
+ip\_proto & IPProto & IP protocol (icmp, tcp, udp) \\
+ip\_version & IP version number (4, 6) \\
+\hline
+\end{tabular}
+\caption{\label{tbl-ip-indexers}IP packet indexers}
+\end{center}
+\end{table}
+
+{\dsc} includes only minimal support for collecting IP-layer
+stats. Mostly we are interested in finding out the mix of
+IP protocols received by the DNS server. It can also show us
+if/when the DNS server is the subject of denial-of-service
+attack.
+Table~\ref{tbl-ip-indexers} shows the indexers for IP packets.
+Here are their longer descriptions:
+
+\begin{description}
+\item[ip\_direction]
+ One of three values: sent, recv, or else. Direction is determined
+ based on the setting for {\em local\_address\/} in the configuration file.
+\item[ip\_proto]
+ The IP protocol type, e.g.: tcp, udp, icmp.
+ Note that the {\em bpf\_program\/} setting affects all traffic
+ seen by {\dsc}. If the program contains the word ``udp''
+ then you won't see any counts for non-UDP traffic.
+\item[ip\_version]
+ The IP version number, e.g.: 4 or 6. Can be used to compare how much
+ traffic comes in via IPv6 compared to IPV4.
+\end{description}
+
+\subsection{IP Filters}
+
+Currently there is only one IP protocol filter: {\tt any\/}.
+It includes all received packets.
+
+
+\subsection{DNS Indexers}
+
+\begin{table}
+\begin{center}
+\begin{tabular}{|lll|}
+\hline
+Indexer & Label & Description \\
+\hline
+certain\_qnames & CertainQnames & Popular query names seen at roots \\
+client\_subnet & ClientSubnet & The client's IP subnet (/24 for IPv4, /96 for IPv6) \\
+client & ClientAddr & The client's IP address \\
+do\_bit & DO & Whether the DO bit is on \\
+edns\_version & EDNSVersion & The EDNS version number \\
+idn\_qname & IDNQname & If the QNAME is in IDN format \\
+msglen & MsgLen & The DNS message length \\
+null & All & A ``no-op'' indexer \\
+opcode & Opcode & DNS message opcode \\
+qclass & - & Query class \\
+qname & Qname & Full query name \\
+qnamelen & QnameLen & Length of the query name \\
+qtype & Qtype & DNS query type \\
+query\_classification & Class & A classification for bogus queries \\
+rcode & Rcode & DNS response code \\
+rd\_bit & RD & Check if Recursion Desired bit set \\
+tc\_bit & TC & Check if Truncated bit set \\
+tld & TLD & TLD of the query name \\
+transport & Transport & Transport protocol for the DNS message (UDP or TCP) \\
+dns\_ip\_version & IPVersion & IP version of the packet carrying the DNS message \\
+\hline
+\end{tabular}
+\caption{\label{tbl-dns-indexers}DNS message indexers}
+\end{center}
+\end{table}
+
+Table~\ref{tbl-dns-indexers} shows the currently-defined indexers
+for DNS messages, and here are their descriptions:
+
+\begin{description}
+\item[certain\_qnames]
+ This indexer isolates the two most popular query names seen
+ by DNS root servers: {\em localhost\/} and {\em
+ [a--m].root-servers.net\/}.
+\item[client\_subnet]
+ Groups DNS messages together by the subnet of the
+ client's IP address. The subnet is maked by /24 for IPv4
+ and by /96 for IPv6. We use this to make datasets with
+ large, diverse client populations more manageable and to
+ provide a small amount of privacy and anonymization.
+\item[client]
+ The IP (v4 and v6) address of the DNS client.
+\item[do\_bit]
+ This indexer has only two values: 0 or 1. It indicates
+ whether or not the ``DO'' bit is set in a DNS query. According to
+ RFC 2335: {\em Setting the DO bit to one in a query indicates
+ to the server that the resolver is able to accept DNSSEC
+ security RRs.}
+\item[edns\_version]
+ The EDNS version number, if any, in a DNS query. EDNS
+ Version 0 is documented in RFC 2671.
+\item[idn\_qname]
+ This indexer has only two values: 0 or 1. It returns 1
+ when the first QNAME in the DNS message question section
+ is an internationalized domain name (i.e., containing
+ non-ASCII characters). Such QNAMEs begin with the string
+ {\tt xn--\/}. This convention is documented in RFC 3490.
+\item[msglen]
+ The overall length (size) of the DNS message.
+\item[null]
+ A ``no-op'' indexer that always returns the same value.
+ This can be used to effectively turn the 2-D table into a
+ 1-D array.
+\item[opcode]
+ The DNS message opcode is a four-bit field. QUERY is the
+ most common opcode. Additional currently defined opcodes
+ include: IQUERY, STATUS, NOTIFY, and UPDATE.
+\item[qclass]
+ The DNS message query class (QCLASS) is a 16-bit value. IN
+ is the most common query class. Additional currently defined
+ query class values include: CHAOS, HS, NONE, and ANY.
+\item[qname]
+ The full QNAME string from the first (and usually only)
+ QNAME in the question section of a DNS message.
+\item[qnamelen]
+ The length of the first (and usually only) QNAME in a DNS
+ message question section. Note this is the ``expanded''
+ length if the message happens to take advantage of DNS
+ message ``compression.''
+\item[qtype]
+ The query type (QTYPE) for the first QNAME in the DNS message
+ question section. Well-known query types include: A, AAAA,
+ A6, CNAME, PTR, MX, NS, SOA, and ANY.
+\item[query\_classification]
+ A stateless classification of ``bogus'' queries:
+ \begin{itemize}
+ \setlength{\itemsep}{0ex plus 0.5ex minus 0.0ex}
+ \item non-auth-tld: when the TLD is not one of the IANA-approved TLDs.
+ \item root-servers.net: a query for a root server IP address.
+ \item localhost: a query for the localhost IP address.
+ \item a-for-root: an A query for the DNS root (.).
+ \item a-for-a: an A query for an IPv4 address.
+ \item rfc1918-ptr: a PTR query for an RFC 1918 address.
+ \item funny-class: a query with an unknown/undefined query class.
+ \item funny-qtype: a query with an unknown/undefined query type.
+ \item src-port-zero: when the UDP message's source port equals zero.
+ \item malformed: a malformed DNS message that could not be entirely parsed.
+ \end{itemize}
+\item[rcode]
+ The RCODE value in a DNS response. The most common response
+ codes are 0 (NO ERROR) and 3 (NXDOMAIN).
+\item[rd\_bit]
+ This indexer returns 1 if the RD (recursion desired) bit is
+ set in the query. Usually only stub resolvers set the RD bit.
+ Usually authoritative servers do not offer recursion to their
+ clients.
+\item[tc\_bit]
+ This indexer returns 1 if the TC (truncated) bit is
+ set (in a response). An authoritative server sets the TC bit
+ when the entire response won't fit into a UDP message.
+\item[tld]
+ the TLD of the first QNAME in a DNS message's question section.
+\item[transport]
+ Indicates whether the DNS message is carried via UDP or TCP\@.
+\item[dns\_ip\_version]
+ The IP version number that carried the DNS message.
+\end{description}
+
+\subsection{DNS Filters}
+
+You must specify one or more of the following filters (separated by commas) on
+the {\tt dataset\/} line:
+
+\begin{description}
+\item[any]
+ The no-op filter, counts all messages.
+\item[queries-only]
+ Count only DNS query messages. A query is a DNS message
+ where the QR bit is set to 0.
+\item[replies-only]
+ Count only DNS response messages. A query is a DNS message
+ where the QR bit is set to 1.
+\item[popular-qtypes]
+ Count only DNS messages where the query type is one of:
+ A, NS, CNAME, SOA, PTR, MX, AAAA, A6, ANY.
+\item[idn-only]
+ Count only DNS messages where the query name is in the
+ internationalized domain name format.
+\item[aaaa-or-a6-only]
+ Count only DNS Messages where the query type is AAAA or A6.
+\item[root-servers-net-only]
+ Count only DNS messages where the query name is within
+ the {\em root-servers.net\/} domain.
+\item[chaos-class]
+ Counts only DNS messages where QCLASS is equal to
+ CHAOS (3). The CHAOS class is generally used
+ for only the special {\em hostname.bind\/} and
+ {\em version.bind\/} queries.
+\end{description}
+
+\noindent
+Note that multiple filters are ANDed together. That is, they
+narrow the input stream, rather than broaden it.
+
+In addition to these pre-defined filters, you can add your own
+custom filters.
+
+\subsubsection{qname\_filter}
+\label{sec-qname-filter}
+
+The {\em qname\_filter} directive defines a new
+filter that uses regular expression matching on the QNAME field of
+a DNS message. This may be useful if you have a server that is
+authoritative for a number of zones, but you want to limit
+your measurements to a small subset. The {\em qname\_filter} directive
+takes two arguments: a name for the filter and a regular expression.
+For example:
+
+\begin{MyVerbatim}
+qname_filter MyFilterName example\.(com|net|org)$ ;
+\end{MyVerbatim}
+
+This filter matches queries (and responses) for names ending with
+{\em example.com\/}, {\em example.net\/}, and {\em example.org\/}.
+You can reference the named filter in the filters part of a {\em
+dataset\/} line. For example:
+
+\begin{MyVerbatim}
+dataset qtype dns All:null Qtype:qtype queries-only,MyFilterName;
+\end{MyVerbatim}
+
+\subsection{Parameters}
+\label{sec-dataset-params}
+
+\noindent
+{\tt dsc\/} currently supports the following optional parameters:
+
+\begin{description}
+\item[min-count={\em NN\/}]
+ Cells with counts less than {\em NN\/} are not included in
+ the output. Instead, they are aggregated into the special
+ values {\tt -:SKIPPED:-\/} and {\tt -:SKIPPED\_SUM:-\/}.
+ This helps reduce the size of datasets with a large number
+ of small counts.
+\item[max-cells={\em NN\/}]
+ A different, perhaps better, way of limiting the size
+ of a dataset. Instead of trying to determine an appropriate
+ {\em min-count\/} value in advance, {\em max-cells\/}
+ allows you put a limit on the number of cells to
+ include for the second dataset dimension. If the dataset
+ has 9 possible first-dimension values, and you specify
+ a {\em max-cell\/} count of 100, then the dataset will not
+ have more than 900 total values. The cell values are sorted
+ and the top {\em max-cell\/} values are output. Values
+ that fall below the limit are aggregated into the special
+ {\tt -:SKIPPED:-\/} and {\tt -:SKIPPED\_SUM:-\/} entries.
+\end{description}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\chapter{Data Storage}
+
+\section{XML Structure}
+
+A dataset XML file has the following structure:
+
+\begin{MyVerbatim}
+<array name="dataset-name" dimensions="2" start_time="unix-seconds"
+ stop_time="unix-seconds">
+ <dimension number="1" type="Label1"/>
+ <dimension number="2" type="Label2"/>
+ <data>
+ <Label1 val="D1-V1">
+ <Label2 val="D2-V1" count="N1"/>
+ <Label2 val="D2-V2" count="N2"/>
+ <Label2 val="D2-V3" count="N3"/>
+ </Label1>
+ <Label1 val="D1-V2">
+ <Label2 val="D2-V1" count="N1"/>
+ <Label2 val="D2-V2" count="N2"/>
+ <Label2 val="D2-V3" count="N3"/>
+ </Label1>
+ </data>
+</array>
+\end{MyVerbatim}
+
+\noindent
+{\em dataset-name\/},
+{\em Label1\/}, and
+{\em Label2\/} come from the dataset definition in {\em dsc.conf\/}.
+
+The {\em start\_time\/} and {\em stop\_time\/} attributes
+are given in Unix seconds. They are normally 60-seconds apart.
+{\tt dsc} usually starts a new measurement interval on 60 second
+boundaries. That is:
+
+\begin{equation}
+stop\_time \bmod{60} == 0
+\end{equation}
+
+The LABEL1 VAL attributes ({\em D1-V1\/}, {\em D1-V2\/}, etc) are
+values for the first dimension indexer.
+Similarly, the LABEL2 VAL attributes ({\em D2-V1\/}, {\em D2-V2\/},
+{\em D2-V3\/}) are values for the second dimension indexer.
+For some indexers these
+values are numeric, for others they are strings. If the value
+contains certain non-printable characters, the string is base64-encoded
+and the optional BASE64 attribute is set to 1.
+
+There are two special VALs that help keep large datasets down
+to a reasonable size: {\tt -:SKIPPED:-\/} and {\tt -:SKIPPED\_SUM:-\/}.
+These may be present on datasets that use the {\em min-count\/}
+and {\em max-cells\/} parameters (see Section~\ref{sec-dataset-params}).
+{\tt -:SKIPPED:-\/} is the number of cells that were not included
+in the XML output. {\tt -:SKIPPED\_SUM:-\/}, on the other hand, is the
+sum of the counts for all the skipped cells.
+
+Note that ``one-dimensional datasets'' still use two dimensions in
+the XML file. The first dimension type and value will be ``All'',
+as shown in the example below.
+
+The {\em count\/} values are always integers. If the count for
+a particular tuple is zero, it should not be included in the
+XML file.
+
+Note that the contents of the XML file do not indicate
+where it came from. In particular, the server and node that
+it came from are not present. Instead, DSC relies on the
+presenter to store XML files in a directory hierarchy
+with the server and node as directory names.
+
+
+\noindent
+Here is a short sample XML file with real content:
+\begin{MyVerbatim}
+<array name="rcode" dimensions="2" start_time="1154649600"
+ stop_time="1154649660">
+ <dimension number="1" type="All"/>
+ <dimension number="2" type="Rcode"/>
+ <data>
+ <All val="ALL">
+ <Rcode val="0" count="70945"/>
+ <Rcode val="3" count="50586"/>
+ <Rcode val="4" count="121"/>
+ <Rcode val="1" count="56"/>
+ <Rcode val="5" count="44"/>
+ </All>
+ </data>
+</array>
+\end{MyVerbatim}
+
+\noindent
+Please see
+\path|http://dns.measurement-factory.com/tools/dsc/sample-xml/|
+for more sample XML files.
+
+The XML is not very strict and might cause XML purists to cringe.
+{\tt dsc} writes the XML files the old-fashioned way (with printf())
+and reads them with Perl's XML::Simple module.
+Here is a possibly-valid DTD for the dataset XML format.
+Note, however, that the {\em LABEL1\/}
+and {\em LABEL2\/} strings are different
+for each dataset:
+
+\begin{MyVerbatim}
+<!DOCTYPE ARRAY [
+
+<!ELEMENT ARRAY (DIMENSION+, DATA))>
+<!ELEMENT DIMENSION>
+<!ELEMENT DATA (LABEL1+)>
+<!ELEMENT LABEL1 (LABEL2+)>
+
+<!ATTLIST ARRAY NAME CDATA #REQUIRED>
+<!ATTLIST ARRAY DIMENSIONS CDATA #REQUIRED>
+<!ATTLIST ARRAY START_TIME CDATA #REQUIRED>
+<!ATTLIST ARRAY STOP_TIME CDATA #REQUIRED>
+<!ATTLIST DIMENSION NUMBER CDATA #REQUIRED>
+<!ATTLIST DIMENSION TYPE CDATA #REQUIRED>
+<!ATTLIST LABEL1 VAL CDATA #REQUIRED>
+<!ATTLIST LABEL2 VAL CDATA #REQUIRED>
+<!ATTLIST LABEL2 COUNT CDATA #REQUIRED>
+
+]>
+\end{MyVerbatim}
+
+\subsection{XML File Naming Conventions}
+
+{\tt dsc\/} relies on certain file naming conventions for XML files.
+The file name should be of the format:
+
+\begin{quote}
+{\em timestamp\/}.dscdata.xml
+\end{quote}
+
+\noindent
+For example:
+
+\begin{quote}
+1154649660.dscdata.xml
+\end{quote}
+
+NOTE: Versions of DSC prior to 2008-01-30 used a different naming
+convention. Instead of ``dscdata'' the XML file was named after
+the dataset that generated the data. The current XML extraction
+code still supports the older naming convention for backward compatibility.
+If the second component of the XML file name is not ``dscdata'' then
+the extractor assume it is a dataset name.
+
+\noindent
+Dataset names come from {\em dsc.conf\/}, and should match the NAME
+attribute of the ARRAY tag inside the XML file. The timestamp is in
+Unix epoch seconds and is usually the same as the {\em stop\_time\/}
+value.
+
+
+\section{JSON Structure}
+
+The JSON structure mimics the XML structure so that elements are the same.
+
+\begin{MyVerbatim}
+{
+ "name": "dataset-name",
+ "start_time": unix-seconds,
+ "stop_time": unix-seconds,
+ "dimensions": [ "Label1", "Label2" ],
+ "data": [
+ {
+ "Label1": "D1-V1",
+ "Label2": [
+ { "val": "D2-V1", "count": N1 },
+ { "val": "D2-V2", "count": N2 },
+ { "val": "D2-V3", "count": N3 }
+ ]
+ },
+ {
+ "Label1": "D1-V1-base64",
+ "base64": true,
+ "Label2": [
+ { "val": "D2-V1", "count": N1 },
+ { "val": "D2-V2-base64", "base64": true, "count": N2 },
+ { "val": "D2-V3", "count": N3 }
+ ]
+ }
+ ]
+}
+\end{MyVerbatim}
+
+
+\section{Archived Data Format}
+
+{\dsc} actually uses four different file formats for archived
+datasets. These are all text-based and designed to be quickly
+read from, and written to, by Perl scripts.
+
+\subsection{Format 1}
+
+\noindent
+\begin{tt}time $k1$ $N_{k1}$ $k2$ $N_{k2}$ $k3$ $N_{k3}$ ...
+\end{tt}
+
+\vspace{1ex}\noindent
+This is a one-dimensional time-series format.\footnote{Which means
+it can only be used for datasets where one of the indexers is set
+to the Null indexer.} The first column is a timestamp (unix seconds).
+The remaining space-separated fields are key-value pairs. For
+example:
+
+\begin{MyVerbatim}
+1093219980 root-servers.net 122 rfc1918-ptr 112 a-for-a 926 funny-qclass 16
+1093220040 root-servers.net 121 rfc1918-ptr 104 a-for-a 905 funny-qclass 15
+1093220100 root-servers.net 137 rfc1918-ptr 116 a-for-a 871 funny-qclass 12
+\end{MyVerbatim}
+
+\subsection{Format 2}
+
+\noindent
+\begin{tt}time $j1$ $k1$:$N_{j1,k1}$:$k2$:$N_{j1,k2}$:... $j2$ $k1$:$N_{j2,k1}$:$k2$:$N_{j2,k2}$:... ...
+\end{tt}
+
+\vspace{1ex}\noindent
+This is a two-dimensional time-series format. In the above,
+$j$ represents the first dimension indexer and $k$ represents
+the second. Key-value pairs for the second dimension are
+separated by colons, rather than space. For example:
+
+\begin{MyVerbatim}
+1093220160 recv icmp:2397:udp:136712:tcp:428 sent icmp:819:udp:119191:tcp:323
+1093220220 recv icmp:2229:udp:124708:tcp:495 sent icmp:716:udp:107652:tcp:350
+1093220280 recv udp:138212:icmp:2342:tcp:499 sent udp:120788:icmp:819:tcp:364
+1093220340 recv icmp:2285:udp:137107:tcp:468 sent icmp:733:udp:118522:tcp:341
+\end{MyVerbatim}
+
+\subsection{Format 3}
+
+\noindent
+\begin{tt}$k$ $N_{k}$
+\end{tt}
+
+\vspace{1ex}\noindent
+This format is used for one-dimensional datasets where the key space
+is (potentially) very large. That is, putting all the key-value pairs
+on a single line would result in a very long line in the datafile.
+Furthermore, for these larger datasets, it is prohibitive to
+store the data as a time series. Instead the counters are incremented
+over time. For example:
+
+\begin{MyVerbatim}
+10.0.160.0 3024
+10.0.20.0 92
+10.0.244.0 5934
+\end{MyVerbatim}
+
+\subsection{Format 4}
+
+\noindent
+\begin{tt}$j$ $k$ $N_{j,k}$
+\end{tt}
+
+\vspace{1ex}\noindent
+This format is used for two-dimensional datasets where one or both
+key spaces are very large. Again, counters are incremented over
+time, rather than storing the data as a time series.
+For example:
+
+\begin{MyVerbatim}
+10.0.0.0 non-auth-tld 105
+10.0.0.0 ok 37383
+10.0.0.0 rfc1918-ptr 5941
+10.0.0.0 root-servers.net 1872
+10.0.1.0 a-for-a 6
+10.0.1.0 non-auth-tld 363
+10.0.1.0 ok 144
+\end{MyVerbatim}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\chapter{Bugs}
+
+\begin{itemize}
+
+\item
+ Seems too confusing to have an opaque name for indexers in
+ dsc.conf dataset line. The names are pre-determined anyway
+ since they must match what the XML extractors look for.
+\item
+ Also stupid to have indexer names and a separate ``Label'' for
+ the XML file.
+
+\item
+ {\dsc} perl modules are installed in the ``site\_perl'' directory
+ but they should probably be installed under /usr/local/dsc.
+
+\item
+ {\dsc} collector silently drops UDP frags
+
+\end{itemize}
+
+\end{document}