1 files changed, 1863 insertions, 0 deletions
diff --git a/doc/dsc-manual.tex b/doc/dsc-manual.tex
new file mode 100644
index 0000000..501d34a
--- /dev/null
+++ b/doc/dsc-manual.tex
@@ -0,0 +1,1863 @@
+\documentclass{report}
+\usepackage{epsfig}
+\usepackage{path}
+\usepackage{fancyvrb}
+
+\def\dsc{{\sc dsc}}
+
+\DefineVerbatimEnvironment%
+  {MyVerbatim}{Verbatim}
+  {frame=lines,framerule=0.8mm,fontsize=\small}
+
+\renewcommand{\abstractname}{}
+
+\begin{document}
+
+\begin{titlepage}
+\title{DSC Manual}
+\author{Duane Wessels, Measurement Factory\\
+Ken Keys, CAIDA\\
+\\
+http://dns.measurement-factory.com/tools/dsc/}
+\date{\today}
+\end{titlepage}
+
+\maketitle
+
+\begin{abstract}
+\setlength{\parskip}{1ex}
+\section{Copyright}
+
+The DNS Statistics Collector (dsc)
+
+Copyright 2003-2007 by The Measurement Factory, Inc., 2007-2008 by Internet
+Systems Consortium, Inc., 2008-2019 by OARC, Inc.
+
+{\em info@measurement-factory.com\/}, {\em info@isc.org\/}
+
+\section{License}
+
+{\dsc} is licensed under the terms of the BSD license:
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+Neither the name of The Measurement Factory nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+
+\section{Contributors}
+\begin{itemize}
+\item Duane Wessels, Measurement Factory
+\item Ken Keys, Cooperative Association for Internet Data Analysis
+\item Sebastian Castro, New Zealand Registry Services
+\end{itemize}
+\end{abstract}
+
+
+\tableofcontents
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\chapter{Introduction}
+
+{\dsc} is a system for collecting and presenting statistics from
+a busy DNS server.
+
+\section{Components}
+
+{\dsc} consists of the following components:
+\begin{itemize}
+\item A data collector
+\item A data presenter, where data is archived and rendered
+\item A method for securely transferring data from the collector
+	to the presenter
+\item Utilities and scripts that parse XML and archive files from the collector
+\item Utilities and scripts that generate graphs and HTML pages
+\end{itemize}
+
+\subsection{The Collector}
+
+The collector is a binary program, named {\tt dsc\/}, which snoops
+on DNS messages.  It is written in C and uses {\em libpcap\/} for
+packet capture.
+
+{\tt dsc\/} uses a relatively simple configuration file called {\em
+dsc.conf\/} to define certain parameters and options.  The configuration
+file also determines the {\em datasets\/} that {\tt dsc\/} collects.
+
+A Dataset is a 2-D array of counters of IP/DNS message properties.
+You can define each dimension of the array independently.  For
+example you might define a dataset categorized by DNS query type
+along one dimension and TLD along the other.
+{\tt dsc\/} dumps the datasets from memory to XML files every 60 seconds.
+
+\subsection{XML Data Transfer}
+
+You may run the {\dsc} collector on a remote machine.  That
+is, the collector may run on a different machine than where the
+data is archived and displayed.  {\dsc} includes some Perl and {\tt /bin/sh}
+scripts to move XML files from collector to presenter.  One
+technique uses X.509 certificates and a secure HTTP server.  The other
+uses {\em rsync\/}, presumably over {\em ssh\/}.
+
+\subsubsection{X.509/SSL}
+
+To make this work, Apache/mod\_ssl should run on the machine where data
+is archived and presented.
+Data transfer is authenticated via SSL X.509 certificates.  A Perl
+CGI script handles all PUT requests on the server.  If the client
+certificate is allowed, XML files are stored in the appropriate
+directory.
+
+A shell script runs on the collector to upload the XML files.  It
+uses {\tt curl\/}\footnote{http://curl.haxx.se} to establish an
+HTTPS connection.  XML files are bundled together with {\tt tar\/}
+before transfer to eliminate per-connection delays.
+You could use {\tt scp\/} or {\tt rsync\/} instead of
+{\tt curl\/} if you like.
+
+\path|put-file.pl| is the script that accepts PUT requests on the
+HTTP server.  The HTTP server validates the client's X.509 certificate.
+If the certificate is invalid, the PUT request is denied.  This
+script reads environment variables to get X.509 parameters.  The
+uploaded-data is stored in a directory based on the X.509 Organizational
+Unit (server) and Common Name fields (node).
+
+\subsubsection{rsync/ssh}
+
+This technique uses the {\em rsync\/} utility to transfer files.
+You'll probably want to use {\em ssh\/} as the underlying transport,
+although you can still use the less-secure {\em rsh\/} or native
+rsync server transports if you like.
+
+If you use {\em ssh\/} then you'll need to create passphrase-less
+SSH keys so that the transfer can occur automatically.  You may
+want to create special {\em dsc\/} userids on both ends as well.
+
+\subsection{The Extractor}
+
+The XML extractor is a Perl script that reads the XML files from
+{\tt dsc\/}.  The extractor essentially converts the XML-structured
+data to a format that is easier (faster) for the graphing tools to
+parse.  Currently the extracted data files are line-based ASCII
+text files.  Support for SQL databases is planned for the future.
+
+\subsection{The Grapher}
+
+{\dsc} uses {\em Ploticus\/}\footnote{http://ploticus.sourceforge.net/}
+as the graphing engine.  A Perl module and CGI script read extracted
+data files and generate Ploticus scriptfiles to generate plots.  Plots
+are always generated on demand via the CGI application.
+
+\path|dsc-grapher.pl| is the script that displays graphs from the
+archived data.
+
+
+\section{Architecture}
+
+Figure~\ref{fig-architecture} shows the {\dsc} architecture.
+
+\begin{figure}
+\centerline{\psfig{figure=dsc-arch.eps,width=3.5in}}
+\caption{\label{fig-architecture}The {\dsc} architecture.}
+\end{figure}
+
+Note that {\dsc} utilizes the concept of {\em servers\/} and {\em
+nodes\/}.  A server is generally a logical service, which may
+actually consist of multiple nodes.  Figure~\ref{fig-architecture}
+shows six collectors (the circles) and two servers (the rounded
+rectangles).  For a real-world example, consider a DNS root server.
+IP Anycast allows a DNS root server to have geographically distributed
+nodes that share a single IP address.  We call each instance a
+{\em node\/} and all nodes sharing the single IP address belong
+to the same {\em server\/}.
+
+The {\dsc} collector program runs on or near\footnote{by
+``near'' we mean that packets may be sniffed remotely via Ethernet taps, switch
+port mirroring, or a SPAN port.} the remote nodes.  Its XML output
+is transferred to the presentation machine via HTTPS PUTs (or something simpler
+if you prefer).
+
+The presentation machine includes an HTTP(S) server.  The extractor looks
+for XML files PUT there by the collectors.  A CGI script also runs on
+the HTTP server to display graphs and other information.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\chapter{Installing the Presenter}
+
+You'll probably want to get the Presenter working before the Collector.
+If you're using the secure XML data transfer, you'll need to
+generate both client- and server-side X.509 certificates.
+
+Installing the Presenter involves the following steps:
+\begin{itemize}
+\setlength{\itemsep}{0ex plus 0.5ex minus 0.0ex}
+\item
+	Install Perl dependencies
+\item
+	Install {\dsc} software
+\item
+	Create X.509 certificates (optional)
+\item
+	Set up a secure HTTP server (e.g., Apache and mod\_ssl)
+\item
+	Add some cron jobs
+\end{itemize}
+
+
+\section{Install Perl Dependencies}
+
+{\dsc} uses Perl for the extractor and grapher components.  Chances are
+that you'll need Perl-5.8, or maybe only Perl-5.6.  You'll also need
+these readily available third-party Perl modules, which you
+can find via CPAN:
+
+\begin{itemize}
+\setlength{\itemsep}{0ex plus 0.5ex minus 0.0ex}
+	\item CGI-Untaint (CGI::Untaint)
+	\item CGI.pm (CGI)
+	\item Digest-MD5 (Digest::MD5)
+	\item File-Flock (File::Flock)
+	\item File-Spec (File::Spec)
+	\item File-Temp (File::Temp)
+	\item Geography-Countries (Geography::Countries)
+	\item Hash-Merge (Hash::Merge)
+	\item IP-Country (IP::Country)
+	\item MIME-Base64 (MIME::Base64)
+	\item Math-Calc-Units (Math::Calc::Units)
+	\item Scalar-List-Utils (List::Util)
+	\item Text-Template (Text::Template)
+	\item URI (URI::Escape)
+	\item XML-Simple (XML::Simple)
+	\item Net-DNS-Resolver (Net::DNS::Resolver)
+
+\end{itemize}
+
+\noindent
+Also note that XML::Simple requires XML::Parser, which in
+turn requires the {\em expat\/} package.
+
+\section{Install Ploticus}
+
+{\dsc} uses Ploticus to generate plots and graphs.  You can find
+this software at \verb|http://ploticus.sourceforge.net|.  The {\em
+Download\/} page has links to some pre-compiled binaries and packages.
+FreeBSD and NetBSD users can find Ploticus in the ports/packages
+collection.
+
+
+\section{Install {\dsc} Software}
+
+All of the extractor and grapher tools are Perl or {\tt /bin/sh}
+scripts, so there is no need to compile anything.  Still,
+you should run {\tt make} first:
+
+\begin{MyVerbatim}
+% cd presenter
+% make
+\end{MyVerbatim}
+
+If you see errors about missing Perl prerequisites, you may want
+to correct those before continuing.
+
+The next step is to install the files.  Recall that
+\path|/usr/local/dsc| is the hard-coded installation prefix.
+You must create it manually:
+
+\begin{MyVerbatim}
+% mkdir /usr/local/dsc
+% make install
+\end{MyVerbatim}
+
+Note that {\dsc}'s Perl modules are installed in the
+``site\_perl'' directory.  You'll probably need {\em root\/}
+privileges to install files there.
+
+\section{CGI Symbolic Links}
+
+{\dsc} has a couple of CGI scripts that are installed
+into \path|/usr/local/dsc/libexec|.  You should add symbolic
+links from your HTTP server's \path|cgi-bin| directory to
+these scripts.
+
+Both of these scripts have been designed to be mod\_perl-friendly.
+
+\begin{MyVerbatim}
+% cd /usr/local/apache/cgi-bin
+% ln -s /usr/local/dsc/libexec/put-file.pl
+% ln -s /usr/local/dsc/libexec/dsc-grapher.pl
+\end{MyVerbatim}
+
+You can skip the \path|put-file.pl| link if you plan to use
+{\em rsync\/} to transfer XML files.
+If you cannot create symbolic links, you'll need to manually
+copy the scripts to the appropriate directory.
+
+
+\section{/usr/local/dsc/data}
+
+\subsection{X.509 method}
+
+This directory is where \path|put-file.pl| writes incoming XML
+files.  It should have been created when you ran {\em make install\/} earlier.
+XML files are actually placed in {\em server\/} and {\em
+node\/} subdirectories based on the authorized client X.509 certificate
+parameters.  If you want \path|put-file.pl| to automatically create
+the subdirectories, the \path|data| directory must be writable by
+the process owner:
+
+\begin{MyVerbatim}
+% chgrp nobody /usr/local/dsc/data/
+% chmod 2775 /usr/local/dsc/data/
+\end{MyVerbatim}
+
+Alternatively, you can create {\em server\/} and {\em node\/} directories
+in advance and make those writable.
+
+\begin{MyVerbatim}
+% mkdir /usr/local/dsc/data/x-root/
+% mkdir /usr/local/dsc/data/x-root/blah/
+% mkdir /usr/local/dsc/data/x-root/blah/incoming/
+% chgrp nobody /usr/local/dsc/data/x-root/blah/
+% chmod 2775 /usr/local/dsc/data/x-root/blah/incoming/
+\end{MyVerbatim}
+
+Make sure that \path|/usr/local/dsc/data/| is on a large partition with
+plenty of free space.  You can make it a symbolic link to another
+partition if necessary.  Note that a typical {\dsc} installation
+for a large DNS root server requires about 4GB to hold a year's worth
+of data.
+
+\subsection{rsync Method}
+
+The directory structure is the same as above (for X.509).  The only
+differences are that:
+\begin{itemize}
+\item
+	The {\em server\/}, {\em node\/}, and {\em incoming\/}
+	directories must be made in advance.
+\item
+	The directories should be writable by the userid associated
+	with the {\em rsync}/{\em ssh\/} connection.  You may want
+	to create a dedicated {\em dsc\/} userid for this.
+\end{itemize}
+
+
+\section{/usr/local/dsc/var/log}
+
+The \path|put-file.pl| script logs its activity to
+\path|put-file.log| in this directory.  It should have been
+created when you ran {\em make install\/} earlier.  The directory
+should be writable by the HTTP server userid (usually {\em nobody\/}
+or {\em www\/}).  Unfortunately the installation isn't fancy enough
+to determine that userid yet, so you must change the ownership manually:
+
+\begin{MyVerbatim}
+% chgrp nobody /usr/local/dsc/var/log/
+\end{MyVerbatim}
+
+Furthermore, you probably want to make sure the log file does not
+grow indefinitely.  For example, on FreeBSD we add this line to \path|/etc/newsyslog.conf|:
+
+\begin{MyVerbatim}
+/usr/local/dsc/var/log/put-file.log nobody:wheel        644  10    *    @T00  BN
+\end{MyVerbatim}
+
+You need not worry about this directory if you are using the
+{\em rsync\/} upload method.
+
+\section{/usr/local/dsc/cache}
+
+This directory, also created by {\em make install\/} above, holds cached
+plot images.  It also must be writable by the HTTP userid:
+
+\begin{MyVerbatim}
+% chgrp nobody /usr/local/dsc/cache/
+\end{MyVerbatim}
+
+\section{Cron Jobs}
+
+{\dsc} requires two cron jobs on the Presenter.  The first
+is the one that processes incoming XML files.  It is called
+\path|refile-and-grok.sh|.  We recommend running it every
+minute.  You also may want to run the jobs at a lowerer priority
+with {\tt nice\/}.  Here is the cron job that we use:
+
+\begin{MyVerbatim}
+* * * * * /usr/bin/nice -10 /usr/local/dsc/libexec/refile-and-grok.sh
+\end{MyVerbatim}
+
+The other useful cron script is \path|remove-xmls.pl|.  It removes
+XML files older than a specified number of days.  Since most of the
+information in the XML files is archived into easier-to-parse
+data files, you can remove the XML files after a few days.  This is
+the job that we use:
+
+\begin{MyVerbatim}
+@midnight find /usr/local/dsc/data/ | /usr/local/dsc/libexec/remove-xmls.pl 7
+\end{MyVerbatim}
+
+\section{Data URIs}
+
+{\dsc} uses ``Data URIs'' by default.  This is a URI where the
+content is base-64 encoded into the URI string.  It allows us
+to include images directly in HTML output, such that the browser
+does not have to make additional HTTP requests for the images.
+Data URIs may not work with some browsers.
+
+To disable Data URIs, edit {\em presenter/perllib/DSC/grapher.pm\/}
+and change this line:
+
+\begin{verbatim}
+        $use_data_uri = 1;
+\end{verbatim}
+
+to
+
+\begin{verbatim}
+        $use_data_uri = 0;
+\end{verbatim}
+
+Also make this symbolic link from your HTTP servers ``htdocs'' directory:
+
+\begin{verbatim}
+# cd htdocs
+# ln -s /usr/local/dsc/share/html dsc
+\end{verbatim}
+
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\chapter{Configuring the {\dsc} Presenter}
+
+This chapter describes how to create X.509 certificates and configure
+Apache/mod\_ssl.  If you plan on using a different upload
+technique (such as scp or rsync) you can skip these instructions.
+
+\section{Generating X.509 Certificates}
+
+We use X.509 certificates to authenticate both sides
+of an SSL connection when uploading XML data files from
+the collector to the presenter.
+
+Certificate generation is a tricky thing.  We use three different
+types of certificates:
+\begin{enumerate}
+\item A self-signed root CA certificate
+\item A server certificate
+\item Client certificates for each collector node
+\end{enumerate}
+
+In the client certificates
+we use X.509 fields to store the collector's server and node name.
+The Organizational Unit Name (OU) becomes the server name and
+the Common Name (CN) becomes the node name.
+
+The {\dsc} source code distribution includes some shell scripts
+that we have
+used to create X.509 certificates.  You can find them in the
+\path|presenter/certs| directory.  Note these are not installed
+into \path|/usr/local/dsc|.  You should edit \path|openssl.conf|
+and enter the relevant information for your organization.
+
+\subsection{Certificate Authority}
+
+You may need to create a self-signed certificate authority if you
+don't already have one.  The CA signs client and server certificates.
+You will need to distribute the CA and client certificates to
+collector sites.  Here is how to use our \path|create-ca-cert.sh|
+script:
+
+\begin{MyVerbatim}
+% sh create-ca-cert.sh
+CREATING CA CERT
+Generating a 2048 bit RSA private key
+..............................................................................
+............+++
+......+++
+writing new private key to './private/cakey.pem'
+Enter PEM pass phrase:
+Verifying - Enter PEM pass phrase:
+-----
+\end{MyVerbatim}
+
+
+\subsection{Server Certificate}
+
+The server certificate is used by the HTTP server (Apache/mod\_ssl).
+The clients will have a copy of the CA certificate so they
+can validate the server's certificate when uploading XML files.
+Use the \path|create-srv-cert.sh| script to create a server
+certificate:
+
+\begin{MyVerbatim}
+% sh create-srv-cert.sh
+CREATING SERVER REQUEST
+Generating a 1024 bit RSA private key
+..........................++++++
+.....................................++++++
+writing new private key to 'server/server.key'
+Enter PEM pass phrase:
+Verifying - Enter PEM pass phrase:
+-----
+You are about to be asked to enter information that will be incorporated
+into your certificate request.
+What you are about to enter is what is called a Distinguished Name or a DN.
+There are quite a few fields but you can leave some blank
+For some fields there will be a default value,
+If you enter '.', the field will be left blank.
+-----
+Country Name (2 letter code) [AU]:US
+State or Province Name (full name) [Some-State]:Colorado
+Locality Name (eg, city) []:Boulder
+Organization Name (eg, company) [Internet Widgits Pty Ltd]:The Measurement Factory, Inc
+Organizational Unit Name (eg, section) []:DNS
+Common Name (eg, YOUR name) []:dns.measurement-factory.com
+Email Address []:wessels@measurement-factory.com
+
+Please enter the following 'extra' attributes
+to be sent with your certificate request
+A challenge password []:
+An optional company name []:
+Enter pass phrase for server/server.key:
+writing RSA key
+CREATING SERVER CERT
+Using configuration from ./openssl.conf
+Enter pass phrase for ./private/cakey.pem:
+Check that the request matches the signature
+Signature ok
+The Subject's Distinguished Name is as follows
+countryName           :PRINTABLE:'US'
+stateOrProvinceName   :PRINTABLE:'Colorado'
+localityName          :PRINTABLE:'Boulder'
+organizationName      :PRINTABLE:'The Measurement Factory, Inc'
+organizationalUnitName:PRINTABLE:'DNS'
+commonName            :PRINTABLE:'dns.measurement-factory.com'
+emailAddress          :IA5STRING:'wessels@measurement-factory.com'
+Certificate is to be certified until Jun  3 20:06:17 2013 GMT (3000 days)
+Sign the certificate? [y/n]:y
+
+
+1 out of 1 certificate requests certified, commit? [y/n]y
+Write out database with 1 new entries
+Data Base Updated
+\end{MyVerbatim}
+
+Note that the Common Name must match the hostname of the HTTP
+server that receives XML files.
+
+Note that the \path|create-srv-cert.sh| script rewrites the
+server key file without the RSA password.  This allows your
+HTTP server to start automatically without prompting for
+the password.
+
+The script leaves the server certificate and key in the \path|server|
+directory.  You'll need to copy these over to the HTTP server config
+directory as described later in this chapter.
+
+\section{Client Certificates}
+
+Generating client certificates is similar.  Remember that
+the Organizational Unit Name and Common Name correspond to the
+collector's {\em server\/} and {\em node\/} names.   For example:
+
+\begin{MyVerbatim}
+% sh create-clt-cert.sh
+CREATING CLIENT REQUEST
+Generating a 1024 bit RSA private key
+................................++++++
+..............++++++
+writing new private key to 'client/client.key'
+Enter PEM pass phrase:
+Verifying - Enter PEM pass phrase:
+-----
+You are about to be asked to enter information that will be incorporated
+into your certificate request.
+What you are about to enter is what is called a Distinguished Name or a DN.
+There are quite a few fields but you can leave some blank
+For some fields there will be a default value,
+If you enter '.', the field will be left blank.
+-----
+Country Name (2 letter code) [AU]:US
+State or Province Name (full name) [Some-State]:California
+Locality Name (eg, city) []:Los Angeles
+Organization Name (eg, company) [Internet Widgits Pty Ltd]:Some DNS Server
+Organizational Unit Name (eg, section) []:x-root
+Common Name (eg, YOUR name) []:LAX
+Email Address []:noc@example.com
+
+Please enter the following 'extra' attributes
+to be sent with your certificate request
+A challenge password []:
+An optional company name []:
+CREATING CLIENT CERT
+Using configuration from ./openssl.conf
+Enter pass phrase for ./private/cakey.pem:
+Check that the request matches the signature
+Signature ok
+The Subject's Distinguished Name is as follows
+countryName           :PRINTABLE:'US'
+stateOrProvinceName   :PRINTABLE:'California'
+localityName          :PRINTABLE:'Los Angeles'
+organizationName      :PRINTABLE:'Some DNS Server'
+organizationalUnitName:PRINTABLE:'x-root  '
+commonName            :PRINTABLE:'LAX'
+emailAddress          :IA5STRING:'noc@example.com'
+Certificate is to be certified until Jun  3 20:17:24 2013 GMT (3000 days)
+Sign the certificate? [y/n]:y
+
+
+1 out of 1 certificate requests certified, commit? [y/n]y
+Write out database with 1 new entries
+Data Base Updated
+Enter pass phrase for client/client.key:
+writing RSA key
+writing RSA key
+\end{MyVerbatim}
+
+The client's key and certificate will be placed in a directory
+based on the server and node names.  For example:
+
+\begin{MyVerbatim}
+% ls -l client/x-root/LAX
+total 10
+-rw-r--r--  1 wessels  wessels  3311 Mar 17 13:17 client.crt
+-rw-r--r--  1 wessels  wessels   712 Mar 17 13:17 client.csr
+-r--------  1 wessels  wessels   887 Mar 17 13:17 client.key
+-rw-r--r--  1 wessels  wessels  1953 Mar 17 13:17 client.pem
+\end{MyVerbatim}
+
+The \path|client.pem| (and \path|cacert.pem|) files should be copied
+to the collector machine.
+
+\section{Apache Configuration}
+
+\noindent
+You need to configure Apache for SSL.  Here is what our configuration
+looks like:
+
+\begin{MyVerbatim}
+SSLRandomSeed startup builtin
+SSLRandomSeed startup file:/dev/random
+SSLRandomSeed startup file:/dev/urandom 1024
+SSLRandomSeed connect builtin
+SSLRandomSeed connect file:/dev/random
+SSLRandomSeed connect file:/dev/urandom 1024
+
+<VirtualHost _default_:443>
+DocumentRoot "/httpd/htdocs-ssl"
+SSLEngine on
+SSLCertificateFile /httpd/conf/SSL/server/server.crt
+SSLCertificateKeyFile /httpd/conf/SSL/server/server.key
+SSLCertificateChainFile /httpd/conf/SSL/cacert.pem
+
+# For client-validation
+SSLCACertificateFile /httpd/conf/SSL/cacert.pem
+SSLVerifyClient require
+
+SSLOptions +CompatEnvVars
+Script PUT /cgi-bin/put-file.pl
+</VirtualHost>
+\end{MyVerbatim}
+
+\noindent
+Note the last line of the configuration specifies the CGI script
+that accepts PUT requests.  The {\em SSLOptions\/}
+line is necessary so that the CGI script receives certain HTTP
+headers as environment variables.  Those headers/variables convey
+the X.509 information to the script so it knows where to store
+received XML files.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\chapter{Collector Installation}
+
+
+A collector machine needs only the {\em dsc\/} binary, a configuration
+file, and a couple of cron job scripts.
+
+At this point, {\dsc} lacks certain niceties such as a \path|./configure|
+script.   The installation prefix, \path|/usr/local/dsc| is currently
+hard-coded.
+
+
+\section{Prerequisites}
+
+You'll need a C/C++ compiler to compile the {\tt dsc\/} source code.
+
+If the collector and archiver are different systems, you'll need a
+way to transfer data files.  We recommend that you use the {\tt
+curl\/} HTTP/SSL client You may use another technique, such as {\tt
+scp\/} or {\tt rsync\/} if you prefer.
+
+\section{\tt Installation}
+
+You can compile {\tt dsc\/} from the {\tt collector\/} directory:
+
+\begin{MyVerbatim}
+% cd collector
+% make
+\end{MyVerbatim}
+
+Assuming there are no errors or problems during compilation, install
+the {\tt dsc\/} binary and other scripts with:
+
+\begin{MyVerbatim}
+% make install
+\end{MyVerbatim}
+
+This installs five files:
+\begin{Verbatim}
+/usr/local/dsc/bin/dsc
+/usr/local/dsc/etc/dsc.conf.sample
+/usr/local/dsc/libexec/upload-prep.pl
+/usr/local/dsc/libexec/upload-rsync.sh
+/usr/local/dsc/libexec/upload-x509.sh
+\end{Verbatim}
+
+Of course, if you don't want to use the default installation
+prefix, you can manually copy these files to a location
+of your choosing.  If you do that, you'll also need to
+edit the cron scripts to match your choice of pathnames, etc.
+
+\section{Uploading XML Files}
+\label{sec-install-collector-cron}
+
+This section describes how XML files are transferred from
+the collector to one or more Presenter systems.
+
+As we'll see in the next chapter, each {\tt dsc} process
+has its own {\em run directory\/}.  This is the directory
+where {\tt dsc} leaves its XML files.  It usually has a
+name like \path|/usr/local/dsc/run/NODENAME|\@.  XML files
+are removed after they are successfully transferred.  If the
+Presenter is unreachable, XML files accumulate here until
+they can be transferred.  Make sure that you have
+enough disk space to queue a lot of XML files in the
+event of an outage.
+
+In general we want to be able to upload XML files to multiple
+presenters.  This is the reason behind the {\tt upload-prep.pl}
+script.  This script runs every 60 seconds from cron:
+
+\begin{MyVerbatim}
+* * * * * /usr/local/dsc/libexec/upload-prep.pl
+\end{MyVerbatim}
+
+{\tt upload-prep.pl} looks for \path|dsc.conf| files in
+\path|/usr/local/dsc/etc| by default.  For each config file
+found, it cd's to the {\em run\_dir\/} and links\footnote{as in
+``hard link'' made with \path|/bin/ln|.}
+XML files to one or more upload directories.  The upload directories
+are named \path|upload/dest1|, \path|upload/dest2|, and so on.
+
+In order for all this to work, you must create the directories
+in advance.   For example, if you are collecting stats on
+your nameserver named {\em ns0\/}, and want to send the XML files
+to two presenters (named oarc and archive), the directory structure
+might look like:
+
+\begin{MyVerbatim}
+% set prefix=/usr/local/dsc
+% mkdir $prefix/run
+% mkdir $prefix/run/ns0
+% mkdir $prefix/run/ns0/upload
+% mkdir $prefix/run/ns0/upload/oarc
+% mkdir $prefix/run/ns0/upload/archive
+\end{MyVerbatim}
+
+With that directory structure, the {\tt upload-prep.pl} script moves
+XML files from the \path|ns0| directory to the two
+upload directories, \path|oarc| and \path|archive|.
+
+To actually transfer files to the presenter, use either
+\path|upload-x509.sh| or \path|upload-rsync.sh|.
+
+\subsection{upload-x509.sh}
+
+This cron script is responsible for
+actually transferring XML files from the upload directories
+to the remote server.    It creates a {\em tar\/} archive
+of XML files and then uploads it to the remote server with
+{\tt curl}.  The script takes three commandline arguments:
+
+\begin{MyVerbatim}
+% upload-x509.sh NODE DEST URI
+\end{MyVerbatim}
+
+{\em NODE\/} must match the name of a directory under
+\path|/usr/local/dsc/run|.  Similarly, {\em DEST\/} must match the
+name of a directory under \path|/usr/local/dsc/run/NODE/upload|.
+{\em URI\/} is the URL/URI that the data is uploaded to.  Usually
+it is just an HTTPS URL with the name of the destination server.
+We also recommend running this from cron every 60 seconds.  For
+example:
+
+\begin{MyVerbatim}
+* * * * * /usr/local/dsc/libexec/upload-x509.sh ns0 oarc \
+	https://collect.oarc.isc.org/
+* * * * * /usr/local/dsc/libexec/upload-x509.sh ns0 archive \
+	https://archive.example.com/
+\end{MyVerbatim}
+
+\path|upload-x509.sh| looks for X.509 certificates in
+\path|/usr/local/dsc/certs|.  The client certificate should be named
+\path|/usr/local/dsc/certs/DEST/NODE.pem| and the CA certificate
+should be named
+\path|/usr/local/dsc/certs/DEST/cacert.pem|.  Note that {\em DEST\/}
+and {\em NODE\/} must match the \path|upload-x509.sh|
+command line arguments.
+
+\subsection{upload-rsync.sh}
+
+This script can be used to transfer XML files files from the upload
+directories to the remote server.  It uses {\em rsync\/} and
+assumes that {\em rsync\/} will use {\em ssh\/} for transport.
+This script also takes three arguments:
+
+\begin{MyVerbatim}
+% upload-rsync.sh NODE DEST RSYNC-DEST
+\end{MyVerbatim}
+
+Note that {\em DEST\/} is the name of the local ``upload'' directory
+and {\em RSYNC-DEST\/} is an {\em rsync\/} destination (i.e., hostname and remote directory).
+Here is how you might use it in a crontab:
+
+\begin{MyVerbatim}
+* * * * * /usr/local/dsc/libexec/upload-rsync.sh ns0 oarc \
+	dsc@collect.oarc.isc.org:/usr/local/dsc/data/Server/ns0
+* * * * * /usr/local/dsc/libexec/upload-rsync.sh ns0 archive \
+	dsc@archive.oarc.isc.org:/usr/local/dsc/data/Server/ns0
+\end{MyVerbatim}
+
+Also note that \path|upload-rsync.sh| will actually store the remote
+XML files in \path|incoming/YYYY-MM-DD| subdirectories.  That is,
+if your {\em RSYNC-DEST\/} is \path|host:/usr/local/dsc/data/Server/ns0|
+then files will actually be written to
+\path|/usr/local/dsc/data/Server/ns0/incoming/YYYY-MM-DD| on {\em host},
+where \path|YYYY-MM-DD| is replaced by the year, month, and date of the
+XML files.  These subdirectories reduce filesystem pressure in the event
+of backlogs.
+
+{\em rsync\/} over {\em ssh\/} requires you to use RSA or DSA public keys
+that do not have a passphrase.  If you do not want to use one of
+{\em ssh\/}'s default identity files, you can create one specifically
+for this script.  It should be named \path|dsc_uploader_id| (and
+\path|dsc_uploader_id.pub|) in the \$HOME/.ssh directory of the user
+that will be running the script.  For example, you can create it
+with this command:
+
+\begin{MyVerbatim}
+% ssh-keygen -t dsa -C dsc-uploader -f $HOME/.ssh/dsc_uploader_id
+\end{MyVerbatim}
+
+Then add \path|dsc_uploader_id.pub| to the \path|authorized_keys|
+file of the receiving userid on the presenter system.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\chapter{Configuring and Running the {\dsc} Collector}
+
+\section{dsc.conf}
+
+Before running {\tt dsc\/} you need to create a configuration file.
+Note that configuration directive lines are terminated with a semi-colon.
+The configuration file currently understands the following directives:
+
+\begin{description}
+
+\item[local\_address]
+
+	Specifies the DNS server's local IP address.  It is used
+	to determine the ``direction'' of an IP packet: sending,
+	receiving, or other.  You may specify multiple local addresses
+	by repeating the {\em local\_address} line any number of times.
+
+	Example: {\tt local\_address 172.16.0.1;\/}
+	Example: {\tt local\_address 2001:4f8:0:2::13;\/}
+
+\item[run\_dir]
+
+	A directory that should become {\tt dsc\/}'s current directory
+	after it starts.  XML files will be written here, as will
+	any core dumps.
+
+	Example: {\tt run\_dir "/var/run/dsc";\/}
+
+\item[minfree\_bytes]
+
+	If the filesystem where {\tt dsc\/} writes its XML files
+	does not have at least this much free space, then
+	{\tt dsc\/} will not write the XML files.  This prevents
+	{\tt dsc\/} from filling up the filesystem.  The XML
+	files that would have been written are simply lost and
+	cannot be receovered.  {\tt dsc\/} will begin writing
+	XML files again when the filesystem has the necessary
+	free space.
+
+\item[bpf\_program]
+
+	A Berkeley Packet Filter program string.  Normally you
+	should leave this unset.  You may use this to further
+	restrict the traffic seen by {\tt dsc\/}.  Note that {\tt
+	dsc\/} currently has one indexer that looks at all IP
+	packets.  If you specify something like {\em udp port 53\/}
+	that indexer will not work.
+
+	However, if you want to monitor multiple DNS servers with
+	separate {\dsc} instances on one collector box, then you
+	may need to use {\em bpf\_program} to make sure that each
+	{\tt dsc} process sees only the traffic it should see.
+
+	Note that this directive must go before the {\em interface\/}
+	directive because {\tt dsc\/} makes only one pass through
+	the configuration file and the BPF filter is set when the
+	interface is initialized.
+
+	Example: {\tt bpf\_program "dst host 192.168.1.1";\/}
+
+\item[interface]
+
+	The interface name to sniff packets from or a pcap file to
+	read packets from.   You may specify multiple interfaces.
+
+	Example:
+	{\tt interface fxp0;\/}
+	{\tt interface /path/to/dump.pcap;\/}
+
+\item[bpf\_vlan\_tag\_byte\_order]
+
+	{\tt dsc\/} knows about VLAN tags.  Some operating systems
+	(FreeBSD-4.x) have a bug whereby the VLAN tag id is
+	byte-swapped.  Valid values for this directive are {\tt
+	host\/} and {\tt net\/} (the default).    Set this to {\tt
+	host\/} if you suspect your operating system has the VLAN
+	tag byte order bug.
+
+	Example: {\tt bpf\_vlan\_tag\_byte\_order host;\/}
+
+\item[match\_vlan]
+
+	A list of VLAN identifiers (integers).  If set, only the
+	packets belonging to these VLANs are counted.
+
+	Example: {\tt match\_vlan 101 102;\/}
+
+\item[qname\_filter]
+
+	This directive allows you to define custom filters
+	to match query names in DNS messages.  Please see
+	Section~\ref{sec-qname-filter} for more information.
+
+\item[dataset]
+
+	This directive is the heart of {\dsc}.  However, it is also
+	the most complex.
+	To save time we recommend that you copy interesting-looking
+	dataset definitions from \path|dsc.conf.sample|.  Comment
+	out any that you feel are irrelevant or uninteresting.
+	Later, as you become more familiar with {\dsc}, you may
+	want to read the next chapter and add your own custom
+	datasets.
+
+\item[output\_format]
+
+	Specify the output format, can be give multiple times to output in more then
+	one format. Default output format is XML.
+
+	Available formats are:
+	- XML
+	- JSON
+
+	Example: {\tt output\_format JSON}
+\end{description}
+
+
+\section{A Complete Sample dsc.conf}
+
+Here's how your entire {\em dsc.conf\/} file might look:
+
+\begin{MyVerbatim}
+#bpf_program
+interface em0;
+
+local_address 192.5.5.241;
+
+run_dir "/usr/local/dsc/run/foo";
+
+dataset qtype dns All:null Qtype:qtype queries-only;
+dataset rcode dns All:null Rcode:rcode replies-only;
+dataset opcode dns All:null Opcode:opcode queries-only;
+dataset rcode_vs_replylen dns Rcode:rcode ReplyLen:msglen replies-only;
+dataset client_subnet dns All:null ClientSubnet:client_subnet queries-only
+        max-cells=200;
+dataset qtype_vs_qnamelen dns Qtype:qtype QnameLen:qnamelen queries-only;
+dataset qtype_vs_tld dns Qtype:qtype TLD:tld queries-only,popular-qtypes
+        max-cells=200;
+dataset certain_qnames_vs_qtype dns CertainQnames:certain_qnames
+        Qtype:qtype queries-only;
+dataset client_subnet2 dns Class:query_classification
+        ClientSubnet:client_subnet queries-only max-cells=200;
+dataset client_addr_vs_rcode dns Rcode:rcode ClientAddr:client
+        replies-only max-cells=50;
+dataset chaos_types_and_names dns Qtype:qtype Qname:qname
+        chaos-class,queries-only;
+dataset idn_qname dns All:null IDNQname:idn_qname queries-only;
+dataset edns_version dns All:null EDNSVersion:edns_version queries-only;
+dataset do_bit dns All:null D0:do_bit queries-only;
+dataset rd_bit dns All:null RD:rd_bit queries-only;
+dataset tc_bit dns All:null TC:tc_bit replies-only;
+dataset idn_vs_tld dns All:null TLD:tld queries-only,idn-only;
+dataset ipv6_rsn_abusers dns All:null ClientAddr:client
+        queries-only,aaaa-or-a6-only,root-servers-n et-only max-cells=50;
+dataset transport_vs_qtype dns Transport:transport Qtype:qtype queries-only;
+
+dataset direction_vs_ipproto ip Direction:ip_direction IPProto:ip_proto
+        any;
+\end{MyVerbatim}
+
+\section{Running {\tt dsc}}
+
+{\tt dsc\/} accepts a single command line argument, which is
+the name of the configuration file.  For example:
+
+\begin{MyVerbatim}
+% cd /usr/local/dsc
+% bin/dsc etc/foo.conf
+\end{MyVerbatim}
+
+If you run {\tt ps} when {\tt dsc} is running, you'll see two processes:
+
+\begin{MyVerbatim}
+60494  ??  S      0:00.36 bin/dsc etc/foo.conf
+69453  ??  Ss     0:10.65 bin/dsc etc/foo.conf
+\end{MyVerbatim}
+
+The first process simply forks off child processes every
+60 seconds.  The child processes do the work of analyzing
+and tabulating DNS messages.
+
+Please use NTP or another technique to keep the collector's
+clock synchronized to the correct time.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\chapter{Viewing {\dsc} Graphs}
+
+To view {\dsc} data in a web browser, simply enter the
+URL to the \path|dsc-grapher.pl| CGI.   But before you
+do that, you'll need to create a grapher configuration file.
+
+\path|dsc-grapher.pl| uses a simple configuration file to set certain
+menu options.  This  configuration file is
+\path|/usr/local/dsc/etc/dsc-grapher.cfg|.  You should find
+a sample version in the same directory.  For example:
+
+\begin{MyVerbatim}
+server f-root pao1 sfo2
+server isc senna+piquet
+server tmf hq sc lgh
+trace_windows 1hour 4hour 1day 1week 1month
+accum_windows 1day 2days 3days 1week
+timezone Asia/Tokyo
+domain_list isc_tlds br nl ca cz il pt cl
+domain_list isc_tlds sk ph hr ae bg is si za
+valid_domains isc isc_tlds
+
+\end{MyVerbatim}
+
+\begin{figure}
+\centerline{\psfig{figure=screenshot1.eps,width=6.5in}}
+\caption{\label{fig-screenshot1}A sample graph}
+\end{figure}
+
+Refer to Figure~\ref{fig-screenshot1} to see how
+the directives affect the visual display.
+The following three directives should always be set in
+the configuration file:
+
+\begin{description}
+\item[server]
+	This directive tells \path|dsc-grapher.pl| to list
+	the given server and its associated nodes in the
+	``Servers/Nodes'' section of its navigation menu.
+	You can repeat this directive for each server that
+	the Presenter has.
+\item[trace\_windows]
+	Specifies the ``Time Scale'' menu options for
+	trace-based plots.
+\item[accum\_windows]
+	Specifies the ``Time Scale'' menu options for
+	``cumulative'' plots, such as the Classification plot.
+\end{description}
+
+Note that the \path|dsc-grapher.cfg| only affects what
+may appear in the navigation window.  It does NOT prevent users
+from entering other values in the URL parameters.  For example,
+if you have data for a server/node in your
+\path|/usr/local/dsc/data/| directory that is not listed in
+\path|dsc-grapher.cfg|, a user may still be able to view that
+data by manually setting the URL query parameters.
+
+The configuration file accepts a number of optional directives
+as well.  You may set these if you like, but they are not
+required:
+
+\begin{description}
+\item[timezone]
+	Sets the time zone for dates and times displayed in the
+	graphs.
+	You can use this if you want to override the system
+	time zone.
+	The value for this directive should be the name
+	of a timezone entry in your system database (usually found
+	in {\path|/usr/share/zoneinfo|}.
+	For example, if your system time zone is set
+	to UTC but you want the times displayed for the
+	London timezone, you can set this directive to
+	{\tt Europe/London\/}.
+\item[domain\_list]
+	This directive, along with {\em valid\_domains\/}, tell the
+	presenter which domains a nameserver is authoritative for.
+	That information is used in the TLDs subgraphs to differentiate
+	requests for ``valid'' and ``invalid'' domains.
+
+	The {\em domain\_list\/} creates a named list of domains.
+	The first token is a name for the list, and the remaining
+	tokens are domain names.  The directive may be repeated with
+	the same list name, as shown in the above example.
+\item[valid\_domains]
+	This directive glues servers and domain\_lists together.  The
+	first token is the name of a {\em server\/} and the second token is
+	the name of a {\em domain\_list\/}.
+\item[embargo]
+	The {\em embargo\/} directive may be used to delay the
+	availability of data via the presenter.  For example, you
+	may have one instance of {\em dsc-grapher.pl\/} for internal
+	use only (password protected, etc).  You may also have a
+	second instance for third-parties where data is delayed by
+	some amount of time, such as hours, days, or weeks.  The value
+	of the {\em embargo\/} directive is the number of seconds which
+	data availability should be delayed.  For example, if you set
+	it to 604800, then viewers will not be able to see any data
+	less than one week old.
+\item[anonymize\_ip]
+	When the {\em anonymize\_ip\/} directive is given, IP addresses
+	in the display will be anonymized.  The anonymization algorithm
+	is currently hard-coded and designed only for IPv4 addresses.
+	It masks off the lower 24 bits and leaves only the first octet
+	in place.
+\item[hide\_nodes]
+	When the {\em hide\_nodes\/} directive is given, the presenter
+	will not display the list node names underneath the current
+	server.  This might be useful if you have a number of nodes
+	but only want viewers to see the server as a whole, without
+	exposing the particular nodes in the cluster.  Note, however,
+	that if someone already knows the name of a node they can
+	hand-craft query terms in the URL to display the data for
+	only that node.  In other words, the {\em hide\_nodes\/}
+	only provides ``security through obscurity.''
+\end{description}
+
+
+The first few times you try \path|dsc-grapher.pl|, be sure to run
+{\tt tail -f} on the HTTP server error.log file.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\chapter{{\dsc} Datasets}
+
+A {\em dataset\/} is a 2-D array of counters.  For example, you
+might have a dataset with ``Query Type'' along one dimension and
+``Query Name Length'' on the other.  The result is a table that
+shows the distribution of query name lengths for each query type.
+For example:
+
+\vspace{1ex}
+\begin{center}
+\begin{tabular}{l|rrrrrr}
+Len & A & AAAA & A6 & PTR & NS & SOA \\
+\hline
+$\cdots$ & & & & & \\
+11 & 14 & 8 & 7 & 11 & 2 & 0 \\
+12 & 19 & 2 & 3 & 19 & 4 & 1 \\
+$\cdots$ & & & & & & \\
+255 & 0 & 0 & 0 & 0 & 0 & 0 \\
+\hline
+\end{tabular}
+\end{center}
+\vspace{1ex}
+
+\noindent
+A dataset is defined by the following parameters:
+\begin{itemize}
+\setlength{\itemsep}{0ex plus 0.5ex minus 0.0ex}
+\item A name
+\item A protocol layer (IP or DNS)
+\item An indexer for the first dimension
+\item An indexer for the second dimension
+\item One or more filters
+\item Zero or more options and parameters
+\end{itemize}
+
+\noindent
+The {\em dataset\/} definition syntax in \path|dsc.conf| is:
+
+{\tt dataset\/}
+{\em name\/}
+{\em protocol\/}
+{\em Label1:Indexer1\/}
+{\em Label2:Indexer2\/}
+{\em filter\/}
+{\em [parameters]\/};
+\vspace{2ex}
+
+\section{Dataset Name}
+
+The dataset name is used in the filename for {\tt dsc\/}'s XML
+files.  Although this is an opaque string in theory, the Presenter's
+XML extractor routines must recognize the dataset name to properly
+parse it.  The source code file
+\path|presenter/perllib/DSC/extractor/config.pm| contains an entry
+for each known dataset name.
+
+\section{Protocol}
+
+{\dsc} currently knows about two protocol layers: IP and DNS.
+On the {\tt dataset\/} line they are written as {\tt ip\/} and {\tt dns\/}.
+
+
+\section{Indexers}
+
+An {\em indexer\/} is simply a function that transforms the attributes
+of an IP/DNS message into an array index.  For some attributes the
+transformation is straightforward.  For example, the ``Query Type''
+indexer simply extracts the query type value from a DNS message and
+uses this 16-bit value as the array index.
+
+Other attributes are slightly more complicated.  For example, the
+``TLD'' indexer extracts the TLD of the QNAME field of a DNS message
+and maps it to an integer.  The indexer maintains a simple internal
+table of TLD-to-integer mappings.  The actual integer values are
+unimportant because the TLD strings, not the integers, appear in
+the resulting XML data.
+
+When you specify an indexer on a {\tt dataset\/} line, you must
+provide both the name of the indexer and a label.  The Label appears
+as an attribute in the XML output.  For example,
+Figure~\ref{fig-sample-xml} shows the XML corresponding to this
+{\em dataset\/} line:
+
+\begin{MyVerbatim}
+dataset the_dataset dns Foo:foo Bar:bar queries-only;
+\end{MyVerbatim}
+
+\begin{figure}
+\begin{MyVerbatim}
+<array name="the_dataset" dimensions="2" start_time="1091663940" ...
+  <dimension number="1" type="Foo"/>
+  <dimension number="2" type="Bar"/>
+  <data>
+    <Foo val="1">
+      <Bar val="0" count="4"/>
+      ...
+      <Bar val="100" count="41"/>
+    </Foo>
+    <Foo val="2">
+      ...
+    </Foo>
+  </data>
+</array>
+\end{MyVerbatim}
+\caption{\label{fig-sample-xml}Sample XML output}
+\end{figure}
+
+In theory you are free to choose any label that you like, however,
+the XML extractors look for specific labels.  Please use the labels
+given for the indexers in Tables~\ref{tbl-dns-indexers}
+and~\ref{tbl-ip-indexers}.
+
+\subsection{IP Indexers}
+
+\begin{table}
+\begin{center}
+\begin{tabular}{|lll|}
+\hline
+Indexer & Label & Description \\
+\hline
+ip\_direction & Direction & one of sent, recv, or other \\
+ip\_proto & IPProto & IP protocol (icmp, tcp, udp) \\
+ip\_version & IP version number (4, 6) \\
+\hline
+\end{tabular}
+\caption{\label{tbl-ip-indexers}IP packet indexers}
+\end{center}
+\end{table}
+
+{\dsc} includes only minimal support for collecting IP-layer
+stats.  Mostly we are interested in finding out the mix of
+IP protocols received by the DNS server.  It can also show us
+if/when the DNS server is the subject of denial-of-service
+attack.
+Table~\ref{tbl-ip-indexers} shows the indexers for IP packets.
+Here are their longer descriptions:
+
+\begin{description}
+\item[ip\_direction]
+	One of three values: sent, recv, or else.  Direction is determined
+	based on the setting for {\em local\_address\/} in the configuration file.
+\item[ip\_proto]
+	The IP protocol type, e.g.: tcp, udp, icmp.
+	Note that the {\em bpf\_program\/} setting affects all traffic
+	seen by {\dsc}.  If the program contains the word ``udp''
+	then you won't see any counts for non-UDP traffic.
+\item[ip\_version]
+	The IP version number, e.g.: 4 or 6.  Can be used to compare how much
+	traffic comes in via IPv6 compared to IPV4.
+\end{description}
+
+\subsection{IP Filters}
+
+Currently there is only one IP protocol filter: {\tt any\/}.
+It includes all received packets.
+
+
+\subsection{DNS Indexers}
+
+\begin{table}
+\begin{center}
+\begin{tabular}{|lll|}
+\hline
+Indexer & Label & Description \\
+\hline
+certain\_qnames & CertainQnames & Popular query names seen at roots \\
+client\_subnet & ClientSubnet & The client's IP subnet (/24 for IPv4, /96 for IPv6) \\
+client & ClientAddr & The client's IP address \\
+do\_bit & DO & Whether the DO bit is on \\
+edns\_version & EDNSVersion & The EDNS version number \\
+idn\_qname & IDNQname & If the QNAME is in IDN format \\
+msglen & MsgLen & The DNS message length \\
+null & All & A ``no-op'' indexer \\
+opcode & Opcode & DNS message opcode \\
+qclass & - & Query class \\
+qname & Qname & Full query name \\
+qnamelen & QnameLen & Length of the query name \\
+qtype & Qtype & DNS query type \\
+query\_classification & Class & A classification for bogus queries \\
+rcode & Rcode & DNS response code \\
+rd\_bit & RD & Check if Recursion Desired bit set \\
+tc\_bit & TC & Check if Truncated bit set \\
+tld & TLD & TLD of the query name \\
+transport & Transport & Transport protocol for the DNS message (UDP or TCP) \\
+dns\_ip\_version & IPVersion & IP version of the packet carrying the DNS message \\
+\hline
+\end{tabular}
+\caption{\label{tbl-dns-indexers}DNS message indexers}
+\end{center}
+\end{table}
+
+Table~\ref{tbl-dns-indexers} shows the currently-defined indexers
+for DNS messages, and here are their descriptions:
+
+\begin{description}
+\item[certain\_qnames]
+	This indexer isolates the two most popular query names seen
+	by DNS root servers: {\em localhost\/} and {\em
+	[a--m].root-servers.net\/}.
+\item[client\_subnet]
+	Groups DNS messages together by the subnet of the
+	client's IP address.  The subnet is maked by /24 for IPv4
+	and by /96 for IPv6.  We use this to make datasets with
+	large, diverse client populations more manageable and to
+	provide a small amount of privacy and anonymization.
+\item[client]
+	The IP (v4 and v6) address of the DNS client.
+\item[do\_bit]
+	This indexer has only two values: 0 or 1.  It indicates
+	whether or not the ``DO'' bit is set in a DNS query.  According to
+	RFC 2335: {\em Setting the DO bit to one in a query indicates
+	to the server that the resolver is able to accept DNSSEC
+	security RRs.}
+\item[edns\_version]
+	The EDNS version number, if any, in a DNS query.  EDNS
+	Version 0 is documented in RFC 2671.
+\item[idn\_qname]
+	This indexer has only two values: 0 or 1.  It returns 1
+	when the first QNAME in the DNS message question section
+	is an internationalized domain name (i.e., containing
+	non-ASCII characters).  Such QNAMEs begin with the string
+	{\tt xn--\/}.  This convention is documented in RFC 3490.
+\item[msglen]
+	The overall length (size) of the DNS message.
+\item[null]
+	A ``no-op'' indexer that always returns the same value.
+	This can be used to effectively turn the 2-D table into a
+	1-D array.
+\item[opcode]
+	The DNS message opcode is a four-bit field.  QUERY is the
+	most common opcode.  Additional currently defined opcodes
+	include: IQUERY, STATUS, NOTIFY, and UPDATE.
+\item[qclass]
+	The DNS message query class (QCLASS) is a 16-bit value.  IN
+	is the most common query class.  Additional currently defined
+	query class values include: CHAOS, HS, NONE, and ANY.
+\item[qname]
+	The full QNAME string from the first (and usually only)
+	QNAME in the question section of a DNS message.
+\item[qnamelen]
+	The length of the first (and usually only) QNAME in a DNS
+	message question section.  Note this is the ``expanded''
+	length if the message happens to take advantage of DNS
+	message ``compression.''
+\item[qtype]
+	The query type (QTYPE) for the first QNAME in the DNS message
+	question section.  Well-known query types include: A, AAAA,
+	A6, CNAME, PTR, MX, NS, SOA, and ANY.
+\item[query\_classification]
+	A stateless classification of ``bogus'' queries:
+	\begin{itemize}
+	\setlength{\itemsep}{0ex plus 0.5ex minus 0.0ex}
+	\item non-auth-tld: when the TLD is not one of the IANA-approved TLDs.
+	\item root-servers.net: a query for a root server IP address.
+	\item localhost: a query for the localhost IP address.
+	\item a-for-root: an A query for the DNS root (.).
+	\item a-for-a: an A query for an IPv4 address.
+	\item rfc1918-ptr: a PTR query for an RFC 1918 address.
+	\item funny-class: a query with an unknown/undefined query class.
+	\item funny-qtype: a query with an unknown/undefined query type.
+	\item src-port-zero: when the UDP message's source port equals zero.
+	\item malformed: a malformed DNS message that could not be entirely parsed.
+	\end{itemize}
+\item[rcode]
+	The RCODE value in a DNS response.  The most common response
+	codes are 0 (NO ERROR) and 3 (NXDOMAIN).
+\item[rd\_bit]
+	This indexer returns 1 if the RD (recursion desired) bit is
+	set in the query.  Usually only stub resolvers set the RD bit.
+	Usually authoritative servers do not offer recursion to their
+	clients.
+\item[tc\_bit]
+	This indexer returns 1 if the TC (truncated) bit is
+	set (in a response).  An authoritative server sets the TC bit
+	when the entire response won't fit into a UDP message.
+\item[tld]
+	the TLD of the first QNAME in a DNS message's question section.
+\item[transport]
+	Indicates whether the DNS message is carried via UDP or TCP\@.
+\item[dns\_ip\_version]
+	The IP version number that carried the DNS message.
+\end{description}
+
+\subsection{DNS Filters}
+
+You must specify one or more of the following filters (separated by commas) on
+the {\tt dataset\/} line:
+
+\begin{description}
+\item[any]
+	The no-op filter, counts all messages.
+\item[queries-only]
+	Count only DNS query messages.  A query is a DNS message
+	where the QR bit is set to 0.
+\item[replies-only]
+	Count only DNS response messages.  A query is a DNS message
+        where the QR bit is set to 1.
+\item[popular-qtypes]
+	Count only DNS messages where the query type is one of:
+	A, NS, CNAME, SOA, PTR, MX, AAAA, A6, ANY.
+\item[idn-only]
+	Count only DNS messages where the query name is in the
+	internationalized domain name format.
+\item[aaaa-or-a6-only]
+	Count only DNS Messages where the query type is AAAA or A6.
+\item[root-servers-net-only]
+	Count only DNS messages where the query name is within
+	the {\em root-servers.net\/} domain.
+\item[chaos-class]
+	Counts only DNS messages where QCLASS is equal to
+	CHAOS (3).  The CHAOS class is generally used
+	for only the special {\em hostname.bind\/} and
+	{\em version.bind\/} queries.
+\end{description}
+
+\noindent
+Note that multiple filters are ANDed together.  That is, they
+narrow the input stream, rather than broaden it.
+
+In addition to these pre-defined filters, you can add your own
+custom filters.
+
+\subsubsection{qname\_filter}
+\label{sec-qname-filter}
+
+The {\em qname\_filter} directive defines a new
+filter that uses regular expression matching on the QNAME field of
+a DNS message.  This may be useful if you have a server that is
+authoritative for a number of zones, but you want to limit
+your measurements to a small subset.  The {\em qname\_filter} directive
+takes two arguments: a name for the filter and a regular expression.
+For example:
+
+\begin{MyVerbatim}
+qname_filter MyFilterName example\.(com|net|org)$ ;
+\end{MyVerbatim}
+
+This filter matches queries (and responses) for names ending with
+{\em example.com\/}, {\em example.net\/}, and {\em example.org\/}.
+You can reference the named filter in the filters part of a {\em
+dataset\/} line.  For example:
+
+\begin{MyVerbatim}
+dataset qtype dns All:null Qtype:qtype queries-only,MyFilterName;
+\end{MyVerbatim}
+
+\subsection{Parameters}
+\label{sec-dataset-params}
+
+\noindent
+{\tt dsc\/} currently supports the following optional parameters:
+
+\begin{description}
+\item[min-count={\em NN\/}]
+	Cells with counts less than {\em NN\/} are not included in
+	the output.  Instead, they are aggregated into the special
+	values {\tt -:SKIPPED:-\/} and {\tt -:SKIPPED\_SUM:-\/}.
+	This helps reduce the size of datasets with a large number
+	of small counts.
+\item[max-cells={\em NN\/}]
+	A different, perhaps better, way of limiting the size
+	of a dataset.  Instead of trying to determine an appropriate
+	{\em min-count\/} value in advance, {\em max-cells\/}
+	allows you put a limit on the number of cells to
+	include for the second dataset dimension.  If the dataset
+	has 9 possible first-dimension values, and you specify
+	a {\em max-cell\/} count of 100, then the dataset will not
+	have more than 900 total values.  The cell values are sorted
+	and the top {\em max-cell\/} values are output.  Values
+	that fall below the limit are aggregated into the special
+	{\tt -:SKIPPED:-\/} and {\tt -:SKIPPED\_SUM:-\/} entries.
+\end{description}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\chapter{Data Storage}
+
+\section{XML Structure}
+
+A dataset XML file has the following structure:
+
+\begin{MyVerbatim}
+<array name="dataset-name" dimensions="2" start_time="unix-seconds"
+        stop_time="unix-seconds">
+  <dimension number="1" type="Label1"/>
+  <dimension number="2" type="Label2"/>
+  <data>
+    <Label1 val="D1-V1">
+      <Label2 val="D2-V1" count="N1"/>
+      <Label2 val="D2-V2" count="N2"/>
+      <Label2 val="D2-V3" count="N3"/>
+    </Label1>
+    <Label1 val="D1-V2">
+      <Label2 val="D2-V1" count="N1"/>
+      <Label2 val="D2-V2" count="N2"/>
+      <Label2 val="D2-V3" count="N3"/>
+    </Label1>
+  </data>
+</array>
+\end{MyVerbatim}
+
+\noindent
+{\em dataset-name\/},
+{\em Label1\/}, and
+{\em Label2\/} come from the dataset definition in {\em dsc.conf\/}.
+
+The {\em start\_time\/} and {\em stop\_time\/} attributes
+are given in Unix seconds.  They are normally 60-seconds apart.
+{\tt dsc} usually starts a new measurement interval on 60 second
+boundaries. That is:
+
+\begin{equation}
+stop\_time \bmod{60} == 0
+\end{equation}
+
+The LABEL1 VAL attributes ({\em D1-V1\/}, {\em D1-V2\/}, etc) are
+values for the first dimension indexer.
+Similarly, the LABEL2 VAL attributes ({\em D2-V1\/}, {\em D2-V2\/},
+{\em D2-V3\/}) are values for the second dimension indexer.
+For some indexers these
+values are numeric, for others they are strings.  If the value
+contains certain non-printable characters, the string is base64-encoded
+and the optional BASE64 attribute is set to 1.
+
+There are two special VALs that help keep large datasets down
+to a reasonable size: {\tt -:SKIPPED:-\/}  and {\tt -:SKIPPED\_SUM:-\/}.
+These may be present on datasets that use the {\em min-count\/}
+and {\em max-cells\/} parameters (see Section~\ref{sec-dataset-params}).
+{\tt -:SKIPPED:-\/} is the number of cells that were not included
+in the XML output.  {\tt -:SKIPPED\_SUM:-\/}, on the other hand, is the
+sum of the counts for all the skipped cells.
+
+Note that ``one-dimensional datasets'' still use two dimensions in
+the XML file.  The first dimension type and value will be ``All'',
+as shown in the example below.
+
+The {\em count\/} values are always integers.  If the count for
+a particular tuple is zero, it should not be included in the
+XML file.
+
+Note that the contents of the XML file do not indicate
+where it came from.  In particular, the server and node that
+it came from are not present.  Instead, DSC relies on the
+presenter to store XML files in a directory hierarchy
+with the server and node as directory names.
+
+
+\noindent
+Here is a short sample XML file with real content:
+\begin{MyVerbatim}
+<array name="rcode" dimensions="2" start_time="1154649600"
+        stop_time="1154649660">
+  <dimension number="1" type="All"/>
+  <dimension number="2" type="Rcode"/>
+  <data>
+    <All val="ALL">
+      <Rcode val="0" count="70945"/>
+      <Rcode val="3" count="50586"/>
+      <Rcode val="4" count="121"/>
+      <Rcode val="1" count="56"/>
+      <Rcode val="5" count="44"/>
+    </All>
+  </data>
+</array>
+\end{MyVerbatim}
+
+\noindent
+Please see
+\path|http://dns.measurement-factory.com/tools/dsc/sample-xml/|
+for more sample XML files.
+
+The XML is not very strict and might cause XML purists to cringe.
+{\tt dsc} writes the XML files the old-fashioned way (with printf())
+and reads them with Perl's XML::Simple module.
+Here is a possibly-valid DTD for the dataset XML format.
+Note, however, that the {\em LABEL1\/}
+and {\em LABEL2\/} strings are different
+for each dataset:
+
+\begin{MyVerbatim}
+<!DOCTYPE ARRAY [
+
+<!ELEMENT ARRAY (DIMENSION+, DATA))>
+<!ELEMENT DIMENSION>
+<!ELEMENT DATA (LABEL1+)>
+<!ELEMENT LABEL1 (LABEL2+)>
+
+<!ATTLIST ARRAY NAME CDATA #REQUIRED>
+<!ATTLIST ARRAY DIMENSIONS CDATA #REQUIRED>
+<!ATTLIST ARRAY START_TIME CDATA #REQUIRED>
+<!ATTLIST ARRAY STOP_TIME CDATA #REQUIRED>
+<!ATTLIST DIMENSION NUMBER CDATA #REQUIRED>
+<!ATTLIST DIMENSION TYPE CDATA #REQUIRED>
+<!ATTLIST LABEL1 VAL CDATA #REQUIRED>
+<!ATTLIST LABEL2 VAL CDATA #REQUIRED>
+<!ATTLIST LABEL2 COUNT CDATA #REQUIRED>
+
+]>
+\end{MyVerbatim}
+
+\subsection{XML File Naming Conventions}
+
+{\tt dsc\/} relies on certain file naming conventions for XML files.
+The file name should be of the format:
+
+\begin{quote}
+{\em timestamp\/}.dscdata.xml
+\end{quote}
+
+\noindent
+For example:
+
+\begin{quote}
+1154649660.dscdata.xml
+\end{quote}
+
+NOTE: Versions of DSC prior to 2008-01-30 used a different naming
+convention.  Instead of ``dscdata'' the XML file was named after
+the dataset that generated the data.  The current XML extraction
+code still supports the older naming convention for backward compatibility.
+If the second component of the XML file name is not ``dscdata'' then
+the extractor assume it is a dataset name.
+
+\noindent
+Dataset names come from {\em dsc.conf\/}, and should match the NAME
+attribute of the ARRAY tag inside the XML file.  The timestamp is in
+Unix epoch seconds and is usually the same as the {\em stop\_time\/}
+value.
+
+
+\section{JSON Structure}
+
+The JSON structure mimics the XML structure so that elements are the same.
+
+\begin{MyVerbatim}
+{
+  "name": "dataset-name",
+  "start_time": unix-seconds,
+  "stop_time": unix-seconds,
+  "dimensions": [ "Label1", "Label2" ],
+  "data": [
+    {
+      "Label1": "D1-V1",
+      "Label2": [
+        { "val": "D2-V1", "count": N1 },
+        { "val": "D2-V2", "count": N2 },
+        { "val": "D2-V3", "count": N3 }
+      ]
+    },
+    {
+      "Label1": "D1-V1-base64",
+      "base64": true,
+      "Label2": [
+        { "val": "D2-V1", "count": N1 },
+        { "val": "D2-V2-base64", "base64": true, "count": N2 },
+        { "val": "D2-V3", "count": N3 }
+      ]
+    }
+  ]
+}
+\end{MyVerbatim}
+
+
+\section{Archived Data Format}
+
+{\dsc} actually uses four different file formats for archived
+datasets.  These are all text-based and designed to be quickly
+read from, and written to, by Perl scripts.
+
+\subsection{Format 1}
+
+\noindent
+\begin{tt}time $k1$ $N_{k1}$ $k2$ $N_{k2}$ $k3$ $N_{k3}$ ...
+\end{tt}
+
+\vspace{1ex}\noindent
+This is a one-dimensional time-series format.\footnote{Which means
+it can only be used for datasets where one of the indexers is set
+to the Null indexer.}  The first column is a timestamp (unix seconds).
+The remaining space-separated fields are key-value pairs.  For
+example:
+
+\begin{MyVerbatim}
+1093219980 root-servers.net 122 rfc1918-ptr 112 a-for-a 926 funny-qclass 16
+1093220040 root-servers.net 121 rfc1918-ptr 104 a-for-a 905 funny-qclass 15
+1093220100 root-servers.net 137 rfc1918-ptr 116 a-for-a 871 funny-qclass 12
+\end{MyVerbatim}
+
+\subsection{Format 2}
+
+\noindent
+\begin{tt}time $j1$ $k1$:$N_{j1,k1}$:$k2$:$N_{j1,k2}$:... $j2$ $k1$:$N_{j2,k1}$:$k2$:$N_{j2,k2}$:... ...
+\end{tt}
+
+\vspace{1ex}\noindent
+This is a two-dimensional time-series format.  In the above,
+$j$ represents the first dimension indexer and $k$ represents
+the second.  Key-value pairs for the second dimension are
+separated by colons, rather than space.  For example:
+
+\begin{MyVerbatim}
+1093220160 recv icmp:2397:udp:136712:tcp:428 sent icmp:819:udp:119191:tcp:323
+1093220220 recv icmp:2229:udp:124708:tcp:495 sent icmp:716:udp:107652:tcp:350
+1093220280 recv udp:138212:icmp:2342:tcp:499 sent udp:120788:icmp:819:tcp:364
+1093220340 recv icmp:2285:udp:137107:tcp:468 sent icmp:733:udp:118522:tcp:341
+\end{MyVerbatim}
+
+\subsection{Format 3}
+
+\noindent
+\begin{tt}$k$ $N_{k}$
+\end{tt}
+
+\vspace{1ex}\noindent
+This format is used for one-dimensional datasets where the key space
+is (potentially) very large.  That is, putting all the key-value pairs
+on a single line would result in a very long line in the datafile.
+Furthermore, for these larger datasets, it is prohibitive to
+store the data as a time series.  Instead the counters are incremented
+over time.  For example:
+
+\begin{MyVerbatim}
+10.0.160.0 3024
+10.0.20.0 92
+10.0.244.0 5934
+\end{MyVerbatim}
+
+\subsection{Format 4}
+
+\noindent
+\begin{tt}$j$ $k$ $N_{j,k}$
+\end{tt}
+
+\vspace{1ex}\noindent
+This format is used for two-dimensional datasets where one or both
+key spaces are very large.  Again, counters are incremented over
+time, rather than storing the data as a time series.
+For example:
+
+\begin{MyVerbatim}
+10.0.0.0 non-auth-tld 105
+10.0.0.0 ok 37383
+10.0.0.0 rfc1918-ptr 5941
+10.0.0.0 root-servers.net 1872
+10.0.1.0 a-for-a 6
+10.0.1.0 non-auth-tld 363
+10.0.1.0 ok 144
+\end{MyVerbatim}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\chapter{Bugs}
+
+\begin{itemize}
+
+\item
+	Seems too confusing to have an opaque name for indexers in
+	dsc.conf dataset line.  The names are pre-determined anyway
+	since they must match what the XML extractors look for.
+\item
+	Also stupid to have indexer names and a separate ``Label'' for
+	the XML file.
+
+\item
+	{\dsc} perl modules are installed in the ``site\_perl'' directory
+	but they should probably be installed under /usr/local/dsc.
+
+\item
+	{\dsc} collector silently drops UDP frags
+
+\end{itemize}
+
+\end{document}