Merge branch 'maint-0.2.1' into release-0.2.1

This commit is contained in:
Roger Dingledine 2011-02-22 14:52:38 -05:00
commit f59cad68dc
92 changed files with 19 additions and 20783 deletions

5
changes/torspec.git Normal file
View File

@ -0,0 +1,5 @@
o Packaging changes:
- Stop shipping the Tor specs files and development proposal documents
in the tarball. They are now in a separate git repository at
git://git.torproject.org/torspec.git

View File

@ -2,11 +2,11 @@
EXTRA_DIST = HACKING \
tor-resolve.1 tor-gencert.1 \
tor-osx-dmg-creation.txt tor-rpm-creation.txt \
tor-win32-mingw-creation.txt
tor-win32-mingw-creation.txt spec/README
man_MANS = tor.1 tor-resolve.1 tor-gencert.1
SUBDIRS = design-paper spec
SUBDIRS = design-paper
DIST_SUBDIRS = design-paper spec
DIST_SUBDIRS = design-paper

View File

@ -1,5 +0,0 @@
EXTRA_DIST = tor-spec.txt rend-spec.txt control-spec.txt \
dir-spec.txt socks-extensions.txt path-spec.txt \
version-spec.txt address-spec.txt

11
doc/spec/README Normal file
View File

@ -0,0 +1,11 @@
The Tor specifications and proposals have moved to a new repository.
To browse the specifications, go to
https://gitweb.torproject.org/torspec.git/tree
To check out the specification repository, run
git clone git://git.torproject.org/torspec.git
For other information on the repository, see
https://gitweb.torproject.org/torspec.git

View File

@ -1,68 +0,0 @@
$Id$
Special Hostnames in Tor
Nick Mathewson
1. Overview
Most of the time, Tor treats user-specified hostnames as opaque: When
the user connects to www.torproject.org, Tor picks an exit node and uses
that node to connect to "www.torproject.org". Some hostnames, however,
can be used to override Tor's default behavior and circuit-building
rules.
These hostnames can be passed to Tor as the address part of a SOCKS4a or
SOCKS5 request. If the application is connected to Tor using an IP-only
method (such as SOCKS4, TransPort, or NatdPort), these hostnames can be
substituted for certain IP addresses using the MapAddress configuration
option or the MAPADDRESS control command.
2. .exit
SYNTAX: [hostname].[name-or-digest].exit
[name-or-digest].exit
Hostname is a valid hostname; [name-or-digest] is either the nickname of a
Tor node or the hex-encoded digest of that node's public key.
When Tor sees an address in this format, it uses the specified hostname as
the exit node. If no "hostname" component is given, Tor defaults to the
published IPv4 address of the exit node.
It is valid to try to resolve hostnames, and in fact upon success Tor
will cache an internal mapaddress of the form
"www.google.com.foo.exit=64.233.161.99.foo.exit" to speed subsequent
lookups.
EXAMPLES:
www.example.com.exampletornode.exit
Connect to www.example.com from the node called "exampletornode."
exampletornode.exit
Connect to the published IP address of "exampletornode" using
"exampletornode" as the exit.
3. .onion
SYNTAX: [digest].onion
The digest is the first eighty bits of a SHA1 hash of the identity key for
a hidden service, encoded in base32.
When Tor sees an address in this format, it tries to look up and connect to
the specified hidden service. See rend-spec.txt for full details.
4. .noconnect
SYNTAX: [string].noconnect
When Tor sees an address in this format, it immediately closes the
connection without attaching it to any circuit. This is useful for
controllers that want to test whether a given application is indeed using
the same instance of Tor that they're controlling.
5. [XXX Is there a ".virtual" address that we expose too, or is that
just intended to be internal? -RD]

View File

@ -1,250 +0,0 @@
$Id$
Tor bridges specification
0. Preface
This document describes the design decisions around support for bridge
users, bridge relays, and bridge authorities. It acts as an overview
of the bridge design and deployment for developers, and it also tries
to point out limitations in the current design and implementation.
For more details on what all of these mean, look at blocking.tex in
/doc/design-paper/
1. Bridge relays
Bridge relays are just like normal Tor relays except they don't publish
their server descriptors to the main directory authorities.
1.1. PublishServerDescriptor
To configure your relay to be a bridge relay, just add
BridgeRelay 1
PublishServerDescriptor bridge
to your torrc. This will cause your relay to publish its descriptor
to the bridge authorities rather than to the default authorities.
Alternatively, you can say
BridgeRelay 1
PublishServerDescriptor 0
which will cause your relay to not publish anywhere. This could be
useful for private bridges.
1.2. Recommendations.
Bridge relays should use an exit policy of "reject *:*". This is
because they only need to relay traffic between the bridge users
and the rest of the Tor network, so there's no need to let people
exit directly from them.
We invented the RelayBandwidth* options for this situation: Tor clients
who want to allow relaying too. See proposal 111 for details. Relay
operators should feel free to rate-limit their relayed traffic.
1.3. Implementation note.
Vidalia 0.0.15 has turned its "Relay" settings page into a tri-state
"Don't relay" / "Relay for the Tor network" / "Help censored users".
If you click the third choice, it forces your exit policy to reject *:*.
If all the bridges end up on port 9001, that's not so good. On the
other hand, putting the bridges on a low-numbered port in the Unix
world requires jumping through extra hoops. The current compromise is
that Vidalia makes the ORPort default to 443 on Windows, and 9001 on
other platforms.
At the bottom of the relay config settings window, Vidalia displays
the bridge identifier to the operator (see Section 3.1) so he can pass
it on to bridge users.
2. Bridge authorities.
Bridge authorities are like normal v3 directory authorities, except
they don't create their own network-status documents or votes. So if
you ask a bridge authority for a network-status document or consensus,
they behave like a directory mirror: they give you one from one of
the main authorities. But if you ask the bridge authority for the
descriptor corresponding to a particular identity fingerprint, it will
happily give you the latest descriptor for that fingerprint.
To become a bridge authority, add these lines to your torrc:
AuthoritativeDirectory 1
BridgeAuthoritativeDir 1
Right now there's one bridge authority, running on the Tonga relay.
2.1. Exporting bridge-purpose descriptors
We've added a new purpose for server descriptors: the "bridge"
purpose. With the new router-descriptors file format that includes
annotations, it's easy to look through it and find the bridge-purpose
descriptors.
Currently we export the bridge descriptors from Tonga to the
BridgeDB server, so it can give them out according to the policies
in blocking.pdf.
2.2. Reachability/uptime testing
Right now the bridge authorities do active reachability testing of
bridges, so we know which ones to recommend for users.
But in the design document, we suggested that bridges should publish
anonymously (i.e. via Tor) to the bridge authority, so somebody watching
the bridge authority can't just enumerate all the bridges. But if we're
doing active measurement, the game is up. Perhaps we should back off on
this goal, or perhaps we should do our active measurement anonymously?
Answering this issue is scheduled for 0.2.1.x.
2.3. Future work: migrating to multiple bridge authorities
Having only one bridge authority is both a trust bottleneck (if you
break into one place you learn about every single bridge we've got)
and a robustness bottleneck (when it's down, bridge users become sad).
Right now if we put up a second bridge authority, all the bridges would
publish to it, and (assuming the code works) bridge users would query
a random bridge authority. This resolves the robustness bottleneck,
but makes the trust bottleneck even worse.
In 0.2.2.x and later we should think about better ways to have multiple
bridge authorities.
3. Bridge users.
Bridge users are like ordinary Tor users except they use encrypted
directory connections by default, and they use bridge relays as both
entry guards (their first hop) and directory guards (the source of
all their directory information).
To become a bridge user, add the following line to your torrc:
UseBridges 1
and then add at least one "Bridge" line to your torrc based on the
format below.
3.1. Format of the bridge identifier.
The canonical format for a bridge identifier contains an IP address,
an ORPort, and an identity fingerprint:
bridge 128.31.0.34:9009 4C17 FB53 2E20 B2A8 AC19 9441 ECD2 B017 7B39 E4B1
However, the identity fingerprint can be left out, in which case the
bridge user will connect to that relay and use it as a bridge regardless
of what identity key it presents:
bridge 128.31.0.34:9009
This might be useful for cases where only short bridge identifiers
can be communicated to bridge users.
In a future version we may also support bridge identifiers that are
only a key fingerprint:
bridge 4C17 FB53 2E20 B2A8 AC19 9441 ECD2 B017 7B39 E4B1
and the bridge user can fetch the latest descriptor from the bridge
authority (see Section 3.4).
3.2. Bridges as entry guards
For now, bridge users add their bridge relays to their list of "entry
guards" (see path-spec.txt for background on entry guards). They are
managed by the entry guard algorithms exactly as if they were a normal
entry guard -- their keys and timing get cached in the "state" file,
etc. This means that when the Tor user starts up with "UseBridges"
disabled, he will skip past the bridge entries since they won't be
listed as up and usable in his networkstatus consensus. But to be clear,
the "entry_guards" list doesn't currently distinguish guards by purpose.
Internally, each bridge user keeps a smartlist of "bridge_info_t"
that reflects the "bridge" lines from his torrc along with a download
schedule (see Section 3.5 below). When he starts Tor, he attempts
to fetch a descriptor for each configured bridge (see Section 3.4
below). When he succeeds at getting a descriptor for one of the bridges
in his list, he adds it directly to the entry guard list using the
normal add_an_entry_guard() interface. Once a bridge descriptor has
been added, should_delay_dir_fetches() will stop delaying further
directory fetches, and the user begins to bootstrap his directory
information from that bridge (see Section 3.3).
Currently bridge users cache their bridge descriptors to the
"cached-descriptors" file (annotated with purpose "bridge"), but
they don't make any attempt to reuse descriptors they find in this
file. The theory is that either the bridge is available now, in which
case you can get a fresh descriptor, or it's not, in which case an
old descriptor won't do you much good.
We could disable writing out the bridge lines to the state file, if
we think this is a problem.
As an exception, if we get an application request when we have one
or more bridge descriptors but we believe none of them are running,
we mark them all as running again. This is similar to the exception
already in place to help long-idle Tor clients realize they should
fetch fresh directory information rather than just refuse requests.
3.3. Bridges as directory guards
In addition to using bridges as the first hop in their circuits, bridge
users also use them to fetch directory updates. Other than initial
bootstrapping to find a working bridge descriptor (see Section 3.4
below), all further non-anonymized directory fetches will be redirected
to the bridge.
This means that bridge relays need to have cached answers for all
questions the bridge user might ask. This makes the upgrade path
tricky --- for example, if we migrate to a v4 directory design, the
bridge user would need to keep using v3 so long as his bridge relays
only knew how to answer v3 queries.
In a future design, for cases where the user has enough information
to build circuits yet the chosen bridge doesn't know how to answer a
given query, we might teach bridge users to make an anonymized request
to a more suitable directory server.
3.4. How bridge users get their bridge descriptor
Bridge users can fetch bridge descriptors in two ways: by going directly
to the bridge and asking for "/tor/server/authority", or by going to
the bridge authority and asking for "/tor/server/fp/ID". By default,
they will only try the direct queries. If the user sets
UpdateBridgesFromAuthority 1
in his config file, then he will try querying the bridge authority
first for bridges where he knows a digest (if he only knows an IP
address and ORPort, then his only option is a direct query).
If the user has at least one working bridge, then he will do further
queries to the bridge authority through a full three-hop Tor circuit.
But when bootstrapping, he will make a direct begin_dir-style connection
to the bridge authority.
As of Tor 0.2.0.10-alpha, if the user attempts to fetch a descriptor
from the bridge authority and it returns a 404 not found, the user
will automatically fall back to trying a direct query. Therefore it is
recommended that bridge users always set UpdateBridgesFromAuthority,
since at worst it will delay their fetches a little bit and notify
the bridge authority of the identity fingerprint (but not location)
of their intended bridges.
3.5. Bridge descriptor retry schedule
Bridge users try to fetch a descriptor for each bridge (using the
steps in Section 3.4 above) on startup. Whenever they receive a
bridge descriptor, they reschedule a new descriptor download for 1
hour from then.
If on the other hand it fails, they try again after 15 minutes for the
first attempt, after 15 minutes for the second attempt, and after 60
minutes for subsequent attempts.
In 0.2.2.x we should come up with some smarter retry schedules.
3.6. Implementation note.
Vidalia 0.1.0 has a new checkbox in its Network config window called
"My ISP blocks connections to the Tor network." Users who click that
box change their configuration to:
UseBridges 1
UpdateBridgesFromAuthority 1
and should add at least one bridge identifier.

View File

@ -1,499 +0,0 @@
$Id$
TC: A Tor control protocol (Version 0)
-1. Deprecation
THIS PROTOCOL IS DEPRECATED. It is still documented here because Tor
0.1.1.x happens to support much of it; but the support for v0 is not
maintained, so you should expect it to rot in unpredictable ways. Support
for v0 will be removed some time after Tor 0.1.2.
0. Scope
This document describes an implementation-specific protocol that is used
for other programs (such as frontend user-interfaces) to communicate
with a locally running Tor process. It is not part of the Tor onion
routing protocol.
We're trying to be pretty extensible here, but not infinitely
forward-compatible.
1. Protocol outline
TC is a bidirectional message-based protocol. It assumes an underlying
stream for communication between a controlling process (the "client") and
a Tor process (the "server"). The stream may be implemented via TCP,
TLS-over-TCP, a Unix-domain socket, or so on, but it must provide
reliable in-order delivery. For security, the stream should not be
accessible by untrusted parties.
In TC, the client and server send typed variable-length messages to each
other over the underlying stream. By default, all messages from the server
are in response to messages from the client. Some client requests, however,
will cause the server to send messages to the client indefinitely far into
the future.
Servers respond to messages in the order they're received.
2. Message format
The messages take the following format:
Length [2 octets; big-endian]
Type [2 octets; big-endian]
Body [Length octets]
Upon encountering a recognized Type, implementations behave as described in
section 3 below. If the type is not recognized, servers respond with an
"ERROR" message (code UNRECOGNIZED; see 3.1 below), and clients simply ignore
the message.
2.1. Types and encodings
All numbers are given in big-endian (network) order.
OR identities are given in hexadecimal, in the same format as identity key
fingerprints, but without spaces; see tor-spec.txt for more information.
3. Message types
Message types are drawn from the following ranges:
0x0000-0xEFFF : Reserved for use by official versions of this spec.
0xF000-0xFFFF : Unallocated; usable by unofficial extensions.
3.1. ERROR (Type 0x0000)
Sent in response to a message that could not be processed as requested.
The body of the message begins with a 2-byte error code. The following
values are defined:
0x0000 Unspecified error
[]
0x0001 Internal error
[Something went wrong inside Tor, so that the client's
request couldn't be fulfilled.]
0x0002 Unrecognized message type
[The client sent a message type we don't understand.]
0x0003 Syntax error
[The client sent a message body in a format we can't parse.]
0x0004 Unrecognized configuration key
[The client tried to get or set a configuration option we don't
recognize.]
0x0005 Invalid configuration value
[The client tried to set a configuration option to an
incorrect, ill-formed, or impossible value.]
0x0006 Unrecognized byte code
[The client tried to set a byte code (in the body) that
we don't recognize.]
0x0007 Unauthorized.
[The client tried to send a command that requires
authorization, but it hasn't sent a valid AUTHENTICATE
message.]
0x0008 Failed authentication attempt
[The client sent a well-formed authorization message.]
0x0009 Resource exhausted
[The server didn't have enough of a given resource to
fulfill a given request.]
0x000A No such stream
0x000B No such circuit
0x000C No such OR
The rest of the body should be a human-readable description of the error.
In general, new error codes should only be added when they don't fall under
one of the existing error codes.
3.2. DONE (Type 0x0001)
Sent from server to client in response to a request that was successfully
completed, with no more information needed. The body is usually empty but
may contain a message.
3.3. SETCONF (Type 0x0002)
Change the value of a configuration variable. The body contains a list of
newline-terminated key-value configuration lines. An individual key-value
configuration line consists of the key, followed by a space, followed by
the value. The server behaves as though it had just read the key-value pair
in its configuration file.
The server responds with a DONE message on success, or an ERROR message on
failure.
When a configuration options takes multiple values, or when multiple
configuration keys form a context-sensitive group (see below), then
setting _any_ of the options in a SETCONF command is taken to reset all of
the others. For example, if two ORBindAddress values are configured,
and a SETCONF command arrives containing a single ORBindAddress value, the
new command's value replaces the two old values.
To _remove_ all settings for a given option entirely (and go back to its
default value), send a single line containing the key and no value.
3.4. GETCONF (Type 0x0003)
Request the value of a configuration variable. The body contains one or
more NL-terminated strings for configuration keys. The server replies
with a CONFVALUE message.
If an option appears multiple times in the configuration, all of its
key-value pairs are returned in order.
Some options are context-sensitive, and depend on other options with
different keywords. These cannot be fetched directly. Currently there
is only one such option: clients should use the "HiddenServiceOptions"
virtual keyword to get all HiddenServiceDir, HiddenServicePort,
HiddenServiceNodes, and HiddenServiceExcludeNodes option settings.
3.5. CONFVALUE (Type 0x0004)
Sent in response to a GETCONF message; contains a list of "Key Value\n"
(A non-whitespace keyword, a single space, a non-NL value, a NL)
strings.
3.6. SETEVENTS (Type 0x0005)
Request the server to inform the client about interesting events.
The body contains a list of 2-byte event codes (see "event" below).
Any events *not* listed in the SETEVENTS body are turned off; thus, sending
SETEVENTS with an empty body turns off all event reporting.
The server responds with a DONE message on success, and an ERROR message
if one of the event codes isn't recognized. (On error, the list of active
event codes isn't changed.)
3.7. EVENT (Type 0x0006)
Sent from the server to the client when an event has occurred and the
client has requested that kind of event. The body contains a 2-byte
event code followed by additional event-dependent information. Event
codes are:
0x0001 -- Circuit status changed
Status [1 octet]
0x00 Launched - circuit ID assigned to new circuit
0x01 Built - all hops finished, can now accept streams
0x02 Extended - one more hop has been completed
0x03 Failed - circuit closed (was not built)
0x04 Closed - circuit closed (was built)
Circuit ID [4 octets]
(Must be unique to Tor process/time)
Path [NUL-terminated comma-separated string]
(For extended/failed, is the portion of the path that is
built)
0x0002 -- Stream status changed
Status [1 octet]
(Sent connect=0,sent resolve=1,succeeded=2,failed=3,
closed=4, new connection=5, new resolve request=6,
stream detached from circuit and still retriable=7)
Stream ID [4 octets]
(Must be unique to Tor process/time)
Target (NUL-terminated address-port string]
0x0003 -- OR Connection status changed
Status [1 octet]
(Launched=0,connected=1,failed=2,closed=3)
OR nickname/identity [NUL-terminated]
0x0004 -- Bandwidth used in the last second
Bytes read [4 octets]
Bytes written [4 octets]
0x0005 -- Notice/warning/error occurred
Message [NUL-terminated]
<obsolete: use 0x0007-0x000B instead.>
0x0006 -- New descriptors available
OR List [NUL-terminated, comma-delimited list of
OR identity]
0x0007 -- Debug message occurred
0x0008 -- Info message occurred
0x0009 -- Notice message occurred
0x000A -- Warning message occurred
0x000B -- Error message occurred
Message [NUL-terminated]
3.8. AUTHENTICATE (Type 0x0007)
Sent from the client to the server. Contains a 'magic cookie' to prove
that client is really allowed to control this Tor process. The server
responds with DONE or ERROR.
The format of the 'cookie' is implementation-dependent; see 4.1 below for
information on how the standard Tor implementation handles it.
3.9. SAVECONF (Type 0x0008)
Sent from the client to the server. Instructs the server to write out
its config options into its torrc. Server returns DONE if successful, or
ERROR if it can't write the file or some other error occurs.
3.10. SIGNAL (Type 0x0009)
Sent from the client to the server. The body contains one byte that
indicates the action the client wishes the server to take.
1 (0x01) -- Reload: reload config items, refetch directory.
2 (0x02) -- Controlled shutdown: if server is an OP, exit immediately.
If it's an OR, close listeners and exit after 30 seconds.
10 (0x0A) -- Dump stats: log information about open connections and
circuits.
12 (0x0C) -- Debug: switch all open logs to loglevel debug.
15 (0x0F) -- Immediate shutdown: clean up and exit now.
The server responds with DONE if the signal is recognized (or simply
closes the socket if it was asked to close immediately), else ERROR.
3.11. MAPADDRESS (Type 0x000A)
Sent from the client to the server. The body contains a sequence of
address mappings, each consisting of the address to be mapped, a single
space, the replacement address, and a NL character.
Addresses may be IPv4 addresses, IPv6 addresses, or hostnames.
The client sends this message to the server in order to tell it that future
SOCKS requests for connections to the original address should be replaced
with connections to the specified replacement address. If the addresses
are well-formed, and the server is able to fulfill the request, the server
replies with a single DONE message containing the source and destination
addresses. If request is malformed, the server replies with a syntax error
message. The server can't fulfill the request, it replies with an internal
ERROR message.
The client may decline to provide a body for the original address, and
instead send a special null address ("0.0.0.0" for IPv4, "::0" for IPv6, or
"." for hostname), signifying that the server should choose the original
address itself, and return that address in the DONE message. The server
should ensure that it returns an element of address space that is unlikely
to be in actual use. If there is already an address mapped to the
destination address, the server may reuse that mapping.
If the original address is already mapped to a different address, the old
mapping is removed. If the original address and the destination address
are the same, the server removes any mapping in place for the original
address.
{Note: This feature is designed to be used to help Tor-ify applications
that need to use SOCKS4 or hostname-less SOCKS5. There are three
approaches to doing this:
1. Somehow make them use SOCKS4a or SOCKS5-with-hostnames instead.
2. Use tor-resolve (or another interface to Tor's resolve-over-SOCKS
feature) to resolve the hostname remotely. This doesn't work
with special addresses like x.onion or x.y.exit.
3. Use MAPADDRESS to map an IP address to the desired hostname, and then
arrange to fool the application into thinking that the hostname
has resolved to that IP.
This functionality is designed to help implement the 3rd approach.}
[XXXX When, if ever, can mappings expire? Should they expire?]
[XXXX What addresses, if any, are safe to use?]
3.12 GETINFO (Type 0x000B)
Sent from the client to the server. The message body is as for GETCONF:
one or more NL-terminated strings. The server replies with an INFOVALUE
message.
Unlike GETCONF, this message is used for data that are not stored in the
Tor configuration file, but instead.
Recognized key and their values include:
"version" -- The version of the server's software, including the name
of the software. (example: "Tor 0.0.9.4")
"desc/id/<OR identity>" or "desc/name/<OR nickname>" -- the latest server
descriptor for a given OR, NUL-terminated. If no such OR is known, the
corresponding value is an empty string.
"network-status" -- a space-separated list of all known OR identities.
This is in the same format as the router-status line in directories;
see tor-spec.txt for details.
"addr-mappings/all"
"addr-mappings/config"
"addr-mappings/cache"
"addr-mappings/control" -- a NL-terminated list of address mappings, each
in the form of "from-address" SP "to-address". The 'config' key
returns those address mappings set in the configuration; the 'cache'
key returns the mappings in the client-side DNS cache; the 'control'
key returns the mappings set via the control interface; the 'all'
target returns the mappings set through any mechanism.
3.13 INFOVALUE (Type 0x000C)
Sent from the server to the client in response to a GETINFO message.
Contains one or more items of the format:
Key [(NUL-terminated string)]
Value [(NUL-terminated string)]
The keys match those given in the GETINFO message.
3.14 EXTENDCIRCUIT (Type 0x000D)
Sent from the client to the server. The message body contains two fields:
Circuit ID [4 octets]
Path [NUL-terminated, comma-delimited string of OR nickname/identity]
This request takes one of two forms: either the Circuit ID is zero, in
which case it is a request for the server to build a new circuit according
to the specified path, or the Circuit ID is nonzero, in which case it is a
request for the server to extend an existing circuit with that ID according
to the specified path.
If the request is successful, the server sends a DONE message containing
a message body consisting of the four-octet Circuit ID of the newly created
circuit.
3.15 ATTACHSTREAM (Type 0x000E)
Sent from the client to the server. The message body contains two fields:
Stream ID [4 octets]
Circuit ID [4 octets]
This message informs the server that the specified stream should be
associated with the specified circuit. Each stream may be associated with
at most one circuit, and multiple streams may share the same circuit.
Streams can only be attached to completed circuits (that is, circuits that
have sent a circuit status 'built' event).
If the circuit ID is 0, responsibility for attaching the given stream is
returned to Tor.
{Implementation note: By default, Tor automatically attaches streams to
circuits itself, unless the configuration variable
"__LeaveStreamsUnattached" is set to "1". Attempting to attach streams
via TC when "__LeaveStreamsUnattached" is false may cause a race between
Tor and the controller, as both attempt to attach streams to circuits.}
3.16 POSTDESCRIPTOR (Type 0x000F)
Sent from the client to the server. The message body contains one field:
Descriptor [NUL-terminated string]
This message informs the server about a new descriptor.
The descriptor, when parsed, must contain a number of well-specified
fields, including fields for its nickname and identity.
If there is an error in parsing the descriptor, the server must send an
appropriate error message. If the descriptor is well-formed but the server
chooses not to add it, it must reply with a DONE message whose body
explains why the server was not added.
3.17 FRAGMENTHEADER (Type 0x0010)
Sent in either direction. Used to encapsulate messages longer than 65535
bytes in length.
Underlying type [2 bytes]
Total Length [4 bytes]
Data [Rest of message]
A FRAGMENTHEADER message MUST be followed immediately by a number of
FRAGMENT messages, such that lengths of the "Data" fields of the
FRAGMENTHEADER and FRAGMENT messages add to the "Total Length" field of the
FRAGMENTHEADER message.
Implementations MUST NOT fragment messages of length less than 65536 bytes.
Implementations MUST be able to process fragmented messages that not
optimally packed.
3.18 FRAGMENT (Type 0x0011)
Data [Entire message]
See FRAGMENTHEADER for more information
3.19 REDIRECTSTREAM (Type 0x0012)
Sent from the client to the server. The message body contains two fields:
Stream ID [4 octets]
Address [variable-length, NUL-terminated.]
Tells the server to change the exit address on the specified stream. No
remapping is performed on the new provided address.
To be sure that the modified address will be used, this event must be sent
after a new stream event is received, and before attaching this stream to
a circuit.
3.20 CLOSESTREAM (Type 0x0013)
Sent from the client to the server. The message body contains three
fields:
Stream ID [4 octets]
Reason [1 octet]
Flags [1 octet]
Tells the server to close the specified stream. The reason should be
one of the Tor RELAY_END reasons given in tor-spec.txt. Flags is not
used currently. Tor may hold the stream open for a while to flush
any data that is pending.
3.21 CLOSECIRCUIT (Type 0x0014)
Sent from the client to the server. The message body contains two
fields:
Circuit ID [4 octets]
Flags [1 octet]
Tells the server to close the specified circuit. If the LSB of the flags
field is nonzero, do not close the circuit unless it is unused.
4. Implementation notes
4.1. Authentication
By default, the current Tor implementation trusts all local users.
If the 'CookieAuthentication' option is true, Tor writes a "magic cookie"
file named "control_auth_cookie" into its data directory. To authenticate,
the controller must send the contents of this file.
If the 'HashedControlPassword' option is set, it must contain the salted
hash of a secret password. The salted hash is computed according to the
S2K algorithm in RFC 2440 (OpenPGP), and prefixed with the s2k specifier.
This is then encoded in hexadecimal, prefixed by the indicator sequence
"16:". Thus, for example, the password 'foo' could encode to:
16:660537E3E1CD49996044A3BF558097A981F539FEA2F9DA662B4626C1C2
++++++++++++++++**^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
salt hashed value
indicator
You can generate the salt of a password by calling
'tor --hash-password <password>'
or by using the example code in the Python and Java controller libraries.
To authenticate under this scheme, the controller sends Tor the original
secret that was used to generate the password.
4.2. Don't let the buffer get too big.
If you ask for lots of events, and 16MB of them queue up on the buffer,
the Tor process will close the socket.

File diff suppressed because it is too large Load Diff

View File

@ -1,315 +0,0 @@
$Id$
Tor Protocol Specification
Roger Dingledine
Nick Mathewson
0. Preliminaries
THIS SPECIFICATION IS OBSOLETE.
This document specifies the Tor directory protocol as used in version
0.1.0.x and earlier. See dir-spec.txt for a current version.
1. Basic operation
There is a small number of directory authorities, and a larger number of
caches. Client and servers know public keys for the directory authorities.
Tor servers periodically upload self-signed "router descriptors" to the
directory authorities. Each authority publishes a self-signed "directory"
(containing all the router descriptors it knows, and a statement on which
are running) and a self-signed "running routers" document containing only
the statement on which routers are running.
All Tors periodically download these documents, downloading the directory
less frequently than they do the "running routers" document. Clients
preferentially download from caches rather than authorities.
1.1. Document format
Router descriptors, directories, and running-routers documents all obey the
following lightweight extensible information format.
The highest level object is a Document, which consists of one or more
Items. Every Item begins with a KeywordLine, followed by one or more
Objects. A KeywordLine begins with a Keyword, optionally followed by
whitespace and more non-newline characters, and ends with a newline. A
Keyword is a sequence of one or more characters in the set [A-Za-z0-9-].
An Object is a block of encoded data in pseudo-Open-PGP-style
armor. (cf. RFC 2440)
More formally:
Document ::= (Item | NL)+
Item ::= KeywordLine Object*
KeywordLine ::= Keyword NL | Keyword WS ArgumentsChar+ NL
Keyword = KeywordChar+
KeywordChar ::= 'A' ... 'Z' | 'a' ... 'z' | '0' ... '9' | '-'
ArgumentChar ::= any printing ASCII character except NL.
WS = (SP | TAB)+
Object ::= BeginLine Base-64-encoded-data EndLine
BeginLine ::= "-----BEGIN " Keyword "-----" NL
EndLine ::= "-----END " Keyword "-----" NL
The BeginLine and EndLine of an Object must use the same keyword.
When interpreting a Document, software MUST reject any document containing a
KeywordLine that starts with a keyword it doesn't recognize.
The "opt" keyword is reserved for non-critical future extensions. All
implementations MUST ignore any item of the form "opt keyword ....." when
they would not recognize "keyword ....."; and MUST treat "opt keyword ....."
as synonymous with "keyword ......" when keyword is recognized.
2. Router descriptor format.
Every router descriptor MUST start with a "router" Item; MUST end with a
"router-signature" Item and an extra NL; and MUST contain exactly one
instance of each of the following Items: "published" "onion-key" "link-key"
"signing-key" "bandwidth". Additionally, a router descriptor MAY contain
any number of "accept", "reject", "fingerprint", "uptime", and "opt" Items.
Other than "router" and "router-signature", the items may appear in any
order.
The items' formats are as follows:
"router" nickname address ORPort SocksPort DirPort
Indicates the beginning of a router descriptor. "address"
must be an IPv4 address in dotted-quad format. The last
three numbers indicate the TCP ports at which this OR exposes
functionality. ORPort is a port at which this OR accepts TLS
connections for the main OR protocol; SocksPort is deprecated and
should always be 0; and DirPort is the port at which this OR accepts
directory-related HTTP connections. If any port is not supported,
the value 0 is given instead of a port number.
"bandwidth" bandwidth-avg bandwidth-burst bandwidth-observed
Estimated bandwidth for this router, in bytes per second. The
"average" bandwidth is the volume per second that the OR is willing
to sustain over long periods; the "burst" bandwidth is the volume
that the OR is willing to sustain in very short intervals. The
"observed" value is an estimate of the capacity this server can
handle. The server remembers the max bandwidth sustained output
over any ten second period in the past day, and another sustained
input. The "observed" value is the lesser of these two numbers.
"platform" string
A human-readable string describing the system on which this OR is
running. This MAY include the operating system, and SHOULD include
the name and version of the software implementing the Tor protocol.
"published" YYYY-MM-DD HH:MM:SS
The time, in GMT, when this descriptor was generated.
"fingerprint"
A fingerprint (a HASH_LEN-byte of asn1 encoded public key, encoded
in hex, with a single space after every 4 characters) for this router's
identity key. A descriptor is considered invalid (and MUST be
rejected) if the fingerprint line does not match the public key.
[We didn't start parsing this line until Tor 0.1.0.6-rc; it should
be marked with "opt" until earlier versions of Tor are obsolete.]
"hibernating" 0|1
If the value is 1, then the Tor server was hibernating when the
descriptor was published, and shouldn't be used to build circuits.
[We didn't start parsing this line until Tor 0.1.0.6-rc; it should
be marked with "opt" until earlier versions of Tor are obsolete.]
"uptime"
The number of seconds that this OR process has been running.
"onion-key" NL a public key in PEM format
This key is used to encrypt EXTEND cells for this OR. The key MUST
be accepted for at least XXXX hours after any new key is published in
a subsequent descriptor.
"signing-key" NL a public key in PEM format
The OR's long-term identity key.
"accept" exitpattern
"reject" exitpattern
These lines, in order, describe the rules that an OR follows when
deciding whether to allow a new stream to a given address. The
'exitpattern' syntax is described below.
"router-signature" NL Signature NL
The "SIGNATURE" object contains a signature of the PKCS1-padded
hash of the entire router descriptor, taken from the beginning of the
"router" line, through the newline after the "router-signature" line.
The router descriptor is invalid unless the signature is performed
with the router's identity key.
"contact" info NL
Describes a way to contact the server's administrator, preferably
including an email address and a PGP key fingerprint.
"family" names NL
'Names' is a whitespace-separated list of server nicknames. If two ORs
list one another in their "family" entries, then OPs should treat them
as a single OR for the purpose of path selection.
For example, if node A's descriptor contains "family B", and node B's
descriptor contains "family A", then node A and node B should never
be used on the same circuit.
"read-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM,NUM,NUM... NL
"write-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM,NUM,NUM... NL
Declare how much bandwidth the OR has used recently. Usage is divided
into intervals of NSEC seconds. The YYYY-MM-DD HH:MM:SS field defines
the end of the most recent interval. The numbers are the number of
bytes used in the most recent intervals, ordered from oldest to newest.
[We didn't start parsing these lines until Tor 0.1.0.6-rc; they should
be marked with "opt" until earlier versions of Tor are obsolete.]
2.1. Nonterminals in routerdescriptors
nickname ::= between 1 and 19 alphanumeric characters, case-insensitive.
exitpattern ::= addrspec ":" portspec
portspec ::= "*" | port | port "-" port
port ::= an integer between 1 and 65535, inclusive.
addrspec ::= "*" | ip4spec | ip6spec
ipv4spec ::= ip4 | ip4 "/" num_ip4_bits | ip4 "/" ip4mask
ip4 ::= an IPv4 address in dotted-quad format
ip4mask ::= an IPv4 mask in dotted-quad format
num_ip4_bits ::= an integer between 0 and 32
ip6spec ::= ip6 | ip6 "/" num_ip6_bits
ip6 ::= an IPv6 address, surrounded by square brackets.
num_ip6_bits ::= an integer between 0 and 128
Ports are required; if they are not included in the router
line, they must appear in the "ports" lines.
3. Directory format
A Directory begins with a "signed-directory" item, followed by one each of
the following, in any order: "recommended-software", "published",
"router-status", "dir-signing-key". It may include any number of "opt"
items. After these items, a directory includes any number of router
descriptors, and a single "directory-signature" item.
"signed-directory"
Indicates the start of a directory.
"published" YYYY-MM-DD HH:MM:SS
The time at which this directory was generated and signed, in GMT.
"dir-signing-key"
The key used to sign this directory; see "signing-key" for format.
"recommended-software" comma-separated-version-list
A list of which versions of which implementations are currently
believed to be secure and compatible with the network.
"running-routers" whitespace-separated-list
A description of which routers are currently believed to be up or
down. Every entry consists of an optional "!", followed by either an
OR's nickname, or "$" followed by a hexadecimal encoding of the hash
of an OR's identity key. If the "!" is included, the router is
believed not to be running; otherwise, it is believed to be running.
If a router's nickname is given, exactly one router of that nickname
will appear in the directory, and that router is "approved" by the
directory server. If a hashed identity key is given, that OR is not
"approved". [XXXX The 'running-routers' line is only provided for
backward compatibility. New code should parse 'router-status'
instead.]
"router-status" whitespace-separated-list
A description of which routers are currently believed to be up or
down, and which are verified or unverified. Contains one entry for
every router that the directory server knows. Each entry is of the
format:
!name=$digest [Verified router, currently not live.]
name=$digest [Verified router, currently live.]
!$digest [Unverified router, currently not live.]
or $digest [Unverified router, currently live.]
(where 'name' is the router's nickname and 'digest' is a hexadecimal
encoding of the hash of the routers' identity key).
When parsing this line, clients should only mark a router as
'verified' if its nickname AND digest match the one provided.
"directory-signature" nickname-of-dirserver NL Signature
The signature is computed by computing the digest of the
directory, from the characters "signed-directory", through the newline
after "directory-signature". This digest is then padded with PKCS.1,
and signed with the directory server's signing key.
If software encounters an unrecognized keyword in a single router descriptor,
it MUST reject only that router descriptor, and continue using the
others. Because this mechanism is used to add 'critical' extensions to
future versions of the router descriptor format, implementation should treat
it as a normal occurrence and not, for example, report it to the user as an
error. [Versions of Tor prior to 0.1.1 did this.]
If software encounters an unrecognized keyword in the directory header,
it SHOULD reject the entire directory.
4. Network-status descriptor
A "network-status" (a.k.a "running-routers") document is a truncated
directory that contains only the current status of a list of nodes, not
their actual descriptors. It contains exactly one of each of the following
entries.
"network-status"
Must appear first.
"published" YYYY-MM-DD HH:MM:SS
(see section 3 above)
"router-status" list
(see section 3 above)
"directory-signature" NL signature
(see section 3 above)
5. Behavior of a directory server
lists nodes that are connected currently
speaks HTTP on a socket, spits out directory on request
Directory servers listen on a certain port (the DirPort), and speak a
limited version of HTTP 1.0. Clients send either GET or POST commands.
The basic interactions are:
"%s %s HTTP/1.0\r\nContent-Length: %lu\r\nHost: %s\r\n\r\n",
command, url, content-length, host.
Get "/tor/" to fetch a full directory.
Get "/tor/dir.z" to fetch a compressed full directory.
Get "/tor/running-routers" to fetch a network-status descriptor.
Post "/tor/" to post a server descriptor, with the body of the
request containing the descriptor.
"host" is used to specify the address:port of the dirserver, so
the request can survive going through HTTP proxies.

View File

@ -1,897 +0,0 @@
$Id$
Tor directory protocol, version 2
0. Scope and preliminaries
This directory protocol is used by Tor version 0.1.1.x and 0.1.2.x. See
dir-spec-v1.txt for information on earlier versions, and dir-spec.txt
for information on later versions.
0.1. Goals and motivation
There were several problems with the way Tor handles directory information
in version 0.1.0.x and earlier. Here are the problems we try to fix with
this new design, already implemented in 0.1.1.x:
1. Directories were very large and use up a lot of bandwidth: clients
downloaded descriptors for all router several times an hour.
2. Every directory authority was a trust bottleneck: if a single
directory authority lied, it could make clients believe for a time an
arbitrarily distorted view of the Tor network.
3. Our current "verified server" system is kind of nonsensical.
4. Getting more directory authorities would add more points of failure
and worsen possible partitioning attacks.
There are two problems that remain unaddressed by this design.
5. Requiring every client to know about every router won't scale.
6. Requiring every directory cache to know every router won't scale.
We attempt to fix 1-4 here, and to build a solution that will work when we
figure out an answer for 5. We haven't thought at all about what to do
about 6.
1. Outline
There is a small set (say, around 10) of semi-trusted directory
authorities. A default list of authorities is shipped with the Tor
software. Users can change this list, but are encouraged not to do so, in
order to avoid partitioning attacks.
Routers periodically upload signed "descriptors" to the directory
authorities describing their keys, capabilities, and other information.
Routers may act as directory mirrors (also called "caches"), to reduce
load on the directory authorities. They announce this in their
descriptors.
Each directory authority periodically generates and signs a compact
"network status" document that lists that authority's view of the current
descriptors and status for known routers, but which does not include the
descriptors themselves.
Directory mirrors download, cache, and re-serve network-status documents
to clients.
Clients, directory mirrors, and directory authorities all use
network-status documents to find out when their list of routers is
out-of-date. If it is, they download any missing router descriptors.
Clients download missing descriptors from mirrors; mirrors and authorities
download from authorities. Descriptors are downloaded by the hash of the
descriptor, not by the server's identity key: this prevents servers from
attacking clients by giving them descriptors nobody else uses.
All directory information is uploaded and downloaded with HTTP.
Coordination among directory authorities is done client-side: clients
compute a vote-like algorithm among the network-status documents they
have, and base their decisions on the result.
1.1. What's different from 0.1.0.x?
Clients used to download a signed concatenated set of router descriptors
(called a "directory") from directory mirrors, regardless of which
descriptors had changed.
Between downloading directories, clients would download "network-status"
documents that would list which servers were supposed to running.
Clients would always believe the most recently published network-status
document they were served.
Routers used to upload fresh descriptors all the time, whether their keys
and other information had changed or not.
1.2. Document meta-format
Router descriptors, directories, and running-routers documents all obey the
following lightweight extensible information format.
The highest level object is a Document, which consists of one or more
Items. Every Item begins with a KeywordLine, followed by one or more
Objects. A KeywordLine begins with a Keyword, optionally followed by
whitespace and more non-newline characters, and ends with a newline. A
Keyword is a sequence of one or more characters in the set [A-Za-z0-9-].
An Object is a block of encoded data in pseudo-Open-PGP-style
armor. (cf. RFC 2440)
More formally:
Document ::= (Item | NL)+
Item ::= KeywordLine Object*
KeywordLine ::= Keyword NL | Keyword WS ArgumentsChar+ NL
Keyword = KeywordChar+
KeywordChar ::= 'A' ... 'Z' | 'a' ... 'z' | '0' ... '9' | '-'
ArgumentChar ::= any printing ASCII character except NL.
WS = (SP | TAB)+
Object ::= BeginLine Base-64-encoded-data EndLine
BeginLine ::= "-----BEGIN " Keyword "-----" NL
EndLine ::= "-----END " Keyword "-----" NL
The BeginLine and EndLine of an Object must use the same keyword.
When interpreting a Document, software MUST ignore any KeywordLine that
starts with a keyword it doesn't recognize; future implementations MUST NOT
require current clients to understand any KeywordLine not currently
described.
The "opt" keyword was used until Tor 0.1.2.5-alpha for non-critical future
extensions. All implementations MUST ignore any item of the form "opt
keyword ....." when they would not recognize "keyword ....."; and MUST
treat "opt keyword ....." as synonymous with "keyword ......" when keyword
is recognized.
Implementations before 0.1.2.5-alpha rejected any document with a
KeywordLine that started with a keyword that they didn't recognize.
Implementations MUST prefix items not recognized by older versions of Tor
with an "opt" until those versions of Tor are obsolete.
Other implementations that want to extend Tor's directory format MAY
introduce their own items. The keywords for extension items SHOULD start
with the characters "x-" or "X-", to guarantee that they will not conflict
with keywords used by future versions of Tor.
2. Router operation
ORs SHOULD generate a new router descriptor whenever any of the
following events have occurred:
- A period of time (18 hrs by default) has passed since the last
time a descriptor was generated.
- A descriptor field other than bandwidth or uptime has changed.
- Bandwidth has changed by at least a factor of 2 from the last time a
descriptor was generated, and at least a given interval of time
(20 mins by default) has passed since then.
- Its uptime has been reset (by restarting).
After generating a descriptor, ORs upload it to every directory
authority they know, by posting it to the URL
http://<hostname:port>/tor/
2.1. Router descriptor format
Every router descriptor MUST start with a "router" Item; MUST end with a
"router-signature" Item and an extra NL; and MUST contain exactly one
instance of each of the following Items: "published" "onion-key"
"signing-key" "bandwidth".
A router descriptor MAY have zero or one of each of the following Items,
but MUST NOT have more than one: "contact", "uptime", "fingerprint",
"hibernating", "read-history", "write-history", "eventdns", "platform",
"family".
Additionally, a router descriptor MAY contain any number of "accept",
"reject", and "opt" Items. Other than "router" and "router-signature",
the items may appear in any order.
The items' formats are as follows:
"router" nickname address ORPort SocksPort DirPort
Indicates the beginning of a router descriptor. "address" must be an
IPv4 address in dotted-quad format. The last three numbers indicate
the TCP ports at which this OR exposes functionality. ORPort is a port
at which this OR accepts TLS connections for the main OR protocol;
SocksPort is deprecated and should always be 0; and DirPort is the
port at which this OR accepts directory-related HTTP connections. If
any port is not supported, the value 0 is given instead of a port
number.
"bandwidth" bandwidth-avg bandwidth-burst bandwidth-observed
Estimated bandwidth for this router, in bytes per second. The
"average" bandwidth is the volume per second that the OR is willing to
sustain over long periods; the "burst" bandwidth is the volume that
the OR is willing to sustain in very short intervals. The "observed"
value is an estimate of the capacity this server can handle. The
server remembers the max bandwidth sustained output over any ten
second period in the past day, and another sustained input. The
"observed" value is the lesser of these two numbers.
"platform" string
A human-readable string describing the system on which this OR is
running. This MAY include the operating system, and SHOULD include
the name and version of the software implementing the Tor protocol.
"published" YYYY-MM-DD HH:MM:SS
The time, in GMT, when this descriptor was generated.
"fingerprint"
A fingerprint (a HASH_LEN-byte of asn1 encoded public key, encoded in
hex, with a single space after every 4 characters) for this router's
identity key. A descriptor is considered invalid (and MUST be
rejected) if the fingerprint line does not match the public key.
[We didn't start parsing this line until Tor 0.1.0.6-rc; it should
be marked with "opt" until earlier versions of Tor are obsolete.]
"hibernating" 0|1
If the value is 1, then the Tor server was hibernating when the
descriptor was published, and shouldn't be used to build circuits.
[We didn't start parsing this line until Tor 0.1.0.6-rc; it should be
marked with "opt" until earlier versions of Tor are obsolete.]
"uptime"
The number of seconds that this OR process has been running.
"onion-key" NL a public key in PEM format
This key is used to encrypt EXTEND cells for this OR. The key MUST be
accepted for at least 1 week after any new key is published in a
subsequent descriptor.
"signing-key" NL a public key in PEM format
The OR's long-term identity key.
"accept" exitpattern
"reject" exitpattern
These lines describe the rules that an OR follows when
deciding whether to allow a new stream to a given address. The
'exitpattern' syntax is described below. The rules are considered in
order; if no rule matches, the address will be accepted. For clarity,
the last such entry SHOULD be accept *:* or reject *:*.
"router-signature" NL Signature NL
The "SIGNATURE" object contains a signature of the PKCS1-padded
hash of the entire router descriptor, taken from the beginning of the
"router" line, through the newline after the "router-signature" line.
The router descriptor is invalid unless the signature is performed
with the router's identity key.
"contact" info NL
Describes a way to contact the server's administrator, preferably
including an email address and a PGP key fingerprint.
"family" names NL
'Names' is a space-separated list of server nicknames or
hexdigests. If two ORs list one another in their "family" entries,
then OPs should treat them as a single OR for the purpose of path
selection.
For example, if node A's descriptor contains "family B", and node B's
descriptor contains "family A", then node A and node B should never
be used on the same circuit.
"read-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM,NUM,NUM... NL
"write-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM,NUM,NUM... NL
Declare how much bandwidth the OR has used recently. Usage is divided
into intervals of NSEC seconds. The YYYY-MM-DD HH:MM:SS field
defines the end of the most recent interval. The numbers are the
number of bytes used in the most recent intervals, ordered from
oldest to newest.
[We didn't start parsing these lines until Tor 0.1.0.6-rc; they should
be marked with "opt" until earlier versions of Tor are obsolete.]
"eventdns" bool NL
Declare whether this version of Tor is using the newer enhanced
dns logic. Versions of Tor without eventdns SHOULD NOT be used for
reverse hostname lookups.
[All versions of Tor before 0.1.2.2-alpha should be assumed to have
this option set to 0 if it is not present. All Tor versions at
0.1.2.2-alpha or later should be assumed to have this option set to
1 if it is not present. Until 0.1.2.1-alpha-dev, this option was
not generated, even when eventdns was in use. Versions of Tor
before 0.1.2.1-alpha-dev did not parse this option, so it should be
marked "opt". With 0.2.0.1-alpha, the old 'dnsworker' logic has
been removed, rendering this option of historical interest only.]
2.2. Nonterminals in router descriptors
nickname ::= between 1 and 19 alphanumeric characters, case-insensitive.
hexdigest ::= a '$', followed by 20 hexadecimal characters.
[Represents a server by the digest of its identity key.]
exitpattern ::= addrspec ":" portspec
portspec ::= "*" | port | port "-" port
port ::= an integer between 1 and 65535, inclusive.
[Some implementations incorrectly generate ports with value 0.
Implementations SHOULD accept this, and SHOULD NOT generate it.]
addrspec ::= "*" | ip4spec | ip6spec
ipv4spec ::= ip4 | ip4 "/" num_ip4_bits | ip4 "/" ip4mask
ip4 ::= an IPv4 address in dotted-quad format
ip4mask ::= an IPv4 mask in dotted-quad format
num_ip4_bits ::= an integer between 0 and 32
ip6spec ::= ip6 | ip6 "/" num_ip6_bits
ip6 ::= an IPv6 address, surrounded by square brackets.
num_ip6_bits ::= an integer between 0 and 128
bool ::= "0" | "1"
Ports are required; if they are not included in the router
line, they must appear in the "ports" lines.
3. Network status format
Directory authorities generate, sign, and compress network-status
documents. Directory servers SHOULD generate a fresh network-status
document when the contents of such a document would be different from the
last one generated, and some time (at least one second, possibly longer)
has passed since the last one was generated.
The network status document contains a preamble, a set of router status
entries, and a signature, in that order.
We use the same meta-format as used for directories and router descriptors
in "tor-spec.txt". Implementations MAY insert blank lines
for clarity between sections; these blank lines are ignored.
Implementations MUST NOT depend on blank lines in any particular location.
As used here, "whitespace" is a sequence of 1 or more tab or space
characters.
The preamble contains:
"network-status-version" -- A document format version. For this
specification, the version is "2".
"dir-source" -- The authority's hostname, current IP address, and
directory port, all separated by whitespace.
"fingerprint" -- A base16-encoded hash of the signing key's
fingerprint, with no additional spaces added.
"contact" -- An arbitrary string describing how to contact the
directory server's administrator. Administrators should include at
least an email address and a PGP fingerprint.
"dir-signing-key" -- The directory server's public signing key.
"client-versions" -- A comma-separated list of recommended client
versions.
"server-versions" -- A comma-separated list of recommended server
versions.
"published" -- The publication time for this network-status object.
"dir-options" -- A set of flags, in any order, separated by whitespace:
"Names" if this directory authority performs name bindings.
"Versions" if this directory authority recommends software versions.
"BadExits" if the directory authority flags nodes that it believes
are performing incorrectly as exit nodes.
"BadDirectories" if the directory authority flags nodes that it
believes are performing incorrectly as directory caches.
The dir-options entry is optional. The "-versions" entries are required if
the "Versions" flag is present. The other entries are required and must
appear exactly once. The "network-status-version" entry must appear first;
the others may appear in any order. Implementations MUST ignore
additional arguments to the items above, and MUST ignore unrecognized
flags.
For each router, the router entry contains: (This format is designed for
conciseness.)
"r" -- followed by the following elements, in order, separated by
whitespace:
- The OR's nickname,
- A hash of its identity key, encoded in base64, with trailing =
signs removed.
- A hash of its most recent descriptor, encoded in base64, with
trailing = signs removed. (The hash is calculated as for
computing the signature of a descriptor.)
- The publication time of its most recent descriptor, in the form
YYYY-MM-DD HH:MM:SS, in GMT.
- An IP address
- An OR port
- A directory port (or "0" for none")
"s" -- A series of whitespace-separated status flags, in any order:
"Authority" if the router is a directory authority.
"BadExit" if the router is believed to be useless as an exit node
(because its ISP censors it, because it is behind a restrictive
proxy, or for some similar reason).
"BadDirectory" if the router is believed to be useless as a
directory cache (because its directory port isn't working,
its bandwidth is always throttled, or for some similar
reason).
"Exit" if the router is useful for building general-purpose exit
circuits.
"Fast" if the router is suitable for high-bandwidth circuits.
"Guard" if the router is suitable for use as an entry guard.
"Named" if the router's identity-nickname mapping is canonical,
and this authority binds names.
"Stable" if the router is suitable for long-lived circuits.
"Running" if the router is currently usable.
"Valid" if the router has been 'validated'.
"V2Dir" if the router implements this protocol.
"v" -- The version of the Tor protocol that this server is running. If
the value begins with "Tor" SP, the rest of the string is a Tor
version number, and the protocol is "The Tor protocol as supported
by the given version of Tor." Otherwise, if the value begins with
some other string, Tor has upgraded to a more sophisticated
protocol versioning system, and the protocol is "a version of the
Tor protocol more recent than any we recognize."
The "r" entry for each router must appear first and is required. The
"s" entry is optional (see Section 3.1 below for how the flags are
decided). Unrecognized flags on the "s" line and extra elements
on the "r" line must be ignored. The "v" line is optional; it was not
supported until 0.1.2.5-alpha, and it must be preceded with an "opt"
until all earlier versions of Tor are obsolete.
The signature section contains:
"directory-signature" nickname-of-dirserver NL Signature
Signature is a signature of this network-status document
(the document up until the signature, including the line
"directory-signature <nick>\n"), using the directory authority's
signing key.
We compress the network status list with zlib before transmitting it.
3.1. Establishing server status
(This section describes how directory authorities choose which status
flags to apply to routers, as of Tor 0.1.1.18-rc. Later directory
authorities MAY do things differently, so long as clients keep working
well. Clients MUST NOT depend on the exact behaviors in this section.)
In the below definitions, a router is considered "active" if it is
running, valid, and not hibernating.
"Valid" -- a router is 'Valid' if it is running a version of Tor not
known to be broken, and the directory authority has not blacklisted
it as suspicious.
"Named" -- Directory authority administrators may decide to support name
binding. If they do, then they must maintain a file of
nickname-to-identity-key mappings, and try to keep this file consistent
with other directory authorities. If they don't, they act as clients, and
report bindings made by other directory authorities (name X is bound to
identity Y if at least one binding directory lists it, and no directory
binds X to some other Y'.) A router is called 'Named' if the router
believes the given name should be bound to the given key.
"Running" -- A router is 'Running' if the authority managed to connect to
it successfully within the last 30 minutes.
"Stable" -- A router is 'Stable' if it is active, and either its
uptime is at least the median uptime for known active routers, or
its uptime is at least 30 days. Routers are never called stable if
they are running a version of Tor known to drop circuits stupidly.
(0.1.1.10-alpha through 0.1.1.16-rc are stupid this way.)
"Fast" -- A router is 'Fast' if it is active, and its bandwidth is
in the top 7/8ths for known active routers.
"Guard" -- A router is a possible 'Guard' if it is 'Stable' and its
bandwidth is above median for known active routers. If the total
bandwidth of active non-BadExit Exit servers is less than one third
of the total bandwidth of all active servers, no Exit is listed as
a Guard.
"Authority" -- A router is called an 'Authority' if the authority
generating the network-status document believes it is an authority.
"V2Dir" -- A router supports the v2 directory protocol if it has an open
directory port, and it is running a version of the directory protocol that
supports the functionality clients need. (Currently, this is
0.1.1.9-alpha or later.)
Directory server administrators may label some servers or IPs as
blacklisted, and elect not to include them in their network-status lists.
Authorities SHOULD 'disable' any servers in excess of 3 on any single IP.
When there are more than 3 to choose from, authorities should first prefer
authorities to non-authorities, then prefer Running to non-Running, and
then prefer high-bandwidth to low-bandwidth. To 'disable' a server, the
authority *should* advertise it without the Running or Valid flag.
Thus, the network-status list includes all non-blacklisted,
non-expired, non-superseded descriptors.
4. Directory server operation
All directory authorities and directory mirrors ("directory servers")
implement this section, except as noted.
4.1. Accepting uploads (authorities only)
When a router posts a signed descriptor to a directory authority, the
authority first checks whether it is well-formed and correctly
self-signed. If it is, the authority next verifies that the nickname
in question is not already assigned to a router with a different
public key.
Finally, the authority MAY check that the router is not blacklisted
because of its key, IP, or another reason.
If the descriptor passes these tests, and the authority does not already
have a descriptor for a router with this public key, it accepts the
descriptor and remembers it.
If the authority _does_ have a descriptor with the same public key, the
newly uploaded descriptor is remembered if its publication time is more
recent than the most recent old descriptor for that router, and either:
- There are non-cosmetic differences between the old descriptor and the
new one.
- Enough time has passed between the descriptors' publication times.
(Currently, 12 hours.)
Differences between router descriptors are "non-cosmetic" if they would be
sufficient to force an upload as described in section 2 above.
Note that the "cosmetic difference" test only applies to uploaded
descriptors, not to descriptors that the authority downloads from other
authorities.
4.2. Downloading network-status documents (authorities and caches)
All directory servers (authorities and mirrors) try to keep a fresh
set of network-status documents from every authority. To do so,
every 5 minutes, each authority asks every other authority for its
most recent network-status document. Every 15 minutes, each mirror
picks a random authority and asks it for the most recent network-status
documents for all the authorities the authority knows about (including
the chosen authority itself).
Directory servers and mirrors remember and serve the most recent
network-status document they have from each authority. Other
network-status documents don't need to be stored. If the most recent
network-status document is over 10 days old, it is discarded anyway.
Mirrors SHOULD store and serve network-status documents from authorities
they don't recognize, but SHOULD NOT use such documents for any other
purpose. Mirrors SHOULD discard network-status documents older than 48
hours.
4.3. Downloading and storing router descriptors (authorities and caches)
Periodically (currently, every 10 seconds), directory servers check
whether there are any specific descriptors (as identified by descriptor
hash in a network-status document) that they do not have and that they
are not currently trying to download.
If so, the directory server launches requests to the authorities for these
descriptors, such that each authority is only asked for descriptors listed
in its most recent network-status. When more than one authority lists the
descriptor, we choose which to ask at random.
If one of these downloads fails, we do not try to download that descriptor
from the authority that failed to serve it again unless we receive a newer
network-status from that authority that lists the same descriptor.
Directory servers must potentially cache multiple descriptors for each
router. Servers must not discard any descriptor listed by any current
network-status document from any authority. If there is enough space to
store additional descriptors, servers SHOULD try to hold those which
clients are likely to download the most. (Currently, this is judged
based on the interval for which each descriptor seemed newest.)
Authorities SHOULD NOT download descriptors for routers that they would
immediately reject for reasons listed in 3.1.
4.4. HTTP URLs
"Fingerprints" in these URLs are base-16-encoded SHA1 hashes.
The authoritative network-status published by a host should be available at:
http://<hostname>/tor/status/authority.z
The network-status published by a host with fingerprint
<F> should be available at:
http://<hostname>/tor/status/fp/<F>.z
The network-status documents published by hosts with fingerprints
<F1>,<F2>,<F3> should be available at:
http://<hostname>/tor/status/fp/<F1>+<F2>+<F3>.z
The most recent network-status documents from all known authorities,
concatenated, should be available at:
http://<hostname>/tor/status/all.z
The most recent descriptor for a server whose identity key has a
fingerprint of <F> should be available at:
http://<hostname>/tor/server/fp/<F>.z
The most recent descriptors for servers with identity fingerprints
<F1>,<F2>,<F3> should be available at:
http://<hostname>/tor/server/fp/<F1>+<F2>+<F3>.z
(NOTE: Implementations SHOULD NOT download descriptors by identity key
fingerprint. This allows a corrupted server (in collusion with a cache) to
provide a unique descriptor to a client, and thereby partition that client
from the rest of the network.)
The server descriptor with (descriptor) digest <D> (in hex) should be
available at:
http://<hostname>/tor/server/d/<D>.z
The most recent descriptors with digests <D1>,<D2>,<D3> should be
available at:
http://<hostname>/tor/server/d/<D1>+<D2>+<D3>.z
The most recent descriptor for this server should be at:
http://<hostname>/tor/server/authority.z
[Nothing in the Tor protocol uses this resource yet, but it is useful
for debugging purposes. Also, the official Tor implementations
(starting at 0.1.1.x) use this resource to test whether a server's
own DirPort is reachable.]
A concatenated set of the most recent descriptors for all known servers
should be available at:
http://<hostname>/tor/server/all.z
For debugging, directories SHOULD expose non-compressed objects at URLs like
the above, but without the final ".z".
Clients MUST handle compressed concatenated information in two forms:
- A concatenated list of zlib-compressed objects.
- A zlib-compressed concatenated list of objects.
Directory servers MAY generate either format: the former requires less
CPU, but the latter requires less bandwidth.
Clients SHOULD use upper case letters (A-F) when base16-encoding
fingerprints. Servers MUST accept both upper and lower case fingerprints
in requests.
5. Client operation: downloading information
Every Tor that is not a directory server (that is, those that do
not have a DirPort set) implements this section.
5.1. Downloading network-status documents
Each client maintains an ordered list of directory authorities.
Insofar as possible, clients SHOULD all use the same ordered list.
For each network-status document a client has, it keeps track of its
publication time *and* the time when the client retrieved it. Clients
consider a network-status document "live" if it was published within the
last 24 hours.
Clients try to have a live network-status document hours from *every*
authority, and try to periodically get new network-status documents from
each authority in rotation as follows:
If a client is missing a live network-status document for any
authority, it tries to fetch it from a directory cache. On failure,
the client waits briefly, then tries that network-status document
again from another cache. The client does not build circuits until it
has live network-status documents from more than half the authorities
it trusts, and it has descriptors for more than 1/4 of the routers
that it believes are running.
If the most recently _retrieved_ network-status document is over 30
minutes old, the client attempts to download a network-status document.
When choosing which documents to download, clients treat their list of
directory authorities as a circular ring, and begin with the authority
appearing immediately after the authority for their most recently
retrieved network-status document. If this attempt fails (either it
fails to download at all, or the one it gets is not as good as the
one it has), the client retries at other caches several times, before
moving on to the next network-status document in sequence.
Clients discard all network-status documents over 24 hours old.
If enough mirrors (currently 4) claim not to have a given network status,
we stop trying to download that authority's network-status, until we
download a new network-status that makes us believe that the authority in
question is running. Clients should wait a little longer after each
failure.
Clients SHOULD try to batch as many network-status requests as possible
into each HTTP GET.
(Note: clients can and should pick caches based on the network-status
information they have: once they have first fetched network-status info
from an authority, they should not need to go to the authority directly
again.)
5.2. Downloading and storing router descriptors
Clients try to have the best descriptor for each router. A descriptor is
"best" if:
* It is the most recently published descriptor listed for that router
by at least two network-status documents.
OR,
* No descriptor for that router is listed by two or more
network-status documents, and it is the most recently published
descriptor listed by any network-status document.
Periodically (currently every 10 seconds) clients check whether there are
any "downloadable" descriptors. A descriptor is downloadable if:
- It is the "best" descriptor for some router.
- The descriptor was published at least 10 minutes in the past.
(This prevents clients from trying to fetch descriptors that the
mirrors have probably not yet retrieved and cached.)
- The client does not currently have it.
- The client is not currently trying to download it.
- The client would not discard it immediately upon receiving it.
- The client thinks it is running and valid (see 6.1 below).
If at least 16 known routers have downloadable descriptors, or if
enough time (currently 10 minutes) has passed since the last time the
client tried to download descriptors, it launches requests for all
downloadable descriptors, as described in 5.3 below.
When a descriptor download fails, the client notes it, and does not
consider the descriptor downloadable again until a certain amount of time
has passed. (Currently 0 seconds for the first failure, 60 seconds for the
second, 5 minutes for the third, 10 minutes for the fourth, and 1 day
thereafter.) Periodically (currently once an hour) clients reset the
failure count.
No descriptors are downloaded until the client has downloaded more than
half of the network-status documents.
Clients retain the most recent descriptor they have downloaded for each
router so long as it is not too old (currently, 48 hours), OR so long as
it is recommended by at least one networkstatus AND no "better"
descriptor has been downloaded. [Versions of Tor before 0.1.2.3-alpha
would discard descriptors simply for being published too far in the past.]
[The code seems to discard descriptors in all cases after they're 5
days old. True? -RD]
5.3. Managing downloads
When a client has no live network-status documents, it downloads
network-status documents from a randomly chosen authority. In all other
cases, the client downloads from mirrors randomly chosen from among those
believed to be V2 directory servers. (This information comes from the
network-status documents; see 6 below.)
When downloading multiple router descriptors, the client chooses multiple
mirrors so that:
- At least 3 different mirrors are used, except when this would result
in more than one request for under 4 descriptors.
- No more than 128 descriptors are requested from a single mirror.
- Otherwise, as few mirrors as possible are used.
After choosing mirrors, the client divides the descriptors among them
randomly.
After receiving any response client MUST discard any network-status
documents and descriptors that it did not request.
6. Using directory information
Everyone besides directory authorities uses the approaches in this section
to decide which servers to use and what their keys are likely to be.
(Directory authorities just believe their own opinions, as in 3.1 above.)
6.1. Choosing routers for circuits.
Tor implementations only pay attention to "live" network-status documents.
A network status is "live" if it is the most recently downloaded network
status document for a given directory server, and the server is a
directory server trusted by the client, and the network-status document is
no more than 1 day old.
For time-sensitive information, Tor implementations focus on "recent"
network-status documents. A network status is "recent" if it is live, and
if it was published in the last 60 minutes. If there are fewer
than 3 such documents, the most recently published 3 are "recent." If
there are fewer than 3 in all, all are "recent.")
Circuits SHOULD NOT be built until the client has enough directory
information: network-statuses (or failed attempts to download
network-statuses) for all authorities, network-statuses for at more than
half of the authorities, and descriptors for at least 1/4 of the servers
believed to be running.
A server is "listed" if it is included by more than half of the live
network status documents. Clients SHOULD NOT use unlisted servers.
Clients believe the flags "Valid", "Exit", "Fast", "Guard", "Stable", and
"V2Dir" about a given router when they are asserted by more than half of
the live network-status documents. Clients believe the flag "Running" if
it is listed by more than half of the recent network-status documents.
These flags are used as follows:
- Clients SHOULD NOT use non-'Valid' or non-'Running' routers unless
requested to do so.
- Clients SHOULD NOT use non-'Fast' routers for any purpose other than
very-low-bandwidth circuits (such as introduction circuits).
- Clients SHOULD NOT use non-'Stable' routers for circuits that are
likely to need to be open for a very long time (such as those used for
IRC or SSH connections).
- Clients SHOULD NOT choose non-'Guard' nodes when picking entry guard
nodes.
- Clients SHOULD NOT download directory information from non-'V2Dir'
caches.
6.2. Managing naming
In order to provide human-memorable names for individual server
identities, some directory servers bind names to IDs. Clients handle
names in two ways:
When a client encounters a name it has not mapped before:
If all the live "Naming" network-status documents the client has
claim that the name binds to some identity ID, and the client has at
least three live network-status documents, the client maps the name to
ID.
When a user tries to refer to a router with a name that does not have a
mapping under the above rules, the implementation SHOULD warn the user.
After giving the warning, the implementation MAY use a router that at
least one Naming authority maps the name to, so long as no other naming
authority maps that name to a different router. If no Naming authority
maps the name to a router, the implementation MAY use any router that
advertises the name.
Not every router needs a nickname. When a router doesn't configure a
nickname, it publishes with the default nickname "Unnamed". Authorities
SHOULD NOT ever mark a router with this nickname as Named; client software
SHOULD NOT ever use a router in response to a user request for a router
called "Unnamed".
6.3. Software versions
An implementation of Tor SHOULD warn when it has fetched (or has
attempted to fetch and failed four consecutive times) a network-status
for each authority, and it is running a software version
not listed on more than half of the live "Versioning" network-status
documents.
6.4. Warning about a router's status.
If a router tries to publish its descriptor to a Naming authority
that has its nickname mapped to another key, the router SHOULD
warn the operator that it is either using the wrong key or is using
an already claimed nickname.
If a router has fetched (or attempted to fetch and failed four
consecutive times) a network-status for every authority, and at
least one of the authorities is "Naming", and no live "Naming"
authorities publish a binding for the router's nickname, the
router MAY remind the operator that the chosen nickname is not
bound to this key at the authorities, and suggest contacting the
authority operators.
...
6.5. Router protocol versions
A client should believe that a router supports a given feature if that
feature is supported by the router or protocol versions in more than half
of the live networkstatus's "v" entries for that router. In other words,
if the "v" entries for some router are:
v Tor 0.0.8pre1 (from authority 1)
v Tor 0.1.2.11 (from authority 2)
v FutureProtocolDescription 99 (from authority 3)
then the client should believe that the router supports any feature
supported by 0.1.2.11.
This is currently equivalent to believing the median declared version for
a router in all live networkstatuses.
7. Standards compliance
All clients and servers MUST support HTTP 1.0.
7.1. HTTP headers
Servers MAY set the Content-Length: header. Servers SHOULD set
Content-Encoding to "deflate" or "identity".
Servers MAY include an X-Your-Address-Is: header, whose value is the
apparent IP address of the client connecting to them (as a dotted quad).
For directory connections tunneled over a BEGIN_DIR stream, servers SHOULD
report the IP from which the circuit carrying the BEGIN_DIR stream reached
them. [Servers before version 0.1.2.5-alpha reported 127.0.0.1 for all
BEGIN_DIR-tunneled connections.]
Servers SHOULD disable caching of multiple network statuses or multiple
router descriptors. Servers MAY enable caching of single descriptors,
single network statuses, the list of all router descriptors, a v1
directory, or a v1 running routers document. XXX mention times.
7.2. HTTP status codes
XXX We should write down what return codes dirservers send in what situations.

File diff suppressed because it is too large Load Diff

View File

@ -1,423 +0,0 @@
$Id$
Tor Path Specification
Roger Dingledine
Nick Mathewson
Note: This is an attempt to specify Tor as currently implemented. Future
versions of Tor will implement improved algorithms.
This document tries to cover how Tor chooses to build circuits and assign
streams to circuits. Other implementations MAY take other approaches, but
implementors should be aware of the anonymity and load-balancing implications
of their choices.
THIS SPEC ISN'T DONE YET.
1. General operation
Tor begins building circuits as soon as it has enough directory
information to do so (see section 5 of dir-spec.txt). Some circuits are
built preemptively because we expect to need them later (for user
traffic), and some are built because of immediate need (for user traffic
that no current circuit can handle, for testing the network or our
reachability, and so on).
When a client application creates a new stream (by opening a SOCKS
connection or launching a resolve request), we attach it to an appropriate
open circuit if one exists, or wait if an appropriate circuit is
in-progress. We launch a new circuit only
if no current circuit can handle the request. We rotate circuits over
time to avoid some profiling attacks.
To build a circuit, we choose all the nodes we want to use, and then
construct the circuit. Sometimes, when we want a circuit that ends at a
given hop, and we have an appropriate unused circuit, we "cannibalize" the
existing circuit and extend it to the new terminus.
These processes are described in more detail below.
This document describes Tor's automatic path selection logic only; path
selection can be overridden by a controller (with the EXTENDCIRCUIT and
ATTACHSTREAM commands). Paths constructed through these means may
violate some constraints given below.
1.1. Terminology
A "path" is an ordered sequence of nodes, not yet built as a circuit.
A "clean" circuit is one that has not yet been used for any traffic.
A "fast" or "stable" or "valid" node is one that has the 'Fast' or
'Stable' or 'Valid' flag
set respectively, based on our current directory information. A "fast"
or "stable" circuit is one consisting only of "fast" or "stable" nodes.
In an "exit" circuit, the final node is chosen based on waiting stream
requests if any, and in any case it avoids nodes with exit policy of
"reject *:*". An "internal" circuit, on the other hand, is one where
the final node is chosen just like a middle node (ignoring its exit
policy).
A "request" is a client-side stream or DNS resolve that needs to be
served by a circuit.
A "pending" circuit is one that we have started to build, but which has
not yet completed.
A circuit or path "supports" a request if it is okay to use the
circuit/path to fulfill the request, according to the rules given below.
A circuit or path "might support" a request if some aspect of the request
is unknown (usually its target IP), but we believe the path probably
supports the request according to the rules given below.
2. Building circuits
2.1. When we build
2.1.1. Clients build circuits preemptively
When running as a client, Tor tries to maintain at least a certain
number of clean circuits, so that new streams can be handled
quickly. To increase the likelihood of success, Tor tries to
predict what circuits will be useful by choosing from among nodes
that support the ports we have used in the recent past (by default
one hour). Specifically, on startup Tor tries to maintain one clean
fast exit circuit that allows connections to port 80, and at least
two fast clean stable internal circuits in case we get a resolve
request or hidden service request (at least three if we _run_ a
hidden service).
After that, Tor will adapt the circuits that it preemptively builds
based on the requests it sees from the user: it tries to have two fast
clean exit circuits available for every port seen within the past hour
(each circuit can be adequate for many predicted ports -- it doesn't
need two separate circuits for each port), and it tries to have the
above internal circuits available if we've seen resolves or hidden
service activity within the past hour. If there are 12 or more clean
circuits open, it doesn't open more even if it has more predictions.
Only stable circuits can "cover" a port that is listed in the
LongLivedPorts config option. Similarly, hidden service requests
to ports listed in LongLivedPorts make us create stable internal
circuits.
Note that if there are no requests from the user for an hour, Tor
will predict no use and build no preemptive circuits.
The Tor client SHOULD NOT store its list of predicted requests to a
persistent medium.
2.1.2. Clients build circuits on demand
Additionally, when a client request exists that no circuit (built or
pending) might support, we create a new circuit to support the request.
For exit connections, we pick an exit node that will handle the
most pending requests (choosing arbitrarily among ties), launch a
circuit to end there, and repeat until every unattached request
might be supported by a pending or built circuit. For internal
circuits, we pick an arbitrary acceptable path, repeating as needed.
In some cases we can reuse an already established circuit if it's
clean; see Section 2.3 (cannibalizing circuits) for details.
2.1.3. Servers build circuits for testing reachability and bandwidth
Tor servers test reachability of their ORPort once they have
successfully built a circuit (on start and whenever their IP address
changes). They build an ordinary fast internal circuit with themselves
as the last hop. As soon as any testing circuit succeeds, the Tor
server decides it's reachable and is willing to publish a descriptor.
We launch multiple testing circuits (one at a time), until we
have NUM_PARALLEL_TESTING_CIRC (4) such circuits open. Then we
do a "bandwidth test" by sending a certain number of relay drop
cells down each circuit: BandwidthRate * 10 / CELL_NETWORK_SIZE
total cells divided across the four circuits, but never more than
CIRCWINDOW_START (1000) cells total. This exercises both outgoing and
incoming bandwidth, and helps to jumpstart the observed bandwidth
(see dir-spec.txt).
Tor servers also test reachability of their DirPort once they have
established a circuit, but they use an ordinary exit circuit for
this purpose.
2.1.4. Hidden-service circuits
See section 4 below.
2.1.5. Rate limiting of failed circuits
If we fail to build a circuit N times in a X second period (see Section
2.3 for how this works), we stop building circuits until the X seconds
have elapsed.
XXXX
2.1.6. When to tear down circuits
XXXX
2.2. Path selection and constraints
We choose the path for each new circuit before we build it. We choose the
exit node first, followed by the other nodes in the circuit. All paths
we generate obey the following constraints:
- We do not choose the same router twice for the same path.
- We do not choose any router in the same family as another in the same
path.
- We do not choose more than one router in a given /16 subnet
(unless EnforceDistinctSubnets is 0).
- We don't choose any non-running or non-valid router unless we have
been configured to do so. By default, we are configured to allow
non-valid routers in "middle" and "rendezvous" positions.
- If we're using Guard nodes, the first node must be a Guard (see 5
below)
- XXXX Choosing the length
For circuits that do not need to be "fast", when choosing among
multiple candidates for a path element, we choose randomly.
For "fast" circuits, we pick a given router as an exit with probability
proportional to its advertised bandwidth [the smaller of the 'rate' and
'observed' arguments to the "bandwidth" element in its descriptor]. If a
router's advertised bandwidth is greater than MAX_BELIEVABLE_BANDWIDTH
(currently 10 MB/s), we clip to that value.
For non-exit positions on "fast" circuits, we pick routers as above, but
we weight the clipped advertised bandwidth of Exit-flagged nodes depending
on the fraction of bandwidth available from non-Exit nodes. Call the
total clipped advertised bandwidth for Exit nodes under consideration E,
and the total clipped advertised bandwidth for all nodes under
consideration T. If E<T/3, we do not consider Exit-flagged nodes.
Otherwise, we weight their bandwidth with the factor (E-T/3)/E. This
ensures that bandwidth is evenly distributed over nodes in 3-hop paths.
Similarly, guard nodes are weighted by the factor (G-T/3)/G, and not
considered for non-guard positions if this value is less than 0.
Additionally, we may be building circuits with one or more requests in
mind. Each kind of request puts certain constraints on paths:
- All service-side introduction circuits and all rendezvous paths
should be Stable.
- All connection requests for connections that we think will need to
stay open a long time require Stable circuits. Currently, Tor decides
this by examining the request's target port, and comparing it to a
list of "long-lived" ports. (Default: 21, 22, 706, 1863, 5050,
5190, 5222, 5223, 6667, 6697, 8300.)
- DNS resolves require an exit node whose exit policy is not equivalent
to "reject *:*".
- Reverse DNS resolves require a version of Tor with advertised eventdns
support (available in Tor 0.1.2.1-alpha-dev and later).
- All connection requests require an exit node whose exit policy
supports their target address and port (if known), or which "might
support it" (if the address isn't known). See 2.2.1.
- Rules for Fast? XXXXX
2.2.1. Choosing an exit
If we know what IP address we want to connect to or resolve, we can
trivially tell whether a given router will support it by simulating
its declared exit policy.
Because we often connect to addresses of the form hostname:port, we do not
always know the target IP address when we select an exit node. In these
cases, we need to pick an exit node that "might support" connections to a
given address port with an unknown address. An exit node "might support"
such a connection if any clause that accepts any connections to that port
precedes all clauses (if any) that reject all connections to that port.
Unless requested to do so by the user, we never choose an exit server
flagged as "BadExit" by more than half of the authorities who advertise
themselves as listing bad exits.
2.2.2. User configuration
Users can alter the default behavior for path selection with configuration
options.
- If "ExitNodes" is provided, then every request requires an exit node on
the ExitNodes list. (If a request is supported by no nodes on that list,
and StrictExitNodes is false, then Tor treats that request as if
ExitNodes were not provided.)
- "EntryNodes" and "StrictEntryNodes" behave analogously.
- If a user tries to connect to or resolve a hostname of the form
<target>.<servername>.exit, the request is rewritten to a request for
<target>, and the request is only supported by the exit whose nickname
or fingerprint is <servername>.
2.3. Cannibalizing circuits
If we need a circuit and have a clean one already established, in
some cases we can adapt the clean circuit for our new
purpose. Specifically,
For hidden service interactions, we can "cannibalize" a clean internal
circuit if one is available, so we don't need to build those circuits
from scratch on demand.
We can also cannibalize clean circuits when the client asks to exit
at a given node -- either via the ".exit" notation or because the
destination is running at the same location as an exit node.
2.4. Handling failure
If an attempt to extend a circuit fails (either because the first create
failed or a subsequent extend failed) then the circuit is torn down and is
no longer pending. (XXXX really?) Requests that might have been
supported by the pending circuit thus become unsupported, and a new
circuit needs to be constructed.
If a stream "begin" attempt fails with an EXITPOLICY error, we
decide that the exit node's exit policy is not correctly advertised,
so we treat the exit node as if it were a non-exit until we retrieve
a fresh descriptor for it.
XXXX
3. Attaching streams to circuits
When a circuit that might support a request is built, Tor tries to attach
the request's stream to the circuit and sends a BEGIN, BEGIN_DIR,
or RESOLVE relay
cell as appropriate. If the request completes unsuccessfully, Tor
considers the reason given in the CLOSE relay cell. [XXX yes, and?]
After a request has remained unattached for SocksTimeout (2 minutes
by default), Tor abandons the attempt and signals an error to the
client as appropriate (e.g., by closing the SOCKS connection).
XXX Timeouts and when Tor auto-retries.
* What stream-end-reasons are appropriate for retrying.
If no reply to BEGIN/RESOLVE, then the stream will timeout and fail.
4. Hidden-service related circuits
XXX Tracking expected hidden service use (client-side and hidserv-side)
5. Guard nodes
We use Guard nodes (also called "helper nodes" in the literature) to
prevent certain profiling attacks. Here's the risk: if we choose entry and
exit nodes at random, and an attacker controls C out of N servers
(ignoring advertised bandwidth), then the
attacker will control the entry and exit node of any given circuit with
probability (C/N)^2. But as we make many different circuits over time,
then the probability that the attacker will see a sample of about (C/N)^2
of our traffic goes to 1. Since statistical sampling works, the attacker
can be sure of learning a profile of our behavior.
If, on the other hand, we picked an entry node and held it fixed, we would
have probability C/N of choosing a bad entry and being profiled, and
probability (N-C)/N of choosing a good entry and not being profiled.
When guard nodes are enabled, Tor maintains an ordered list of entry nodes
as our chosen guards, and stores this list persistently to disk. If a Guard
node becomes unusable, rather than replacing it, Tor adds new guards to the
end of the list. When choosing the first hop of a circuit, Tor
chooses at
random from among the first NumEntryGuards (default 3) usable guards on the
list. If there are not at least 2 usable guards on the list, Tor adds
routers until there are, or until there are no more usable routers to add.
A guard is unusable if any of the following hold:
- it is not marked as a Guard by the networkstatuses,
- it is not marked Valid (and the user hasn't set AllowInvalid entry)
- it is not marked Running
- Tor couldn't reach it the last time it tried to connect
A guard is unusable for a particular circuit if any of the rules for path
selection in 2.2 are not met. In particular, if the circuit is "fast"
and the guard is not Fast, or if the circuit is "stable" and the guard is
not Stable, or if the guard has already been chosen as the exit node in
that circuit, Tor can't use it as a guard node for that circuit.
If the guard is excluded because of its status in the networkstatuses for
over 30 days, Tor removes it from the list entirely, preserving order.
If Tor fails to connect to an otherwise usable guard, it retries
periodically: every hour for six hours, every 4 hours for 3 days, every
18 hours for a week, and every 36 hours thereafter. Additionally, Tor
retries unreachable guards the first time it adds a new guard to the list,
since it is possible that the old guards were only marked as unreachable
because the network was unreachable or down.
Tor does not add a guard persistently to the list until the first time we
have connected to it successfully.
6. Router descriptor purposes
There are currently three "purposes" supported for router descriptors:
general, controller, and bridge. Most descriptors are of type general
-- these are the ones listed in the consensus, and the ones fetched
and used in normal cases.
Controller-purpose descriptors are those delivered by the controller
and labelled as such: they will be kept around (and expire like
normal descriptors), and they can be used by the controller in its
CIRCUITEXTEND commands. Otherwise they are ignored by Tor when it
chooses paths.
Bridge-purpose descriptors are for routers that are used as bridges. See
doc/design-paper/blocking.pdf for more design explanation, or proposal
125 for specific details. Currently bridge descriptors are used in place
of normal entry guards, for Tor clients that have UseBridges enabled.
X. Old notes
X.1. Do we actually do this?
How to deal with network down.
- While all helpers are down/unreachable and there are no established
or on-the-way testing circuits, launch a testing circuit. (Do this
periodically in the same way we try to establish normal circuits
when things are working normally.)
(Testing circuits are a special type of circuit, that streams won't
attach to by accident.)
- When a testing circuit succeeds, mark all helpers up and hold
the testing circuit open.
- If a connection to a helper succeeds, close all testing circuits.
Else mark that helper down and try another.
- If the last helper is marked down and we already have a testing
circuit established, then add the first hop of that testing circuit
to the end of our helper node list, close that testing circuit,
and go back to square one. (Actually, rather than closing the
testing circuit, can we get away with converting it to a normal
circuit and beginning to use it immediately?)
[Do we actually do any of the above? If so, let's spec it. If not, let's
remove it. -NM]
X.2. A thing we could do to deal with reachability.
And as a bonus, it leads to an answer to Nick's attack ("If I pick
my helper nodes all on 18.0.0.0:*, then I move, you'll know where I
bootstrapped") -- the answer is to pick your original three helper nodes
without regard for reachability. Then the above algorithm will add some
more that are reachable for you, and if you move somewhere, it's more
likely (though not certain) that some of the originals will become useful.
Is that smart or just complex?
X.3. Some stuff that worries me about entry guards. 2006 Jun, Nickm.
It is unlikely for two users to have the same set of entry guards.
Observing a user is sufficient to learn its entry guards. So, as we move
around, entry guards make us linkable. If we want to change guards when
our location (IP? subnet?) changes, we have two bad options. We could
- Drop the old guards. But if we go back to our old location,
we'll not use our old guards. For a laptop that sometimes gets used
from work and sometimes from home, this is pretty fatal.
- Remember the old guards as associated with the old location, and use
them again if we ever go back to the old location. This would be
nasty, since it would force us to record where we've been.
[Do we do any of this now? If not, this should move into 099-misc or
098-todo. -NM]

View File

@ -1,161 +0,0 @@
Filename: 000-index.txt
Title: Index of Tor Proposals
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: 26-Jan-2007
Status: Meta
Overview:
This document provides an index to Tor proposals.
This is an informational document.
Everything in this document below the line of '=' signs is automatically
generated by reindex.py; do not edit by hand.
============================================================
Proposals by number:
000 Index of Tor Proposals [META]
001 The Tor Proposal Process [META]
098 Proposals that should be written [META]
099 Miscellaneous proposals [META]
100 Tor Unreliable Datagram Extension Proposal [DEAD]
101 Voting on the Tor Directory System [CLOSED]
102 Dropping "opt" from the directory format [CLOSED]
103 Splitting identity key from regularly used signing key [CLOSED]
104 Long and Short Router Descriptors [CLOSED]
105 Version negotiation for the Tor protocol [CLOSED]
106 Checking fewer things during TLS handshakes [CLOSED]
107 Uptime Sanity Checking [CLOSED]
108 Base "Stable" Flag on Mean Time Between Failures [CLOSED]
109 No more than one server per IP address [CLOSED]
110 Avoiding infinite length circuits [ACCEPTED]
111 Prioritizing local traffic over relayed traffic [CLOSED]
112 Bring Back Pathlen Coin Weight [SUPERSEDED]
113 Simplifying directory authority administration [SUPERSEDED]
114 Distributed Storage for Tor Hidden Service Descriptors [CLOSED]
115 Two Hop Paths [DEAD]
116 Two hop paths from entry guards [DEAD]
117 IPv6 exits [ACCEPTED]
118 Advertising multiple ORPorts at once [ACCEPTED]
119 New PROTOCOLINFO command for controllers [CLOSED]
120 Shutdown descriptors when Tor servers stop [DEAD]
121 Hidden Service Authentication [FINISHED]
122 Network status entries need a new Unnamed flag [CLOSED]
123 Naming authorities automatically create bindings [CLOSED]
124 Blocking resistant TLS certificate usage [SUPERSEDED]
125 Behavior for bridge users, bridge relays, and bridge authorities [CLOSED]
126 Getting GeoIP data and publishing usage summaries [CLOSED]
127 Relaying dirport requests to Tor download site / website [DRAFT]
128 Families of private bridges [DEAD]
129 Block Insecure Protocols by Default [CLOSED]
130 Version 2 Tor connection protocol [CLOSED]
131 Help users to verify they are using Tor [NEEDS-REVISION]
132 A Tor Web Service For Verifying Correct Browser Configuration [DRAFT]
133 Incorporate Unreachable ORs into the Tor Network [DRAFT]
134 More robust consensus voting with diverse authority sets [ACCEPTED]
135 Simplify Configuration of Private Tor Networks [CLOSED]
136 Mass authority migration with legacy keys [CLOSED]
137 Keep controllers informed as Tor bootstraps [CLOSED]
138 Remove routers that are not Running from consensus documents [CLOSED]
139 Download consensus documents only when it will be trusted [CLOSED]
140 Provide diffs between consensuses [ACCEPTED]
141 Download server descriptors on demand [DRAFT]
142 Combine Introduction and Rendezvous Points [DEAD]
143 Improvements of Distributed Storage for Tor Hidden Service Descriptors [OPEN]
144 Increase the diversity of circuits by detecting nodes belonging the same provider [DRAFT]
145 Separate "suitable as a guard" from "suitable as a new guard" [OPEN]
146 Add new flag to reflect long-term stability [OPEN]
147 Eliminate the need for v2 directories in generating v3 directories [ACCEPTED]
148 Stream end reasons from the client side should be uniform [CLOSED]
149 Using data from NETINFO cells [OPEN]
150 Exclude Exit Nodes from a circuit [CLOSED]
151 Improving Tor Path Selection [DRAFT]
152 Optionally allow exit from single-hop circuits [CLOSED]
153 Automatic software update protocol [SUPERSEDED]
154 Automatic Software Update Protocol [SUPERSEDED]
155 Four Improvements of Hidden Service Performance [FINISHED]
156 Tracking blocked ports on the client side [OPEN]
157 Make certificate downloads specific [ACCEPTED]
158 Clients download consensus + microdescriptors [OPEN]
159 Exit Scanning [OPEN]
Proposals by status:
DRAFT:
127 Relaying dirport requests to Tor download site / website
132 A Tor Web Service For Verifying Correct Browser Configuration
133 Incorporate Unreachable ORs into the Tor Network
141 Download server descriptors on demand
144 Increase the diversity of circuits by detecting nodes belonging the same provider
151 Improving Tor Path Selection
NEEDS-REVISION:
131 Help users to verify they are using Tor
OPEN:
143 Improvements of Distributed Storage for Tor Hidden Service Descriptors [for 0.2.1.x]
145 Separate "suitable as a guard" from "suitable as a new guard" [for 0.2.1.x]
146 Add new flag to reflect long-term stability [for 0.2.1.x]
149 Using data from NETINFO cells [for 0.2.1.x]
156 Tracking blocked ports on the client side [for 0.2.?]
158 Clients download consensus + microdescriptors
159 Exit Scanning
ACCEPTED:
110 Avoiding infinite length circuits [for 0.2.1.x] [in 0.2.1.3-alpha]
117 IPv6 exits [for 0.2.1.x]
118 Advertising multiple ORPorts at once [for 0.2.1.x]
134 More robust consensus voting with diverse authority sets [for 0.2.2.x]
140 Provide diffs between consensuses [for 0.2.2.x]
147 Eliminate the need for v2 directories in generating v3 directories [for 0.2.1.x]
157 Make certificate downloads specific [for 0.2.1.x]
META:
000 Index of Tor Proposals
001 The Tor Proposal Process
098 Proposals that should be written
099 Miscellaneous proposals
FINISHED:
121 Hidden Service Authentication [in 0.2.1.x]
155 Four Improvements of Hidden Service Performance [in 0.2.1.x]
CLOSED:
101 Voting on the Tor Directory System [in 0.2.0.x]
102 Dropping "opt" from the directory format [in 0.2.0.x]
103 Splitting identity key from regularly used signing key [in 0.2.0.x]
104 Long and Short Router Descriptors [in 0.2.0.x]
105 Version negotiation for the Tor protocol [in 0.2.0.x]
106 Checking fewer things during TLS handshakes [in 0.2.0.x]
107 Uptime Sanity Checking [in 0.2.0.x]
108 Base "Stable" Flag on Mean Time Between Failures [in 0.2.0.x]
109 No more than one server per IP address [in 0.2.0.x]
111 Prioritizing local traffic over relayed traffic [in 0.2.0.x]
114 Distributed Storage for Tor Hidden Service Descriptors [in 0.2.0.x]
119 New PROTOCOLINFO command for controllers [in 0.2.0.x]
122 Network status entries need a new Unnamed flag [in 0.2.0.x]
123 Naming authorities automatically create bindings [in 0.2.0.x]
125 Behavior for bridge users, bridge relays, and bridge authorities [in 0.2.0.x]
126 Getting GeoIP data and publishing usage summaries [in 0.2.0.x]
129 Block Insecure Protocols by Default [in 0.2.0.x]
130 Version 2 Tor connection protocol [in 0.2.0.x]
135 Simplify Configuration of Private Tor Networks [for 0.2.1.x] [in 0.2.1.2-alpha]
136 Mass authority migration with legacy keys [in 0.2.0.x]
137 Keep controllers informed as Tor bootstraps [in 0.2.1.x]
138 Remove routers that are not Running from consensus documents [in 0.2.1.2-alpha]
139 Download consensus documents only when it will be trusted [in 0.2.1.x]
148 Stream end reasons from the client side should be uniform [in 0.2.1.9-alpha]
150 Exclude Exit Nodes from a circuit [in 0.2.1.3-alpha]
152 Optionally allow exit from single-hop circuits [in 0.2.1.6-alpha]
SUPERSEDED:
112 Bring Back Pathlen Coin Weight
113 Simplifying directory authority administration
124 Blocking resistant TLS certificate usage
153 Automatic software update protocol
154 Automatic Software Update Protocol
DEAD:
100 Tor Unreliable Datagram Extension Proposal
115 Two Hop Paths
116 Two hop paths from entry guards
120 Shutdown descriptors when Tor servers stop
128 Families of private bridges
142 Combine Introduction and Rendezvous Points

View File

@ -1,187 +0,0 @@
Filename: 001-process.txt
Title: The Tor Proposal Process
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: 30-Jan-2007
Status: Meta
Overview:
This document describes how to change the Tor specifications, how Tor
proposals work, and the relationship between Tor proposals and the
specifications.
This is an informational document.
Motivation:
Previously, our process for updating the Tor specifications was maximally
informal: we'd patch the specification (sometimes forking first, and
sometimes not), then discuss the patches, reach consensus, and implement
the changes.
This had a few problems.
First, even at its most efficient, the old process would often have the
spec out of sync with the code. The worst cases were those where
implementation was deferred: the spec and code could stay out of sync for
versions at a time.
Second, it was hard to participate in discussion, since you had to know
which portions of the spec were a proposal, and which were already
implemented.
Third, it littered the specifications with too many inline comments.
[This was a real problem -NM]
[Especially when it went to multiple levels! -NM]
[XXXX especially when they weren't signed and talked about that
thing that you can't remember after a year]
How to change the specs now:
First, somebody writes a proposal document. It should describe the change
that should be made in detail, and give some idea of how to implement it.
Once it's fleshed out enough, it becomes a proposal.
Like an RFC, every proposal gets a number. Unlike RFCs, proposals can
change over time and keep the same number, until they are finally
accepted or rejected. The history for each proposal
will be stored in the Tor Subversion repository.
Once a proposal is in the repository, we should discuss and improve it
until we've reached consensus that it's a good idea, and that it's
detailed enough to implement. When this happens, we implement the
proposal and incorporate it into the specifications. Thus, the specs
remain the canonical documentation for the Tor protocol: no proposal is
ever the canonical documentation for an implemented feature.
(This process is pretty similar to the Python Enhancement Process, with
the major exception that Tor proposals get re-integrated into the specs
after implementation, whereas PEPs _become_ the new spec.)
{It's still okay to make small changes directly to the spec if the code
can be
written more or less immediately, or cosmetic changes if no code change is
required. This document reflects the current developers' _intent_, not
a permanent promise to always use this process in the future: we reserve
the right to get really excited and run off and implement something in a
caffeine-or-m&m-fueled all-night hacking session.}
How new proposals get added:
Once an idea has been proposed on the development list, a properly formatted
(see below) draft exists, and rough consensus within the active development
community exists that this idea warrants consideration, the proposal editor
will officially add the proposal.
To get your proposal in, send it to or-dev.
The current proposal editor is Nick Mathewson.
What should go in a proposal:
Every proposal should have a header containing these fields:
Filename, Title, Version, Last-Modified, Author, Created, Status.
The Version and Last-Modified fields should use the SVN Revision and Date
tags respectively.
These fields are optional but recommended:
Target, Implemented-In.
The Target field should describe which version the proposal is hoped to be
implemented in (if it's Open or Accepted). The Implemented-In field
should describe which version the proposal was implemented in (if it's
Finished or Closed).
The body of the proposal should start with an Overview section explaining
what the proposal's about, what it does, and about what state it's in.
After the Overview, the proposal becomes more free-form. Depending on its
the length and complexity, the proposal can break into sections as
appropriate, or follow a short discursive format. Every proposal should
contain at least the following information before it is "ACCEPTED",
though the information does not need to be in sections with these names.
Motivation: What problem is the proposal trying to solve? Why does
this problem matter? If several approaches are possible, why take this
one?
Design: A high-level view of what the new or modified features are, how
the new or modified features work, how they interoperate with each
other, and how they interact with the rest of Tor. This is the main
body of the proposal. Some proposals will start out with only a
Motivation and a Design, and wait for a specification until the
Design seems approximately right.
Security implications: What effects the proposed changes might have on
anonymity, how well understood these effects are, and so on.
Specification: A detailed description of what needs to be added to the
Tor specifications in order to implement the proposal. This should
be in about as much detail as the specifications will eventually
contain: it should be possible for independent programmers to write
mutually compatible implementations of the proposal based on its
specifications.
Compatibility: Will versions of Tor that follow the proposal be
compatible with versions that do not? If so, how will compatibility
be achieved? Generally, we try to not drop compatibility if at
all possible; we haven't made a "flag day" change since May 2004,
and we don't want to do another one.
Implementation: If the proposal will be tricky to implement in Tor's
current architecture, the document can contain some discussion of how
to go about making it work.
Performance and scalability notes: If the feature will have an effect
on performance (in RAM, CPU, bandwidth) or scalability, there should
be some analysis on how significant this effect will be, so that we
can avoid really expensive performance regressions, and so we can
avoid wasting time on insignificant gains.
Proposal status:
Open: A proposal under discussion.
Accepted: The proposal is complete, and we intend to implement it.
After this point, substantive changes to the proposal should be
avoided, and regarded as a sign of the process having failed
somewhere.
Finished: The proposal has been accepted and implemented. After this
point, the proposal should not be changed.
Closed: The proposal has been accepted, implemented, and merged into the
main specification documents. The proposal should not be changed after
this point.
Rejected: We're not going to implement the feature as described here,
though we might do some other version. See comments in the document
for details. The proposal should not be changed after this point;
to bring up some other version of the idea, write a new proposal.
Draft: This isn't a complete proposal yet; there are definite missing
pieces. Please don't add any new proposals with this status; put them
in the "ideas" sub-directory instead.
Needs-Revision: The idea for the proposal is a good one, but the proposal
as it stands has serious problems that keep it from being accepted.
See comments in the document for details.
Dead: The proposal hasn't been touched in a long time, and it doesn't look
like anybody is going to complete it soon. It can become "Open" again
if it gets a new proponent.
Needs-Research: There are research problems that need to be solved before
it's clear whether the proposal is a good idea.
Meta: This is not a proposal, but a document about proposals.
The editor maintains the correct status of proposals, based on rough
consensus and his own discretion.
Proposal numbering:
Numbers 000-099 are reserved for special and meta-proposals. 100 and up
are used for actual proposals. Numbers aren't recycled.

View File

@ -1,109 +0,0 @@
Filename: 098-todo.txt
Title: Proposals that should be written
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson, Roger Dingledine
Created: 26-Jan-2007
Status: Meta
Overview:
This document lists ideas that various people have had for improving the
Tor protocol. These should be implemented and specified if they're
trivial, or written up as proposals if they're not.
This is an active document, to be edited as proposals are written and as
we come up with new ideas for proposals. We should take stuff out as it
seems irrelevant.
For some later protocol version.
- It would be great to get smarter about identity and linkability.
It's not crazy to say, "Never use the same circuit for my SSH
connections and my web browsing." How far can/should we take this?
See ideas/xxx-separate-streams-by-port.txt for a start.
- Fix onionskin handshake scheme to be more mainstream, less nutty.
Can we just do
E(HMAC(g^x), g^x) rather than just E(g^x) ?
No, that has the same flaws as before. We should send
E(g^x, C) with random C and expect g^y, HMAC_C(K=g^xy).
Better ask Ian; probably Stephen too.
- Length on CREATE and friends
- Versioning on circuits and create cells, so we have a clear path
to improve the circuit protocol.
- SHA1 is showing its age. We should get a design for upgrading our
hash once the AHS competition is done, or even sooner.
- Not being able to upgrade ciphersuites or increase key lengths is
lame.
- Paul has some ideas about circuit creation; read his PET paper once it's
out.
Any time:
- Some ideas for revising the directory protocol:
- Extend the "r" line in network-status to give a set of buckets (say,
comma-separated) for that router.
- Buckets are deterministic based on IP address.
- Then clients can choose a bucket (or set of buckets) to
download and use.
- We need a way for the authorities to declare that nodes are in a
family. Also, it kinda sucks that family declarations use O(N^2) space
in the descriptors.
- REASON_CONNECTFAILED should include an IP.
- Spec should incorporate some prose from tor-design to be more readable.
- Spec when we should rotate which keys
- Spec how to publish descriptors less often
- Describe pros and cons of non-deterministic path lengths
- We should use a variable-length path length by default -- 3 +/- some
distribution. Need to think harder about allowing values less than 3,
and there's a tradeoff between having a wide variance and performance.
- Clients currently use certs during TLS. Is this wise? It does make it
easier for servers to tell which NATted client is which. We could use a
seprate set of certs for each guard, I suppose, but generating so many
certs could get expensive. Omitting them entirely would make OP->OR
easier to tell from OR->OR.
Things that should change...
B.1. ... but which will require backward-incompatible change
- Circuit IDs should be longer.
. IPv6 everywhere.
- Maybe, keys should be longer.
- Maybe, key-length should be adjustable. How to do this without
making anonymity suck?
- Drop backward compatibility.
- We should use a 128-bit subgroup of our DH prime.
- Handshake should use HMAC.
- Multiple cell lengths.
- Ability to split circuits across paths (If this is useful.)
- SENDME windows should be dynamic.
- Directory
- Stop ever mentioning socks ports
B.1. ... and that will require no changes
- Advertised outbound IP?
- Migrate streams across circuits.
- Fix bug 469 by limiting the number of simultaneous connections per IP.
B.2. ... and that we have no idea how to do.
- UDP (as transport)
- UDP (as content)
- Use a better AES mode that has built-in integrity checking,
doesn't grow with the number of hops, is not patented, and
is implemented and maintained by smart people.
Let onion keys be not just RSA but maybe DH too, for Paul's reply onion
design.

View File

@ -1,30 +0,0 @@
Filename: 099-misc.txt
Title: Miscellaneous proposals
Version: $Revision$
Last-Modified: $Date$
Author: Various
Created: 26-Jan-2007
Status: Meta
Overview:
This document is for small proposal ideas that are about one paragraph in
length. From here, ideas can be rejected outright, expanded into full
proposals, or specified and implemented as-is.
Proposals
1. Directory compression.
Gzip would be easier to work with than zlib; bzip2 would result in smaller
data lengths. [Concretely, we're looking at about 10-15% space savings at
the expense of 3-5x longer compression time for using bzip2.] Doing
on-the-fly gzip requires zlib 1.2 or later; doing bzip2 requires bzlib.
Pre-compressing status documents in multiple formats would force us to use
more memory to hold them.
Status: Open
-- Nick Mathewson

View File

@ -1,424 +0,0 @@
Filename: 100-tor-spec-udp.txt
Title: Tor Unreliable Datagram Extension Proposal
Version: $Revision$
Last-Modified: $Date$
Author: Marc Liberatore
Created: 23 Feb 2006
Status: Dead
Overview:
This is a modified version of the Tor specification written by Marc
Liberatore to add UDP support to Tor. For each TLS link, it adds a
corresponding DTLS link: control messages and TCP data flow over TLS, and
UDP data flows over DTLS.
This proposal is not likely to be accepted as-is; see comments at the end
of the document.
Contents
0. Introduction
Tor is a distributed overlay network designed to anonymize low-latency
TCP-based applications. The current tor specification supports only
TCP-based traffic. This limitation prevents the use of tor to anonymize
other important applications, notably voice over IP software. This document
is a proposal to extend the tor specification to support UDP traffic.
The basic design philosophy of this extension is to add support for
tunneling unreliable datagrams through tor with as few modifications to the
protocol as possible. As currently specified, tor cannot directly support
such tunneling, as connections between nodes are built using transport layer
security (TLS) atop TCP. The latency incurred by TCP is likely unacceptable
to the operation of most UDP-based application level protocols.
Thus, we propose the addition of links between nodes using datagram
transport layer security (DTLS). These links allow packets to traverse a
route through tor quickly, but their unreliable nature requires minor
changes to the tor protocol. This proposal outlines the necessary
additions and changes to the tor specification to support UDP traffic.
We note that a separate set of DTLS links between nodes creates a second
overlay, distinct from the that composed of TLS links. This separation and
resulting decrease in each anonymity set's size will make certain attacks
easier. However, it is our belief that VoIP support in tor will
dramatically increase its appeal, and correspondingly, the size of its user
base, number of deployed nodes, and total traffic relayed. These increases
should help offset the loss of anonymity that two distinct networks imply.
1. Overview of Tor-UDP and its complications
As described above, this proposal extends the Tor specification to support
UDP with as few changes as possible. Tor's overlay network is managed
through TLS based connections; we will re-use this control plane to set up
and tear down circuits that relay UDP traffic. These circuits be built atop
DTLS, in a fashion analogous to how Tor currently sends TCP traffic over
TLS.
The unreliability of DTLS circuits creates problems for Tor at two levels:
1. Tor's encryption of the relay layer does not allow independent
decryption of individual records. If record N is not received, then
record N+1 will not decrypt correctly, as the counter for AES/CTR is
maintained implicitly.
2. Tor's end-to-end integrity checking works under the assumption that
all RELAY cells are delivered. This assumption is invalid when cells
are sent over DTLS.
The fix for the first problem is straightforward: add an explicit sequence
number to each cell. To fix the second problem, we introduce a
system of nonces and hashes to RELAY packets.
In the following sections, we mirror the layout of the Tor Protocol
Specification, presenting the necessary modifications to the Tor protocol as
a series of deltas.
2. Connections
Tor-UDP uses DTLS for encryption of some links. All DTLS links must have
corresponding TLS links, as all control messages are sent over TLS. All
implementations MUST support the DTLS ciphersuite "[TODO]".
DTLS connections are formed using the same protocol as TLS connections.
This occurs upon request, following a CREATE_UDP or CREATE_FAST_UDP cell,
as detailed in section 4.6.
Once a paired TLS/DTLS connection is established, the two sides send cells
to one another. All but two types of cells are sent over TLS links. RELAY
cells containing the commands RELAY_UDP_DATA and RELAY_UDP_DROP, specified
below, are sent over DTLS links. [Should all cells still be 512 bytes long?
Perhaps upon completion of a preliminary implementation, we should do a
performance evaluation for some class of UDP traffic, such as VoIP. - ML]
Cells may be sent embedded in TLS or DTLS records of any size or divided
across such records. The framing of these records MUST NOT leak any more
information than the above differentiation on the basis of cell type. [I am
uncomfortable with this leakage, but don't see any simple, elegant way
around it. -ML]
As with TLS connections, DTLS connections are not permanent.
3. Cell format
Each cell contains the following fields:
CircID [2 bytes]
Command [1 byte]
Sequence Number [2 bytes]
Payload (padded with 0 bytes) [507 bytes]
[Total size: 512 bytes]
The 'Command' field holds one of the following values:
0 -- PADDING (Padding) (See Sec 6.2)
1 -- CREATE (Create a circuit) (See Sec 4)
2 -- CREATED (Acknowledge create) (See Sec 4)
3 -- RELAY (End-to-end data) (See Sec 5)
4 -- DESTROY (Stop using a circuit) (See Sec 4)
5 -- CREATE_FAST (Create a circuit, no PK) (See Sec 4)
6 -- CREATED_FAST (Circuit created, no PK) (See Sec 4)
7 -- CREATE_UDP (Create a UDP circuit) (See Sec 4)
8 -- CREATED_UDP (Acknowledge UDP create) (See Sec 4)
9 -- CREATE_FAST_UDP (Create a UDP circuit, no PK) (See Sec 4)
10 -- CREATED_FAST_UDP(UDP circuit created, no PK) (See Sec 4)
The sequence number allows for AES/CTR decryption of RELAY cells
independently of one another; this functionality is required to support
cells sent over DTLS. The sequence number is described in more detail in
section 4.5.
[Should the sequence number only appear in RELAY packets? The overhead is
small, and I'm hesitant to force more code paths on the implementor. -ML]
[There's already a separate relay header that has other material in it,
so it wouldn't be the end of the world to move it there if it's
appropriate. -RD]
[Having separate commands for UDP circuits seems necessary, unless we can
assume a flag day event for a large number of tor nodes. -ML]
4. Circuit management
4.2. Setting circuit keys
Keys are set up for UDP circuits in the same fashion as for TCP circuits.
Each UDP circuit shares keys with its corresponding TCP circuit.
[If the keys are used for both TCP and UDP connections, how does it
work to mix sequence-number-less cells with sequenced-numbered cells --
how do you know you have the encryption order right? -RD]
4.3. Creating circuits
UDP circuits are created as TCP circuits, using the *_UDP cells as
appropriate.
4.4. Tearing down circuits
UDP circuits are torn down as TCP circuits, using the *_UDP cells as
appropriate.
4.5. Routing relay cells
When an OR receives a RELAY cell, it checks the cell's circID and
determines whether it has a corresponding circuit along that
connection. If not, the OR drops the RELAY cell.
Otherwise, if the OR is not at the OP edge of the circuit (that is,
either an 'exit node' or a non-edge node), it de/encrypts the payload
with AES/CTR, as follows:
'Forward' relay cell (same direction as CREATE):
Use Kf as key; decrypt, using sequence number to synchronize
ciphertext and keystream.
'Back' relay cell (opposite direction from CREATE):
Use Kb as key; encrypt, using sequence number to synchronize
ciphertext and keystream.
Note that in counter mode, decrypt and encrypt are the same operation.
[Since the sequence number is only 2 bytes, what do you do when it
rolls over? -RD]
Each stream encrypted by a Kf or Kb has a corresponding unique state,
captured by a sequence number; the originator of each such stream chooses
the initial sequence number randomly, and increments it only with RELAY
cells. [This counts cells; unlike, say, TCP, tor uses fixed-size cells, so
there's no need for counting bytes directly. Right? - ML]
[I believe this is true. You'll find out for sure when you try to
build it. ;) -RD]
The OR then decides whether it recognizes the relay cell, by
inspecting the payload as described in section 5.1 below. If the OR
recognizes the cell, it processes the contents of the relay cell.
Otherwise, it passes the decrypted relay cell along the circuit if
the circuit continues. If the OR at the end of the circuit
encounters an unrecognized relay cell, an error has occurred: the OR
sends a DESTROY cell to tear down the circuit.
When a relay cell arrives at an OP, the OP decrypts the payload
with AES/CTR as follows:
OP receives data cell:
For I=N...1,
Decrypt with Kb_I, using the sequence number as above. If the
payload is recognized (see section 5.1), then stop and process
the payload.
For more information, see section 5 below.
4.6. CREATE_UDP and CREATED_UDP cells
Users set up UDP circuits incrementally. The procedure is similar to that
for TCP circuits, as described in section 4.1. In addition to the TLS
connection to the first node, the OP also attempts to open a DTLS
connection. If this succeeds, the OP sends a CREATE_UDP cell, with a
payload in the same format as a CREATE cell. To extend a UDP circuit past
the first hop, the OP sends an EXTEND_UDP relay cell (see section 5) which
instructs the last node in the circuit to send a CREATE_UDP cell to extend
the circuit.
The relay payload for an EXTEND_UDP relay cell consists of:
Address [4 bytes]
TCP port [2 bytes]
UDP port [2 bytes]
Onion skin [186 bytes]
Identity fingerprint [20 bytes]
The address field and ports denote the IPV4 address and ports of the next OR
in the circuit.
The payload for a CREATED_UDP cell or the relay payload for an
RELAY_EXTENDED_UDP cell is identical to that of the corresponding CREATED or
RELAY_EXTENDED cell. Both circuits are established using the same key.
Note that the existence of a UDP circuit implies the
existence of a corresponding TCP circuit, sharing keys, sequence numbers,
and any other relevant state.
4.6.1 CREATE_FAST_UDP/CREATED_FAST_UDP cells
As above, the OP must successfully connect using DTLS before attempting to
send a CREATE_FAST_UDP cell. Otherwise, the procedure is the same as in
section 4.1.1.
5. Application connections and stream management
5.1. Relay cells
Within a circuit, the OP and the exit node use the contents of RELAY cells
to tunnel end-to-end commands, TCP connections ("Streams"), and UDP packets
across circuits. End-to-end commands and UDP packets can be initiated by
either edge; streams are initiated by the OP.
The payload of each unencrypted RELAY cell consists of:
Relay command [1 byte]
'Recognized' [2 bytes]
StreamID [2 bytes]
Digest [4 bytes]
Length [2 bytes]
Data [498 bytes]
The relay commands are:
1 -- RELAY_BEGIN [forward]
2 -- RELAY_DATA [forward or backward]
3 -- RELAY_END [forward or backward]
4 -- RELAY_CONNECTED [backward]
5 -- RELAY_SENDME [forward or backward]
6 -- RELAY_EXTEND [forward]
7 -- RELAY_EXTENDED [backward]
8 -- RELAY_TRUNCATE [forward]
9 -- RELAY_TRUNCATED [backward]
10 -- RELAY_DROP [forward or backward]
11 -- RELAY_RESOLVE [forward]
12 -- RELAY_RESOLVED [backward]
13 -- RELAY_BEGIN_UDP [forward]
14 -- RELAY_DATA_UDP [forward or backward]
15 -- RELAY_EXTEND_UDP [forward]
16 -- RELAY_EXTENDED_UDP [backward]
17 -- RELAY_DROP_UDP [forward or backward]
Commands labelled as "forward" must only be sent by the originator
of the circuit. Commands labelled as "backward" must only be sent by
other nodes in the circuit back to the originator. Commands marked
as either can be sent either by the originator or other nodes.
The 'recognized' field in any unencrypted relay payload is always set to
zero.
The 'digest' field can have two meanings. For all cells sent over TLS
connections (that is, all commands and all non-UDP RELAY data), it is
computed as the first four bytes of the running SHA-1 digest of all the
bytes that have been sent reliably and have been destined for this hop of
the circuit or originated from this hop of the circuit, seeded from Df or Db
respectively (obtained in section 4.2 above), and including this RELAY
cell's entire payload (taken with the digest field set to zero). Cells sent
over DTLS connections do not affect this running digest. Each cell sent
over DTLS (that is, RELAY_DATA_UDP and RELAY_DROP_UDP) has the digest field
set to the SHA-1 digest of the current RELAY cells' entire payload, with the
digest field set to zero. Coupled with a randomly-chosen streamID, this
provides per-cell integrity checking on UDP cells.
[If you drop malformed UDP relay cells but don't close the circuit,
then this 8 bytes of digest is not as strong as what we get in the
TCP-circuit side. Is this a problem? -RD]
When the 'recognized' field of a RELAY cell is zero, and the digest
is correct, the cell is considered "recognized" for the purposes of
decryption (see section 4.5 above).
(The digest does not include any bytes from relay cells that do
not start or end at this hop of the circuit. That is, it does not
include forwarded data. Therefore if 'recognized' is zero but the
digest does not match, the running digest at that node should
not be updated, and the cell should be forwarded on.)
All RELAY cells pertaining to the same tunneled TCP stream have the
same streamID. Such streamIDs are chosen arbitrarily by the OP. RELAY
cells that affect the entire circuit rather than a particular
stream use a StreamID of zero.
All RELAY cells pertaining to the same UDP tunnel have the same streamID.
This streamID is chosen randomly by the OP, but cannot be zero.
The 'Length' field of a relay cell contains the number of bytes in
the relay payload which contain real payload data. The remainder of
the payload is padded with NUL bytes.
If the RELAY cell is recognized but the relay command is not
understood, the cell must be dropped and ignored. Its contents
still count with respect to the digests, though. [Before
0.1.1.10, Tor closed circuits when it received an unknown relay
command. Perhaps this will be more forward-compatible. -RD]
5.2.1. Opening UDP tunnels and transferring data
To open a new anonymized UDP connection, the OP chooses an open
circuit to an exit that may be able to connect to the destination
address, selects a random streamID not yet used on that circuit,
and constructs a RELAY_BEGIN_UDP cell with a payload encoding the address
and port of the destination host. The payload format is:
ADDRESS | ':' | PORT | [00]
where ADDRESS can be a DNS hostname, or an IPv4 address in
dotted-quad format, or an IPv6 address surrounded by square brackets;
and where PORT is encoded in decimal.
[What is the [00] for? -NM]
[It's so the payload is easy to parse out with string funcs -RD]
Upon receiving this cell, the exit node resolves the address as necessary.
If the address cannot be resolved, the exit node replies with a RELAY_END
cell. (See 5.4 below.) Otherwise, the exit node replies with a
RELAY_CONNECTED cell, whose payload is in one of the following formats:
The IPv4 address to which the connection was made [4 octets]
A number of seconds (TTL) for which the address may be cached [4 octets]
or
Four zero-valued octets [4 octets]
An address type (6) [1 octet]
The IPv6 address to which the connection was made [16 octets]
A number of seconds (TTL) for which the address may be cached [4 octets]
[XXXX Versions of Tor before 0.1.1.6 ignore and do not generate the TTL
field. No version of Tor currently generates the IPv6 format.]
The OP waits for a RELAY_CONNECTED cell before sending any data.
Once a connection has been established, the OP and exit node
package UDP data in RELAY_DATA_UDP cells, and upon receiving such
cells, echo their contents to the corresponding socket.
RELAY_DATA_UDP cells sent to unrecognized streams are dropped.
Relay RELAY_DROP_UDP cells are long-range dummies; upon receiving such
a cell, the OR or OP must drop it.
5.3. Closing streams
UDP tunnels are closed in a fashion corresponding to TCP connections.
6. Flow Control
UDP streams are not subject to flow control.
7.2. Router descriptor format.
The items' formats are as follows:
"router" nickname address ORPort SocksPort DirPort UDPPort
Indicates the beginning of a router descriptor. "address" must be
an IPv4 address in dotted-quad format. The last three numbers
indicate the TCP ports at which this OR exposes
functionality. ORPort is a port at which this OR accepts TLS
connections for the main OR protocol; SocksPort is deprecated and
should always be 0; DirPort is the port at which this OR accepts
directory-related HTTP connections; and UDPPort is a port at which
this OR accepts DTLS connections for UDP data. If any port is not
supported, the value 0 is given instead of a port number.
Other sections:
What changes need to happen to each node's exit policy to support this? -RD
Switching to UDP means managing the queues of incoming packets better,
so we don't miss packets. How does this interact with doing large public
key operations (handshakes) in the same thread? -RD
========================================================================
COMMENTS
========================================================================
[16 May 2006]
I don't favor this approach; it makes packet traffic partitioned from
stream traffic end-to-end. The architecture I'd like to see is:
A *All* Tor-to-Tor traffic is UDP/DTLS, unless we need to fall back on
TCP/TLS for firewall penetration or something. (This also gives us an
upgrade path for routing through legacy servers.)
B Stream traffic is handled with end-to-end per-stream acks/naks and
retries. On failure, the data is retransmitted in a new RELAY_DATA cell;
a cell isn't retransmitted.
We'll need to do A anyway, to fix our behavior on packet-loss. Once we've
done so, B is more or less inevitable, and we can support end-to-end UDP
traffic "for free".
(Also, there are some details that this draft spec doesn't address. For
example, what happens when a UDP packet doesn't fit in a single cell?)
-NM

View File

@ -1,285 +0,0 @@
Filename: 101-dir-voting.txt
Title: Voting on the Tor Directory System
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: Nov 2006
Status: Closed
Implemented-In: 0.2.0.x
Overview
This document describes a consensus voting scheme for Tor directories;
instead of publishing different network statuses, directories would vote on
and publish a single "consensus" network status document.
This is an open proposal.
Proposal:
0. Scope and preliminaries
This document describes a consensus voting scheme for Tor directories.
Once it's accepted, it should be merged with dir-spec.txt. Some
preliminaries for authority and caching support should be done during
the 0.1.2.x series; the main deployment should come during the 0.2.0.x
series.
0.1. Goals and motivation: voting.
The current directory system relies on clients downloading separate
network status statements from the caches signed by each directory.
Clients download a new statement every 30 minutes or so, choosing to
replace the oldest statement they currently have.
This creates a partitioning problem: different clients have different
"most recent" networkstatus sources, and different versions of each
(since authorities change their statements often).
It also creates a scaling problem: most of the downloaded networkstatus
are probably quite similar, and the redundancy grows as we add more
authorities.
So if we have clients only download a single multiply signed consensus
network status statement, we can:
- Save bandwidth.
- Reduce client partitioning
- Reduce client-side and cache-side storage
- Simplify client-side voting code (by moving voting away from the
client)
We should try to do this without:
- Assuming that client-side or cache-side clocks are more correct
than we assume now.
- Assuming that authority clocks are perfectly correct.
- Degrading badly if a few authorities die or are offline for a bit.
We do not have to perform well if:
- No clique of more than half the authorities can agree about who
the authorities are.
1. The idea.
Instead of publishing a network status whenever something changes,
each authority instead publishes a fresh network status only once per
"period" (say, 60 minutes). Authorities either upload this network
status (or "vote") to every other authority, or download every other
authority's "vote" (see 3.1 below for discussion on push vs pull).
After an authority has (or has become convinced that it won't be able to
get) every other authority's vote, it deterministically computes a
consensus networkstatus, and signs it. Authorities download (or are
uploaded; see 3.1) one another's signatures, and form a multiply signed
consensus. This multiply-signed consensus is what caches cache and what
clients download.
If an authority is down, authorities vote based on what they *can*
download/get uploaded.
If an authority is "a little" down and only some authorities can reach
it, authorities try to get its info from other authorities.
If an authority computes the vote wrong, its signature isn't included on
the consensus.
Clients use a consensus if it is "trusted": signed by more than half the
authorities they recognize. If clients can't find any such consensus,
they use the most recent trusted consensus they have. If they don't
have any trusted consensus, they warn the user and refuse to operate
(and if DirServers is not the default, beg the user to adapt the list
of authorities).
2. Details.
2.0. Versioning
All documents generated here have version "3" given in their
network-status-version entries.
2.1. Vote specifications
Votes in v3 are similar to v2 network status documents. We add these
fields to the preamble:
"vote-status" -- the word "vote".
"valid-until" -- the time when this authority expects to publish its
next vote.
"known-flags" -- a space-separated list of flags that will sometimes
be included on "s" lines later in the vote.
"dir-source" -- as before, except the "hostname" part MUST be the
authority's nickname, which MUST be unique among authorities, and
MUST match the nickname in the "directory-signature" entry.
Authorities SHOULD cache their most recently generated votes so they
can persist them across restarts. Authorities SHOULD NOT generate
another document until valid-until has passed.
Router entries in the vote MUST be sorted in ascending order by router
identity digest. The flags in "s" lines MUST appear in alphabetical
order.
Votes SHOULD be synchronized to half-hour publication intervals (one
hour? XXX say more; be more precise.)
XXXX some way to request older networkstatus docs?
2.2. Consensus directory specifications
Consensuses are like v3 votes, except for the following fields:
"vote-status" -- the word "consensus".
"published" is the latest of all the published times on the votes.
"valid-until" is the earliest of all the valid-until times on the
votes.
"dir-source" and "fingerprint" and "dir-signing-key" and "contact"
are included for each authority that contributed to the vote.
"vote-digest" for each authority that contributed to the vote,
calculated as for the digest in the signature on the vote. [XXX
re-English this sentence]
"client-versions" and "server-versions" are sorted in ascending
order based on version-spec.txt.
"dir-options" and "known-flags" are not included.
[XXX really? why not list the ones that are used in the consensus?
For example, right now BadExit is in use, but no servers would be
labelled BadExit, and it's still worth knowing that it was considered
by the authorities. -RD]
The fields MUST occur in the following order:
"network-status-version"
"vote-status"
"published"
"valid-until"
For each authority, sorted in ascending order of nickname, case-
insensitively:
"dir-source", "fingerprint", "contact", "dir-signing-key",
"vote-digest".
"client-versions"
"server-versions"
The signatures at the end of the document appear as multiple instances
of directory-signature, sorted in ascending order by nickname,
case-insensitively.
A router entry should be included in the result if it is included by more
than half of the authorities (total authorities, not just those whose votes
we have). A router entry has a flag set if it is included by more than
half of the authorities who care about that flag. [XXXX this creates an
incentive for attackers to DOS authorities whose votes they don't like.
Can we remember what flags people set the last time we saw them? -NM]
[Which 'we' are we talking here? The end-users never learn which
authority sets which flags. So you're thinking the authorities
should record the last vote they saw from each authority and if it's
within a week or so, count all the flags that it advertised as 'no'
votes? Plausible. -RD]
The signature hash covers from the "network-status-version" line through
the characters "directory-signature" in the first "directory-signature"
line.
Consensus directories SHOULD be rejected if they are not signed by more
than half of the known authorities.
2.2.1. Detached signatures
Assuming full connectivity, every authority should compute and sign the
same consensus directory in each period. Therefore, it isn't necessary to
download the consensus computed by each authority; instead, the authorities
only push/fetch each others' signatures. A "detached signature" document
contains a single "consensus-digest" entry and one or more
directory-signature entries. [XXXX specify more.]
2.3. URLs and timelines
2.3.1. URLs and timeline used for agreement
An authority SHOULD publish its vote immediately at the start of each voting
period. It does this by making it available at
http://<hostname>/tor/status-vote/current/authority.z
and sending it in an HTTP POST request to each other authority at the URL
http://<hostname>/tor/post/vote
If, N minutes after the voting period has begun, an authority does not have
a current statement from another authority, the first authority retrieves
the other's statement.
Once an authority has a vote from another authority, it makes it available
at
http://<hostname>/tor/status-vote/current/<fp>.z
where <fp> is the fingerprint of the other authority's identity key.
The consensus network status, along with as many signatures as the server
currently knows, should be available at
http://<hostname>/tor/status-vote/current/consensus.z
All of the detached signatures it knows for consensus status should be
available at:
http://<hostname>/tor/status-vote/current/consensus-signatures.z
Once an authority has computed and signed a consensus network status, it
should send its detached signature to each other authority in an HTTP POST
request to the URL:
http://<hostname>/tor/post/consensus-signature
[XXXX Store votes to disk.]
2.3.2. Serving a consensus directory
Once the authority is done getting signatures on the consensus directory,
it should serve it from:
http://<hostname>/tor/status/consensus.z
Caches SHOULD download consensus directories from an authority and serve
them from the same URL.
2.3.3. Timeline and synchronization
[XXXX]
2.4. Distributing routerdescs between authorities
Consensus will be more meaningful if authorities take steps to make sure
that they all have the same set of descriptors _before_ the voting
starts. This is safe, since all descriptors are self-certified and
timestamped: it's always okay to replace a signed descriptor with a more
recent one signed by the same identity.
In the long run, we might want some kind of sophisticated process here.
For now, since authorities already download one another's networkstatus
documents and use them to determine what descriptors to download from one
another, we can rely on this existing mechanism to keep authorities up to
date.
[We should do a thorough read-through of dir-spec again to make sure
that the authorities converge on which descriptor to "prefer" for
each router. Right now the decision happens at the client, which is
no longer the right place for it. -RD]
3. Questions and concerns
3.1. Push or pull?
The URLs above define a push mechanism for publishing votes and consensus
signatures via HTTP POST requests, and a pull mechanism for downloading
these documents via HTTP GET requests. As specified, every authority will
post to every other. The "download if no copy has been received" mechanism
exists only as a fallback.
4. Migration
* It would be cool if caches could get ready to download consensus
status docs, verify enough signatures, and serve them now. That way
once stuff works all we need to do is upgrade the authorities. Caches
don't need to verify the correctness of the format so long as it's
signed (or maybe multisigned?). We need to make sure that caches back
off very quickly from downloading consensus docs until they're
actually implemented.

View File

@ -1,40 +0,0 @@
Filename: 102-drop-opt.txt
Title: Dropping "opt" from the directory format
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: Jan 2007
Status: Closed
Implemented-In: 0.2.0.x
Overview:
This document proposes a change in the format used to transmit router and
directory information.
This proposal has been accepted, implemented, and merged into dir-spec.txt.
Proposal:
The "opt" keyword in Tor's directory formats was originally intended to
mean, "it is okay to ignore this entry if you don't understand it"; the
default behavior has been "discard a routerdesc if it contains entries you
don't recognize."
But so far, every new flag we have added has been marked 'opt'. It would
probably make sense to change the default behavior to "ignore unrecognized
fields", and add the statement that clients SHOULD ignore fields they don't
recognize. As a meta-principle, we should say that clients and servers
MUST NOT have to understand new fields in order to use directory documents
correctly.
Of course, this will make it impossible to say, "The format has changed a
lot; discard this quietly if you don't understand it." We could do that by
adding a version field.
Status:
* We stopped requiring it as of 0.1.2.5-alpha. We'll stop generating it
once earlier formats are obsolete.

View File

@ -1,206 +0,0 @@
Filename: 103-multilevel-keys.txt
Title: Splitting identity key from regularly used signing key.
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: Jan 2007
Status: Closed
Implemented-In: 0.2.0.x
Overview:
This document proposes a change in the way identity keys are used, so that
highly sensitive keys can be password-protected and seldom loaded into RAM.
It presents options; it is not yet a complete proposal.
Proposal:
Replacing a directory authority's identity key in the event of a compromise
would be tremendously annoying. We'd need to tell every client to switch
their configuration, or update to a new version with an uploaded list. So
long as some weren't upgraded, they'd be at risk from whoever had
compromised the key.
With this in mind, it's a shame that our current protocol forces us to
store identity keys unencrypted in RAM. We need some kind of signing key
stored unencrypted, since we need to generate new descriptors/directories
and rotate link and onion keys regularly. (And since, of course, we can't
ask server operators to be on-hand to enter a passphrase every time we
want to rotate keys or sign a descriptor.)
The obvious solution seems to be to have a signing-only key that lives
indefinitely (months or longer) and signs descriptors and link keys, and a
separate identity key that's used to sign the signing key. Tor servers
could run in one of several modes:
1. Identity key stored encrypted. You need to pick a passphrase when
you enable this mode, and re-enter this passphrase every time you
rotate the signing key.
1'. Identity key stored separate. You save your identity key to a
floppy, and use the floppy when you need to rotate the signing key.
2. All keys stored unencrypted. In this case, we might not want to even
*have* a separate signing key. (We'll need to support no-separate-
signing-key mode anyway to keep old servers working.)
3. All keys stored encrypted. You need to enter a passphrase to start
Tor.
(Of course, we might not want to implement all of these.)
Case 1 is probably most usable and secure, if we assume that people don't
forget their passphrases or lose their floppies. We could mitigate this a
bit by encouraging people to PGP-encrypt their passphrases to themselves,
or keep a cleartext copy of their secret key secret-split into a few
pieces, or something like that.
Migration presents another difficulty, especially with the authorities. If
we use the current set of identity keys as the new identity keys, we're in
the position of having sensitive keys that have been stored on
media-of-dubious-encryption up to now. Also, we need to keep old clients
(who will expect descriptors to be signed by the identity keys they know
and love, and who will not understand signing keys) happy.
A possible solution:
One thing to consider is that router identity keys are not very sensitive:
if an OR disappears and reappears with a new key, the network treats it as
though an old router had disappeared and a new one had joined the network.
The Tor network continues unharmed; this isn't a disaster.
Thus, the ideas above are mostly relevant for authorities.
The most straightforward solution for the authorities is probably to take
advantage of the protocol transition that will come with proposal 101, and
introduce a new set of signing _and_ identity keys used only to sign votes
and consensus network-status documents. Signing and identity keys could be
delivered to users in a separate, rarely changing "keys" document, so that
the consensus network-status documents wouldn't need to include N signing
keys, N identity keys, and N certifications.
Note also that there is no reason that the identity/signing keys used by
directory authorities would necessarily have to be the same as the identity
keys those authorities use in their capacity as routers. Decoupling these
keys would give directory authorities the following set of keys:
Directory authority identity:
Highly confidential; stored encrypted and/or offline. Used to
identity directory authorities. Shipped with clients. Used to
sign Directory authority signing keys.
Directory authority signing key:
Stored online, accessible to regular Tor process. Used to sign
votes and consensus directories. Downloaded as part of a "keys"
document.
[Administrators SHOULD rotate their signing keys every month or
two, just to keep in practice and keep from forgetting the
password to the authority identity.]
V1-V2 directory authority identity:
Stored online, never changed. Used to sign legacy network-status
and directory documents.
Router identity:
Stored online, seldom changed. Used to sign server descriptors
for this authority in its role as a router. Implicitly certified
by being listed in network-status documents.
Onion key, link key:
As in tor-spec.txt
Extensions to Proposal 101.
Define a new document type, "Key certificate". It contains the
following fields, in order:
"dir-key-certificate-version": As network-status-version. Must be
"3".
"fingerprint": Hex fingerprint, with spaces, based on the directory
authority's identity key.
"dir-identity-key": The long-term identity key for this authority.
"dir-key-published": The time when this directory's signing key was
last changed.
"dir-key-expires": A time after which this key is no longer valid.
"dir-signing-key": As in proposal 101.
"dir-key-certification": A signature of the above fields, in order.
The signed material extends from the beginning of
"dir-key-certicate-version" through the newline after
"dir-key-certification". The identity key is used to generate
this signature.
These elements together constitute a "key certificate". These are
generated offline when starting a v3 authority. Private identity
keys SHOULD be stored offline, encrypted, or both. A running
authority only needs access to the signing key.
Unlike other keys currently used by Tor, the authority identity
keys and directory signing keys MAY be longer than 1024 bits.
(They SHOULD be 2048 bits or longer; they MUST NOT be shorter than
1024.)
Vote documents change as follows:
A key certificate MUST be included in-line in every vote document. With
the exception of "fingerprint", its elements MUST NOT appear in consensus
documents.
Consensus network statuses change as follows:
Remove dir-signing-key.
Change "directory-signature" to take a fingerprint of the authority's
identity key and a fingerprint of the authority's current signing key
rather than the authority's nickname.
Change "dir-source" to take the a fingerprint of the authority's
identity key rather than the authority's nickname or hostname.
Add a new document type:
A "keys" document contains all currently known key certificates.
All authorities serve it at
http://<hostname>/tor/status/keys.z
Caches and clients download the keys document whenever they receive a
consensus vote that uses a key they do not recognize. Caches download
from authorities; clients download from caches.
Processing votes:
When receiving a vote, authorities check to see if the key
certificate for the voter is different from the one they have. If
the key certificate _is_ different, and its dir-key-published is
more recent than the most recently known one, and it is
well-formed and correctly signed with the correct identity key,
then authorities remember it as the new canonical key certificate
for that voter.
A key certificate is invalid if any of the following hold:
* The version is unrecognized.
* The fingerprint does not match the identity key.
* The identity key or the signing key is ill-formed.
* The published date is very far in the past or future.
* The signature is not a valid signature of the key certificate
generated with the identity key.
When processing the signatures on consensus, clients and caches act as
follows:
1. Only consider the directory-signature entries whose identity
key hashes match trusted authorities.
2. If any such entries have signing key hashes that match unknown
signing keys, download a new keys document.
3. For every entry with a known (identity key,signing key) pair,
check the signature on the document.
4. If the document has been signed by more than half of the
authorities the client recognizes, treat the consensus as
correctly signed.
If not, but the number entries with known identity keys but
unknown signing keys might be enough to make the consensus
correctly signed, do not use the consensus, but do not discard
it until we have a new keys document.

View File

@ -1,183 +0,0 @@
Filename: 104-short-descriptors.txt
Title: Long and Short Router Descriptors
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: Jan 2007
Status: Closed
Implemented-In: 0.2.0.x
Overview:
This document proposes moving unused-by-clients information from regular
router descriptors into a new "extra info" router descriptor.
Proposal:
Some of the costliest fields in the current directory protocol are ones
that no client actually uses. In particular, the "read-history" and
"write-history" fields are used only by the authorities for monitoring the
status of the network. If we took them out, the size of a compressed list
of all the routers would fall by about 60%. (No other disposable field
would save much more than 2%.)
We propose to remove these fields from descriptors, and and have them
uploaded as a part of a separate signed "extra info" to the authorities.
This document will be signed. A hash of this document will be included in
the regular descriptors.
(We considered another design, where routers would generate and upload a
short-form and a long-form descriptor. Only the short-form descriptor would
ever be used by anybody for routing. The long-form descriptor would be
used only for analytics and other tools. We decided against this because
well-behaved tools would need to download short-form descriptors too (as
these would be the only ones indexed), and hence get redundant info. Badly
behaved tools would download only long-form descriptors, and expose
themselves to partitioning attacks.)
Other disposable fields:
Clients don't need these fields, but removing them doesn't help bandwidth
enough to be worthwhile.
contact (save about 1%)
fingerprint (save about 3%)
We could represent these fields more succinctly, but removing them would
only save 1%. (!)
reject
accept
(Apparently, exit polices are highly compressible.)
[Does size-on-disk matter to anybody? Some clients and servers don't
have much disk, or have really slow disk (e.g. USB). And we don't
store caches compressed right now. -RD]
Specification:
1. Extra Info Format.
An "extra info" descriptor contains the following fields:
"extra-info" Nickname Fingerprint
Identifies what router this is an extra info descriptor for.
Fingerprint is encoded in hex (using upper-case letters), with
no spaces.
"published" As currently documented in dir-spec.txt. It MUST match the
"published" field of the descriptor published at the same time.
"read-history"
"write-history"
As currently documented in dir-spec.txt. Optional.
"router-signature" NL Signature NL
A signature of the PKCS1-padded hash of the entire extra info
document, taken from the beginning of the "extra-info" line, through
the newline after the "router-signature" line. An extra info
document is not valid unless the signature is performed with the
identity key whose digest matches FINGERPRINT.
The "extra-info" field is required and MUST appear first. The
router-signature field is required and MUST appear last. All others are
optional. As for other documents, unrecognized fields must be ignored.
2. Existing formats
Implementations that use "read-history" and "write-history" SHOULD
continue accepting router descriptors that contain them. (Prior to
0.2.0.x, this information was encoded in ordinary router descriptors;
in any case they have always been listed as opt, so they should be
accepted anyway.)
Add these fields to router descriptors:
"extra-info-digest" Digest
"Digest" is a hex-encoded digest (using upper-case characters)
of the router's extra-info document, as signed in the router's
extra-info. (If this field is absent, no extra-info-digest
exists.)
"caches-extra-info"
Present if this router is a directory cache that provides
extra-info documents, or an authority that handles extra-info
documents.
(Since implementations before 0.1.2.5-alpha required that the "opt"
keyword precede any unrecognized entry, these keys MUST be preceded
with "opt" until 0.1.2.5-alpha is obsolete.)
3. New communications rules
Servers SHOULD generate and upload one extra-info document after each
descriptor they generate and upload; no more, no less. Servers MUST
upload the new descriptor before they upload the new extra-info.
Authorities receiving an extra-info document SHOULD verify all of the
following:
* They have a router descriptor for some server with a matching
nickname and identity fingerprint.
* That server's identity key has been used to sign the extra-info
document.
* The extra-info-digest field in the router descriptor matches
the digest of the extra-info document.
* The published fields in the two documents match.
Authorities SHOULD drop extra-info documents that do not meet these
criteria.
Extra-info documents MAY be uploaded as part of the same HTTP post as
the router descriptor, or separately. Authorities MUST accept both
methods.
Authorities SHOULD try to fetch extra-info documents from one another if
they do not have one matching the digest declared in a router
descriptor.
Caches that are running locally with a tool that needs to use extra-info
documents MAY download and store extra-info documents. They should do
so when they notice that the recommended descriptor has an
extra-info-digest not matching any extra-info document they currently
have. (Caches not running on a host that needs to use extra-info
documents SHOULD NOT download or cache them.)
4. New URLs
http://<hostname>/tor/extra/d/...
http://<hostname>/tor/extra/fp/...
http://<hostname>/tor/extra/all[.z]
(As for /tor/server/ URLs: supports fetching extra-info documents
by their digest, by the fingerprint of their servers, or all
at once. When serving by fingerprint, we serve the extra-info
that corresponds to the descriptor we would serve by that
fingerprint. Only directory authorities are guaranteed to support
these URLs.)
http://<hostname>/tor/extra/authority[.z]
(The extra-info document for this router.)
Extra-info documents are uploaded to the same URLs as regular
router descriptors.
Migration:
For extra info approach:
* First:
* Authorities should accept extra info, and support serving it.
* Routers should upload extra info once authorities accept it.
* Caches should support an option to download and cache it, once
authorities serve it.
* Tools should be updated to use locally cached information.
These tools include:
lefkada's exit.py script.
tor26's noreply script and general directory cache.
https://nighteffect.us/tns/ for its graphs
and check with or-talk for the rest, once it's time.
* Set a cutoff time for including bandwidth in router descriptors, so
that tools that use bandwidth info know that they will need to fetch
extra info documents.
* Once tools that want bandwidth info support fetching extra info:
* Have routers stop including bandwidth info in their router
descriptors.

View File

@ -1,325 +0,0 @@
Filename: 105-handshake-revision.txt
Title: Version negotiation for the Tor protocol.
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson, Roger Dingledine
Created: Jan 2007
Status: Closed
Implemented-In: 0.2.0.x
Overview:
This document was extracted from a modified version of tor-spec.txt that we
had written before the proposal system went into place. It adds two new
cells types to the Tor link connection setup handshake: one used for
version negotiation, and another to prevent MITM attacks.
This proposal is partially implemented, and partially proceded by
proposal 130.
Motivation: Tor versions
Our *current* approach to versioning the Tor protocol(s) has been as
follows:
- All changes must be backward compatible.
- It's okay to add new cell types, if they would be ignored by previous
versions of Tor.
- It's okay to add new data elements to cells, if they would be
ignored by previous versions of Tor.
- For forward compatibility, Tor must ignore cell types it doesn't
recognize, and ignore data in those cells it doesn't expect.
- Clients can inspect the version of Tor declared in the platform line
of a router's descriptor, and use that to learn whether a server
supports a given feature. Servers, however, aren't assumed to all
know about each other, and so don't know the version of who they're
talking to.
This system has these problems:
- It's very hard to change fundamental aspects of the protocol, like the
cell format, the link protocol, any of the various encryption schemes,
and so on.
- The router-to-router link protocol has remained more-or-less frozen
for a long time, since we can't easily have an OR use new features
unless it knows the other OR will understand them.
We need to resolve these problems because:
- Our cipher suite is showing its age: SHA1/AES128/RSA1024/DH1024 will
not seem like the best idea for all time.
- There are many ideas circulating for multiple cell sizes; while it's
not obvious whether these are safe, we can't do them at all without a
mechanism to permit them.
- There are many ideas circulating for alternative circuit building and
cell relay rules: they don't work unless they can coexist in the
current network.
- If our protocol changes a lot, it's hard to describe any coherent
version of it: we need to say "the version that Tor versions W through
X use when talking to versions Y through Z". This makes analysis
harder.
Motivation: Preventing MITM attacks
TLS prevents a man-in-the-middle attacker from reading or changing the
contents of a communication. It does not, however, prevent such an
attacker from observing timing information. Since timing attacks are some
of the most effective against low-latency anonymity nets like Tor, we
should take more care to make sure that we're not only talking to who
we think we're talking to, but that we're using the network path we
believe we're using.
Motivation: Signed clock information
It's very useful for Tor instances to know how skewed they are relative
to one another. The only way to find out currently has been to download
directory information, and check the Date header--but this is not
authenticated, and hence subject to modification on the wire. Using
BEGIN_DIR to create an authenticated directory stream through an existing
circuit is better, but that's an extra step and it might be nicer to
learn the information in the course of the regular protocol.
Proposal:
1.0. Version numbers
The node-to-node TLS-based "OR connection" protocol and the multi-hop
"circuit" protocol are versioned quasi-independently.
Of course, some dependencies will continue to exist: Certain versions
of the circuit protocol may require a minimum version of the connection
protocol to be used. The connection protocol affects:
- Initial connection setup, link encryption, transport guarantees,
etc.
- The allowable set of cell commands
- Allowable formats for cells.
The circuit protocol determines:
- How circuits are established and maintained
- How cells are decrypted and relayed
- How streams are established and maintained.
Version numbers are incremented for backward-incompatible protocol changes
only. Backward-compatible changes are generally implemented by adding
additional fields to existing structures; implementations MUST ignore
fields they do not expect. Unused portions of cells MUST be set to zero.
Though versioning the protocol will make it easier to maintain backward
compatibility with older versions of Tor, we will nevertheless continue to
periodically drop support for older protocols,
- to keep the implementation from growing without bound,
- to limit the maintenance burden of patching bugs in obsolete Tors,
- to limit the testing burden of verifying that many old protocol
versions continue to be implemented properly, and
- to limit the exposure of the network to protocol versions that are
expensive to support.
The Tor protocol as implemented through the 0.1.2.x Tor series will be
called "version 1" in its link protocol and "version 1" in its relay
protocol. Versions of the Tor protocol so old as to be incompatible with
Tor 0.1.2.x can be considered to be version 0 of each, and are not
supported.
2.1. VERSIONS cells
When a Tor connection is established, both parties normally send a
VERSIONS cell before sending any other cells. (But see below.)
VersionsLen [2 byte]
Versions [VersionsLen bytes]
"Versions" is a sequence of VersionsLen bytes. Each value between 1 and
127 inclusive represents a single version; current implementations MUST
ignore other bytes. Parties should list all of the versions which they
are able and willing to support. Parties can only communicate if they
have some connection protocol version in common.
Version 0.2.0.x-alpha and earlier don't understand VERSIONS cells,
and therefore don't support version negotiation. Thus, waiting until
the other side has sent a VERSIONS cell won't work for these servers:
if the other side sends no cells back, it is impossible to tell
whether they
have sent a VERSIONS cell that has been stalled, or whether they have
dropped our own VERSIONS cell as unrecognized. Therefore, we'll
change the TLS negotiation parameters so that old parties can still
negotiate, but new parties can recognize each other. Immediately
after a TLS connection has been established, the parties check
whether the other side negotiated the connection in an "old" way or a
"new" way. If either party negotiated in the "old" way, we assume a
v1 connection. Otherwise, both parties send VERSIONS cells listing
all their supported versions. Upon receiving the other party's
VERSIONS cell, the implementation begins using the highest-valued
version common to both cells. If the first cell from the other party
has a recognized command, and is _not_ a VERSIONS cell, we assume a
v1 protocol.
(For more detail on the TLS protocol change, see forthcoming draft
proposals from Steven Murdoch.)
Implementations MUST discard VERSIONS cells that are not the first
recognized cells sent on a connection.
The VERSIONS cell must be sent as a v1 cell (2 bytes of circuitID, 1
byte of command, 509 bytes of payload).
[NOTE: The VERSIONS cell is assigned the command number 7.]
2.2. MITM-prevention and time checking
If we negotiate a v2 connection or higher, the second cell we send SHOULD
be a NETINFO cell. Implementations SHOULD NOT send NETINFO cells at other
times.
A NETINFO cell contains:
Timestamp [4 bytes]
Other OR's address [variable]
Number of addresses [1 byte]
This OR's addresses [variable]
Timestamp is the OR's current Unix time, in seconds since the epoch. If
an implementation receives time values from many ORs that
indicate that its clock is skewed, it SHOULD try to warn the
administrator. (We leave the definition of 'many' intentionally vague
for now.)
Before believing the timestamp in a NETINFO cell, implementations
SHOULD compare the time at which they received the cell to the time
when they sent their VERSIONS cell. If the difference is very large,
it is likely that the cell was delayed long enough that its
contents are out of date.
Each address contains Type/Length/Value as used in Section 6.4 of
tor-spec.txt. The first address is the one that the party sending
the NETINFO cell believes the other has -- it can be used to learn
what your IP address is if you have no other hints.
The rest of the addresses are the advertised addresses of the party
sending the NETINFO cell -- we include them
to block a man-in-the-middle attack on TLS that lets an attacker bounce
traffic through his own computers to enable timing and packet-counting
attacks.
A Tor instance should use the other Tor's reported address
information as part of logic to decide whether to treat a given
connection as suitable for extending circuits to a given address/ID
combination. When we get an extend request, we use an
existing OR connection if the ID matches, and ANY of the following
conditions hold:
- The IP matches the requested IP.
- We know that the IP we're using is canonical because it was
listed in the NETINFO cell.
- We know that the IP we're using is canonical because it was
listed in the server descriptor.
[NOTE: The NETINFO cell is assigned the command number 8.]
Discussion: Versions versus feature lists
Many protocols negotiate lists of available features instead of (or in
addition to) protocol versions. While it's possible that some amount of
feature negotiation could be supported in a later Tor, we should prefer to
use protocol versions whenever possible, for reasons discussed in
the "Anonymity Loves Company" paper.
Discussion: Bytes per version, versions per cell
This document provides for a one-byte count of how many versions a Tor
supports, and allows one byte per version. Thus, it can only support only
254 more versions of the protocol beyond the unallocated v0 and the
current v1. If we ever need to split the protocol into 255 incompatible
versions, we've probably screwed up badly somewhere.
Nevertheless, here are two ways we could support more versions:
- Change the version count to a two-byte field that counts the number of
_bytes_ used, and use a UTF8-style encoding: versions 0 through 127
take one byte to encode, versions 128 through 2047 take two bytes to
encode, and so on. We wouldn't need to parse any version higher than
127 right now, since all bytes used to encode higher versions would
have their high bit set.
We'd still have a limit of 380 simultaneously versions that could be
declared in any version. This is probably okay.
- Decide that if we need to support more versions, we can add a
MOREVERSIONS cell that gets sent before the VERSIONS cell. The spec
above requires Tors to ignore unrecognized cell types that they get
before the first VERSIONS cell, and still allows version negotiation
to
succeed.
[Resolution: Reserve the high bit and the v0 value for later use. If
we ever have more live versions than we can fit in a cell, we've made a
bad design decision somewhere along the line.]
Discussion: Reducing round-trips
It might be appealing to see if we can cram more information in the
initial VERSIONS cell. For example, the contents of NETINFO will pretty
soon be sent by everybody before any more information is exchanged, but
decoupling them from the version exchange increases round-trips.
Instead, we could speculatively include handshaking information at
the end of a VERSIONS cell, wrapped in a marker to indicate, "if we wind
up speaking VERSION 2, here's the NETINFO I'll send. Otherwise, ignore
this." This could be extended to opportunistically reduce round trips
when possible for future versions when we guess the versions right.
Of course, we'd need to be careful about using a feature like this:
- We don't want to include things that are expensive to compute,
like PK signatures or proof-of-work.
- We don't want to speculate as a mobile client: it may leak our
experience with the server in question.
Discussion: Advertising versions in routerdescs and networkstatuses.
In network-statuses:
The networkstatus "v" line now has the format:
"v" IMPLEMENTATION IMPL-VERSION "Link" LINK-VERSION-LIST
"Circuit" CIRCUIT-VERSION-LIST NL
LINK-VERSION-LIST and CIRCUIT-VERSION-LIST are comma-separated lists of
supported version numbers. IMPLEMENTATION is the name of the
implementation of the Tor protocol (e.g., "Tor"), and IMPL-VERSION is the
version of the implementation.
Examples:
v Tor 0.2.5.1-alpha Link 1,2,3 Circuit 2,5
v OtherOR 2000+ Link 3 Circuit 5
Implementations that release independently of the Tor codebase SHOULD NOT
use "Tor" as the value of their IMPLEMENTATION.
Additional fields on the "v" line MUST be ignored.
In router descriptors:
The router descriptor should contain a line of the form,
"protocols" "Link" LINK-VERSION-LIST "Circuit" CIRCUIT_VERSION_LIST
Additional fields on the "protocols" line MUST be ignored.
[Versions of Tor before 0.1.2.5-alpha rejected router descriptors with
unrecognized items; the protocols line should be preceded with an "opt"
until these Tors are obsolete.]
Security issues:
Client partitioning is the big danger when we introduce new versions; if a
client supports some very unusual set of protocol versions, it will stand
out from others no matter where it goes. If a server supports an unusual
version, it will get a disproportionate amount of traffic from clients who
prefer that version. We can mitigate this somewhat as follows:
- Do not have clients prefer any protocol version by default until that
version is widespread. (First introduce the new version to servers,
and have clients admit to using it only when configured to do so for
testing. Then, once many servers are running the new protocol
version, enable its use by default.)
- Do not multiply protocol versions needlessly.
- Encourage protocol implementors to implement the same protocol version
sets as some popular version of Tor.
- Disrecommend very old/unpopular versions of Tor via the directory
authorities' RecommmendedVersions mechanism, even if it is still
technically possible to use them.

View File

@ -1,113 +0,0 @@
Filename: 106-less-tls-constraint.txt
Title: Checking fewer things during TLS handshakes
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: 9-Feb-2007
Status: Closed
Implemented-In: 0.2.0.x
Overview:
This document proposes that we relax our requirements on the context of
X.509 certificates during initial TLS handshakes.
Motivation:
Later, we want to try harder to avoid protocol fingerprinting attacks.
This means that we'll need to make our connection handshake look closer
to a regular HTTPS connection: one certificate on the server side and
zero certificates on the client side. For now, about the best we
can do is to stop requiring things during handshake that we don't
actually use.
What we check now, and where we check it:
tor_tls_check_lifetime:
peer has certificate
notBefore <= now <= notAfter
tor_tls_verify:
peer has at least one certificate
There is at least one certificate in the chain
At least one of the certificates in the chain is not the one used to
negotiate the connection. (The "identity cert".)
The certificate _not_ used to negotiate the connection has signed the
link cert
tor_tls_get_peer_cert_nickname:
peer has a certificate.
certificate has a subjectName.
subjectName has a commonName.
commonName consists only of characters in LEGAL_NICKNAME_CHARACTERS. [2]
tor_tls_peer_has_cert:
peer has a certificate.
connection_or_check_valid_handshake:
tor_tls_peer_has_cert [1]
tor_tls_get_peer_cert_nickname [1]
tor_tls_verify [1]
If nickname in cert is a known, named router, then its identity digest
must be as expected.
If we initiated the connection, then we got the identity digest we
expected.
USEFUL THINGS WE COULD DO:
[1] We could just not force clients to have any certificate at all, let alone
an identity certificate. Internally to the code, we could assign the
identity_digest field of these or_connections to a random number, or even
not add them to the identity_digest->or_conn map.
[so if somebody connects with no certs, we let them. and mark them as
a client and don't treat them as a server. great. -rd]
[2] Instead of using a restricted nickname character set that makes our
commonName structure look unlike typical SSL certificates, we could treat
the nickname as extending from the start of the commonName up to but not
including the first non-nickname character.
Alternatively, we could stop checking commonNames entirely. We don't
actually _do_ anything based on the nickname in the certificate, so
there's really no harm in letting every router have any commonName it
wants.
[this is the better choice -rd]
[agreed. -nm]
REMAINING WAYS TO RECOGNIZE CLIENT->SERVER CONNECTIONS:
Assuming that we removed the above requirements, we could then (in a later
release) have clients not send certificates, and sometimes and started
making our DNs a little less formulaic, client->server OR connections would
still be recognizable by:
having a two-certificate chain sent by the server
using a particular set of ciphersuites
traffic patterns
probing the server later
OTHER IMPLICATIONS:
If we stop verifying the above requirements:
It will be slightly (but only slightly) more common to connect to a non-Tor
server running TLS, and believe that you're talking to a Tor server (until
you send the first cell).
It will be far easier for non-Tor SSL clients to accidentally connect to
Tor servers and speak HTTPS or whatever to them.
If, in a later release, we have clients not send certificates, and we make
DNs less recognizable:
If clients don't send certs, servers don't need to verify them: win!
If we remove these restrictions, it will be easier for people to write
clients to fuzz our protocol: sorta win!
If clients don't send certs, they look slightly less like servers.
OTHER SPEC CHANGES:
When a client doesn't give us an identity, we should never extend any
circuits to it (duh), and we should allow it to set circuit ID however it
wants.

View File

@ -1,56 +0,0 @@
Filename: 107-uptime-sanity-checking.txt
Title: Uptime Sanity Checking
Version: $Revision$
Last-Modified: $Date$
Author: Kevin Bauer & Damon McCoy
Created: 8-March-2007
Status: Closed
Implemented-In: 0.2.0.x
Overview:
This document describes how to cap the uptime that is used when computing
which routers are marked as stable such that highly stable routers cannot
be displaced by malicious routers that report extremely high uptime
values.
This is similar to how bandwidth is capped at 1.5MB/s.
Motivation:
It has been pointed out that an attacker can displace all stable nodes and
entry guard nodes by reporting high uptimes. This is an easy fix that will
prevent highly stable nodes from being displaced.
Security implications:
It should decrease the effectiveness of routing attacks that report high
uptimes while not impacting the normal routing algorithms.
Specification:
So we could patch Section 3.1 of dir-spec.txt to say:
"Stable" -- A router is 'Stable' if it is running, valid, not
hibernating, and either its uptime is at least the median uptime for
known running, valid, non-hibernating routers, or its uptime is at
least 30 days. Routers are never called stable if they are running
a version of Tor known to drop circuits stupidly. (0.1.1.10-alpha
through 0.1.1.16-rc are stupid this way.)
Compatibility:
There should be no compatibility issues due to uptime capping.
Implementation:
Implemented and merged into dir-spec in 0.2.0.0-alpha-dev (r9788).
Discussion:
Initially, this proposal set the maximum at 60 days, not 30; the 30 day
limit and spec wording was suggested by Roger in an or-dev post on 9 March
2007.
This proposal also led to 108-mtbf-based-stability.txt

View File

@ -1,90 +0,0 @@
Filename: 108-mtbf-based-stability.txt
Title: Base "Stable" Flag on Mean Time Between Failures
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: 10-Mar-2007
Status: Closed
Implemented-In: 0.2.0.x
Overview:
This document proposes that we change how directory authorities set the
stability flag from inspection of a router's declared Uptime to the
authorities' perceived mean time between failure for the router.
Motivation:
Clients prefer nodes that the authorities call Stable. This flag is (as
of 0.2.0.0-alpha-dev) set entirely based on the node's declared value for
uptime. This creates an opportunity for malicious nodes to declare
falsely high uptimes in order to get more traffic.
Spec changes:
Replace the current rule for setting the Stable flag with:
"Stable" -- A router is 'Stable' if it is active and its observed Stability
for the past month is at or above the median Stability for active routers.
Routers are never called stable if they are running a version of Tor
known to drop circuits stupidly. (0.1.1.10-alpha through 0.1.1.16-rc
are stupid this way.)
Stability shall be defined as the weighted mean length of the runs
observed by a given directory authority. A run begins when an authority
decides that the server is Running, and ends when the authority decides
that the server is not Running. In-progress runs are counted when
measuring Stability. When calculating the mean, runs are weighted by
$\alpha ^ t$, where $t$ is time elapsed since the end of the run, and
$0 < \alpha < 1$. Time when an authority is down do not count to the
length of the run.
Rejected Alternative:
"A router's Stability shall be defined as the sum of $\alpha ^ d$ for every
$d$ such that the router was considered reachable for the entire day
$d$ days ago.
This allows a simpler implementation: every day, we multiply
yesterday's Stability by alpha, and if the router was observed to be
available every time we looked today, we add 1.
Instead of "day", we could pick an arbitrary time unit. We should
pick alpha to be high enough that long-term stability counts, but low
enough that the distant past is eventually forgotten. Something
between .8 and .95 seems right.
(By requiring that routers be up for an entire day to get their
stability increased, instead of counting fractions of a day, we
capture the notion that stability is more like "probability of
staying up for the next hour" than it is like "probability of being
up at some randomly chosen time over the next hour." The former
notion of stability is far more relevant for long-lived circuits.)
Limitations:
Authorities can have false positives and false negatives when trying to
tell whether a router is up or down. So long as these aren't terribly
wrong, and so long as they aren't significantly biased, we should be able
to use them to estimate stability pretty well.
Probing approaches like the above could miss short incidents of
downtime. If we use the router's declared uptime, we could detect
these: but doing so would penalize routers who reported their uptime
accurately.
Implementation:
For now, the easiest way to store this information at authorities
would probably be in some kind of periodically flushed flat file.
Later, we could move to Berkeley db or something if we really had to.
For each router, an authority will need to store:
The router ID.
Whether the router is up.
The time when the current run started, if the router is up.
The weighted sum length of all previous runs.
The time at which the weighted sum length was last weighted down.
Servers should probe at random intervals to test whether servers are
running.

View File

@ -1,92 +0,0 @@
Filename: 109-no-sharing-ips.txt
Title: No more than one server per IP address.
Version: $Revision$
Last-Modified: $Date$
Author: Kevin Bauer & Damon McCoy
Created: 9-March-2007
Status: Closed
Implemented-In: 0.2.0.x
Overview:
This document describes a solution to a Sybil attack vulnerability in the
directory servers. Currently, it is possible for a single IP address to
host an arbitrarily high number of Tor routers. We propose that the
directory servers limit the number of Tor routers that may be registered at
a particular IP address to some small (fixed) number, perhaps just one Tor
router per IP address.
While Tor never uses more than one server from a given /16 in the same
circuit, an attacker with multiple servers in the same place is still
dangerous because he can get around the per-server bandwidth cap that is
designed to prevent a single server from attracting too much of the overall
traffic.
Motivation:
Since it is possible for an attacker to register an arbitrarily large
number of Tor routers, it is possible for malicious parties to do this
as part of a traffic analysis attack.
Security implications:
This countermeasure will increase the number of IP addresses that an
attacker must control in order to carry out traffic analysis.
Specification:
For each IP address, each directory authority tracks the number of routers
using that IP address, along with their total observed bandwidth. If there
are more than MAX_SERVERS_PER_IP servers at some IP, the authority should
"disable" all but MAX_SERVERS_PER_IP servers. When choosing which servers
to disable, the authority should first disable non-Running servers in
increasing order of observed bandwidth, and then should disable Running
servers in increasing order of bandwidth.
[[ We don't actually do this part here. -NM
If the total observed
bandwidth of the remaining non-"disabled" servers exceeds MAX_BW_PER_IP,
the authority should "disable" some of the remaining servers until only one
server remains, or until the remaining observed bandwidth of non-"disabled"
servers is under MAX_BW_PER_IP.
]]
Servers that are "disabled" MUST be marked as non-Valid and non-Running.
MAX_SERVERS_PER_IP is 3.
MAX_BW_PER_IP is 8 MB per s.
Compatibility:
Upon inspection of a directory server, we found that the following IP
addresses have more than one Tor router:
Scruples 68.5.113.81 ip68-5-113-81.oc.oc.cox.net 443
WiseUp 68.5.113.81 ip68-5-113-81.oc.oc.cox.net 9001
Unnamed 62.1.196.71 pc01-megabyte-net-arkadiou.megabyte.gr 9001
Unnamed 62.1.196.71 pc01-megabyte-net-arkadiou.megabyte.gr 9001
Unnamed 62.1.196.71 pc01-megabyte-net-arkadiou.megabyte.gr 9001
aurel 85.180.62.138 e180062138.adsl.alicedsl.de 9001
sokrates 85.180.62.138 e180062138.adsl.alicedsl.de 9001
moria1 18.244.0.188 moria.mit.edu 9001
peacetime 18.244.0.188 moria.mit.edu 9100
There may exist compatibility issues with this proposed fix. Reasons why
more than one server would share an IP address include:
* Testing. moria1, moria2, peacetime, and other morias all run on one
computer at MIT, because that way we get testing. Moria1 and moria2 are
run by Roger, and peacetime is run by Nick.
* NAT. If there are several servers but they port-forward through the same
IP address, ... we can hope that the operators coordinate with each
other. Also, we should recognize that while they help the network in
terms of increased capacity, they don't help as much as they could in
terms of location diversity. But our approach so far has been to take
what we can get.
* People who have more than 1.5MB/s and want to help out more. For
example, for a while Tonga was offering 10MB/s and its Tor server
would only make use of a bit of it. So Roger suggested that he run
two Tor servers, to use more.
[Note Roger's tweak to this behavior, in
http://archives.seul.org/or/cvs/Oct-2007/msg00118.html]

View File

@ -1,122 +0,0 @@
Filename: 110-avoid-infinite-circuits.txt
Title: Avoiding infinite length circuits
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 13-Mar-2007
Status: Accepted
Target: 0.2.1.x
Implemented-In: 0.2.1.3-alpha
History:
Revised 28 July 2008 by nickm: set K.
Revised 3 July 2008 by nickm: rename from relay_extend to
relay_early. Revise to current migration plan. Allow K cells
over circuit lifetime, not just at start.
Overview:
Right now, an attacker can add load to the Tor network by extending a
circuit an arbitrary number of times. Every cell that goes down the
circuit then adds N times that amount of load in overall bandwidth
use. This vulnerability arises because servers don't know their position
on the path, so they can't tell how many nodes there are before them
on the path.
We propose a new set of relay cells that are distinguishable by
intermediate hops as permitting extend cells. This approach will allow
us to put an upper bound on circuit length relative to the number of
colluding adversary nodes; but there are some downsides too.
Motivation:
The above attack can be used to generally increase load all across the
network, or it can be used to target specific servers: by building a
circuit back and forth between two victim servers, even a low-bandwidth
attacker can soak up all the bandwidth offered by the fastest Tor
servers.
The general attacks could be used as a demonstration that Tor isn't
perfect (leading to yet more media articles about "breaking" Tor), and
the targetted attacks will come into play once we have a reputation
system -- it will be trivial to DoS a server so it can't pass its
reputation checks, in turn impacting security.
Design:
We should split RELAY cells into two types: RELAY and RELAY_EARLY.
Only K (say, 10) Relay_early cells can be sent across a circuit, and
only relay_early cells are allowed to contain extend requests. We
still support obscuring the length of the circuit (if more research
shows us what to do), because Alice can choose how many of the K to
mark as relay_early. Note that relay_early cells *can* contain any
sort of data cell; so in effect it's actually the relay type cells
that are restricted. By default, she would just send the first K
data cells over the stream as relay_early cells, regardless of their
actual type.
(Note that a circuit that is out of relay_early cells MUST NOT be
cannibalized later, since it can't extend. Note also that it's always okay
to use regular RELAY cells when sending non-EXTEND commands targetted at
the first hop of a circuit, since there is no intermediate hop to try to
learn the relay command type.)
Each intermediate server would pass on the same type of cell that it
received (either relay or relay_early), and the cell's destination
will be able to learn whether it's allowed to contain an Extend request.
If an intermediate server receives more than K relay_early cells, or
if it sees a relay cell that contains an extend request, then it
tears down the circuit (protocol violation).
Security implications:
The upside is that this limits the bandwidth amplification factor to
K: for an individual circuit to become arbitrary-length, the attacker
would need an adversary-controlled node every K hops, and at that
point the attack is no worse than if the attacker creates N/K separate
K-hop circuits.
On the other hand, we want to pick a large enough value of K that we
don't mind the cap.
If we ever want to take steps to hide the number of hops in the circuit
or a node's position in the circuit, this design probably makes that
more complex.
Migration:
In 0.2.0, servers speaking v2 or later of the link protocol accept
RELAY_EARLY cells, and pass them on. If the next OR in the circuit
is not speaking the v2 link protocol, the server relays the cell as
a RELAY cell.
In 0.2.1.3-alpha, clients begin using RELAY_EARLY cells on v2
connections. This functionality can be safely backported to
0.2.0.x. Clients should pick a random number betweeen (say) K and
K-2 to send.
In 0.2.1.3-alpha, servers close any circuit in which more than K
relay_early cells are sent.
Once all versions the do not send RELAY_EARLY cells are obsolete,
servers can begin to reject any EXTEND requests not sent in a
RELAY_EARLY cell.
Parameters:
Let K = 8, for no terribly good reason.
Spec:
[We can formalize this part once we think the design is a good one.]
Acknowledgements:
This design has been kicking around since Christian Grothoff and I came
up with it at PET 2004. (Nathan Evans, Christian Grothoff's student,
is working on implementing a fix based on this design in the summer
2007 timeframe.)

View File

@ -1,153 +0,0 @@
Filename: 111-local-traffic-priority.txt
Title: Prioritizing local traffic over relayed traffic
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 14-Mar-2007
Status: Closed
Implemented-In: 0.2.0.x
Overview:
We describe some ways to let Tor users operate as a relay and enforce
rate limiting for relayed traffic without impacting their locally
initiated traffic.
Motivation:
Right now we encourage people who use Tor as a client to configure it
as a relay too ("just click the button in Vidalia"). Most of these users
are on asymmetric links, meaning they have a lot more download capacity
than upload capacity. But if they enable rate limiting too, suddenly
they're limited to the same download capacity as upload capacity. And
they have to enable rate limiting, or their upstream pipe gets filled
up, starts dropping packets, and now their net connection doesn't work
even for non-Tor stuff. So they end up turning off the relaying part
so they can use Tor (and other applications) again.
So far this hasn't mattered that much: most of our fast relays are
being operated only in relay mode, so the rate limiting makes sense
for them. But if we want to be able to attract many more relays in
the future, we need to let ordinary users act as relays too.
Further, as we begin to deploy the blocking-resistance design and we
rely on ordinary users to click the "Tor for Freedom" button, this
limitation will become a serious stumbling block to getting volunteers
to act as bridges.
The problem:
Tor implements its rate limiting on the 'read' side by only reading
a certain number of bytes from the network in each second. If it has
emptied its token bucket, it doesn't read any more from the network;
eventually TCP notices and stalls until we resume reading. But if we
want to have two classes of service, we can't know what class a given
incoming cell will be until we look at it, at which point we've already
read it.
Some options:
Option 1: read when our token bucket is full enough, and if it turns
out that what we read was local traffic, then add the tokens back into
the token bucket. This will work when local traffic load alternates
with relayed traffic load; but it's a poor option in general, because
when we're receiving both local and relayed traffic, there are plenty
of cases where we'll end up with an empty token bucket, and then we're
back where we were before.
More generally, notice that our problem is easy when a given TCP
connection either has entirely local circuits or entirely relayed
circuits. In fact, even if they are both present, if one class is
entirely idle (none of its circuits have sent or received in the past
N seconds), we can ignore that class until it wakes up again. So it
only gets complex when a single connection contains active circuits
of both classes.
Next, notice that local traffic uses only the entry guards, whereas
relayed traffic likely doesn't. So if we're a bridge handling just
a few users, the expected number of overlapping connections would be
almost zero, and even if we're a full relay the number of overlapping
connections will be quite small.
Option 2: build separate TCP connections for local traffic and for
relayed traffic. In practice this will actually only require a few
extra TCP connections: we would only need redundant TCP connections
to at most the number of entry guards in use.
However, this approach has some drawbacks. First, if the remote side
wants to extend a circuit to you, how does it know which TCP connection
to send it on? We would need some extra scheme to label some connections
"client-only" during construction. Perhaps we could do this by seeing
whether any circuit was made via CREATE_FAST; but this still opens
up a race condition where the other side sends a create request
immediately. The only ways I can imagine to avoid the race entirely
are to specify our preference in the VERSIONS cell, or to add some
sort of "nope, not this connection, why don't you try another rather
than failing" response to create cells, or to forbid create cells on
connections that you didn't initiate and on which you haven't seen
any circuit creation requests yet -- this last one would lead to a bit
more connection bloat but doesn't seem so bad. And we already accept
this race for the case where directory authorities establish new TCP
connections periodically to check reachability, and then hope to hang
up on them soon after. (In any case this issue is moot for bridges,
since each destination will be one-way with respect to extend requests:
either receiving extend requests from bridge users or sending extend
requests to the Tor server, never both.)
The second problem with option 2 is that using two TCP connections
reveals that there are two classes of traffic (and probably quickly
reveals which is which, based on throughput). Now, it's unclear whether
this information is already available to the other relay -- he would
easily be able to tell that some circuits are fast and some are rate
limited, after all -- but it would be nice to not add even more ways to
leak that information. Also, it's less clear that an external observer
already has this information if the circuits are all bundled together,
and for this case it's worth trying to protect it.
Option 3: tell the other side about our rate limiting rules. When we
establish the TCP connection, specify the different policy classes we
have configured. Each time we extend a circuit, specify which policy
class that circuit should be part of. Then hope the other side obeys
our wishes. (If he doesn't, hang up on him.) Besides the design and
coordination hassles involved in this approach, there's a big problem:
our rate limiting classes apply to all our connections, not just
pairwise connections. How does one server we're connected to know how
much of our bucket has already been spent by another? I could imagine
a complex and inefficient "ok, now you can send me those two more cells
that you've got queued" protocol. I'm not sure how else we could do it.
(Gosh. How could UDP designs possibly be compatible with rate limiting
with multiple bucket sizes?)
Option 4: put both classes of circuits over a single connection, and
keep track of the last time we read or wrote a high-priority cell. If
it's been less than N seconds, give the whole connection high priority,
else give the whole connection low priority.
Option 5: put both classes of circuits over a single connection, and
play a complex juggling game by periodically telling the remote side
what rate limits to set for that connection, so you end up giving
priority to the right connections but still stick to roughly your
intended bandwidthrate and relaybandwidthrate.
Option 6: ?
Prognosis:
Nick really didn't like option 2 because of the partitioning questions.
I've put option 4 into place as of Tor 0.2.0.3-alpha.
In terms of implementation, it will be easy: just add a time_t to
or_connection_t that specifies client_used (used by the initiator
of the connection to rate limit it differently depending on how
recently the time_t was reset). We currently update client_used
in three places:
- command_process_relay_cell() when we receive a relay cell for
an origin circuit.
- relay_send_command_from_edge() when we send a relay cell for
an origin circuit.
- circuit_deliver_create_cell() when send a create cell.
We could probably remove the third case and it would still work,
but hey.

View File

@ -1,165 +0,0 @@
Filename: 112-bring-back-pathlencoinweight.txt
Title: Bring Back Pathlen Coin Weight
Version: $Revision$
Last-Modified: $Date$
Author: Mike Perry
Created:
Status: Superseded
Superseded-By: 115
Overview:
The idea is that users should be able to choose a weight which
probabilistically chooses their path lengths to be 2 or 3 hops. This
weight will essentially be a biased coin that indicates an
additional hop (beyond 2) with probability P. The user should be
allowed to choose 0 for this weight to always get 2 hops and 1 to
always get 3.
This value should be modifiable from the controller, and should be
available from Vidalia.
Motivation:
The Tor network is slow and overloaded. Increasingly often I hear
stories about friends and friends of friends who are behind firewalls,
annoying censorware, or under surveillance that interferes with their
productivity and Internet usage, or chills their speech. These people
know about Tor, but they choose to put up with the censorship because
Tor is too slow to be usable for them. In fact, to download a fresh,
complete copy of levine-timing.pdf for the Anonymity Implications
section of this proposal over Tor took me 3 tries.
There are many ways to improve the speed problem, and of course we
should and will implement as many as we can. Johannes's GSoC project
and my reputation system are longer term, higher-effort things that
will still provide benefit independent of this proposal.
However, reducing the path length to 2 for those who do not need the
(questionable) extra anonymity 3 hops provide not only improves
their Tor experience but also reduces their load on the Tor network by
33%, and can be done in less than 10 lines of code. That's not just
Win-Win, it's Win-Win-Win.
Furthermore, when blocking resistance measures insert an extra relay
hop into the equation, 4 hops will certainly be completely unusable
for these users, especially since it will be considerably more
difficult to balance the load across a dark relay net than balancing
the load on Tor itself (which today is still not without its flaws).
Anonymity Implications:
It has long been established that timing attacks against mixed
networks are extremely effective, and that regardless of path
length, if the adversary has compromised your first and last
hop of your path, you can assume they have compromised your
identity for that connection.
In [1], it is demonstrated that for all but the slowest, lossiest
networks, error rates for false positives and false negatives were
very near zero. Only for constant streams of traffic over slow and
(more importantly) extremely lossy network links did the error rate
hit 20%. For loss rates typical to the Internet, even the error rate
for slow nodes with constant traffic streams was 13%.
When you take into account that most Tor streams are not constant,
but probably much more like their "HomeIP" dataset, which consists
mostly of web traffic that exists over finite intervals at specific
times, error rates drop to fractions of 1%, even for the "worst"
network nodes.
Therefore, the user has little benefit from the extra hop, assuming
the adversary does timing correlation on their nodes. The real
protection is the probability of getting both the first and last hop,
and this is constant whether the client chooses 2 hops, 3 hops, or 42.
Partitioning attacks form another concern. Since Tor uses telescoping
to build circuits, it is possible to tell a user is constructing only
two hop paths at the entry node. It is questionable if this data is
actually worth anything though, especially if the majority of users
have easy access to this option, and do actually choose their path
lengths semi-randomly.
Nick has postulated that exits may also be able to tell that you are
using only 2 hops by the amount of time between sending their
RELAY_CONNECTED cell and the first bit of RELAY_DATA traffic they
see from the OP. I doubt that they will be able to make much use
of this timing pattern, since it will likely vary widely depending
upon the type of node selected for that first hop, and the user's
connection rate to that first hop. It is also questionable if this
data is worth anything, especially if many users are using this
option (and I imagine many will).
Perhaps most seriously, two hop paths do allow malicious guards
to easily fail circuits if they do not extend to their colluding peers
for the exit hop. Since guards can detect the number of hops in a
path, they could always fail the 3 hop circuits and focus on
selectively failing the two hop ones until a peer was chosen.
I believe currently guards are rotated if circuits fail, which does
provide some protection, but this could be changed so that an entry
guard is completely abandoned after a certain ratio of extend or
general circuit failures with respect to non-failed circuits. This
could possibly be gamed to increase guard turnover, but such a game
would be much more noticeable than an individual guard failing circuits,
though, since it would affect all clients, not just those who chose
a particular guard.
Why not fix Pathlen=2?:
The main reason I am not advocating that we always use 2 hops is that
in some situations, timing correlation evidence by itself may not be
considered as solid and convincing as an actual, uninterrupted, fully
traced path. Are these timing attacks as effective on a real network
as they are in simulation? Would an extralegal adversary or authoritarian
government even care? In the face of these situation-dependent unknowns,
it should be up to the user to decide if this is a concern for them or not.
It should probably also be noted that even a false positive
rate of 1% for a 200k concurrent-user network could mean that for a
given node, a given stream could be confused with something like 10
users, assuming ~200 nodes carry most of the traffic (ie 1000 users
each). Though of course to really know for sure, someone needs to do
an attack on a real network, unfortunately.
Implementation:
new_route_len() can be modified directly with a check of the
PathlenCoinWeight option (converted to percent) and a call to
crypto_rand_int(0,100) for the weighted coin.
The entry_guard_t structure could have num_circ_failed and
num_circ_succeeded members such that if it exceeds N% circuit
extend failure rate to a second hop, it is removed from the entry list.
N should be sufficiently high to avoid churn from normal Tor circuit
failure as determined by TorFlow scans.
The Vidalia option should be presented as a boolean, to minimize confusion
for the user. Something like a radiobutton with:
* "I use Tor for Censorship Resistance, not Anonymity. Speed is more
important to me than Anonymity."
* "I use Tor for Anonymity. I need extra protection at the cost of speed."
and then some explanation in the help for exactly what this means, and
the risks involved with eliminating the adversary's need for timing attacks
wrt to false positives, etc.
Migration:
Phase one: Experiment with the proper ratio of circuit failures
used to expire garbage or malicious guards via TorFlow.
Phase two: Re-enable config and modify new_route_len() to add an
extra hop if coin comes up "heads".
Phase three: Make radiobutton in Vidalia, along with help entry
that explains in layman's terms the risks involved.
[1] http://www.cs.umass.edu/~mwright/papers/levine-timing.pdf

View File

@ -1,87 +0,0 @@
Filename: 113-fast-authority-interface.txt
Title: Simplifying directory authority administration
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created:
Status: Superseded
Overview
The problem:
Administering a directory authority is a pain: you need to go through
emails and manually add new nodes as "named". When bad things come up,
you need to mark nodes (or whole regions) as invalid, badexit, etc.
This means that mostly, authority admins don't: only 2/4 current authority
admins actually bind names or list bad exits, and those two have often
complained about how annoying it is to do so.
Worse, name binding is a common path, but it's a pain in the neck: nobody
has done it for a couple of months.
Digression: who knows what?
It's trivial for Tor to automatically keep track of all of the
following information about a server:
name, fingerprint, IP, last-seen time, first-seen time, declared
contact.
All we need to have the administrator set is:
- Is this name/fingerprint pair bound?
- Is this fingerprint/IP a bad exit?
- Is this fingerprint/IP an invalid node?
- Is this fingerprint/IP to be rejected?
The workflow for authority admins has two parts:
- Periodically, go through tor-ops and add new names. This doesn't
need to be done urgently.
- Less often, mark badly behaved serves as badly behaved. This is more
urgent.
Possible solution #1: Web-interface for name binding.
Deprecate use of the tor-ops mailing list; instead, have operators go to a
webform and enter their server info. This would put the information in a
standardized format, thus allowing quick, nearly-automated approval and
reply.
Possible solution #2: Self-binding names.
Peter Palfrader has proposed that names be assigned automatically to nodes
that have been up and running and valid for a while.
Possible solution #3: Self-maintaining approved-routers file
Mixminion alpha has a neat feature where whenever a new server is seen,
a stub line gets added to a configuration file. For Tor, it could look
something like this:
## First seen with this key on 2007-04-21 13:13:14
## Stayed up for at least 12 hours on IP 192.168.10.10
#RouterName AAAABBBBCCCCDDDDEFEF
(Note that the implementation needs to parse commented lines to make sure
that it doesn't add duplicates, but that's not so hard.)
To add a router as named, administrators would only need to uncomment the
entry. This automatically maintained file could be kept separately from a
manually maintained one.
This could be combined with solution #2, such that Tor would do the hard
work of uncommenting entries for routers that should get Named, but
operators could override its decisions.
Possible solution #4: A separate mailing list for authority operators.
Right now, the tor-ops list is very high volume. There should be another
list that's only for dealing with problems that need prompt action, like
marking a router as !badexit.
Resolution:
Solution #2 is described in "Proposal 123: Naming authorities
automatically create bindings", and that approach is implemented.
There are remaining issues in the problem statement above that need
their own solutions.

View File

@ -1,441 +0,0 @@
Filename: 114-distributed-storage.txt
Title: Distributed Storage for Tor Hidden Service Descriptors
Version: $Revision$
Last-Modified: $Date$
Author: Karsten Loesing
Created: 13-May-2007
Status: Closed
Implemented-In: 0.2.0.x
Change history:
13-May-2007 Initial proposal
14-May-2007 Added changes suggested by Lasse Øverlier
30-May-2007 Changed descriptor format, key length discussion, typos
09-Jul-2007 Incorporated suggestions by Roger, added status of specification
and implementation for upcoming GSoC mid-term evaluation
11-Aug-2007 Updated implementation statuses, included non-consecutive
replication to descriptor format
20-Aug-2007 Renamed config option HSDir as HidServDirectoryV2
02-Dec-2007 Closed proposal
Overview:
The basic idea of this proposal is to distribute the tasks of storing and
serving hidden service descriptors from currently three authoritative
directory nodes among a large subset of all onion routers. The three
reasons to do this are better robustness (availability), better
scalability, and improved security properties. Further,
this proposal suggests changes to the hidden service descriptor format to
prevent new security threats coming from decentralization and to gain even
better security properties.
Status:
As of December 2007, the new hidden service descriptor format is implemented
and usable. However, servers and clients do not yet make use of descriptor
cookies, because there are open usability issues of this feature that might
be resolved in proposal 121. Further, hidden service directories do not
perform replication by themselves, because (unauthorized) replica fetch
requests would allow any attacker to fetch all hidden service descriptors in
the system. As neither issue is critical to the functioning of v2
descriptors and their distribution, this proposal is considered as Closed.
Motivation:
The current design of hidden services exhibits the following performance and
security problems:
First, the three hidden service authoritative directories constitute a
performance bottleneck in the system. The directory nodes are responsible for
storing and serving all hidden service descriptors. As of May 2007 there are
about 1000 descriptors at a time, but this number is assumed to increase in
the future. Further, there is no replication protocol for descriptors between
the three directory nodes, so that hidden services must ensure the
availability of their descriptors by manually publishing them on all
directory nodes. Whenever a fourth or fifth hidden service authoritative
directory is added, hidden services will need to maintain an equally
increasing number of replicas. These scalability issues have an impact on the
current usage of hidden services and put an even higher burden on the
development of new kinds of applications for hidden services that might
require storing even more descriptors.
Second, besides posing a limitation to scalability, storing all hidden
service descriptors on three directory nodes also constitutes a security
risk. The directory node operators could easily analyze the publish and fetch
requests to derive information on service activity and usage and read the
descriptor contents to determine which onion routers work as introduction
points for a given hidden service and need to be attacked or threatened to
shut it down. Furthermore, the contents of a hidden service descriptor offer
only minimal security properties to the hidden service. Whoever gets aware of
the service ID can easily find out whether the service is active at the
moment and which introduction points it has. This applies to (former)
clients, (former) introduction points, and of course to the directory nodes.
It requires only to request the descriptor for the given service ID, which
can be performed by anyone anonymously.
This proposal suggests two major changes to approach the described
performance and security problems:
The first change affects the storage location for hidden service descriptors.
Descriptors are distributed among a large subset of all onion routers instead
of three fixed directory nodes. Each storing node is responsible for a subset
of descriptors for a limited time only. It is not able to choose which
descriptors it stores at a certain time, because this is determined by its
onion ID which is hard to change frequently and in time (only routers which
are stable for a given time are accepted as storing nodes). In order to
resist single node failures and untrustworthy nodes, descriptors are
replicated among a certain number of storing nodes. A first replication
protocol makes sure that descriptors don't get lost when the node population
changes; therefore, a storing node periodically requests the descriptors from
its siblings. A second replication protocol distributes descriptors among
non-consecutive nodes of the ID ring to prevent a group of adversaries from
generating new onion keys until they have consecutive IDs to create a 'black
hole' in the ring and make random services unavailable. Connections to
storing nodes are established by extending existing circuits by one hop to
the storing node. This also ensures that contents are encrypted. The effect
of this first change is that the probability that a single node operator
learns about a certain hidden service is very small and that it is very hard
to track a service over time, even when it collaborates with other node
operators.
The second change concerns the content of hidden service descriptors.
Obviously, security problems cannot be solved only by decentralizing storage;
in fact, they could also get worse if done without caution. At first, a
descriptor ID needs to change periodically in order to be stored on changing
nodes over time. Next, the descriptor ID needs to be computable only for the
service's clients, but should be unpredictable for all other nodes. Further,
the storing node needs to be able to verify that the hidden service is the
true originator of the descriptor with the given ID even though it is not a
client. Finally, a storing node should learn as little information as
necessary by storing a descriptor, because it might not be as trustworthy as
a directory node; for example it does not need to know the list of
introduction points. Therefore, a second key is applied that is only known to
the hidden service provider and its clients and that is not included in the
descriptor. It is used to calculate descriptor IDs and to encrypt the
introduction points. This second key can either be given to all clients
together with the hidden service ID, or to a group or a single client as
an authentication token. In the future this second key could be the result of
some key agreement protocol between the hidden service and one or more
clients. A new text-based format is proposed for descriptors instead of an
extension of the existing binary format for reasons of future extensibility.
Design:
The proposed design is described by the required changes to the current
design. These requirements are grouped by content, rather than by affected
specification documents or code files, and numbered for reference below.
Hidden service clients, servers, and directories:
/1/ Create routing list
All participants can filter the consensus status document received from the
directory authorities to one routing list containing only those servers
that store and serve hidden service descriptors and which are running for
at least 24 hours. A participant only trusts its own routing list and never
learns about routing information from other parties.
/2/ Determine responsible hidden service directory
All participants can determine the hidden service directory that is
responsible for storing and serving a given ID, as well as the hidden
service directories that replicate its content. Every hidden service
directory is responsible for the descriptor IDs in the interval from
its predecessor, exclusive, to its own ID, inclusive. Further, a hidden
service directory holds replicas for its n predecessors, where n denotes
the number of consecutive replicas. (requires /1/)
[/3/ and /4/ were requirements to use BEGIN_DIR cells for directory
requests which have not been fulfilled in the course of the implementation
of this proposal, but elsewhere.]
Hidden service directory nodes:
/5/ Advertise hidden service directory functionality
Every onion router that has its directory port open can decide whether it
wants to store and serve hidden service descriptors by setting a new config
option "HidServDirectoryV2" 0|1 to 1. An onion router with this config
option being set includes the flag "hidden-service-dir" in its router
descriptors that it sends to directory authorities.
/6/ Accept v2 publish requests, parse and store v2 descriptors
Hidden service directory nodes accept publish requests for hidden service
descriptors and store them to their local memory. (It is not necessary to
make descriptors persistent, because after disconnecting, the onion router
would not be accepted as storing node anyway, because it has not been
running for at least 24 hours.) All requests and replies are formatted as
HTTP messages. Requests are directed to the router's directory port and are
contained within BEGIN_DIR cells. A hidden service directory node stores a
descriptor only when it thinks that it is responsible for storing that
descriptor based on its own routing table. Every hidden service directory
node is responsible for the descriptor IDs in the interval of its n-th
predecessor in the ID circle up to its own ID (n denotes the number of
consecutive replicas). (requires /1/)
/7/ Accept v2 fetch requests
Same as /6/, but with fetch requests for hidden service descriptors.
(requires /2/)
/8/ Replicate descriptors with neighbors
A hidden service directory node replicates descriptors from its two
predecessors by downloading them once an hour. Further, it checks its
routing table periodically for changes. Whenever it realizes that a
predecessor has left the network, it establishes a connection to the new
n-th predecessor and requests its stored descriptors in the interval of its
(n+1)-th predecessor and the requested n-th predecessor. Whenever it
realizes that a new onion router has joined with an ID higher than its
former n-th predecessor, it adds it to its predecessors and discards all
descriptors in the interval of its (n+1)-th and its n-th predecessor.
(requires /1/)
[Dec 02: This function has not been implemented, because arbitrary nodes
what have been able to download the entire set of v2 descriptors. An
authorized replication request would be necessary. For the moment, the
system runs without any directory-side replication. -KL]
Authoritative directory nodes:
/9/ Confirm a router's hidden service directory functionality
Directory nodes include a new flag "HSDir" for routers that decided to
provide storage for hidden service descriptors and that are running for at
least 24 hours. The last requirement prevents a node from frequently
changing its onion key to become responsible for an identifier it wants to
target.
Hidden service provider:
/10/ Configure v2 hidden service
Each hidden service provider that has set the config option
"PublishV2HidServDescriptors" 0|1 to 1 is configured to publish v2
descriptors and conform to the v2 connection establishment protocol. When
configuring a hidden service, a hidden service provider checks if it has
already created a random secret_cookie and a hostname2 file; if not, it
creates both of them. (requires /2/)
/11/ Establish introduction points with fresh key
If configured to publish only v2 descriptors and no v0/v1 descriptors any
more, a hidden service provider that is setting up the hidden service at
introduction points does not pass its own public key, but the public key
of a freshly generated key pair. It also includes these fresh public keys
in the hidden service descriptor together with the other introduction point
information. The reason is that the introduction point does not need to and
therefore should not know for which hidden service it works, so as to
prevent it from tracking the hidden service's activity. (If a hidden
service provider supports both, v0/v1 and v2 descriptors, v0/v1 clients
rely on the fact that all introduction points accept the same public key,
so that this new feature cannot be used.)
/12/ Encode v2 descriptors and send v2 publish requests
If configured to publish v2 descriptors, a hidden service provider
publishes a new descriptor whenever its content changes or a new
publication period starts for this descriptor. If the current publication
period would only last for less than 60 minutes (= 2 x 30 minutes to allow
the server to be 30 minutes behind and the client 30 minutes ahead), the
hidden service provider publishes both a current descriptor and one for
the next period. Publication is performed by sending the descriptor to all
hidden service directories that are responsible for keeping replicas for
the descriptor ID. This includes two non-consecutive replicas that are
stored at 3 consecutive nodes each. (requires /1/ and /2/)
Hidden service client:
/13/ Send v2 fetch requests
A hidden service client that has set the config option
"FetchV2HidServDescriptors" 0|1 to 1 handles SOCKS requests for v2 onion
addresses by requesting a v2 descriptor from a randomly chosen hidden
service directory that is responsible for keeping replica for the
descriptor ID. In total there are six replicas of which the first and the
last three are stored on consecutive nodes. The probability of picking one
of the three consecutive replicas is 1/6, 2/6, and 3/6 to incorporate the
fact that the availability will be the highest on the node with next higher
ID. A hidden service client relies on the hidden service provider to store
two sets of descriptors to compensate clock skew between service and
client. (requires /1/ and /2/)
/14/ Process v2 fetch reply and parse v2 descriptors
A hidden service client that has sent a request for a v2 descriptor can
parse it and store it to the local cache of rendezvous service descriptors.
/15/ Establish connection to v2 hidden service
A hidden service client can establish a connection to a hidden service
using a v2 descriptor. This includes using the secret cookie for decrypting
the introduction points contained in the descriptor. When contacting an
introduction point, the client does not use the public key of the hidden
service provider, but the freshly-generated public key that is included in
the hidden service descriptor. Whether or not a fresh key is used instead
of the key of the hidden service depends on the available protocol versions
that are included in the descriptor; by this, connection establishment is
to a certain extend decoupled from fetching the descriptor.
Hidden service descriptor:
(Requirements concerning the descriptor format are contained in /6/ and /7/.)
The new v2 hidden service descriptor format looks like this:
onion-address = h(public-key) + cookie
descriptor-id = h(h(public-key) + h(time-period + cookie + relica))
descriptor-content = {
descriptor-id,
version,
public-key,
h(time-period + cookie + replica),
timestamp,
protocol-versions,
{ introduction-points } encrypted with cookie
} signed with private-key
The "descriptor-id" needs to change periodically in order for the
descriptor to be stored on changing nodes over time. It may only be
computable by a hidden service provider and all of his clients to prevent
unauthorized nodes from tracking the service activity by periodically
checking whether there is a descriptor for this service. Finally, the
hidden service directory needs to be able to verify that the hidden service
provider is the true originator of the descriptor with the given ID.
Therefore, "descriptor-id" is derived from the "public-key" of the hidden
service provider, the current "time-period" which changes every 24 hours,
a secret "cookie" shared between hidden service provider and clients, and
a "replica" denoting the number of this non-consecutive replica. (The
"time-period" is constructed in a way that time periods do not change at
the same moment for all descriptors by deriving a value between 0:00 and
23:59 hours from h(public-key) and making the descriptors of this hidden
service provider expire at that time of the day.) The "descriptor-id" is
defined to be 160 bits long. [extending the "descriptor-id" length
suggested by LØ]
Only the hidden service provider and the clients are able to generate
future "descriptor-ID"s. Hence, the "onion-address" is extended from now
the hash value of "public-key" by the secret "cookie". The "public-key" is
determined to be 80 bits long, whereas the "cookie" is dimensioned to be
120 bits long. This makes a total of 200 bits or 40 base32 chars, which is
quite a lot to handle for a human, but necessary to provide sufficient
protection against an adversary from generating a key pair with same
"public-key" hash or guessing the "cookie".
A hidden service directory can verify that a descriptor was created by the
hidden service provider by checking if the "descriptor-id" corresponds to
the "public-key" and if the signature can be verified with the
"public-key".
The "introduction-points" that are included in the descriptor are encrypted
using the same "cookie" that is shared between hidden service provider and
clients. [correction to use another key than h(time-period + cookie) as
encryption key for introduction points made by LØ]
A new text-based format is proposed for descriptors instead of an extension
of the existing binary format for reasons of future extensibility.
Security implications:
The security implications of the proposed changes are grouped by the roles of
nodes that could perform attacks or on which attacks could be performed.
Attacks by authoritative directory nodes
Authoritative directory nodes are no longer the single places in the
network that know about a hidden service's activity and introduction
points. Thus, they cannot perform attacks using this information, e.g.
track a hidden service's activity or usage pattern or attack its
introduction points. Formerly, it would only require a single corrupted
authoritative directory operator to perform such an attack.
Attacks by hidden service directory nodes
A hidden service directory node could misuse a stored descriptor to track a
hidden service's activity and usage pattern by clients. Though there is no
countermeasure against this kind of attack, it is very expensive to track a
certain hidden service over time. An attacker would need to run a large
number of stable onion routers that work as hidden service directory nodes
to have a good probability to become responsible for its changing
descriptor IDs. For each period, the probability is:
1-(N-c choose r)/(N choose r) for N-c>=r and 1 otherwise, with N
as total
number of hidden service directories, c as compromised nodes, and r as
number of replicas
The hidden service directory nodes could try to make a certain hidden
service unavailable to its clients. Therefore, they could discard all
stored descriptors for that hidden service and reply to clients that there
is no descriptor for the given ID or return an old or false descriptor
content. The client would detect a false descriptor, because it could not
contain a correct signature. But an old content or an empty reply could
confuse the client. Therefore, the countermeasure is to replicate
descriptors among a small number of hidden service directories, e.g. 5.
The probability of a group of collaborating nodes to make a hidden service
completely unavailable is in each period:
(c choose r)/(N choose r) for c>=r and N>=r, and 0 otherwise,
with N as total
number of hidden service directories, c as compromised nodes, and r as
number of replicas
A hidden service directory could try to find out which introduction points
are working on behalf of a hidden service. In contrast to the previous
design, this is not possible anymore, because this information is encrypted
to the clients of a hidden service.
Attacks on hidden service directory nodes
An anonymous attacker could try to swamp a hidden service directory with
false descriptors for a given descriptor ID. This is prevented by requiring
that descriptors are signed.
Anonymous attackers could swamp a hidden service directory with correct
descriptors for non-existing hidden services. There is no countermeasure
against this attack. However, the creation of valid descriptors is more
expensive than verification and storage in local memory. This should make
this kind of attack unattractive.
Attacks by introduction points
Current or former introduction points could try to gain information on the
hidden service they serve. But due to the fresh key pair that is used by
the hidden service, this attack is not possible anymore.
Attacks by clients
Current or former clients could track a hidden service's activity, attack
its introduction points, or determine the responsible hidden service
directory nodes and attack them. There is nothing that could prevent them
from doing so, because honest clients need the full descriptor content to
establish a connection to the hidden service. At the moment, the only
countermeasure against dishonest clients is to change the secret cookie and
pass it only to the honest clients.
Compatibility:
The proposed design is meant to replace the current design for hidden service
descriptors and their storage in the long run.
There should be a first transition phase in which both, the current design
and the proposed design are served in parallel. Onion routers should start
serving as hidden service directories, and hidden service providers and
clients should make use of the new design if both sides support it. Hidden
service providers should be allowed to publish descriptors of the current
format in parallel, and authoritative directories should continue storing and
serving these descriptors.
After the first transition phase, hidden service providers should stop
publishing descriptors on authoritative directories, and hidden service
clients should not try to fetch descriptors from the authoritative
directories. However, the authoritative directories should continue serving
hidden service descriptors for a second transition phase. As of this point,
all v2 config options should be set to a default value of 1.
After the second transition phase, the authoritative directories should stop
serving hidden service descriptors.

View File

@ -1,387 +0,0 @@
Filename: 115-two-hop-paths.txt
Title: Two Hop Paths
Version: $Revision$
Last-Modified: $Date$
Author: Mike Perry
Created:
Status: Dead
Supersedes: 112
Overview:
The idea is that users should be able to choose if they would like
to have either two or three hop paths through the tor network.
Let us be clear: the users who would choose this option should be
those that are concerned with IP obfuscation only: ie they would not be
targets of a resource-intensive multi-node attack. It is sometimes said
that these users should find some other network to use other than Tor.
This is a foolish suggestion: more users improves security of everyone,
and the current small userbase size is a critical hindrance to
anonymity, as is discussed below and in [1].
This value should be modifiable from the controller, and should be
available from Vidalia.
Motivation:
The Tor network is slow and overloaded. Increasingly often I hear
stories about friends and friends of friends who are behind firewalls,
annoying censorware, or under surveillance that interferes with their
productivity and Internet usage, or chills their speech. These people
know about Tor, but they choose to put up with the censorship because
Tor is too slow to be usable for them. In fact, to download a fresh,
complete copy of levine-timing.pdf for the Theoretical Argument
section of this proposal over Tor took me 3 tries.
Furthermore, the biggest current problem with Tor's anonymity for
those who really need it is not someone attacking the network to
discover who they are. It's instead the extreme danger that so few
people use Tor because it's so slow, that those who do use it have
essentially no confusion set.
The recent case where the professor and the rogue Tor user were the
only Tor users on campus, and thus suspected in an incident involving
Tor and that University underscores this point: "That was why the police
had come to see me. They told me that only two people on our campus were
using Tor: me and someone they suspected of engaging in an online scam.
The detectives wanted to know whether the other user was a former
student of mine, and why I was using Tor"[1].
Not only does Tor provide no anonymity if you use it to be anonymous
but are obviously from a certain institution, location or circumstance,
it is also dangerous to use Tor for risk of being accused of having
something significant enough to hide to be willing to put up with
the horrible performance as opposed to using some weaker alternative.
There are many ways to improve the speed problem, and of course we
should and will implement as many as we can. Johannes's GSoC project
and my reputation system are longer term, higher-effort things that
will still provide benefit independent of this proposal.
However, reducing the path length to 2 for those who do not need the
extra anonymity 3 hops provide not only improves their Tor experience
but also reduces their load on the Tor network by 33%, and should
increase adoption of Tor by a good deal. That's not just Win-Win, it's
Win-Win-Win.
Who will enable this option?
This is the crux of the proposal. Admittedly, there is some anonymity
loss and some degree of decreased investment required on the part of
the adversary to attack 2 hop users versus 3 hop users, even if it is
minimal and limited mostly to up-front costs and false positives.
The key questions are:
1. Are these users in a class such that their risk is significantly
less than the amount of this anonymity loss?
2. Are these users able to identify themselves?
Many many users of Tor are not at risk for an adversary capturing c/n
nodes of the network just to see what they do. These users use Tor to
circumvent aggressive content filters, or simply to keep their IP out of
marketing and search engine databases. Most content filters have no
interest in running Tor nodes to catch violators, and marketers
certainly would never consider such a thing, both on a cost basis and a
legal one.
In a sense, this represents an alternate threat model against these
users who are not at risk for Tor's normal threat model.
It should be evident to these users that they fall into this class. All
that should be needed is a radio button
* "I use Tor for local content filter circumvention and/or IP obfuscation,
not anonymity. Speed is more important to me than high anonymity.
No one will make considerable efforts to determine my real IP."
* "I use Tor for anonymity and/or national-level, legally enforced
censorship. It is possible effort will be taken to identify
me, including but not limited to network surveillance. I need more
protection."
and then some explanation in the help for exactly what this means, and
the risks involved with eliminating the adversary's need for timing
attacks with respect to false positives. Ultimately, the decision is a
simple one that can be made without this information, however. The user
does not need Paul Syverson to instruct them on the deep magic of Onion
Routing to make this decision. They just need to know why they use Tor.
If they use it just to stay out of marketing databases and/or bypass a
local content filter, two hops is plenty. This is likely the vast
majority of Tor users, and many non-users we would like to bring on
board.
So, having established this class of users, let us now go on to
examine theoretical and practical risks we place them at, and determine
if these risks violate the users needs, or introduce additional risk
to node operators who may be subject to requests from law enforcement
to track users who need 3 hops, but use 2 because they enjoy the
thrill of russian roulette.
Theoretical Argument:
It has long been established that timing attacks against mixed
and onion networks are extremely effective, and that regardless
of path length, if the adversary has compromised your first and
last hop of your path, you can assume they have compromised your
identity for that connection.
In fact, it was demonstrated that for all but the slowest, lossiest
networks, error rates for false positives and false negatives were
very near zero[2]. Only for constant streams of traffic over slow and
(more importantly) extremely lossy network links did the error rate
hit 20%. For loss rates typical to the Internet, even the error rate
for slow nodes with constant traffic streams was 13%.
When you take into account that most Tor streams are not constant,
but probably much more like their "HomeIP" dataset, which consists
mostly of web traffic that exists over finite intervals at specific
times, error rates drop to fractions of 1%, even for the "worst"
network nodes.
Therefore, the user has little benefit from the extra hop, assuming
the adversary does timing correlation on their nodes. Since timing
correlation is simply an implementation issue and is most likely
a single up-front cost (and one that is like quite a bit cheaper
than the cost of the machines purchased to host the nodes to mount
an attack), the real protection is the low probability of getting
both the first and last hop of a client's stream.
Practical Issues:
Theoretical issues aside, there are several practical issues with the
implementation of Tor that need to be addressed to ensure that
identity information is not leaked by the implementation.
Exit policy issues:
If a client chooses an exit with a very restrictive exit policy
(such as an IP or IP range), the first hop then knows a good deal
about the destination. For this reason, clients should not select
exits that match their destination IP with anything other than "*".
Partitioning:
Partitioning attacks form another concern. Since Tor uses telescoping
to build circuits, it is possible to tell a user is constructing only
two hop paths at the entry node and on the local network. An external
adversary can potentially differentiate 2 and 3 hop users, and decide
that all IP addresses connecting to Tor and using 3 hops have something
to hide, and should be scrutinized more closely or outright apprehended.
One solution to this is to use the "leaky-circuit" method of attaching
streams: The user always creates 3-hop circuits, but if the option
is enabled, they always exit from their 2nd hop. The ideal solution
would be to create a RELAY_SHISHKABOB cell which contains onion
skins for every host along the path, but this requires protocol
changes at the nodes to support.
Guard nodes:
Since guard nodes can rotate due to client relocation, network
failure, node upgrades and other issues, if you amortize the risk a
mobile, dialup, or otherwise intermittently connected user is exposed to
over any reasonable duration of Tor usage (on the order of a year), it
is the same with or without guard nodes. Assuming an adversary has
c%/n% of network bandwidth, and guards rotate on average with period R,
statistically speaking, it's merely a question of if the user wishes
their risk to be concentrated with probability c/n over an expected
period of R*c, and probability 0 over an expected period of R*(n-c),
versus a continuous risk of (c/n)^2. So statistically speaking, guards
only create a time-tradeoff of risk over the long run for normal Tor
usage. Rotating guards do not reduce risk for normal client usage long
term.[3]
On other other hand, assuming a more stable method of guard selection
and preservation is devised, or a more stable client side network than
my own is typical (which rotates guards frequently due to network issues
and moving about), guard nodes provide a tradeoff in the form of c/n% of
the users being "sacrificial users" who are exposed to high risk O(c/n)
of identification, while the rest of the network is exposed to zero
risk.
The nature of Tor makes it likely an adversary will take a "shock and
awe" approach to suppressing Tor by rounding up a few users whose
browsing activity has been observed to be made into examples, in an
attempt to prove that Tor is not perfect.
Since this "shock and awe" attack can be applied with or without guard
nodes, stable guard nodes do offer a measure of accountability of sorts.
If a user was using a small set of guard nodes and knows them well, and
then is suddenly apprehended as a result of Tor usage, having a fixed
set of entry points to suspect is a lot better than suspecting the whole
network. Conversely, it can also give non-apprehended users comfort
that they are likely to remain safe indefinitely with their set of (now
presumably trusted) guards. This is probably the most beneficial
property of reliable guards: they deter the adversary from mounting
"shock and awe" attacks because the surviving users will not
intimidated, but instead made more confident. Of course, guards need to
be made much more stable and users need to be encouraged to know their
guards for this property to really take effect.
This beneficial property of client vigilance also carries over to an
active adversary, except in this case instead of relying on the user
to remember their guard nodes and somehow communicate them after
apprehension, the code can alert them to the presence of an active
adversary before they are apprehended. But only if they use guard nodes.
So lets consider the active adversary: Two hop paths allow malicious
guards to get considerably more benefit from failing circuits if they do
not extend to their colluding peers for the exit hop. Since guards can
detect the number of hops in a path via either timing or by statistical
analysis of the exit policy of the 2nd hop, they can perform this attack
predominantly against 2 hop users.
This can be addressed by completely abandoning an entry guard after a
certain ratio of extend or general circuit failures with respect to
non-failed circuits. The proper value for this ratio can be determined
experimentally with TorFlow. There is the possibility that the local
network can abuse this feature to cause certain guards to be dropped,
but they can do that anyways with the current Tor by just making guards
they don't like unreachable. With this mechanism, Tor will complain
loudly if any guard failure rate exceeds the expected in any failure
case, local or remote.
Eliminating guards entirely would actually not address this issue due
to the time-tradeoff nature of risk. In fact, it would just make it
worse. Without guard nodes, it becomes much more difficult for clients
to become alerted to Tor entry points that are failing circuits to make
sure that they only devote bandwidth to carry traffic for streams which
they observe both ends. Yet the rogue entry points are still able to
significantly increase their success rates by failing circuits.
For this reason, guard nodes should remain enabled for 2 hop users,
at least until an IP-independent, undetectable guard scanner can
be created. TorFlow can scan for failing guards, but after a while,
its unique behavior gives away the fact that its IP is a scanner and
it can be given selective service.
Consideration of risks for node operators:
There is a serious risk for two hop users in the form of guard
profiling. If an adversary running an exit node notices that a
particular site is always visited from a fixed previous hop, it is
likely that this is a two hop user using a certain guard, which could be
monitored to determine their identity. Thus, for the protection of both
2 hop users and node operators, 2 hop users should limit their guard
duration to a sufficient number of days to verify reliability of a node,
but not much more. This duration can be determined experimentally by
TorFlow.
Considering a Tor client builds on average 144 circuits/day (10
minutes per circuit), if the adversary owns c/n% of exits on the
network, they can expect to see 144*c/n circuits from this user, or
about 14 minutes of usage per day per percentage of network penetration.
Since it will take several occurrences of user-linkable exit content
from the same predecessor hop for the adversary to have any confidence
this is a 2 hop user, it is very unlikely that any sort of demands made
upon the predecessor node would guaranteed to be effective (ie it
actually was a guard), let alone be executed in time to apprehend the
user before they rotated guards.
The reverse risk also warrants consideration. If a malicious guard has
orders to surveil Mike Perry, it can determine Mike Perry is using two
hops by observing his tendency to choose a 2nd hop with a viable exit
policy. This can be done relatively quickly, unfortunately, and
indicates Mike Perry should spend some of his time building real 3 hop
circuits through the same guards, to require them to at least wait for
him to actually use Tor to determine his style of operation, rather than
collect this information from his passive building patterns.
However, to actively determine where Mike Perry is going, the guard
will need to require logging ahead of time at multiple exit nodes that
he may use over the course of the few days while he is at that guard,
and correlate the usage times of the exit node with Mike Perry's
activity at that guard for the few days he uses it. At this point, the
adversary is mounting a scale and method of attack (widespread logging,
timing attacks) that works pretty much just as effectively against 3
hops, so exit node operators are exposed to no additional danger than
they otherwise normally are.
Why not fix Pathlen=2?:
The main reason I am not advocating that we always use 2 hops is that
in some situations, timing correlation evidence by itself may not be
considered as solid and convincing as an actual, uninterrupted, fully
traced path. Are these timing attacks as effective on a real network as
they are in simulation? Maybe the circuit multiplexing of Tor can serve
to frustrate them to a degree? Would an extralegal adversary or
authoritarian government even care? In the face of these situation
dependent unknowns, it should be up to the user to decide if this is
a concern for them or not.
It should probably also be noted that even a false positive
rate of 1% for a 200k concurrent-user network could mean that for a
given node, a given stream could be confused with something like 10
users, assuming ~200 nodes carry most of the traffic (ie 1000 users
each). Though of course to really know for sure, someone needs to do
an attack on a real network, unfortunately.
Additionally, at some point cover traffic schemes may be implemented to
frustrate timing attacks on the first hop. It is possible some expert
users may do this ad-hoc already, and may wish to continue using 3 hops
for this reason.
Implementation:
new_route_len() can be modified directly with a check of the
Pathlen option. However, circuit construction logic should be
altered so that both 2 hop and 3 hop users build the same types of
circuits, and the option should ultimately govern circuit selection,
not construction. This improves coverage against guard nodes being
able to passively profile users who aren't even using Tor.
PathlenCoinWeight, anyone? :)
The exit policy hack is a bit more tricky. compare_addr_to_addr_policy
needs to return an alternate ADDR_POLICY_ACCEPTED_WILDCARD or
ADDR_POLICY_ACCEPTED_SPECIFIC return value for use in
circuit_is_acceptable.
The leaky exit is trickier still.. handle_control_attachstream
does allow paths to exit at a given hop. Presumably something similar
can be done in connection_ap_handshake_process_socks, and elsewhere?
Circuit construction would also have to be performed such that the
2nd hop's exit policy is what is considered, not the 3rd's.
The entry_guard_t structure could have num_circ_failed and
num_circ_succeeded members such that if it exceeds F% circuit
extend failure rate to a second hop, it is removed from the entry list.
F should be sufficiently high to avoid churn from normal Tor circuit
failure as determined by TorFlow scans.
The Vidalia option should be presented as a radio button.
Migration:
Phase 1: Adjust exit policy checks if Pathlen is set, implement leaky
circuit ability, and 2-3 hop circuit selection logic governed by
Pathlen.
Phase 2: Experiment to determine the proper ratio of circuit
failures used to expire garbage or malicious guards via TorFlow
(pending Bug #440 backport+adoption).
Phase 3: Implement guard expiration code to kick off failure-prone
guards and warn the user. Cap 2 hop guard duration to a proper number
of days determined sufficient to establish guard reliability (to be
determined by TorFlow).
Phase 4: Make radiobutton in Vidalia, along with help entry
that explains in layman's terms the risks involved.
Phase 5: Allow user to specify path length by HTTP URL suffix.
[1] http://p2pnet.net/story/11279
[2] http://www.cs.umass.edu/~mwright/papers/levine-timing.pdf
[3] Proof available upon request ;)

View File

@ -1,120 +0,0 @@
Filename: 116-two-hop-paths-from-guard.txt
Title: Two hop paths from entry guards
Version: $Revision$
Last-Modified: $Date$
Author: Michael Lieberman
Created: 26-Jun-2007
Status: Dead
This proposal is related to (but different from) Mike Perry's proposal 115
"Two Hop Paths."
Overview:
Volunteers who run entry guards should have the option of using only 2
additional tor nodes when constructing their own tor circuits.
While the option of two hop paths should perhaps be extended to every client
(as discussed in Mike Perry's thread), I believe the anonymity properties of
two hop paths are particularly well-suited to client computers that are also
serving as entry guards.
First I will describe the details of the strategy, as well as possible
avenues of attack. Then I will list advantages and disadvantages. Then, I
will discuss some possibly safer variations of the strategy, and finally
some implementation issues.
Details:
Suppose Alice is an entry guard, and wants to construct a two hop circuit.
Alice chooses a middle node at random (not using the entry guard strategy),
and gains anonymity by having her traffic look just like traffic from
someone else using her as an entry guard.
Can Alice's middle node figure out that she is initiator of the traffic? I
can think of four possible approaches for distinguishing traffic from Alice
with traffic through Alice:
1) Notice that communication from Alice comes too fast: Experimentation is
needed to determine if traffic from Alice can be distinguished from traffic
from a computer with a decent link to Alice.
2) Monitor Alice's network traffic to discover the lack of incoming packets
at the appropriate times. If an adversary has this ability, then Alice
already has problems in the current system, because the adversary can run a
standard timing attack on Alice's traffic.
3) Notice that traffic from Alice is unique in some way such that if Alice
was just one of 3 entry guards for this traffic, then the traffic should be
coming from two other entry guards as well. An example of "unique traffic"
could be always sending 117 packets every 3 minutes to an exit node that
exits to port 4661. However, if such patterns existed with sufficient
precision, then it seems to me that Tor already has a problem. (This "unique
traffic" may not be a problem if clients often end up choosing a single
entry guard because their other two are down. Does anyone know if this is
the case?)
4) First, control the middle node *and* some other part of the traffic,
using standard attacks on a two hop circuit without entry nodes (my recent
paper on Browser-Based Attacks would work well for this
http://petworkshop.org/2007/papers/PET2007_preproc_Browser_based.pdf). With
control of the circuit, we can now cause "unique traffic" as in 3).
Alternatively, if we know something about Alice independently, and we can
see what websites are being visited, we might be able to guess that she is
the kind of person that would visit those websites.
Anonymity Advantages:
-Alice never has the problem of choosing a malicious entry guard. In some
sense, Alice acts as her own entry guard.
Anonymity Disadvantages:
-If Alice's traffic is identified as originating from herself (see above for
how hard that might be), then she has the anonymity of a 2 hop circuit
without entry guards.
Additional advantages:
-A discussion of the latency advantages of two hop circuits is going on in
Mike Perry's thread already.
-Also, we can advertise this change as "Run an entry guard and decrease your
own Tor latency." This incentive has the potential to add nodes to the
network, improving the network as a whole.
Safer variations:
To solve the "unique traffic" problem, Alice could use two hop paths only
1/3 of the time, and choose 2 other entry guards for the other 2/3 of the
time. All the advantages are now 1/3 as useful (possibly more, if the other
2 entry guards are not always up).
To solve the problem that Alice's responses are too fast, Alice could delay
her responses (ideally based on some real data of response time when Alice
is used an entry guard). This loses most of the speed advantages of the two
hop path, but if Alice is a fast entry guard, it doesn't lose everything. It
also still has the (arguable) anonymity advantage that Alice doesn't have to
worry about having a malicious entry guard.
Implementation details:
For Alice to remain anonymous using this strategy, she has to actually be
acting as an entry guard for other nodes. This means the two hop option can
only be available to whatever high-performance threshold is currently set on
entry guards. Alice may need to somehow check her own current status as an
entry guard before choosing this two hop strategy.
Another thing to consider: suppose Alice is also an exit node. If the
fraction of exit nodes in existence is too small, she may rarely or never be
chosen as an entry guard. It would be sad if we offered an incentive to run
an entry guard that didn't extend to exit nodes. I suppose clients of Exit
nodes could pull the same trick, and bypass using Tor altogether (zero hop
paths), though that has additional issues.*
Mike Lieberman
MIT
*Why we shouldn't recommend Exit nodes pull the same trick:
1) Exit nodes would suffer heavily from the problem of "unique traffic"
mentioned above.
2) It would give governments an incentive to confiscate exit nodes to see if
they are pulling this trick.

View File

@ -1,412 +0,0 @@
Filename: 117-ipv6-exits.txt
Title: IPv6 exits
Version: $Revision$
Last-Modified: $Date$
Author: coderman
Created: 10-Jul-2007
Status: Accepted
Target: 0.2.1.x
Overview
Extend Tor for TCP exit via IPv6 transport and DNS resolution of IPv6
addresses. This proposal does not imply any IPv6 support for OR
traffic, only exit and name resolution.
Contents
0. Motivation
As the IPv4 address space becomes more scarce there is increasing
effort to provide Internet services via the IPv6 protocol. Many
hosts are available at IPv6 endpoints which are currently
inaccessible for Tor users.
Extending Tor to support IPv6 exit streams and IPv6 DNS name
resolution will allow users of the Tor network to access these hosts.
This capability would be present for those who do not currently have
IPv6 access, thus increasing the utility of Tor and furthering
adoption of IPv6.
1. Design
1.1. General design overview
There are three main components to this proposal. The first is a
method for routers to advertise their ability to exit IPv6 traffic.
The second is the manner in which routers resolve names to IPv6
addresses. Last but not least is the method in which clients
communicate with Tor to resolve and connect to IPv6 endpoints
anonymously.
1.2. Router IPv6 exit support
In order to specify exit policies and IPv6 capability new directives
in the Tor configuration will be needed. If a router advertises IPv6
exit policies in its descriptor this will signal the ability to
provide IPv6 exit. There are a number of additional default deny
rules associated with this new address space which are detailed in
the addendum.
When Tor is started on a host it should check for the presence of a
global unicast IPv6 address and if present include the default IPv6
exit policies and any user specified IPv6 exit policies.
If a user provides IPv6 exit policies but no global unicast IPv6
address is available Tor should generate a warning and not publish the
IPv6 policies in the router descriptor.
It should be noted that IPv4 mapped IPv6 addresses are not valid exit
destinations. This mechanism is mainly used to interoperate with
both IPv4 and IPv6 clients on the same socket. Any attempts to use
an IPv4 mapped IPv6 address, perhaps to circumvent exit policy for
IPv4, must be refused.
1.3. DNS name resolution of IPv6 addresses (AAAA records)
In addition to exit support for IPv6 TCP connections, a method to
resolve domain names to their respective IPv6 addresses is also
needed. This is accomplished in the existing DNS system via AAAA
records. Routers will perform both A and AAAA requests when
resolving a name so that the client can utilize an IPv6 endpoint when
available or preferred.
To avoid potential problems with caching DNS servers that behave
poorly all NXDOMAIN responses to AAAA requests should be ignored if a
successful response is received for an A request. This implies that
both AAAA and A requests will always be performed for each name
resolution.
For reverse lookups on IPv6 addresses, like that used for
RESOLVE_PTR, Tor will perform the necessary PTR requests via
IP6.ARPA.
All routers which perform DNS resolution on behalf of clients
(RELAY_RESOLVE) should perform and respond with both A and AAAA
resources.
[NOTE: In a future version, when we extend the behavior of RESOLVE to
encapsulate more of real DNS, it will make sense to allow more
flexibility here. -nickm]
1.4. Client interaction with IPv6 exit capability
1.4.1. Usability goals
There are a number of behaviors which Tor can provide when
interacting with clients that will improve the usability of IPv6 exit
capability. These behaviors are designed to make it simple for
clients to express a preference for IPv6 transport and utilize IPv6
host services.
1.4.2. SOCKSv5 IPv6 client behavior
The SOCKS version 5 protocol supports IPv6 connections. When using
SOCKSv5 with hostnames it is difficult to determine if a client
wishes to use an IPv4 or IPv6 address to connect to the desired host
if it resolves to both address types.
In order to make this more intuitive the SOCKSv5 protocol can be
supported on a local IPv6 endpoint, [::1] port 9050 for example.
When a client requests a connection to the desired host via an IPv6
SOCKS connection Tor will prefer IPv6 addresses when resolving the
host name and connecting to the host.
Likewise, RESOLVE and RESOLVE_PTR requests from an IPv6 SOCKS
connection will return IPv6 addresses when available, and fall back
to IPv4 addresses if not.
[NOTE: This means that SocksListenAddress and DNSListenAddress should
support IPv6 addresses. Perhaps there should also be a general option
to have listeners that default to 127.0.0.1 and 0.0.0.0 listen
additionally or instead on ::1 and :: -nickm]
1.4.3. MAPADDRESS behavior
The MAPADDRESS capability supports clients that may not be able to
use the SOCKSv4a or SOCKSv5 hostname support to resolve names via
Tor. This ability should be extended to IPv6 addresses in SOCKSv5 as
well.
When a client requests an address mapping from the wildcard IPv6
address, [::0], the server will respond with a unique local IPv6
address on success. It is important to note that there may be two
mappings for the same name if both an IPv4 and IPv6 address are
associated with the host. In this case a CONNECT to a mapped IPv6
address should prefer IPv6 for the connection to the host, if
available, while CONNECT to a mapped IPv4 address will prefer IPv4.
It should be noted that IPv6 does not provide the concept of a host
local subnet, like 127.0.0.0/8 in IPv4. For this reason integration
of Tor with IPv6 clients should consider a firewall or filter rule to
drop unique local addresses to or from the network when possible.
These packets should not be routed, however, keeping them off the
subnet entirely is worthwhile.
1.4.3.1. Generating unique local IPv6 addresses
The usual manner of generating a unique local IPv6 address is to
select a Global ID part randomly, along with a Subnet ID, and sharing
this prefix among the communicating parties who each have their own
distinct Interface ID. In this style a given Tor instance might
select a random Global and Subnet ID and provide MAPADDRESS
assignments with a random Interface ID as needed. This has the
potential to associate unique Global/Subnet identifiers with a given
Tor instance and may expose attacks against the anonymity of Tor
users.
Tor avoid this potential problem entirely MAPADDRESS must always
generate the Global, Subnet, and Interface IDs randomly for each
request. It is also highly suggested that explicitly specifying an
IPv6 source address instead of the wildcard address not be supported
to ensure that a good random address is used.
1.4.4. DNSProxy IPv6 client behavior
A new capability in recent Tor versions is the transparent DNS proxy.
This feature will need to return both A and AAAA resource records
when responding to client name resolution requests.
The transparent DNS proxy should also support reverse lookups for
IPv6 addresses. It is suggested that any such requests to the
deprecated IP6.INT domain should be translated to IP6.ARPA instead.
This translation is not likely to be used and is of low priority.
It would be nice to support DNS over IPv6 transport as well, however,
this is not likely to be used and is of low priority.
1.4.5. TransPort IPv6 client behavior
Tor also provides transparent TCP proxy support via the Trans*
directives in the configuration. The TransListenAddress directive
should accept an IPv6 address in addition to IPv4 so that IPv6 TCP
connections can be transparently proxied.
1.5. Additional changes
The RedirectExit option should be deprecated rather than extending
this feature to IPv6.
2. Spec changes
2.1. Tor specification
In '6.2. Opening streams and transferring data' the following should
be changed to indicate IPv6 exit capability:
"No version of Tor currently generates the IPv6 format."
In '6.4. Remote hostname lookup' the following should be updated to
reflect use of ip6.arpa in addition to in-addr.arpa.
"For a reverse lookup, the OP sends a RELAY_RESOLVE cell containing an
in-addr.arpa address."
In 'A.1. Differences between spec and implementation' the following
should be updated to indicate IPv6 exit capability:
"The current codebase has no IPv6 support at all."
[NOTE: the EXITPOLICY end-cell reason says that it can hold an ipv4 or an
ipv6 address, but doesn't say how. We may want a separate EXITPOLICY2
type that can hold an ipv6 address, since the way we encode ipv6
addresses elsewhere ("0.0.0.0 indicates that the next 16 bytes are ipv6")
is a bit dumb. -nickm]
[Actually, the length field lets us distinguish EXITPOLICY. -nickm]
2.2. Directory specification
In '2.1. Router descriptor format' a new set of directives is needed
for IPv6 exit policy. The existing accept/reject directives should
be clarified to indicate IPv4 or wildcard address relevance. The new
IPv6 directives will be in the form of:
"accept6" exitpattern NL
"reject6" exitpattern NL
The section describing accept6/reject6 should explain that the
presence of accept6 or reject6 exit policies in a router descriptor
signals the ability of that router to exit IPv6 traffic (according to
IPv6 exit policies).
The "[::]/0" notation is used to represent "all IPv6 addresses".
"[::0]/0" may also be used for this representation.
If a user specifies a 'reject6 [::]/0:*' policy in the Tor
configuration this will be interpreted as forcing no IPv6 exit
support and no accept6/reject6 policies will be included in the
published descriptor. This will prevent IPv6 exit if the router host
has a global unicast IPv6 address present.
It is important to note that a wildcard address in an accept or
reject policy applies to both IPv4 and IPv6 addresses.
2.3. Control specification
In '3.8. MAPADDRESS' the potential to have to addresses for a given
name should be explained. The method for generating unique local
addresses for IPv6 mappings needs explanation as described above.
When IPv6 addresses are used in this document they should include the
brackets for consistency. For example, the null IPv6 address should
be written as "[::0]" and not "::0". The control commands will
expect the same syntax as well.
In '3.9. GETINFO' the "address" command should return both public
IPv4 and IPv6 addresses if present. These addresses should be
separated via \r\n.
2.4. Tor SOCKS extensions
In '2. Name lookup' a description of IPv6 address resolution is
needed for SOCKSv5 as described above. IPv6 addresses should be
supported in both the RESOLVE and RESOLVE_PTR extensions.
A new section describing the ability to accept SOCKSv5 clients on a
local IPv6 address to indicate a preference for IPv6 transport as
described above is also needed. The behavior of Tor SOCKSv5 proxy
with an IPv6 preference should be explained, for example, preferring
IPv6 transport to a named host with both IPv4 and IPv6 addresses
available (A and AAAA records).
3. Questions and concerns
3.1. DNS A6 records
A6 is explicitly avoided in this document. There are potential
reasons for implementing this, however, the inherent complexity of
the protocol and resolvers make this unappealing. Is there a
compelling reason to consider A6 as part of IPv6 exit support?
[IMO not till anybody needs it. -nickm]
3.2. IPv4 and IPv6 preference
The design above tries to infer a preference for IPv4 or IPv6
transport based on client interactions with Tor. It might be useful
to provide more explicit control over this preference. For example,
an IPv4 SOCKSv5 client may want to use IPv6 transport to named hosts
in CONNECT requests while the current implementation would assume an
IPv4 preference. Should more explicit control be available, through
either configuration directives or control commands?
Many applications support a inet6-only or prefer-family type option
that provides the user manual control over address preference. This
could be provided as a Tor configuration option.
An explicit preference is still possible by resolving names and then
CONNECTing to an IPv4 or IPv6 address as desired, however, not all
client applications may have this option available.
3.3. Support for IPv6 only transparent proxy clients
It may be useful to support IPv6 only transparent proxy clients using
IPv4 mapped IPv6 like addresses. This would require transparent DNS
proxy using IPv6 transport and the ability to map A record responses
into IPv4 mapped IPv6 like addresses in the manner described in the
"NAT-PT" RFC for a traditional Basic-NAT-PT with DNS-ALG. The
transparent TCP proxy would thus need to detect these mapped addresses
and connect to the desired IPv4 host.
The IPv6 prefix used for this purpose must not be the actual IPv4
mapped IPv6 address prefix, though the manner in which IPv4 addresses
are embedded in IPv6 addresses would be the same.
The lack of any IPv6 only hosts which would use this transparent proxy
method makes this a lot of work for very little gain. Is there a
compelling reason to support this NAT-PT like capability?
3.4. IPv6 DNS and older Tor routers
It is expected that many routers will continue to run with older
versions of Tor when the IPv6 exit capability is released. Clients
who wish to use IPv6 will need to route RELAY_RESOLVE requests to the
newer routers which will respond with both A and AAAA resource
records when possible.
One way to do this is to route RELAY_RESOLVE requests to routers with
IPv6 exit policies published, however, this would not utilize current
routers that can resolve IPv6 addresses even if they can't exit such
traffic.
There was also concern expressed about the ability of existing clients
to cope with new RELAY_RESOLVE responses that contain IPv6 addresses.
If this breaks backward compatibility, a new request type may be
necessary, like RELAY_RESOLVE6, or some other mechanism of indicating
the ability to parse IPv6 responses when making the request.
3.5. IPv4 and IPv6 bindings in MAPADDRESS
It may be troublesome to try and support two distinct address mappings
for the same name in the existing MAPADDRESS implementation. If this
cannot be accommodated then the behavior should replace existing
mappings with the new address regardless of family. A warning when
this occurs would be useful to assist clients who encounter problems
when both an IPv4 and IPv6 application are using MAPADDRESS for the
same names concurrently, causing lost connections for one of them.
4. Addendum
4.1. Sample IPv6 default exit policy
reject 0.0.0.0/8
reject 169.254.0.0/16
reject 127.0.0.0/8
reject 192.168.0.0/16
reject 10.0.0.0/8
reject 172.16.0.0/12
reject6 [0000::]/8
reject6 [0100::]/8
reject6 [0200::]/7
reject6 [0400::]/6
reject6 [0800::]/5
reject6 [1000::]/4
reject6 [4000::]/3
reject6 [6000::]/3
reject6 [8000::]/3
reject6 [A000::]/3
reject6 [C000::]/3
reject6 [E000::]/4
reject6 [F000::]/5
reject6 [F800::]/6
reject6 [FC00::]/7
reject6 [FE00::]/9
reject6 [FE80::]/10
reject6 [FEC0::]/10
reject6 [FF00::]/8
reject *:25
reject *:119
reject *:135-139
reject *:445
reject *:1214
reject *:4661-4666
reject *:6346-6429
reject *:6699
reject *:6881-6999
accept *:*
# accept6 [2000::]/3:* is implied
4.2. Additional resources
'DNS Extensions to Support IP Version 6'
http://www.ietf.org/rfc/rfc3596.txt
'DNS Extensions to Support IPv6 Address Aggregation and Renumbering'
http://www.ietf.org/rfc/rfc2874.txt
'SOCKS Protocol Version 5'
http://www.ietf.org/rfc/rfc1928.txt
'Unique Local IPv6 Unicast Addresses'
http://www.ietf.org/rfc/rfc4193.txt
'INTERNET PROTOCOL VERSION 6 ADDRESS SPACE'
http://www.iana.org/assignments/ipv6-address-space
'Network Address Translation - Protocol Translation (NAT-PT)'
http://www.ietf.org/rfc/rfc2766.txt

View File

@ -1,86 +0,0 @@
Filename: 118-multiple-orports.txt
Title: Advertising multiple ORPorts at once
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: 09-Jul-2007
Status: Accepted
Target: 0.2.1.x
Overview:
This document is a proposal for servers to advertise multiple
address/port combinations for their ORPort.
Motivation:
Sometimes servers want to support multiple ports for incoming
connections, either in order to support multiple address families, to
better use multiple interfaces, or to support a variety of
FascistFirewallPorts settings. This is easy to set up now, but
there's no way to advertise it to clients.
New descriptor syntax:
We add a new line in the router descriptor, "or-address". This line
can occur zero, one, or multiple times. Its format is:
or-address SP ADDRESS ":" PORTLIST NL
ADDRESS = IP6ADDR / IP4ADDR
IPV6ADDR = an ipv6 address, surrounded by square brackets.
IPV4ADDR = an ipv4 address, represented as a dotted quad.
PORTLIST = PORTSPEC | PORTSPEC "," PORTLIST
PORTSPEC = PORT | PORT "-" PORT
[This is the regular format for specifying sets of addresses and
ports in Tor.]
New OR behavior:
We add two more options to supplement ORListenAddress:
ORPublishedListenAddress, and ORPublishAddressSet. The former
listens on an address-port combination and publishes it in addition
to the regular address. The latter advertises a set of address-port
combinations, but does not listen on them. [To use this option, the
server operator should set up port forwarding to the regular ORPort,
as for example with firewall rules.]
Servers should extend their testing to include advertised addresses
and ports. No address or port should be advertised until it's been
tested. [This might get expensive in practice.]
New authority behavior:
Authorities should spot-test descriptors, and reject any where a
substantial part of the addresses can't be reached.
New client behavior:
When connecting to another server, clients SHOULD pick an
address-port ocmbination at random as supported by their
reachableaddresses. If a client has a connection to a server at one
address, it SHOULD use that address for any simultaneous connections
to that server. Clients SHOULD use the canonical address for any
server when generating extend cells.
Not addressed here:
* There's no reason to listen on multiple dirports; current Tors
mostly don't connect directly to the dirport anyway.
* It could be advantageous to list something about extra addresses in
the network-status document. This would, however, eat space there.
More analysis is needed, particularly in light of proposal 141
("Download server descriptors on demand")
Dependencies:
Testing for canonical connections needs to be implemented before it's
safe to use this proposal.
Notes 3 July:
- Write up the simple version of this. No ranges needed yet. No
networkstatus chagnes yet.

View File

@ -1,142 +0,0 @@
Filename: 119-controlport-auth.txt
Title: New PROTOCOLINFO command for controllers
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 14-Aug-2007
Status: Closed
Implemented-In: 0.2.0.x
Overview:
Here we describe how to help controllers locate the cookie
authentication file when authenticating to Tor, so we can a) require
authentication by default for Tor controllers and b) still keep
things usable. Also, we propose an extensible, general-purpose mechanism
for controllers to learn about a Tor instance's protocol and
authentication requirements before authenticating.
The Problem:
When we first added the controller protocol, we wanted to make it
easy for people to play with it, so by default we didn't require any
authentication from controller programs. We allowed requests only from
localhost as a stopgap measure for security.
Due to an increasing number of vulnerabilities based on this approach,
it's time to add authentication in default configurations.
We have a number of goals:
- We want the default Vidalia bundles to transparently work. That
means we don't want the users to have to type in or know a password.
- We want to allow multiple controller applications to connect to the
control port. So if Vidalia is launching Tor, it can't just keep the
secrets to itself.
Right now there are three authentication approaches supported
by the control protocol: NULL, CookieAuthentication, and
HashedControlPassword. See Sec 5.1 in control-spec.txt for details.
There are a couple of challenges here. The first is: if the controller
launches Tor, how should we teach Tor what authentication approach
it should require, and the secret that goes along with it? Next is:
how should this work when the controller attaches to an existing Tor,
rather than launching Tor itself?
Cookie authentication seems most amenable to letting multiple controller
applications interact with Tor. But that brings in yet another question:
how does the controller guess where to look for the cookie file,
without first knowing what DataDirectory Tor is using?
Design:
We should add a new controller command PROTOCOLINFO that can be sent
as a valid first command (the others being AUTHENTICATE and QUIT). If
PROTOCOLINFO is sent as the first command, the second command must be
either a successful AUTHENTICATE or a QUIT.
If the initial command sequence is not valid, Tor closes the connection.
Spec:
C: "PROTOCOLINFO" *(SP PIVERSION) CRLF
S: "250+PROTOCOLINFO" SP PIVERSION CRLF *InfoLine "250 OK" CRLF
InfoLine = AuthLine / VersionLine / OtherLine
AuthLine = "250-AUTH" SP "METHODS=" AuthMethod *(",")AuthMethod
*(SP "COOKIEFILE=" AuthCookieFile) CRLF
VersionLine = "250-VERSION" SP "Tor=" TorVersion [SP Arguments] CRLF
AuthMethod =
"NULL" / ; No authentication is required
"HASHEDPASSWORD" / ; A controller must supply the original password
"COOKIE" / ; A controller must supply the contents of a cookie
AuthCookieFile = QuotedString
TorVersion = QuotedString
OtherLine = "250-" Keyword [SP Arguments] CRLF
For example:
C: PROTOCOLINFO CRLF
S: "250+PROTOCOLINFO 1" CRLF
S: "250-AUTH Methods=HASHEDPASSWORD,COOKIE COOKIEFILE="/tor/cookie"" CRLF
S: "250-VERSION Tor=0.2.0.5-alpha" CRLF
S: "250 OK" CRLF
Tor MAY give its InfoLines in any order; controllers MUST ignore InfoLines
with keywords it does not recognize. Controllers MUST ignore extraneous
data on any InfoLine.
PIVERSION is there in case we drastically change the syntax one day. For
now it should always be "1", for the controller protocol. Controllers MAY
provide a list of the protocol versions they support; Tor MAY select a
version that the controller does not support.
Right now only two "topics" (AUTH and VERSION) are included, but more
may be included in the future. Controllers must accept lines with
unexpected topics.
AuthCookieFile = QuotedString
AuthMethod is used to specify one or more control authentication
methods that Tor currently accepts.
AuthCookieFile specifies the absolute path and filename of the
authentication cookie that Tor is expecting and is provided iff
the METHODS field contains the method "COOKIE". Controllers MUST handle
escape sequences inside this string.
The VERSION line contains the Tor version.
[What else might we want to include that could be useful? -RD]
Compatibility:
Tor 0.1.2.16 and 0.2.0.4-alpha hang up after the first failed
command. Earlier Tors don't know about this command but don't hang
up. That means controllers will need a mechanism for distinguishing
whether they're talking to a Tor that speaks PROTOCOLINFO or not.
I suggest that the controllers attempt a PROTOCOLINFO. Then:
- If it works, great. Authenticate as required.
- If they get hung up on, reconnect and do a NULL AUTHENTICATE.
- If it's unrecognized but they're not hung up on, do a NULL
AUTHENTICATE.
Unsolved problems:
If Torbutton wants to be a Tor controller one day... talking TCP is
bad enough, but reading from the filesystem is even harder. Is there
a way to let simple programs work with the controller port without
needing all the auth infrastructure?
Once we put this approach in place, the next vulnerability we see will
involve an attacker somehow getting read access to the victim's files
--- and then we're back where we started. This means we still need
to think about how to demand password-based authentication without
bothering the user about it.

View File

@ -1,85 +0,0 @@
Filename: 120-shutdown-descriptors.txt
Title: Shutdown descriptors when Tor servers stop
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 15-Aug-2007
Status: Dead
[Proposal dead as of 11 Jul 2008. The point of this proposal was to give
routers a good way to get out of the networkstatus early, but proposal
138 (already implemented) has achieved this.]
Overview:
Tor servers should publish a last descriptor whenever they shut down,
to let others know that they are no longer offering service.
The Problem:
The main reason for this is in reaction to Internet services that want
to treat connections from the Tor network differently. Right now,
if a user experiments with turning on the "relay" functionality, he
is punished by being locked out of some websites, some IRC networks,
etc --- and this lockout persists for several days even after he turns
the server off.
Design:
During the "slow shutdown" period if exiting, or shortly after the
user sets his ORPort back to 0 if not exiting, Tor should publish a
final descriptor with the following characteristics:
1) Exit policy is listed as "reject *:*"
2) It includes a new entry called "opt shutdown 1"
The first step is so current blacklists will no longer list this node
as exiting to whatever the service is.
The second step is so directory authorities can avoid wasting time
doing reachability testing. Authorities should automatically not list
as Running any router whose latest descriptor says it shut down.
[I originally had in mind a third step --- Advertised bandwidth capacity
is listed as "0" --- so current Tor clients will skip over this node
when building most circuits. But since clients won't fetch descriptors
from nodes not listed as Running, this step seems pointless. -RD]
Spec:
TBD but should be pretty straightforward.
Security issues:
Now external people can learn exactly when a node stopped offering
relay service. How bad is this? I can see a few minor attacks based
on this knowledge, but on the other hand as it is we don't really take
any steps to keep this information secret.
Overhead issues:
We are creating more descriptors that want to be remembered. However,
since the router won't be marked as Running, ordinary clients won't
fetch the shutdown descriptors. Caches will, though. I hope this is ok.
Implementation:
To make things easy, we should publish the shutdown descriptor only
on controlled shutdown (SIGINT as opposed to SIGTERM). That would
leave enough time for publishing that we probably wouldn't need any
extra synchronization code.
If that turns out to be too unintuitive for users, I could imagine doing
it on SIGTERMs too, and just delaying exit until we had successfully
published to at least one authority, at which point we'd hope that it
propagated from there.
Acknowledgements:
tup suggested this idea.
Comments:
2) Maybe add a rule "Don't do this for hibernation if we expect to wake
up before the next consensus is published"?
- NM 9 Oct 2007

View File

@ -1,778 +0,0 @@
Filename: 121-hidden-service-authentication.txt
Title: Hidden Service Authentication
Version: $Revision$
Last-Modified: $Date$
Author: Tobias Kamm, Thomas Lauterbach, Karsten Loesing, Ferdinand Rieger,
Christoph Weingarten
Created: 10-Sep-2007
Status: Finished
Implemented-In: 0.2.1.x
Change history:
26-Sep-2007 Initial proposal for or-dev
08-Dec-2007 Incorporated comments by Nick posted to or-dev on 10-Oct-2007
15-Dec-2007 Rewrote complete proposal for better readability, modified
authentication protocol, merged in personal notes
24-Dec-2007 Replaced misleading term "authentication" by "authorization"
and added some clarifications (comments by Sven Kaffille)
28-Apr-2008 Updated most parts of the concrete authorization protocol
04-Jul-2008 Add a simple algorithm to delay descriptor publication for
different clients of a hidden service
19-Jul-2008 Added INTRODUCE1V cell type (1.2), improved replay
protection for INTRODUCE2 cells (1.3), described limitations
for auth protocols (1.6), improved hidden service protocol
without client authorization (2.1), added second, more
scalable authorization protocol (2.2), rewrote existing
authorization protocol (2.3); changes based on discussion
with Nick
31-Jul-2008 Limit maximum descriptor size to 20 kilobytes to prevent
abuse.
01-Aug-2008 Use first part of Diffie-Hellman handshake for replay
protection instead of rendezvous cookie.
01-Aug-2008 Remove improved hidden service protocol without client
authorization (2.1). It might get implemented in proposal
142.
Overview:
This proposal deals with a general infrastructure for performing
authorization (not necessarily implying authentication) of requests to
hidden services at three points: (1) when downloading and decrypting
parts of the hidden service descriptor, (2) at the introduction point,
and (3) at Bob's Tor client before contacting the rendezvous point. A
service provider will be able to restrict access to his service at these
three points to authorized clients only. Further, the proposal contains
specific authorization protocols as instances that implement the
presented authorization infrastructure.
This proposal is based on v2 hidden service descriptors as described in
proposal 114 and introduced in version 0.2.0.10-alpha.
The proposal is structured as follows: The next section motivates the
integration of authorization mechanisms in the hidden service protocol.
Then we describe a general infrastructure for authorization in hidden
services, followed by specific authorization protocols for this
infrastructure. At the end we discuss a number of attacks and non-attacks
as well as compatibility issues.
Motivation:
The major part of hidden services does not require client authorization
now and won't do so in the future. To the contrary, many clients would
not want to be (pseudonymously) identifiable by the service (though this
is unavoidable to some extent), but rather use the service
anonymously. These services are not addressed by this proposal.
However, there may be certain services which are intended to be accessed
by a limited set of clients only. A possible application might be a
wiki or forum that should only be accessible for a closed user group.
Another, less intuitive example might be a real-time communication
service, where someone provides a presence and messaging service only to
his buddies. Finally, a possible application would be a personal home
server that should be remotely accessed by its owner.
Performing authorization for a hidden service within the Tor network, as
proposed here, offers a range of advantages compared to allowing all
client connections in the first instance and deferring authorization to
the transported protocol:
(1) Reduced traffic: Unauthorized requests would be rejected as early as
possible, thereby reducing the overall traffic in the network generated
by establishing circuits and sending cells.
(2) Better protection of service location: Unauthorized clients could not
force Bob to create circuits to their rendezvous points, thus preventing
the attack described by Øverlier and Syverson in their paper "Locating
Hidden Servers" even without the need for guards.
(3) Hiding activity: Apart from performing the actual authorization, a
service provider could also hide the mere presence of his service from
unauthorized clients when not providing hidden service descriptors to
them, rejecting unauthorized requests already at the introduction
point (ideally without leaking presence information at any of these
points), or not answering unauthorized introduction requests.
(4) Better protection of introduction points: When providing hidden
service descriptors to authorized clients only and encrypting the
introduction points as described in proposal 114, the introduction points
would be unknown to unauthorized clients and thereby protected from DoS
attacks.
(5) Protocol independence: Authorization could be performed for all
transported protocols, regardless of their own capabilities to do so.
(6) Ease of administration: A service provider running multiple hidden
services would be able to configure access at a single place uniformly
instead of doing so for all services separately.
(7) Optional QoS support: Bob could adapt his node selection algorithm
for building the circuit to Alice's rendezvous point depending on a
previously guaranteed QoS level, thus providing better latency or
bandwidth for selected clients.
A disadvantage of performing authorization within the Tor network is
that a hidden service cannot make use of authorization data in
the transported protocol. Tor hidden services were designed to be
independent of the transported protocol. Therefore it's only possible to
either grant or deny access to the whole service, but not to specific
resources of the service.
Authorization often implies authentication, i.e. proving one's identity.
However, when performing authorization within the Tor network, untrusted
points should not gain any useful information about the identities of
communicating parties, neither server nor client. A crucial challenge is
to remain anonymous towards directory servers and introduction points.
However, trying to hide identity from the hidden service is a futile
task, because a client would never know if he is the only authorized
client and therefore perfectly identifiable. Therefore, hiding client
identity from the hidden service is not an aim of this proposal.
The current implementation of hidden services does not provide any kind
of authorization. The hidden service descriptor version 2, introduced by
proposal 114, was designed to use a descriptor cookie for downloading and
decrypting parts of the descriptor content, but this feature is not yet
in use. Further, most relevant cell formats specified in rend-spec
contain fields for authorization data, but those fields are neither
implemented nor do they suffice entirely.
Details:
1. General infrastructure for authorization to hidden services
We spotted three possible authorization points in the hidden service
protocol:
(1) when downloading and decrypting parts of the hidden service
descriptor,
(2) at the introduction point, and
(3) at Bob's Tor client before contacting the rendezvous point.
The general idea of this proposal is to allow service providers to
restrict access to some or all of these points to authorized clients
only.
1.1. Client authorization at directory
Since the implementation of proposal 114 it is possible to combine a
hidden service descriptor with a so-called descriptor cookie. If done so,
the descriptor cookie becomes part of the descriptor ID, thus having an
effect on the storage location of the descriptor. Someone who has learned
about a service, but is not aware of the descriptor cookie, won't be able
to determine the descriptor ID and download the current hidden service
descriptor; he won't even know whether the service has uploaded a
descriptor recently. Descriptor IDs are calculated as follows (see
section 1.2 of rend-spec for the complete specification of v2 hidden
service descriptors):
descriptor-id =
H(service-id | H(time-period | descriptor-cookie | replica))
Currently, service-id is equivalent to permanent-id which is calculated
as in the following formula. But in principle it could be any public
key.
permanent-id = H(permanent-key)[:10]
The second purpose of the descriptor cookie is to encrypt the list of
introduction points, including optional authorization data. Hence, the
hidden service directories won't learn any introduction information from
storing a hidden service descriptor. This feature is implemented but
unused at the moment. So this proposal will harness the advantages
of proposal 114.
The descriptor cookie can be used for authorization by keeping it secret
from everyone but authorized clients. A service could then decide whether
to publish hidden service descriptors using that descriptor cookie later
on. An authorized client being aware of the descriptor cookie would be
able to download and decrypt the hidden service descriptor.
The number of concurrently used descriptor cookies for one hidden service
is not restricted. A service could use a single descriptor cookie for all
users, a distinct cookie per user, or something in between, like one
cookie per group of users. It is up to the specific protocol and how it
is applied by a service provider.
Two or more hidden service descriptors for different groups or users
should not be uploaded at the same time. A directory node could conclude
easily that the descriptors were issued by the same hidden service, thus
being able to link the two groups or users. Therefore, descriptors for
different users or clients that ought to be stored on the same directory
are delayed, so that only one descriptor is uploaded to a directory at a
time. The remaining descriptors are uploaded with a delay of up to
30 seconds.
Further, descriptors for different groups or users that are to be stored
on different directories are delayed for a random time of up to 30
seconds to hide relations from colluding directories. Certainly, this
does not prevent linking entirely, but it makes it somewhat harder.
There is a conflict between hiding links between clients and making a
service available in a timely manner.
Although this part of the proposal is meant to describe a general
infrastructure for authorization, changing the way of using the
descriptor cookie to look up hidden service descriptors, e.g. applying
some sort of asymmetric crypto system, would require in-depth changes
that would be incompatible to v2 hidden service descriptors. On the
contrary, using another key for en-/decrypting the introduction point
part of a hidden service descriptor, e.g. a different symmetric key or
asymmetric encryption, would be easy to implement and compatible to v2
hidden service descriptors as understood by hidden service directories
(clients and services would have to be upgraded anyway for using the new
features).
An adversary could try to abuse the fact that introduction points can be
encrypted by storing arbitrary, unrelated data in the hidden service
directory. This abuse can be limited by setting a hard descriptor size
limit, forcing the adversary to split data into multiple chunks. There
are some limitations that make splitting data across multiple descriptors
unattractive: 1) The adversary would not be able to choose descriptor IDs
freely and would therefore have to implement his own indexing
structure. 2) Validity of descriptors is limited to at most 24 hours
after which descriptors need to be republished.
The regular descriptor size in bytes is 745 + num_ipos * 837 + auth_data.
A large descriptor with 7 introduction points and 5 kilobytes of
authorization data would be 11724 bytes in size. The upper size limit of
descriptors should be set to 20 kilobytes, which limits the effect of
abuse while retaining enough flexibility in designing authorization
protocols.
1.2. Client authorization at introduction point
The next possible authorization point after downloading and decrypting
a hidden service descriptor is the introduction point. It may be important
for authorization, because it bears the last chance of hiding presence
of a hidden service from unauthorized clients. Further, performing
authorization at the introduction point might reduce traffic in the
network, because unauthorized requests would not be passed to the
hidden service. This applies to those clients who are aware of a
descriptor cookie and thereby of the hidden service descriptor, but do
not have authorization data to pass the introduction point or access the
service (such a situation might occur when authorization data for
authorization at the directory is not issued on a per-user basis, but
authorization data for authorization at the introduction point is).
It is important to note that the introduction point must be considered
untrustworthy, and therefore cannot replace authorization at the hidden
service itself. Nor should the introduction point learn any sensitive
identifiable information from either the service or the client.
In order to perform authorization at the introduction point, three
message formats need to be modified: (1) v2 hidden service descriptors,
(2) ESTABLISH_INTRO cells, and (3) INTRODUCE1 cells.
A v2 hidden service descriptor needs to contain authorization data that
is introduction-point-specific and sometimes also authorization data
that is introduction-point-independent. Therefore, v2 hidden service
descriptors as specified in section 1.2 of rend-spec already contain two
reserved fields "intro-authorization" and "service-authorization"
(originally, the names of these fields were "...-authentication")
containing an authorization type number and arbitrary authorization
data. We propose that authorization data consists of base64 encoded
objects of arbitrary length, surrounded by "-----BEGIN MESSAGE-----" and
"-----END MESSAGE-----". This will increase the size of hidden service
descriptors, but this is allowed since there is no strict upper limit.
The current ESTABLISH_INTRO cells as described in section 1.3 of
rend-spec do not contain either authorization data or version
information. Therefore, we propose a new version 1 of the ESTABLISH_INTRO
cells adding these two issues as follows:
V Format byte: set to 255 [1 octet]
V Version byte: set to 1 [1 octet]
KL Key length [2 octets]
PK Bob's public key [KL octets]
HS Hash of session info [20 octets]
AUTHT The auth type that is supported [1 octet]
AUTHL Length of auth data [2 octets]
AUTHD Auth data [variable]
SIG Signature of above information [variable]
From the format it is possible to determine the maximum allowed size for
authorization data: given the fact that cells are 512 octets long, of
which 498 octets are usable (see section 6.1 of tor-spec), and assuming
1024 bit = 128 octet long keys, there are 215 octets left for
authorization data. Hence, authorization protocols are bound to use no
more than these 215 octets, regardless of the number of clients that
shall be authenticated at the introduction point. Otherwise, one would
need to send multiple ESTABLISH_INTRO cells or split them up, which we do
not specify here.
In order to understand a v1 ESTABLISH_INTRO cell, the implementation of
a relay must have a certain Tor version. Hidden services need to be able
to distinguish relays being capable of understanding the new v1 cell
formats and perform authorization. We propose to use the version number
that is contained in networkstatus documents to find capable
introduction points.
The current INTRODUCE1 cell as described in section 1.8 of rend-spec is
not designed to carry authorization data and has no version number, too.
Unfortunately, unversioned INTRODUCE1 cells consist only of a fixed-size,
seemingly random PK_ID, followed by the encrypted INTRODUCE2 cell. This
makes it impossible to distinguish unversioned INTRODUCE1 cells from any
later format. In particular, it is not possible to introduce some kind of
format and version byte for newer versions of this cell. That's probably
where the comment "[XXX011 want to put intro-level auth info here, but no
version. crap. -RD]" that was part of rend-spec some time ago comes from.
We propose that new versioned INTRODUCE1 cells use the new cell type 41
RELAY_INTRODUCE1V (where V stands for versioned):
Cleartext
V Version byte: set to 1 [1 octet]
PK_ID Identifier for Bob's PK [20 octets]
AUTHT The auth type that is included [1 octet]
AUTHL Length of auth data [2 octets]
AUTHD Auth data [variable]
Encrypted to Bob's PK:
(RELAY_INTRODUCE2 cell)
The maximum length of contained authorization data depends on the length
of the contained INTRODUCE2 cell. A calculation follows below when
describing the INTRODUCE2 cell format we propose to use.
1.3. Client authorization at hidden service
The time when a hidden service receives an INTRODUCE2 cell constitutes
the last possible authorization point during the hidden service
protocol. Performing authorization here is easier than at the other two
authorization points, because there are no possibly untrusted entities
involved.
In general, a client that is successfully authorized at the introduction
point should be granted access at the hidden service, too. Otherwise, the
client would receive a positive INTRODUCE_ACK cell from the introduction
point and conclude that it may connect to the service, but the request
will be dropped without notice. This would appear as a failure to
clients. Therefore, the number of cases in which a client successfully
passes the introduction point but fails at the hidden service should be
zero. However, this does not lead to the conclusion that the
authorization data used at the introduction point and the hidden service
must be the same, but only that both authorization data should lead to
the same authorization result.
Authorization data is transmitted from client to server via an
INTRODUCE2 cell that is forwarded by the introduction point. There are
versions 0 to 2 specified in section 1.8 of rend-spec, but none of these
contain fields for carrying authorization data. We propose a slightly
modified version of v3 INTRODUCE2 cells that is specified in section
1.8.1 and which is not implemented as of December 2007. In contrast to
the specified v3 we avoid specifying (and implementing) IPv6 capabilities,
because Tor relays will be required to support IPv4 addresses for a long
time in the future, so that this seems unnecessary at the moment. The
proposed format of v3 INTRODUCE2 cells is as follows:
VER Version byte: set to 3. [1 octet]
AUTHT The auth type that is used [1 octet]
AUTHL Length of auth data [2 octets]
AUTHD Auth data [variable]
TS Timestamp (seconds since 1-1-1970) [4 octets]
IP Rendezvous point's address [4 octets]
PORT Rendezvous point's OR port [2 octets]
ID Rendezvous point identity ID [20 octets]
KLEN Length of onion key [2 octets]
KEY Rendezvous point onion key [KLEN octets]
RC Rendezvous cookie [20 octets]
g^x Diffie-Hellman data, part 1 [128 octets]
The maximum possible length of authorization data is related to the
enclosing INTRODUCE1V cell. A v3 INTRODUCE2 cell with
1024 bit = 128 octets long public key without any authorization data
occupies 306 octets (AUTHL is only used when AUTHT has a value != 0),
plus 58 octets for hybrid public key encryption (see
section 5.1 of tor-spec on hybrid encryption of CREATE cells). The
surrounding INTRODUCE1V cell requires 24 octets. This leaves only 110
of the 498 available octets free, which must be shared between
authorization data to the introduction point _and_ to the hidden
service.
When receiving a v3 INTRODUCE2 cell, Bob checks whether a client has
provided valid authorization data to him. He also requires that the
timestamp is no more than 30 minutes in the past or future and that the
first part of the Diffie-Hellman handshake has not been used in the past
60 minutes to prevent replay attacks by rogue introduction points. (The
reason for not using the rendezvous cookie to detect replays---even
though it is only sent once in the current design---is that it might be
desirable to re-use rendezvous cookies for multiple introduction requests
in the future.) If all checks pass, Bob builds a circuit to the provided
rendezvous point. Otherwise he drops the cell.
1.4. Summary of authorization data fields
In summary, the proposed descriptor format and cell formats provide the
following fields for carrying authorization data:
(1) The v2 hidden service descriptor contains:
- a descriptor cookie that is used for the lookup process, and
- an arbitrary encryption schema to ensure authorization to access
introduction information (currently symmetric encryption with the
descriptor cookie).
(2) For performing authorization at the introduction point we can use:
- the fields intro-authorization and service-authorization in
hidden service descriptors,
- a maximum of 215 octets in the ESTABLISH_INTRO cell, and
- one part of 110 octets in the INTRODUCE1V cell.
(3) For performing authorization at the hidden service we can use:
- the fields intro-authorization and service-authorization in
hidden service descriptors,
- the other part of 110 octets in the INTRODUCE2 cell.
It will also still be possible to access a hidden service without any
authorization or only use a part of the authorization infrastructure.
However, this requires to consider all parts of the infrastructure. For
example, authorization at the introduction point relying on confidential
intro-authorization data transported in the hidden service descriptor
cannot be performed without using an encryption schema for introduction
information.
1.5. Managing authorization data at servers and clients
In order to provide authorization data at the hidden service and the
authenticated clients, we propose to use files---either the Tor
configuration file or separate files. The exact format of these special
files depends on the authorization protocol used.
Currently, rend-spec contains the proposition to encode client-side
authorization data in the URL, like in x.y.z.onion. This was never used
and is also a bad idea, because in case of HTTP the requested URL may be
contained in the Host and Referer fields.
1.6. Limitations for authorization protocols
There are two limitations of the current hidden service protocol for
authorization protocols that shall be identified here.
1. The three cell types ESTABLISH_INTRO, INTRODUCE1V, and INTRODUCE2
restricts the amount of data that can be used for authorization.
This forces authorization protocols that require per-user
authorization data at the introduction point to restrict the number
of authorized clients artificially. A possible solution could be to
split contents among multiple cells and reassemble them at the
introduction points.
2. The current hidden service protocol does not specify cell types to
perform interactive authorization between client and introduction
point or hidden service. If there should be an authorization
protocol that requires interaction, new cell types would have to be
defined and integrated into the hidden service protocol.
2. Specific authorization protocol instances
In the following we present two specific authorization protocols that
make use of (parts of) the new authorization infrastructure:
1. The first protocol allows a service provider to restrict access
to clients with a previously received secret key only, but does not
attempt to hide service activity from others.
2. The second protocol, albeit being feasible for a limited set of about
16 clients, performs client authorization and hides service activity
from everyone but the authorized clients.
These two protocol instances extend the existing hidden service protocol
version 2. Hidden services that perform client authorization may run in
parallel to other services running versions 0, 2, or both.
2.1. Service with large-scale client authorization
The first client authorization protocol aims at performing access control
while consuming as few additional resources as possible. A service
provider should be able to permit access to a large number of clients
while denying access for everyone else. However, the price for
scalability is that the service won't be able to hide its activity from
unauthorized or formerly authorized clients.
The main idea of this protocol is to encrypt the introduction-point part
in hidden service descriptors to authorized clients using symmetric keys.
This ensures that nobody else but authorized clients can learn which
introduction points a service currently uses, nor can someone send a
valid INTRODUCE1 message without knowing the introduction key. Therefore,
a subsequent authorization at the introduction point is not required.
A service provider generates symmetric "descriptor cookies" for his
clients and distributes them outside of Tor. The suggested key size is
128 bits, so that descriptor cookies can be encoded in 22 base64 chars
(which can hold up to 22 * 5 = 132 bits, leaving 4 bits to encode the
authorization type (here: "0") and allow a client to distinguish this
authorization protocol from others like the one proposed below).
Typically, the contact information for a hidden service using this
authorization protocol looks like this:
v2cbb2l4lsnpio4q.onion Ll3X7Xgz9eHGKCCnlFH0uz
When generating a hidden service descriptor, the service encrypts the
introduction-point part with a single randomly generated symmetric
128-bit session key using AES-CTR as described for v2 hidden service
descriptors in rend-spec. Afterwards, the service encrypts the session
key to all descriptor cookies using AES. Authorized client should be able
to efficiently find the session key that is encrypted for him/her, so
that 4 octet long client ID are generated consisting of descriptor cookie
and initialization vector. Descriptors always contain a number of
encrypted session keys that is a multiple of 16 by adding fake entries.
Encrypted session keys are ordered by client IDs in order to conceal
addition or removal of authorized clients by the service provider.
ATYPE Authorization type: set to 1. [1 octet]
ALEN Number of clients := 1 + ((clients - 1) div 16) [1 octet]
for each symmetric descriptor cookie:
ID Client ID: H(descriptor cookie | IV)[:4] [4 octets]
SKEY Session key encrypted with descriptor cookie [16 octets]
(end of client-specific part)
RND Random data [(15 - ((clients - 1) mod 16)) * 20 octets]
IV AES initialization vector [16 octets]
IPOS Intro points, encrypted with session key [remaining octets]
An authorized client needs to configure Tor to use the descriptor cookie
when accessing the hidden service. Therefore, a user adds the contact
information that she received from the service provider to her torrc
file. Upon downloading a hidden service descriptor, Tor finds the
encrypted introduction-point part and attempts to decrypt it using the
configured descriptor cookie. (In the rare event of two or more client
IDs being equal a client tries to decrypt all of them.)
Upon sending the introduction, the client includes her descriptor cookie
as auth type "1" in the INTRODUCE2 cell that she sends to the service.
The hidden service checks whether the included descriptor cookie is
authorized to access the service and either responds to the introduction
request, or not.
2.2. Authorization for limited number of clients
A second, more sophisticated client authorization protocol goes the extra
mile of hiding service activity from unauthorized clients. With all else
being equal to the preceding authorization protocol, the second protocol
publishes hidden service descriptors for each user separately and gets
along with encrypting the introduction-point part of descriptors to a
single client. This allows the service to stop publishing descriptors for
removed clients. As long as a removed client cannot link descriptors
issued for other clients to the service, it cannot derive service
activity any more. The downside of this approach is limited scalability.
Even though the distributed storage of descriptors (cf. proposal 114)
tackles the problem of limited scalability to a certain extent, this
protocol should not be used for services with more than 16 clients. (In
fact, Tor should refuse to advertise services for more than this number
of clients.)
A hidden service generates an asymmetric "client key" and a symmetric
"descriptor cookie" for each client. The client key is used as
replacement for the service's permanent key, so that the service uses a
different identity for each of his clients. The descriptor cookie is used
to store descriptors at changing directory nodes that are unpredictable
for anyone but service and client, to encrypt the introduction-point
part, and to be included in INTRODUCE2 cells. Once the service has
created client key and descriptor cookie, he tells them to the client
outside of Tor. The contact information string looks similar to the one
used by the preceding authorization protocol (with the only difference
that it has "1" encoded as auth-type in the remaining 4 of 132 bits
instead of "0" as before).
When creating a hidden service descriptor for an authorized client, the
hidden service uses the client key and descriptor cookie to compute
secret ID part and descriptor ID:
secret-id-part = H(time-period | descriptor-cookie | replica)
descriptor-id = H(client-key[:10] | secret-id-part)
The hidden service also replaces permanent-key in the descriptor with
client-key and encrypts introduction-points with the descriptor cookie.
ATYPE Authorization type: set to 2. [1 octet]
IV AES initialization vector [16 octets]
IPOS Intro points, encr. with descriptor cookie [remaining octets]
When uploading descriptors, the hidden service needs to make sure that
descriptors for different clients are not uploaded at the same time (cf.
Section 1.1) which is also a limiting factor for the number of clients.
When a client is requested to establish a connection to a hidden service
it looks up whether it has any authorization data configured for that
service. If the user has configured authorization data for authorization
protocol "2", the descriptor ID is determined as described in the last
paragraph. Upon receiving a descriptor, the client decrypts the
introduction-point part using its descriptor cookie. Further, the client
includes its descriptor cookie as auth-type "2" in INTRODUCE2 cells that
it sends to the service.
2.3. Hidden service configuration
A hidden service that is meant to perform client authorization adds a
new option HiddenServiceAuthorizeClient to its hidden service
configuration. This option contains the authorization type which is
either "1" for the protocol described in 2.1 or "2" for the protocol in
2.2 and a comma-separated list of human-readable client names, so that
Tor can create authorization data for these clients:
HiddenServiceAuthorizeClient auth-type client-name,client-name,...
If this option is configured, HiddenServiceVersion is automatically
reconfigured to contain only version numbers of 2 or higher.
Tor stores all generated authorization data for the authorization
protocols described in Sections 2.1 and 2.2 in a new file using the
following file format:
"client-name" human-readable client identifier NL
"descriptor-cookie" 128-bit key ^= 22 base64 chars NL
If the authorization protocol of Section 2.2 is used, Tor also generates
and stores the following data:
"client-key" NL a public key in PEM format
2.4. Client configuration
Clients need to make their authorization data known to Tor using another
configuration option that contains a service name (mainly for the sake of
convenience), the service address, and the descriptor cookie that is
required to access a hidden service (the authorization protocol number is
encoded in the descriptor cookie):
HidServAuth service-name service-address descriptor-cookie
Security implications:
In the following we want to discuss possible attacks by dishonest
entities in the presented infrastructure and specific protocol. These
security implications would have to be verified once more when adding
another protocol. The dishonest entities (theoretically) include the
hidden service itself, the authenticated clients, hidden service directory
nodes, introduction points, and rendezvous points. The relays that are
part of circuits used during protocol execution, but never learn about
the exchanged descriptors or cells by design, are not considered.
Obviously, this list makes no claim to be complete. The discussed attacks
are sorted by the difficulty to perform them, in ascending order,
starting with roles that everyone could attempt to take and ending with
partially trusted entities abusing the trust put in them.
(1) A hidden service directory could attempt to conclude presence of a
service from the existence of a locally stored hidden service descriptor:
This passive attack is possible only for a single client-service
relation, because descriptors need to contain a publicly visible
signature of the service using the client key.
A possible protection would be to increase the number of hidden service
directories in the network.
(2) A hidden service directory could try to break the descriptor cookies
of locally stored descriptors: This attack can be performed offline. The
only useful countermeasure against it might be using safe passwords that
are generated by Tor.
[passwords? where did those come in? -RD]
(3) An introduction point could try to identify the pseudonym of the
hidden service on behalf of which it operates: This is impossible by
design, because the service uses a fresh public key for every
establishment of an introduction point (see proposal 114) and the
introduction point receives a fresh introduction cookie, so that there is
no identifiable information about the service that the introduction point
could learn. The introduction point cannot even tell if client accesses
belong to the same client or not, nor can it know the total number of
authorized clients. The only information might be the pattern of
anonymous client accesses, but that is hardly enough to reliably identify
a specific service.
(4) An introduction point could want to learn the identities of accessing
clients: This is also impossible by design, because all clients use the
same introduction cookie for authorization at the introduction point.
(5) An introduction point could try to replay a correct INTRODUCE1 cell
to other introduction points of the same service, e.g. in order to force
the service to create a huge number of useless circuits: This attack is
not possible by design, because INTRODUCE1 cells are encrypted using a
freshly created introduction key that is only known to authorized
clients.
(6) An introduction point could attempt to replay a correct INTRODUCE2
cell to the hidden service, e.g. for the same reason as in the last
attack: This attack is stopped by the fact that a service will drop
INTRODUCE2 cells containing a DH handshake they have seen recently.
(7) An introduction point could block client requests by sending either
positive or negative INTRODUCE_ACK cells back to the client, but without
forwarding INTRODUCE2 cells to the server: This attack is an annoyance
for clients, because they might wait for a timeout to elapse until trying
another introduction point. However, this attack is not introduced by
performing authorization and it cannot be targeted towards a specific
client. A countermeasure might be for the server to periodically perform
introduction requests to his own service to see if introduction points
are working correctly.
(8) The rendezvous point could attempt to identify either server or
client: This remains impossible as it was before, because the
rendezvous cookie does not contain any identifiable information.
(9) An authenticated client could swamp the server with valid INTRODUCE1
and INTRODUCE2 cells, e.g. in order to force the service to create
useless circuits to rendezvous points; as opposed to an introduction
point replaying the same INTRODUCE2 cell, a client could include a new
rendezvous cookie for every request: The countermeasure for this attack
is the restriction to 10 connection establishments per client per hour.
Compatibility:
An implementation of this proposal would require changes to hidden
services and clients to process authorization data and encode and
understand the new formats. However, both services and clients would
remain compatible to regular hidden services without authorization.
Implementation:
The implementation of this proposal can be divided into a number of
changes to hidden service and client side. There are no
changes necessary on directory, introduction, or rendezvous nodes. All
changes are marked with either [service] or [client] do denote on which
side they need to be made.
/1/ Configure client authorization [service]
- Parse configuration option HiddenServiceAuthorizeClient containing
authorized client names.
- Load previously created client keys and descriptor cookies.
- Generate missing client keys and descriptor cookies, add them to
client_keys file.
- Rewrite the hostname file.
- Keep client keys and descriptor cookies of authorized clients in
memory.
[- In case of reconfiguration, mark which client authorizations were
added and whether any were removed. This can be used later when
deciding whether to rebuild introduction points and publish new
hidden service descriptors. Not implemented yet.]
/2/ Publish hidden service descriptors [service]
- Create and upload hidden service descriptors for all authorized
clients.
[- See /1/ for the case of reconfiguration.]
/3/ Configure permission for hidden services [client]
- Parse configuration option HidServAuth containing service
authorization, store authorization data in memory.
/5/ Fetch hidden service descriptors [client]
- Look up client authorization upon receiving a hidden service request.
- Request hidden service descriptor ID including client key and
descriptor cookie. Only request v2 descriptors, no v0.
/6/ Process hidden service descriptor [client]
- Decrypt introduction points with descriptor cookie.
/7/ Create introduction request [client]
- Include descriptor cookie in INTRODUCE2 cell to introduction point.
- Pass descriptor cookie around between involved connections and
circuits.
/8/ Process introduction request [service]
- Read descriptor cookie from INTRODUCE2 cell.
- Check whether descriptor cookie is authorized for access, including
checking access counters.
- Log access for accountability.

View File

@ -1,138 +0,0 @@
Filename: 122-unnamed-flag.txt
Title: Network status entries need a new Unnamed flag
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 04-Oct-2007
Status: Closed
Implemented-In: 0.2.0.x
1. Overview:
Tor's directory authorities can give certain servers a "Named" flag
in the network-status entry, when they want to bind that nickname to
that identity key. This allows clients to specify a nickname rather
than an identity fingerprint and still be certain they're getting the
"right" server. As dir-spec.txt describes it,
Name X is bound to identity Y if at least one binding directory lists
it, and no directory binds X to some other Y'.
In practice, clients can refer to servers by nickname whether they are
Named or not; if they refer to nicknames that aren't Named, a complaint
shows up in the log asking them to use the identity key in the future
--- but it still works.
The problem? Imagine a Tor server with nickname Bob. Bob and his
identity fingerprint are registered in tor26's approved-routers
file, but none of the other authorities registered him. Imagine
there are several other unregistered servers also with nickname Bob
("the imposters").
While Bob is online, all is well: a) tor26 gives a Named flag to
the real one, and refuses to list the other ones; and b) the other
authorities list the imposters but don't give them a Named flag. Clients
who have all the network-statuses can compute which one is the real Bob.
But when the real Bob disappears and his descriptor expires? tor26
continues to refuse to list any of the imposters, and the other
authorities continue to list the imposters. Clients don't have any
idea that there exists a Named Bob, so they can ask for server Bob and
get one of the imposters. (A warning will also appear in their log,
but so what.)
2. The stopgap solution:
tor26 should start accepting and listing the imposters, but it should
assign them a new flag: "Unnamed".
This would produce three cases in terms of assigning flags in the consensus
networkstatus:
i) a router gets the Named flag in the v3 networkstatus if
a) it's the only router with that nickname that has the Named flag
out of all the votes, and
b) no vote lists it as Unnamed
else,
ii) a router gets the Unnamed flag if
a) some vote lists a different router with that nickname as Named, or
b) at least one vote lists it as Unnamed, or
c) there are other routers with the same nickname that are Unnamed
else,
iii) the router neither gets a Named nor an Unnamed flag.
(This whole proposal is meant only for v3 dir flags; we shouldn't try
to backport it to the v2 dir world.)
Then client behavior is:
a) If there's a Bob with a Named flag, pick that one.
else b) If the Bobs don't have the Unnamed flag (notice that they should
either all have it, or none), pick one of them and warn.
else c) They all have the Unnamed flag -- no router found.
3. Problems not solved by this stopgap:
3.1. Naming authorities can go offline.
If tor26 is the only authority that provides a binding for Bob, when
tor26 goes offline we're back in our previous situation -- the imposters
can be referenced with a mere ignorable warning in the client's log.
If some other authority Names a different Bob, and tor26 goes offline,
then that other Bob becomes the unique Named Bob.
So be it. We should try to solve these one day, but there's no clear way
to do it that doesn't destroy usability in other ways, and if we want
to get the Unnamed flag into v3 network statuses we should add it soon.
3.2. V3 dir spec magnifies brief discrepancies.
Another point to notice is if tor26 names Bob(1), doesn't know about
Bob(2), but moria lists Bob(2). Then Bob(2) doesn't get an Unnamed flag
even if it should (and Bob(1) is not around).
Right now, in v2 dirs, the case where an authority doesn't know about
a server but the other authorities do know is rare. That's because
authorities periodically ask for other networkstatuses and then fetch
descriptors that are missing.
With v3, if that window occurs at the wrong time, it is extended for the
entire period. We could solve this by making the voting more complex,
but that doesn't seem worth it.
[3.3. Tor26 is only one tor26.
We need more naming authorities, possibly with some kind of auto-naming
feature. This is out-of-scope for this proposal -NM]
4. Changes to the v2 directory
Previously, v2 authorities that had a binding for a server named Bob did
not list any other server named Bob. This will change too:
Version 2 authorities will start listing all routers they know about,
whether they conflict with a name-binding or not: Servers for which
this authority has a binding will continue to be marked Named,
additionally all other servers of that nickname will be listed without the
Named flag (i.e. there will be no Unnamed flag in v2 status documents).
Clients already should handle having a named Bob alongside unnamed
Bobs correctly, and having the unnamed Bobs in the status file even
without the named server is no worse than the current status quo where
clients learn about those servers from other authorities.
The benefit of this is that an authority's opinion on a server like
Guard, Stable, Fast etc. can now be learned by clients even if that
specific authority has reserved that server's name for somebody else.
5. Other benefits:
This new flag will allow people to operate servers that happen to have
the same nickname as somebody who registered their server two years ago
and left soon after. Right now there are dozens of nicknames that are
registered on all three binding directory authorities, yet haven't been
running for years. While it's bad that these nicknames are effectively
blacklisted from the network, the really bad part is that this logic
is really unintuitive to prospective new server operators.

View File

@ -1,56 +0,0 @@
Filename: 123-autonaming.txt
Title: Naming authorities automatically create bindings
Version: $Revision$
Last-Modified: $Date$
Author: Peter Palfrader
Created: 2007-10-11
Status: Closed
Implemented-In: 0.2.0.x
Overview:
Tor's directory authorities can give certain servers a "Named" flag
in the network-status entry, when they want to bind that nickname to
that identity key. This allows clients to specify a nickname rather
than an identity fingerprint and still be certain they're getting the
"right" server.
Authority operators name a server by adding their nickname and
identity fingerprint to the 'approved-routers' file. Historically
being listed in the file was required for a router, at first for being
listed in the directory at all, and later in order to be used by
clients as a first or last hop of a circuit.
Adding identities to the list of named routers so far has been a
manual, time consuming, and boring job. Given that and the fact that
the Tor network works just fine without named routers the last
authority to keep a current binding list stopped updating it well over
half a year ago.
Naming, if it were done, would serve a useful purpose however in that
users can have a reasonable expectation that the exit server Bob they
are using in their http://www.google.com.bob.exit/ URL is the same
Bob every time.
Proposal:
I propose that identity<->name binding be completely automated:
New bindings should be added after the router has been around for a
bit and their name has not been used by other routers, similarly names
that have not appeared on the network for a long time should be freed
in case a new router wants to use it.
The following rules are suggested:
i) If a named router has not been online for half a year, the
identity<->name binding for that name is removed. The nickname
is free to be taken by other routers now.
ii) If a router claims a certain nickname and
a) has been on the network for at least two weeks, and
b) that nickname is not yet linked to a different router, and
c) no other router has wanted that nickname in the last month,
a new binding should be created for this router and its desired
nickname.
This automaton does not necessarily need to live in the Tor code, it
can do its job just as well when it's an external tool.

View File

@ -1,315 +0,0 @@
Filename: 124-tls-certificates.txt
Title: Blocking resistant TLS certificate usage
Version: $Revision$
Last-Modified: $Date$
Author: Steven J. Murdoch
Created: 2007-10-25
Status: Superseded
Overview:
To be less distinguishable from HTTPS web browsing, only Tor servers should
present TLS certificates. This should be done whilst maintaining backwards
compatibility with Tor nodes which present and expect client certificates, and
while preserving existing security properties. This specification describes
the negotiation protocol, what certificates should be presented during the TLS
negotiation, and how to move the client authentication within the encrypted
tunnel.
Motivation:
In Tor's current TLS [1] handshake, both client and server present a
two-certificate chain. Since TLS performs authentication prior to establishing
the encrypted tunnel, the contents of these certificates are visible to an
eavesdropper. In contrast, during normal HTTPS web browsing, the server
presents a single certificate, signed by a root CA and the client presents no
certificate. Hence it is possible to distinguish Tor from HTTP by identifying
this pattern.
To resist blocking based on traffic identification, Tor should behave as close
to HTTPS as possible, i.e. servers should offer a single certificate and not
request a client certificate; clients should present no certificate. This
presents two difficulties: clients are no longer authenticated and servers are
authenticated by the connection key, rather than identity key. The link
protocol must thus be modified to preserve the old security semantics.
Finally, in order to maintain backwards compatibility, servers must correctly
identify whether the client supports the modified certificate handling. This
is achieved by modifying the cipher suites that clients advertise support
for. These cipher suites are selected to be similar to those chosen by web
browsers, in order to resist blocking based on client hello.
Terminology:
Initiator: OP or OR which initiates a TLS connection ("client" in TLS
terminology)
Responder: OR which receives an incoming TLS connection ("server" in TLS
terminology)
Version negotiation and cipher suite selection:
In the modified TLS handshake, the responder does not request a certificate
from the initiator. This request would normally occur immediately after the
responder receives the client hello (the first message in a TLS handshake) and
so the responder must decide whether to request a certificate based only on
the information in the client hello. This is achieved by examining the cipher
suites in the client hello.
List 1: cipher suites lists offered by version 0/1 Tor
From src/common/tortls.c, revision 12086:
TLS1_TXT_DHE_RSA_WITH_AES_128_SHA
TLS1_TXT_DHE_RSA_WITH_AES_128_SHA : SSL3_TXT_EDH_RSA_DES_192_CBC3_SHA
SSL3_TXT_EDH_RSA_DES_192_CBC3_SHA
Client hello sent by initiator:
Initiators supporting version 2 of the Tor connection protocol MUST
offer a different cipher suite list from those sent by pre-version 2
Tors, contained in List 1. To maintain compatibility with older Tor
versions and common browsers, the cipher suite list MUST include
support for:
TLS_DHE_RSA_WITH_AES_256_CBC_SHA
TLS_DHE_RSA_WITH_AES_128_CBC_SHA
SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA
SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA
Client hello received by responder/server hello sent by responder:
Responders supporting version 2 of the Tor connection protocol should compare
the cipher suite list in the client hello with those in List 1. If it matches
any in the list then the responder should assume that the initiatior supports
version 1, and thus should maintain the version 1 behavior, i.e. send a
two-certificate chain, request a client certificate and do not send or expect
a VERSIONS cell [2].
Otherwise, the responder should assume version 2 behavior and select a cipher
suite following TLS [1] behavior, i.e. select the first entry from the client
hello cipher list which is acceptable. Responders MUST NOT select any suite
that lacks ephemeral keys, or whose symmetric keys are less then KEY_LEN bits,
or whose digests are less than HASH_LEN bits. Implementations SHOULD NOT
allow other SSLv3 ciphersuites.
Should no mutually acceptable cipher suite be found, the connection MUST be
closed.
If the responder is implementing version 2 of the connection protocol it
SHOULD send a server certificate with random contents. The organizationName
field MUST NOT be "Tor", "TOR" or "t o r".
Server certificate received by initiator:
If the server certificate has an organizationName of "Tor", "TOR" or "t o r",
the initiator should assume that the responder does not support version 2 of
the connection protocol. In which case the initiator should respond following
version 1, i.e. send a two-certificate client chain and do not send or expect
a VERSIONS cell.
[SJM: We could also use the fact that a client certificate request was sent]
If the server hello contains a ciphersuite which does not comply with the key
length requirements above, even if it was one offered in the client hello, the
connection MUST be closed. This will only occur if the responder is not a Tor
server.
Backward compatibility:
v1 Initiator, v1 Responder: No change
v1 Initiator, v2 Responder: Responder detects v1 initiator by client hello
v2 Initiator, v1 Responder: Responder accepts v2 client hello. Initiator
detects v1 server certificate and continues with v1 protocol
v2 Initiator, v2 Responder: Responder accepts v2 client hello. Initiator
detects v2 server certificate and continues with v2 protocol.
Additional link authentication process:
Following VERSION and NETINFO negotiation, both responder and
initiator MUST send a certification chain in a CERT cell. If one
party does not have a certificate, the CERT cell MUST still be sent,
but with a length of zero.
A CERT cell is a variable length cell, of the format
CircID [2 bytes]
Command [1 byte]
Length [2 bytes]
Payload [<length> bytes]
CircID MUST set to be 0x0000
Command is [SJM: TODO]
Length is the length of the payload
Payload contains 0 or more certificates, each is of the format:
Cert_Length [2 bytes]
Certificate [<cert_length> bytes]
Each certificate MUST sign the one preceding it. The initator MUST
place its connection certificate first; the responder, having
already sent its connection certificate as part of the TLS handshake
MUST place its identity certificate first.
Initiators who send a CERT cell MUST follow that with an LINK_AUTH
cell to prove that they posess the corresponding private key.
A LINK_AUTH cell is fixed-lenth, of the format:
CircID [2 bytes]
Command [1 byte]
Length [2 bytes]
Payload (padded with 0 bytes) [PAYLOAD_LEN - 2 bytes]
CircID MUST set to be 0x0000
Command is [SJM: TODO]
Length is the valid portion of the payload
Payload is of the format:
Signature version [1 byte]
Signature [<length> - 1 bytes]
Padding [PAYLOAD_LEN - <length> - 2 bytes]
Signature version: Identifies the type of signature, currently 0x00
Signature: Digital signature under the initiator's connection key of the
following item, in PKCS #1 block type 1 [3] format:
HMAC-SHA1, using the TLS master secret as key, of the
following elements concatenated:
- The signature version (0x00)
- The NUL terminated ASCII string: "Tor initiator certificate verification"
- client_random, as sent in the Client Hello
- server_random, as sent in the Server Hello
- SHA-1 hash of the initiator connection certificate
- SHA-1 hash of the responder connection certificate
Security checks:
- Before sending a LINK_AUTH cell, a node MUST ensure that the TLS
connection is authenticated by the responder key.
- For the handshake to have succeeded, the initiator MUST confirm:
- That the TLS handshake was authenticated by the
responder connection key
- That the responder connection key was signed by the first
certificate in the CERT cell
- That each certificate in the CERT cell was signed by the
following certificate, with the exception of the last
- That the last certificate in the CERT cell is the expected
identity certificate for the node being connected to
- For the handshake to have succeeded, the responder MUST confirm
either:
A) - A zero length CERT cell was sent and no LINK_AUTH cell was
sent
In which case the responder shall treat the identity of the
initiator as unknown
or
B) - That the LINK_AUTH MAC contains a signature by the first
certificate in the CERT cell
- That the MAC signed matches the expected value
- That each certificate in the CERT cell was signed by the
following certificate, with the exception of the last
In which case the responder shall treat the identity of the
initiator as that of the last certificate in the CERT cell
Protocol summary:
1. I(nitiator) <-> R(esponder): TLS handshake, including responder
authentication under connection certificate R_c
2. I <->: VERSION and NETINFO negotiation
3. R -> I: CERT (Responder identity certificate R_i (which signs R_c))
4. I -> R: CERT (Initiator connection certificate I_c,
Initiator identity certificate I_i (which signs I_c)
5. I -> R: LINK_AUTH (Signature, under I_c of HMAC-SHA1(master_secret,
"Tor initiator certificate verification" ||
client_random || server_random ||
I_c hash || R_c hash)
Notes: I -> R doesn't need to wait for R_i before sending its own
messages (reduces round-trips).
Certificate hash is calculated like identity hash in CREATE cells.
Initiator signature is calculated in a similar way to Certificate
Verify messages in TLS 1.1 (RFC4346, Sections 7.4.8 and 4.7).
If I is an OP, a zero length certificate chain may be sent in step 4;
In which case, step 5 is not performed
Rationale:
- Version and netinfo negotiation before authentication: The version cell needs
to come before before the rest of the protocol, since we may choose to alter
the rest at some later point, e.g switch to a different MAC/signature scheme.
It is useful to keep the NETINFO and VERSION cells close to each other, since
the time between them is used to check if there is a delay-attack. Still, a
server might want to not act on NETINFO data from an initiator until the
authentication is complete.
Appendix A: Cipher suite choices
This specification intentionally does not put any constraints on the
TLS ciphersuite lists presented by clients, other than a minimum
required for compatibility. However, to maximize blocking
resistance, ciphersuite lists should be carefully selected.
Recommended client ciphersuite list
Source: http://lxr.mozilla.org/security/source/security/nss/lib/ssl/sslproto.h
0xc00a: TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
0xc014: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
0x0039: TLS_DHE_RSA_WITH_AES_256_CBC_SHA
0x0038: TLS_DHE_DSS_WITH_AES_256_CBC_SHA
0xc00f: TLS_ECDH_RSA_WITH_AES_256_CBC_SHA
0xc005: TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA
0x0035: TLS_RSA_WITH_AES_256_CBC_SHA
0xc007: TLS_ECDHE_ECDSA_WITH_RC4_128_SHA
0xc009: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
0xc011: TLS_ECDHE_RSA_WITH_RC4_128_SHA
0xc013: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
0x0033: TLS_DHE_RSA_WITH_AES_128_CBC_SHA
0x0032: TLS_DHE_DSS_WITH_AES_128_CBC_SHA
0xc00c: TLS_ECDH_RSA_WITH_RC4_128_SHA
0xc00e: TLS_ECDH_RSA_WITH_AES_128_CBC_SHA
0xc002: TLS_ECDH_ECDSA_WITH_RC4_128_SHA
0xc004: TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA
0x0004: SSL_RSA_WITH_RC4_128_MD5
0x0005: SSL_RSA_WITH_RC4_128_SHA
0x002f: TLS_RSA_WITH_AES_128_CBC_SHA
0xc008: TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA
0xc012: TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA
0x0016: SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA
0x0013: SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA
0xc00d: TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA
0xc003: TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA
0xfeff: SSL_RSA_FIPS_WITH_3DES_EDE_CBC_SHA (168-bit Triple DES with RSA and a SHA1 MAC)
0x000a: SSL_RSA_WITH_3DES_EDE_CBC_SHA
Order specified in:
http://lxr.mozilla.org/security/source/security/nss/lib/ssl/sslenum.c#47
Recommended options:
0x0000: Server Name Indication [4]
0x000a: Supported Elliptic Curves [5]
0x000b: Supported Point Formats [5]
Recommended compression:
0x00
Recommended server ciphersuite selection:
The responder should select the first entry in this list which is
listed in the client hello:
0x0039: TLS_DHE_RSA_WITH_AES_256_CBC_SHA [ Common Firefox choice ]
0x0033: TLS_DHE_RSA_WITH_AES_128_CBC_SHA [ Tor v1 default ]
0x0016: SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA [ Tor v1 fallback ]
0x0013: SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA [ Valid IE option ]
References:
[1] The Transport Layer Security (TLS) Protocol, Version 1.1, RFC4346, IETF
[2] Version negotiation for the Tor protocol, Tor proposal 105
[3] B. Kaliski, "Public-Key Cryptography Standards (PKCS) #1:
RSA Cryptography Specifications Version 1.5", RFC 2313,
March 1998.
[4] TLS Extensions, RFC 3546
[5] Elliptic Curve Cryptography (ECC) Cipher Suites for Transport Layer Security (TLS)
% <!-- Local IspellDict: american -->

View File

@ -1,293 +0,0 @@
Filename: 125-bridges.txt
Title: Behavior for bridge users, bridge relays, and bridge authorities
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 11-Nov-2007
Status: Closed
Implemented-In: 0.2.0.x
0. Preface
This document describes the design decisions around support for bridge
users, bridge relays, and bridge authorities. It acts as an overview
of the bridge design and deployment for developers, and it also tries
to point out limitations in the current design and implementation.
For more details on what all of these mean, look at blocking.tex in
/doc/design-paper/
1. Bridge relays
Bridge relays are just like normal Tor relays except they don't publish
their server descriptors to the main directory authorities.
1.1. PublishServerDescriptor
To configure your relay to be a bridge relay, just add
BridgeRelay 1
PublishServerDescriptor bridge
to your torrc. This will cause your relay to publish its descriptor
to the bridge authorities rather than to the default authorities.
Alternatively, you can say
BridgeRelay 1
PublishServerDescriptor 0
which will cause your relay to not publish anywhere. This could be
useful for private bridges.
1.2. Exit policy
Bridge relays should use an exit policy of "reject *:*". This is
because they only need to relay traffic between the bridge users
and the rest of the Tor network, so there's no need to let people
exit directly from them.
1.3. RelayBandwidthRate / RelayBandwidthBurst
We invented the RelayBandwidth* options for this situation: Tor clients
who want to allow relaying too. See proposal 111 for details. Relay
operators should feel free to rate-limit their relayed traffic.
1.4. Helping the user with port forwarding, NAT, etc.
Just as for operating normal relays, our documentation and hints for
how to make your ORPort reachable are inadequate for normal users.
We need to work harder on this step, perhaps in 0.2.2.x.
1.5. Vidalia integration
Vidalia has turned its "Relay" settings page into a tri-state
"Don't relay" / "Relay for the Tor network" / "Help censored users".
If you click the third choice, it forces your exit policy to reject *:*.
If all the bridges end up on port 9001, that's not so good. On the
other hand, putting the bridges on a low-numbered port in the Unix
world requires jumping through extra hoops. The current compromise is
that Vidalia makes the ORPort default to 443 on Windows, and 9001 on
other platforms.
At the bottom of the relay config settings window, Vidalia displays
the bridge identifier to the operator (see Section 3.1) so he can pass
it on to bridge users.
1.6. What if the default ORPort is already used?
If the user already has a webserver or some other application
bound to port 443, then Tor will fail to bind it and complain to the
user, probably in a cryptic way. Rather than just working on a better
error message (though we should do this), we should consider an
"ORPort auto" option that tells Tor to try to find something that's
bindable and reachable. This would also help us tolerate ISPs that
filter incoming connections on port 80 and port 443. But this should
be a different proposal, and can wait until 0.2.2.x.
2. Bridge authorities.
Bridge authorities are like normal directory authorities, except they
don't create their own network-status documents or votes. So if you
ask an authority for a network-status document or consensus, they
behave like a directory mirror: they give you one from one of the main
authorities. But if you ask the bridge authority for the descriptor
corresponding to a particular identity fingerprint, it will happily
give you the latest descriptor for that fingerprint.
To become a bridge authority, add these lines to your torrc:
AuthoritativeDirectory 1
BridgeAuthoritativeDir 1
Right now there's one bridge authority, running on the Tonga relay.
2.1. Exporting bridge-purpose descriptors
We've added a new purpose for server descriptors: the "bridge"
purpose. With the new router-descriptors file format that includes
annotations, it's easy to look through it and find the bridge-purpose
descriptors.
Currently we export the bridge descriptors from Tonga to the
BridgeDB server, so it can give them out according to the policies
in blocking.pdf.
2.2. Reachability/uptime testing
Right now the bridge authorities do active reachability testing of
bridges, so we know which ones to recommend for users.
But in the design document, we suggested that bridges should publish
anonymously (i.e. via Tor) to the bridge authority, so somebody watching
the bridge authority can't just enumerate all the bridges. But if we're
doing active measurement, the game is up. Perhaps we should back off on
this goal, or perhaps we should do our active measurement anonymously?
Answering this issue is scheduled for 0.2.1.x.
2.3. Migrating to multiple bridge authorities
Having only one bridge authority is both a trust bottleneck (if you
break into one place you learn about every single bridge we've got)
and a robustness bottleneck (when it's down, bridge users become sad).
Right now if we put up a second bridge authority, all the bridges would
publish to it, and (assuming the code works) bridge users would query
a random bridge authority. This resolves the robustness bottleneck,
but makes the trust bottleneck even worse.
In 0.2.2.x and later we should think about better ways to have multiple
bridge authorities.
3. Bridge users.
Bridge users are like ordinary Tor users except they use encrypted
directory connections by default, and they use bridge relays as both
entry guards (their first hop) and directory guards (the source of
all their directory information).
To become a bridge user, add the following line to your torrc:
UseBridges 1
and then add at least one "Bridge" line to your torrc based on the
format below.
3.1. Format of the bridge identifier.
The canonical format for a bridge identifier contains an IP address,
an ORPort, and an identity fingerprint:
bridge 128.31.0.34:9009 4C17 FB53 2E20 B2A8 AC19 9441 ECD2 B017 7B39 E4B1
However, the identity fingerprint can be left out, in which case the
bridge user will connect to that relay and use it as a bridge regardless
of what identity key it presents:
bridge 128.31.0.34:9009
This might be useful for cases where only short bridge identifiers
can be communicated to bridge users.
In a future version we may also support bridge identifiers that are
only a key fingerprint:
bridge 4C17 FB53 2E20 B2A8 AC19 9441 ECD2 B017 7B39 E4B1
and the bridge user can fetch the latest descriptor from the bridge
authority (see Section 3.4).
3.2. Bridges as entry guards
For now, bridge users add their bridge relays to their list of "entry
guards" (see path-spec.txt for background on entry guards). They are
managed by the entry guard algorithms exactly as if they were a normal
entry guard -- their keys and timing get cached in the "state" file,
etc. This means that when the Tor user starts up with "UseBridges"
disabled, he will skip past the bridge entries since they won't be
listed as up and usable in his networkstatus consensus. But to be clear,
the "entry_guards" list doesn't currently distinguish guards by purpose.
Internally, each bridge user keeps a smartlist of "bridge_info_t"
that reflects the "bridge" lines from his torrc along with a download
schedule (see Section 3.5 below). When he starts Tor, he attempts
to fetch a descriptor for each configured bridge (see Section 3.4
below). When he succeeds at getting a descriptor for one of the bridges
in his list, he adds it directly to the entry guard list using the
normal add_an_entry_guard() interface. Once a bridge descriptor has
been added, should_delay_dir_fetches() will stop delaying further
directory fetches, and the user begins to bootstrap his directory
information from that bridge (see Section 3.3).
Currently bridge users cache their bridge descriptors to the
"cached-descriptors" file (annotated with purpose "bridge"), but
they don't make any attempt to reuse descriptors they find in this
file. The theory is that either the bridge is available now, in which
case you can get a fresh descriptor, or it's not, in which case an
old descriptor won't do you much good.
We could disable writing out the bridge lines to the state file, if
we think this is a problem.
As an exception, if we get an application request when we have one
or more bridge descriptors but we believe none of them are running,
we mark them all as running again. This is similar to the exception
already in place to help long-idle Tor clients realize they should
fetch fresh directory information rather than just refuse requests.
3.3. Bridges as directory guards
In addition to using bridges as the first hop in their circuits, bridge
users also use them to fetch directory updates. Other than initial
bootstrapping to find a working bridge descriptor (see Section 3.4
below), all further non-anonymized directory fetches will be redirected
to the bridge.
This means that bridge relays need to have cached answers for all
questions the bridge user might ask. This makes the upgrade path
tricky --- for example, if we migrate to a v4 directory design, the
bridge user would need to keep using v3 so long as his bridge relays
only knew how to answer v3 queries.
In a future design, for cases where the user has enough information
to build circuits yet the chosen bridge doesn't know how to answer a
given query, we might teach bridge users to make an anonymized request
to a more suitable directory server.
3.4. How bridge users get their bridge descriptor
Bridge users can fetch bridge descriptors in two ways: by going directly
to the bridge and asking for "/tor/server/authority", or by going to
the bridge authority and asking for "/tor/server/fp/ID". By default,
they will only try the direct queries. If the user sets
UpdateBridgesFromAuthority 1
in his config file, then he will try querying the bridge authority
first for bridges where he knows a digest (if he only knows an IP
address and ORPort, then his only option is a direct query).
If the user has at least one working bridge, then he will do further
queries to the bridge authority through a full three-hop Tor circuit.
But when bootstrapping, he will make a direct begin_dir-style connection
to the bridge authority.
As of Tor 0.2.0.10-alpha, if the user attempts to fetch a descriptor
from the bridge authority and it returns a 404 not found, the user
will automatically fall back to trying a direct query. Therefore it is
recommended that bridge users always set UpdateBridgesFromAuthority,
since at worst it will delay their fetches a little bit and notify
the bridge authority of the identity fingerprint (but not location)
of their intended bridges.
3.5. Bridge descriptor retry schedule
Bridge users try to fetch a descriptor for each bridge (using the
steps in Section 3.4 above) on startup. Whenever they receive a
bridge descriptor, they reschedule a new descriptor download for 1
hour from then.
If on the other hand it fails, they try again after 15 minutes for the
first attempt, after 15 minutes for the second attempt, and after 60
minutes for subsequent attempts.
In 0.2.2.x we should come up with some smarter retry schedules.
3.6. Vidalia integration
Vidalia 0.0.16 has a checkbox in its Network config window called
"My ISP blocks connections to the Tor network." Users who click that
box change their configuration to:
UseBridges 1
UpdateBridgesFromAuthority 1
and should specify at least one Bridge identifier.
3.7. Do we need a second layer of entry guards?
If the bridge user uses the bridge as its entry guard, then the
triangulation attacks from Lasse and Paul's Oakland paper work to
locate the user's bridge(s).
Worse, this is another way to enumerate bridges: if the bridge users
keep rotating through second hops, then if you run a few fast servers
(and avoid getting considered an Exit or a Guard) you'll quickly get
a list of the bridges in active use.
That's probably the strongest reason why bridge users will need to
pick second-layer guards. Would this mean bridge users should switch
to four-hop circuits?
We should figure this out in the 0.2.1.x timeframe.

View File

@ -1,412 +0,0 @@
Filename: 126-geoip-reporting.txt
Title: Getting GeoIP data and publishing usage summaries
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 2007-11-24
Status: Closed
Implemented-In: 0.2.0.x
0. Status
In 0.2.0.x, this proposal is implemented to the extent needed to
address its motivations. See notes below with the test "RESOLUTION"
for details.
1. Background and motivation
Right now we can keep a rough count of Tor users, both total and by
country, by watching connections to a single directory mirror. Being
able to get usage estimates is useful both for our funders (to
demonstrate progress) and for our own development (so we know how
quickly we're scaling and can design accordingly, and so we know which
countries and communities to focus on more). This need for information
is the only reason we haven't deployed "directory guards" (think of
them like entry guards but for directory information; in practice,
it would seem that Tor clients should simply use their entry guards
as their directory guards; see also proposal 125).
With the move toward bridges, we will no longer be able to track Tor
clients that use bridges, since they use their bridges as directory
guards. Further, we need to be able to learn which bridges stop seeing
use from certain countries (and are thus likely blocked), so we can
avoid giving them out to other users in those countries.
Right now we already do GeoIP lookups in Vidalia: Vidalia draws relays
and circuits on its 'network map', and it performs anonymized GeoIP
lookups to its central servers to know where to put the dots. Vidalia
caches answers it gets -- to reduce delay, to reduce overhead on
the network, and to reduce anonymity issues where users reveal their
knowledge about the network through which IP addresses they ask about.
But with the advent of bridges, Tor clients are asking about IP
addresses that aren't in the main directory. In particular, bridge
users inform the central Vidalia servers about each bridge as they
discover it and their Vidalia tries to map it.
Also, we wouldn't mind letting Vidalia do a GeoIP lookup on the client's
own IP address, so it can provide a more useful map.
Finally, Vidalia's central servers leave users open to partitioning
attacks, even if they can't target specific users. Further, as we
start using GeoIP results for more operational or security-relevant
goals, such as avoiding or including particular countries in circuits,
it becomes more important that users can't be singled out in terms of
their IP-to-country mapping beliefs.
2. The available GeoIP databases
There are at least two classes of GeoIP database out there: "IP to
country", which tells us the country code for the IP address but
no more details, and "IP to city", which tells us the country code,
the name of the city, and some basic latitude/longitude guesses.
A recent ip-to-country.csv is 3421362 bytes. Compressed, it is 564252
bytes. A typical line is:
"205500992","208605279","US","USA","UNITED STATES"
http://ip-to-country.webhosting.info/node/view/5
Similarly, the maxmind GeoLite Country database is also about 500KB
compressed.
http://www.maxmind.com/app/geolitecountry
The maxmind GeoLite City database gives more finegrained detail like
geo coordinates and city name. Vidalia currently makes use of this
information. On the other hand it's 16MB compressed. A typical line is:
206.124.149.146,Bellevue,WA,US,47.6051,-122.1134
http://www.maxmind.com/app/geolitecity
There are other databases out there, like
http://www.hostip.info/faq.html
http://www.webconfs.com/ip-to-city.php
that want more attention, but for now let's assume that all the db's
are around this size.
3. What we'd like to solve
Goal #1a: Tor relays collect IP-to-country user stats and publish
sanitized versions.
Goal #1b: Tor bridges collect IP-to-country user stats and publish
sanitized versions.
Goal #2a: Vidalia learns IP-to-city stats for Tor relays, for better
mapping.
Goal #2b: Vidalia learns IP-to-country stats for Tor relays, so the user
can pick countries for her paths.
Goal #3: Vidalia doesn't do external lookups on bridge relay addresses.
Goal #4: Vidalia resolves the Tor client's IP-to-country or IP-to-city
for better mapping.
Goal #5: Reduce partitioning opportunities where Vidalia central
servers can give different (distinguishing) responses.
4. Solution overview
Our goal is to allow Tor relays, bridges, and clients to learn enough
GeoIP information so they can do local private queries.
4.1. The IP-to-country db
Directory authorities should publish a "geoip" file that contains
IP-to-country mappings. Directory caches will mirror it, and Tor clients
and relays (including bridge relays) will fetch it. Thus we can solve
goals 1a and 1b (publish sanitized usage info). Controllers could also
use this to solve goal 2b (choosing path by country attributes). It
also solves goal 4 (learning the Tor client's country), though for
huge countries like the US we'd still need to decide where the "middle"
should be when we're mapping that address.
The IP-to-country details are described further in Sections 5 and
6 below.
[RESOLUTION: The geoip file in 0.2.0.x is not distributed through
Tor. Instead, it is shipped with the bundle.]
4.2. The IP-to-city db
In an ideal world, the IP-to-city db would be small enough that we
could distribute it in the above manner too. But for now, it is too
large. Here's where the design choice forks.
Option A: Vidalia should continue doing its anonymized IP-to-city
queries. Thus we can achieve goals 2a and 2b. We would solve goal
3 by only doing lookups on descriptors that are purpose "general"
(see Section 4.2.1 for how). We would leave goal 5 unsolved.
Option B: Each directory authority should keep an IP-to-city db,
lookup the value for each router it lists, and include that line in
the router's network-status entry. The network-status consensus would
then use the line that appears in the majority of votes. This approach
also solves goals 2a and 2b, goal 3 (Vidalia doesn't do any lookups
at all now), and goal 5 (reduced partitioning risks).
Option B has the advantage that Vidalia can simplify its operation,
and the advantage that this consensus IP-to-city data is available to
other controllers besides just Vidalia. But it has the disadvantage
that the networkstatus consensus becomes larger, even though most of
the GeoIP information won't change from one consensus to the next. Is
there another reasonable location for it that can provide similar
consensus security properties?
[RESOLUTION: IP-to-city is not supported.]
4.2.1. Controllers can query for router annotations
Vidalia needs to stop doing queries on bridge relay IP addresses.
It could do that by only doing lookups on descriptors that are in
the networkstatus consensus, but that precludes designs like Blossom
that might want to map its relay locations. The best answer is that it
should learn the router annotations, with a new controller 'getinfo'
command:
"GETINFO desc-annotations/id/<OR identity>"
which would respond with something like
@downloaded-at 2007-11-29 08:06:38
@source "128.31.0.34"
@purpose bridge
[We could also make the answer include the digest for the router in
question, which would enable us to ask GETINFO router-annotations/all.
Is this worth it? -RD]
Then Vidalia can avoid doing lookups on descriptors with purpose
"bridge". Even better would be to add a new annotation "@private true"
so Vidalia can know how to handle new purposes that we haven't created
yet. Vidalia could special-case "bridge" for now, for compatibility
with the current 0.2.0.x-alphas.
4.3. Recommendation
My overall recommendation is that we should implement 4.1 soon
(e.g. early in 0.2.1.x), and we can go with 4.2 option A for now,
with the hope that later we discover a better way to distribute the
IP-to-city info and can switch to 4.2 option B.
Below we discuss more how to go about achieving 4.1.
5. Publishing and caching the GeoIP (IP-to-country) database
Each v3 directory authority should put a copy of the "geoip" file in
its datadirectory. Then its network-status votes should include a hash
of this file (Recommended-geoip-hash: %s), and the resulting consensus
directory should specify the consensus hash.
There should be a new URL for fetching this geoip db (by "current.z"
for testing purposes, and by hash.z for typical downloads). Authorities
should fetch and serve the one listed in the consensus, even when they
vote for their own. This would argue for storing the cached version
in a better filename than "geoip".
Directory mirrors should keep a copy of this file available via the
same URLs.
We assume that the file would change at most a few times a month. Should
Tor ship with a bootstrap geoip file? An out-of-date geoip file may
open you up to partitioning attacks, but for the most part it won't
be that different.
There should be a config option to disable updating the geoip file,
in case users want to use their own file (e.g. they have a proprietary
GeoIP file they prefer to use). In that case we leave it up to the
user to update his geoip file out-of-band.
[XXX Should consider forward/backward compatibility, e.g. if we want
to move to a new geoip file format. -RD]
[RESOLUTION: Not done over Tor.]
6. Controllers use the IP-to-country db for mapping and for path building
Down the road, Vidalia could use the IP-to-country mappings for placing
on its map:
- The location of the client
- The location of the bridges, or other relays not in the
networkstatus, on the map.
- Any relays that it doesn't yet have an IP-to-city answer for.
Other controllers can also use it to set EntryNodes, ExitNodes, etc
in a per-country way.
To support these features, we need to export the IP-to-country data
via the Tor controller protocol.
Is it sufficient just to add a new GETINFO command?
GETINFO ip-to-country/128.31.0.34
250+ip-to-country/128.31.0.34="US","USA","UNITED STATES"
[RESOLUTION: Not done now, except for the getinfo command.]
6.1. Other interfaces
Robert Hogan has also suggested a
GETINFO relays-by-country/cn
as well as torrc options for ExitCountryCodes, EntryCountryCodes,
ExcludeCountryCodes, etc.
[RESOLUTION: Not implemented in 0.2.0.x. Fodder for a future proposal.]
7. Relays and bridges use the IP-to-country db for usage summaries
Once bridges have a GeoIP database locally, they can start to publish
sanitized summaries of client usage -- how many users they see and from
what countries. This might also be a more useful way for ordinary Tor
relays to convey the level of usage they see, which would allow us to
switch to using directory guards for all users by default.
But how to safely summarize this information without opening too many
anonymity leaks?
7.1 Attacks to think about
First, note that we need to have a large enough time window that we're
not aiding correlation attacks much. I hope 24 hours is enough. So
that means no publishing stats until you've been up at least 24 hours.
And you can't publish follow-up stats more often than every 24 hours,
or people could look at the differential.
Second, note that we need to be sufficiently vague about the IP
addresses we're reporting. We are hoping that just specifying the
country will be vague enough. But a) what about active attacks where
we convince a bridge to use a GeoIP db that labels each suspect IP
address as a unique country? We have to assume that the consensus GeoIP
db won't be malicious in this way. And b) could such singling-out
attacks occur naturally, for example because of countries that have
a very small IP space? We should investigate that.
7.2. Granularity of users
Do we only want to report countries that have a sufficient anonymity set
(that is, number of users) for the day? For example, we might avoid
listing any countries that have seen less than five addresses over
the 24 hour period. This approach would be helpful in reducing the
singling-out opportunities -- in the extreme case, we could imagine a
situation where one blogger from the Sudan used Tor on a given day, and
we can discover which entry guard she used.
But I fear that especially for bridges, seeing only one hit from a
given country in a given day may be quite common.
As a compromise, we should start out with an "Other" category in
the reported stats, which is the sum of unlisted countries; if that
category is consistently interesting, we can think harder about how
to get the right data from it safely.
But note that bridge summaries will not be made public individually,
since doing so would help people enumerate bridges. Whereas summaries
from normal relays will be public. So perhaps that means we can afford
to be more specific in bridge summaries? In particular, I'm thinking the
"other" category should be used by public relays but not for bridges
(or if it is, used with a lower threshold).
Even for countries that have many Tor users, we might not want to be
too specific about how many users we've seen. For example, we might
round down the number of users we report to the nearest multiple of 5.
My instinct for now is that this won't be that useful.
7.3 Other issues
Another note: we'll likely be overreporting in the case of users with
dynamic IP addresses: if they rotate to a new address over the course
of the day, we'll count them twice. So be it.
7.4. Where to publish the summaries?
We designed extrainfo documents for information like this. So they
should just be more entries in the extrainfo doc.
But if we want to publish summaries every 24 hours (no more often,
no less often), aren't we tried to the router descriptor publishing
schedule? That is, if we publish a new router descriptor at the 18
hour mark, and nothing much has changed at the 24 hour mark, won't
the new descriptor get dropped as being "cosmetically similar", and
then nobody will know to ask about the new extrainfo document?
One solution would be to make and remember the 24 hour summary at the
24 hour mark, but not actually publish it anywhere until we happen to
publish a new descriptor for other reasons. If we happen to go down
before publishing a new descriptor, then so be it, at least we tried.
7.5. What if the relay is unreachable or goes to sleep?
Even if you've been up for 24 hours, if you were hibernating for 18
of them, then we're not getting as much fuzziness as we'd like. So
I guess that means that we need a 24-hour period of being "awake"
before we'll willing to publish a summary. A similar attack works if
you've been awake but unreachable for the first 18 of the 24 hours. As
another example, a bridge that's on a laptop might be suspended for
some of each day.
This implies that some relays and bridges will never publish summary
stats, because they're not ever reliably working for 24 hours in
a row. If a significant percentage of our reporters end up being in
this boat, we should investigate whether we can accumulate 24 hours of
"usefulness", even if there are holes in the middle, and publish based
on that.
What other issues are like this? It seems that just moving to a new
IP address shouldn't be a reason to cancel stats publishing, assuming
we were usable at each address.
7.6. IP addresses that aren't in the geoip db
Some IP addresses aren't in the public geoip databases. In particular,
I've found that a lot of African countries are missing, but there
are also some common ones in the US that are missing, like parts of
Comcast. We could just lump unknown IP addresses into the "other"
category, but it might be useful to gather a general sense of how many
lookups are failing entirely, by adding a separate "Unknown" category.
We could also contribute back to the geoip db, by letting bridges set
a config option to report the actual IP addresses that failed their
lookup. Then the bridge authority operators can manually make sure
the correct answer will be in later geoip files. This config option
should be disabled by default.
7.7 Bringing it all together
So here's the plan:
24 hours after starting up (modulo Section 7.5 above), bridges and
relays should construct a daily summary of client countries they've
seen, including the above "Unknown" category (Section 7.6) as well.
Non-bridge relays lump all countries with less than K (e.g. K=5) users
into the "Other" category (see Sec 7.2 above), whereas bridge relays are
willing to list a country even when it has only one user for the day.
Whenever we have a daily summary on record, we include it in our
extrainfo document whenever we publish one. The daily summary we
remember locally gets replaced with a newer one when another 24
hours pass.
7.8. Some forward secrecy
How should we remember addresses locally? If we convert them into
country-codes immediately, we will count them again if we see them
again. On the other hand, we don't really want to keep a list hanging
around of all IP addresses we've seen in the past 24 hours.
Step one is that we should never write this stuff to disk. Keeping it
only in ram will make things somewhat better. Step two is to avoid
keeping any timestamps associated with it: rather than a rolling
24-hour window, which would require us to remember the various times
we've seen that address, we can instead just throw out the whole list
every 24 hours and start over.
We could hash the addresses, and then compare hashes when deciding if
we've seen a given address before. We could even do keyed hashes. Or
Bloom filters. But if our goal is to defend against an adversary
who steals a copy of our ram while we're running and then does
guess-and-check on whatever blob we're keeping, we're in bad shape.
We could drop the last octet of the IP address as soon as we see
it. That would cause us to undercount some users from cablemodem and
DSL networks that have a high density of Tor users. And it wouldn't
really help that much -- indeed, the extent to which it does help is
exactly the extent to which it makes our stats less useful.
Other ideas?

View File

@ -1,157 +0,0 @@
Filename: 127-dirport-mirrors-downloads.txt
Title: Relaying dirport requests to Tor download site / website
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 2007-12-02
Status: Draft
1. Overview
Some countries and networks block connections to the Tor website. As
time goes by, this will remain a problem and it may even become worse.
We have a big pile of mirrors (google for "Tor mirrors"), but few of
our users think to try a search like that. Also, many of these mirrors
might be automatically blocked since their pages contain words that
might cause them to get banned. And lastly, we can imagine a future
where the blockers are aware of the mirror list too.
Here we describe a new set of URLs for Tor's DirPort that will relay
connections from users to the official Tor download site. Rather than
trying to cache a bunch of new Tor packages (which is a hassle in terms
of keeping them up to date, and a hassle in terms of drive space used),
we instead just proxy the requests directly to Tor's /dist page.
Specifically, we should support
GET /tor/dist/$1
and
GET /tor/website/$1
2. Direct connections, one-hop circuits, or three-hop circuits?
We could relay the connections directly to the download site -- but
this produces recognizable outgoing traffic on the bridge or cache's
network, which will probably surprise our nice volunteers. (Is this
a good enough reason to discard the direct connection idea?)
Even if we don't do direct connections, should we do a one-hop
begindir-style connection to the mirror site (make a one-hop circuit
to it, then send a 'begindir' cell down the circuit), or should we do
a normal three-hop anonymized connection?
If these mirrors are mainly bridges, doing either a direct or a one-hop
connection creates another way to enumerate bridges. That would argue
for three-hop. On the other hand, downloading a 10+ megabyte installer
through a normal Tor circuit can't be fun. But if you're already getting
throttled a lot because you're in the "relayed traffic" bucket, you're
going to have to accept a slow transfer anyway. So three-hop it is.
Speaking of which, we would want to label this connection
as "relay" traffic for the purposes of rate limiting; see
connection_counts_as_relayed_traffic() and or_conn->client_used. This
will be a bit tricky though, because these connections will use the
bridge's guards.
3. Scanning resistance
One other goal we'd like to achieve, or at least not hinder, is making
it hard to scan large swaths of the Internet to look for responses
that indicate a bridge.
In general this is a really hard problem, so we shouldn't demand to
solve it here. But we can note that some bridges should open their
DirPort (and offer this functionality), and others shouldn't. Then
some bridges provide a download mirror while others can remain
scanning-resistant.
4. Integrity checking
If we serve this stuff in plaintext from the bridge, anybody in between
the user and the bridge can intercept and modify it. The bridge can too.
If we do an anonymized three-hop connection, the exit node can also
intercept and modify the exe it sends back.
Are we setting ourselves up for rogue exit relays, or rogue bridges,
that trojan our users?
Answer #1: Users need to do pgp signature checking. Not a very good
answer, a) because it's complex, and b) because they don't know the
right signing keys in the first place.
Answer #2: The mirrors could exit from a specific Tor relay, using the
'.exit' notation. This would make connections a bit more brittle, but
would resolve the rogue exit relay issue. We could even round-robin
among several, and the list could be dynamic -- for example, all the
relays with an Authority flag that allow exits to the Tor website.
Answer #3: The mirrors should connect to the main distribution site
via SSL. That way the exit relay can't influence anything.
Answer #4: We could suggest that users only use trusted bridges for
fetching a copy of Tor. Hopefully they heard about the bridge from a
trusted source rather than from the adversary.
Answer #5: What if the adversary is trawling for Tor downloads by
network signature -- either by looking for known bytes in the binary,
or by looking for "GET /tor/dist/"? It would be nice to encrypt the
connection from the bridge user to the bridge. And we can! The bridge
already supports TLS. Rather than initiating a TLS renegotiation after
connecting to the ORPort, the user should actually request a URL. Then
the ORPort can either pass the connection off as a linked conn to the
dirport, or renegotiate and become a Tor connection, depending on how
the client behaves.
5. Linked connections: at what level should we proxy?
Check out the connection_ap_make_link() function, as called from
directory.c. Tor clients use this to create a "fake" socks connection
back to themselves, and then they attach a directory request to it,
so they can launch directory fetches via Tor. We can piggyback on
this feature.
We need to decide if we're going to be passing the bytes back and
forth between the web browser and the main distribution site, or if
we're going to be actually acting like a proxy (parsing out the file
they want, fetching that file, and serving it back).
Advantages of proxying without looking inside:
- We don't need to build any sort of http support (including
continues, partial fetches, etc etc).
Disadvantages:
- If the browser thinks it's speaking http, are there easy ways
to pass the bytes to an https server and have everything work
correctly? At the least, it would seem that the browser would
complain about the cert. More generally, ssl wants to be negotiated
before the URL and headers are sent, yet we need to read the URL
and headers to know that this is a mirror request; so we have an
ordering problem here.
- Makes it harder to do caching later on, if we don't look at what
we're relaying. (It might be useful down the road to cache the
answers to popular requests, so we don't have to keep getting
them again.)
6. Outstanding problems
1) HTTP proxies already exist. Why waste our time cloning one
badly? When we clone existing stuff, we usually regret it.
2) It's overbroad. We only seem to need a secure get-a-tor feature,
and instead we're contemplating building a locked-down HTTP proxy.
3) It's going to add a fair bit of complexity to our code. We do
not currently implement HTTPS. We'd need to refactor lots of the
low-level connection stuff so that "SSL" and "Cell-based" were no
longer synonymous.
4) It's still unclear how effective this proposal would be in
practice. You need to know that this feature exists, which means
somebody needs to tell you about a bridge (mirror) address and tell
you how to use it. And if they're doing that, they could (e.g.) tell
you about a gmail autoresponder address just as easily, and then you'd
get better authentication of the Tor program to boot.

View File

@ -1,66 +0,0 @@
Filename: 128-bridge-families.txt
Title: Families of private bridges
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 2007-12-xx
Status: Dead
1. Overview
Proposal 125 introduced the basic notion of how bridge authorities,
bridge relays, and bridge users should behave. But it doesn't get into
the various mechanisms of how to distribute bridge relay addresses to
bridge users.
One of the mechanisms we have in mind is called 'families of bridges'.
If a bridge user knows about only one private bridge, and that bridge
shuts off for the night or gets a new dynamic IP address, the bridge
user is out of luck and needs to re-bootstrap manually or wait and
hope it comes back. On the other hand, if the bridge user knows about
a family of bridges, then as long as one of those bridges is still
reachable his Tor client can automatically learn about where the
other bridges have gone.
So in this design, a single volunteer could run multiple coordinated
bridges, or a group of volunteers could each run a bridge. We abstract
out the details of how these volunteers find each other and decide to
set up a family.
2. Other notes.
somebody needs to run a bridge authority
it needs to have a torrc option to publish networkstatuses of its bridges
it should also do reachability testing just of those bridges
people ask for the bridge networkstatus by asking for a url that
contains a password. (it's safe to do this because of begin_dir.)
so the bridge users need to know a) a password, and b) a bridge
authority line.
the bridge users need to know the bridge authority line.
the bridge authority needs to know the password.
3. Current state
I implemented a BridgePassword config option. Bridge authorities
should set it, and users who want to use those bridge authorities
should set it.
Now there is a new directory URL "/tor/networkstatus-bridges" that
directory mirrors serve if BridgeAuthoritativeDir is set and it's a
begin_dir connection. It looks for the header
Authorization: Basic %s
where %s is the base-64 bridge password.
I never got around to teaching clients how to set the header though,
so it may or may not, and may or may not do what we ultimate want.
I've marked this proposal dead; it really never should have left the
ideas/ directory. Somebody should pick it up sometime and finish the
design and implementation.

View File

@ -1,116 +0,0 @@
Filename: 129-reject-plaintext-ports.txt
Title: Block Insecure Protocols by Default
Version: $Revision$
Last-Modified: $Date$
Author: Kevin Bauer & Damon McCoy
Created: 2008-01-15
Status: Closed
Implemented-In: 0.2.0.x
Overview:
Below is a proposal to mitigate insecure protocol use over Tor.
This document 1) demonstrates the extent to which insecure protocols are
currently used within the Tor network, and 2) proposes a simple solution
to prevent users from unknowingly using these insecure protocols. By
insecure, we consider protocols that explicitly leak sensitive user names
and/or passwords, such as POP, IMAP, Telnet, and FTP.
Motivation:
As part of a general study of Tor use in 2006/2007 [1], we attempted to
understand what types of protocols are used over Tor. While we observed a
enormous volume of Web and Peer-to-peer traffic, we were surprised by the
number of insecure protocols that were used over Tor. For example, over an
8 day observation period, we observed the following number of connections
over insecure protocols:
POP and IMAP:10,326 connections
Telnet: 8,401 connections
FTP: 3,788 connections
Each of the above listed protocols exchange user name and password
information in plain-text. As an upper bound, we could have observed
22,515 user names and passwords. This observation echos the reports of
a Tor router logging and posting e-mail passwords in August 2007 [2]. The
response from the Tor community has been to further educate users
about the dangers of using insecure protocols over Tor. However, we
recently repeated our Tor usage study from last year and noticed that the
trend in insecure protocol use has not declined. Therefore, we propose that
additional steps be taken to protect naive Tor users from inadvertently
exposing their identities (and even passwords) over Tor.
Security Implications:
This proposal is intended to improve Tor's security by limiting the
use of insecure protocols.
Roger added: By adding these warnings for only some of the risky
behavior, users may do other risky behavior, not get a warning, and
believe that it is therefore safe. But overall, I think it's better
to warn for some of it than to warn for none of it.
Specification:
As an initial step towards mitigating the use of the above-mentioned
insecure protocols, we propose that the default ports for each respective
insecure service be blocked at the Tor client's socks proxy. These default
ports include:
23 - Telnet
109 - POP2
110 - POP3
143 - IMAP
Notice that FTP is not included in the proposed list of ports to block. This
is because FTP is often used anonymously, i.e., without any identifying
user name or password.
This blocking scheme can be implemented as a set of flags in the client's
torrc configuration file:
BlockInsecureProtocols 0|1
WarnInsecureProtocols 0|1
When the warning flag is activated, a message should be displayed to
the user similar to the message given when Tor's socks proxy is given an IP
address rather than resolving a host name.
We recommend that the default torrc configuration file block insecure
protocols and provide a warning to the user to explain the behavior.
Finally, there are many popular web pages that do not offer secure
login features, such as MySpace, and it would be prudent to provide
additional rules to Privoxy to attempt to protect users from unknowingly
submitting their login credentials in plain-text.
Compatibility:
None, as the proposed changes are to be implemented in the client.
References:
[1] Shining Light in Dark Places: A Study of Anonymous Network Usage.
University of Colorado Technical Report CU-CS-1032-07. August 2007.
[2] Rogue Nodes Turn Tor Anonymizer Into Eavesdropper's Paradise.
http://www.wired.com/politics/security/news/2007/09/embassy_hacks.
Wired. September 10, 2007.
Implementation:
Roger added this feature in
http://archives.seul.org/or/cvs/Jan-2008/msg00182.html
He also added a status event for Vidalia to recognize attempts to use
vulnerable-plaintext ports, so it can help the user understand what's
going on and how to fix it.
Next steps:
a) Vidalia should learn to recognize this controller status event,
so we don't leave users out in the cold when we enable this feature.
b) We should decide which ports to reject by default. The current
consensus is 23,109,110,143 -- the same set that we warn for now.

View File

@ -1,186 +0,0 @@
Filename: 130-v2-conn-protocol.txt
Title: Version 2 Tor connection protocol
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: 2007-10-25
Status: Closed
Implemented-In: 0.2.0.x
Overview:
This proposal describes the significant changes to be made in the v2
Tor connection protocol.
This proposal relates to other proposals as follows:
It refers to and supersedes:
Proposal 124: Blocking resistant TLS certificate usage
It refers to aspects of:
Proposal 105: Version negotiation for the Tor protocol
In summary, The Tor connection protocol has been in need of a redesign
for a while. This proposal describes how we can add to the Tor
protocol:
- A new TLS handshake (to achieve blocking resistance without
breaking backward compatibility)
- Version negotiation (so that future connection protocol changes
can happen without breaking compatibility)
- The actual changes in the v2 Tor connection protocol.
Motivation:
For motivation, see proposal 124.
Proposal:
0. Terminology
The version of the Tor connection protocol implemented up to now is
"version 1". This proposal describes "version 2".
"Old" or "Older" versions of Tor are ones not aware that version 2
of this protocol exists;
"New" or "Newer" versions are ones that are.
The connection initiator is referred to below as the Client; the
connection responder is referred to below as the Server.
1. The revised TLS handshake.
For motivation, see proposal 124. This is a simplified version of the
handshake that uses TLS's renegotiation capability in order to avoid
some of the extraneous steps in proposal 124.
The Client connects to the Server and, as in ordinary TLS, sends a
list of ciphers. Older versions of Tor will send only ciphers from
the list:
TLS_DHE_RSA_WITH_AES_256_CBC_SHA
TLS_DHE_RSA_WITH_AES_128_CBC_SHA
SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA
SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA
Clients that support the revised handshake will send the recommended
list of ciphers from proposal 124, in order to emulate the behavior of
a web browser.
If the server notices that the list of ciphers contains only ciphers
from this list, it proceeds with Tor's version 1 TLS handshake as
documented in tor-spec.txt.
(The server may also notice cipher lists used by other implementations
of the Tor protocol (in particular, the BouncyCastle default cipher
list as used by some Java-based implementations), and whitelist them.)
On the other hand, if the server sees a list of ciphers that could not
have been sent from an older implementation (because it includes other
ciphers, and does not match any known-old list), the server sends a
reply containing a single connection certificate, constructed as for
the link certificate in the v1 Tor protocol. The subject names in
this certificate SHOULD NOT have any strings to identify them as
coming from a Tor server. The server does not ask the client for
certificates.
Old Servers will (mostly) ignore the cipher list and respond as in the v1
protocol, sending back a two-certificate chain.
After the Client gets a response from the server, it checks for the
number of certificates it received. If there are two certificates,
the client assumes a V1 connection and proceeds as in tor-spec.txt.
But if there is only one certificate, the client assumes a V2 or later
protocol and continues.
At this point, the client has established a TLS connection with the
server, but the parties have not been authenticated: the server hasn't
sent its identity certificate, and the client hasn't sent any
certificates at all. To fix this, the client begins a TLS session
renegotiation. This time, the server continues with two certificates
as usual, and asks for certificates so that the client will send
certificates of its own. Because the TLS connection has been
established, all of this is encrypted. (The certificate sent by the
server in the renegotiated connection need not be the same that
as sentin the original connection.)
The server MUST NOT write any data until the client has renegotiated.
Once the renegotiation is finished, the server and client check one
another's certificates as in V1. Now they are mutually authenticated.
1.1. Revised TLS handshake: implementation notes.
It isn't so easy to adjust server behavior based on the client's
ciphersuite list. Here's how we can do it using OpenSSL. This is a
bit of an abuse of the OpenSSL APIs, but it's the best we can do, and
we won't have to do it forever.
We can use OpenSSL's SSL_set_info_callback() to register a function to
be called when the state changes. The type/state tuple of
SSL_CB_ACCEPT_LOOP/SSL3_ST_SW_SRVR_HELLO_A
happens when we have completely parsed the client hello, and are about
to send a response. From this callback, we can check the cipherlist
and act accordingly:
* If the ciphersuite list indicates a v1 protocol, we set the
verify mode to SSL_VERIFY_NONE with a callback (so we get
certificates).
* If the ciphersuite list indicates a v2 protocol, we set the
verify mode to SSL_VERIFY_NONE with no callback (so we get
no certificates) and set the SSL_MODE_NO_AUTO_CHAIN flag (so that
we send only 1 certificate in the response.
Once the handshake is done, the server clears the
SSL_MODE_NO_AUTO_CHAIN flag and sets the callback as for the V1
protocol. It then starts reading.
The other problem to take care of is missing ciphers and OpenSSL's
cipher sorting algorithms. The two main issues are a) OpenSSL doesn't
support some of the default ciphers that Firefox advertises, and b)
OpenSSL sorts the list of ciphers it offers in a different way than
Firefox sorts them, so unless we fix that Tor will still look different
than Firefox.
[XXXX more on this.]
1.2. Compatibility for clients using libraries less hackable than OpenSSL.
As discussed in proposal 105, servers advertise which protocol
versions they support in their router descriptors. Clients can simply
behave as v1 clients when connecting to servers that do not support
link version 2 or higher, and as v2 clients when connecting to servers
that do support link version 2 or higher.
(Servers can't use this strategy because we do not assume that servers
know one another's capabilities when connecting.)
2. Version negotiation.
Version negotiation proceeds as described in proposal 105, except as
follows:
* Version negotiation only happens if the TLS handshake as described
above completes.
* The TLS renegotiation must be finished before the client sends a
VERSIONS cell; the server sends its VERSIONS cell in response.
* The VERSIONS cell uses the following variable-width format:
Circuit [2 octets; set to 0]
Command [1 octet; set to 7 for VERSIONS]
Length [2 octets; big-endian]
Data [Length bytes]
The Data in the cell is a series of big-endian two-byte integers.
* It is not allowed to negotiate V1 conections once the v2 protocol
has been used. If this happens, Tor instances should close the
connection.
3. The rest of the "v2" protocol
Once a v2 protocol has been negotiated, NETINFO cells are exchanged
as in proposal 105, and communications begin as per tor-spec.txt.
Until NETINFO cells have been exchanged, the connection is not open.

View File

@ -1,150 +0,0 @@
Filename: 131-verify-tor-usage.txt
Title: Help users to verify they are using Tor
Version: $Revision$
Last-Modified: $Date$
Author: Steven J. Murdoch
Created: 2008-01-25
Status: Needs-Revision
Overview:
Websites for checking whether a user is accessing them via Tor are a
very helpful aid to configuring web browsers correctly. Existing
solutions have both false positives and false negatives when
checking if Tor is being used. This proposal will discuss how to
modify Tor so as to make testing more reliable.
Motivation:
Currently deployed websites for detecting Tor use work by comparing
the client IP address for a request with a list of known Tor nodes.
This approach is generally effective, but suffers from both false
positives and false negatives.
If a user has a Tor exit node installed, or just happens to have
been allocated an IP address previously used by a Tor exit node, any
web requests will be incorrectly flagged as coming from Tor. If any
customer of an ISP which implements a transparent proxy runs an exit
node, all other users of the ISP will be flagged as Tor users.
Conversely, if the exit node chosen by a Tor user has not yet been
recorded by the Tor checking website, requests will be incorrectly
flagged as not coming via Tor.
The only reliable way to tell whether Tor is being used or not is for
the Tor client to flag this to the browser.
Proposal:
A DNS name should be registered and point to an IP address
controlled by the Tor project and likely to remain so for the
useful lifetime of a Tor client. A web server should be placed
at this IP address.
Tor should be modified to treat requests to port 80, at the
specified DNS name or IP address specially. Instead of opening a
circuit, it should respond to a HTTP request with a helpful web
page:
- If the request to open a connection was to the domain name, the web
page should state that Tor is working properly.
- If the request was to the IP address, the web page should state
that there is a DNS-leakage vulnerability.
If the request goes through to the real web server, the page
should state that Tor has not been set up properly.
Extensions:
Identifying proxy server:
If needed, other applications between the web browser and Tor (e.g.
Polipo and Privoxy) could piggyback on the same mechanism to flag
whether they are in use. All three possible web pages should include
a machine-readable placeholder, into which another program could
insert their own message.
For example, the webpage returned by Tor to indicate a successful
configuration could include the following HTML:
<h2>Connection chain</h2>
<ul>
<li>Tor 0.1.2.14-alpha</li>
<!-- Tor Connectivity Check: success -->
</ul>
When the proxy server observes this string, in response to a request
for the Tor connectivity check web page, it would prepend it's own
message, resulting in the following being returned to the web
browser:
<h2>Connection chain
<ul>
<li>Tor 0.1.2.14-alpha</li>
<li>Polipo version 1.0.4</li>
<!-- Tor Connectivity Check: success -->
</ul>
Checking external connectivity:
If Tor intercepts a request, and returns a response itself, the user
will not actually confirm whether Tor is able to build a successful
circuit. It may then be advantageous to include an image in the web
page which is loaded from a different domain. If this is able to be
loaded then the user will know that external connectivity through
Tor works.
Automatic Firefox Notification:
All forms of the website should return valid XHTML and have a
hidden link with an id attribute "TorCheckResult" and a target
property that can be queried to determine the result. For example,
a hidden link would convey success like this:
<a id="TorCheckResult" target="success" href="/"></a>
failure like this:
<a id="TorCheckResult" target="failure" href="/"></a>
and DNS leaks like this:
<a id="TorCheckResult" target="dnsleak" href="/"></a>
Firefox extensions such as Torbutton would then be able to
issue an XMLHttpRequest for the page and query the result
with resultXML.getElementById("TorCheckResult").target
to automatically report the Tor status to the user when
they first attempt to enable Tor activity, or whenever
they request a check from the extension preferences window.
If the check website is to be themed with heavy graphics and/or
extensive documentation, the check result itself should be
contained in a seperate lightweight iframe that extensions can
request via an alternate url.
Security and resiliency implications:
What attacks are possible?
If the IP address used for this feature moves there will be two
consequences:
- A new website at this IP address will remain inaccessible over
Tor
- Tor users who are leaking DNS will be informed that Tor is not
working, rather than that it is active but leaking DNS
We should thus attempt to find an IP address which we reasonably
believe can remain static.
Open issues:
If a Tor version which does not support this extra feature is used,
the webpage returned will indicate that Tor is not being used. Can
this be safely fixed?
Related work:
The proposed mechanism is very similar to config.privoxy.org. The
most significant difference is that if the web browser is
misconfigured, Tor will only get an IP address. Even in this case,
Tor should be able to respond with a webpage to notify the user of how
to fix the problem. This also implies that Tor must be told of the
special IP address, and so must be effectively permanent.

View File

@ -1,147 +0,0 @@
Filename: 132-browser-check-tor-service.txt
Title: A Tor Web Service For Verifying Correct Browser Configuration
Version: $Revision$
Last-Modified: $Date$
Author: Robert Hogan
Created: 2008-03-08
Status: Draft
Overview:
Tor should operate a primitive web service on the loopback network device
that tests the operation of user's browser, privacy proxy and Tor client.
The tests are performed by serving unique, randomly generated elements in
image URLs embedded in static HTML. The images are only displayed if the DNS
and HTTP requests for them are routed through Tor, otherwise the 'alt' text
may be displayed. The proposal assumes that 'alt' text is not displayed on
all browsers so suggests that text and links should accompany each image
advising the user on next steps in case the test fails.
The service is primarily for the use of controllers, since presumably users
aren't going to want to edit text files and then type something exotic like
127.0.0.1:9999 into their address bar. In the main use case the controller
will have configured the actual port for the webservice so will know where
to direct the request. It would also be the responsibility of the controller
to ensure the webservice is available, and tor is running, before allowing
the user to access the page through their browser.
Motivation:
This is a complementary approach to proposal 131. It overcomes some of the
limitations of the approach described in proposal 131: reliance
on a permanent, real IP address and compatibility with older versions of
Tor. Unlike 131, it is not as useful to Tor users who are not running a
controller.
Objective:
Provide a reliable means of helping users to determine if their Tor
installation, privacy proxy and browser are properly configured for
anonymous browsing.
Proposal:
When configured to do so, Tor should run a basic web service available
on a configured port on 127.0.0.1. The purpose of this web service is to
serve a number of basic test images that will allow the user to determine
if their browser is properly configured and that Tor is working normally.
The service can consist of a single web page with two columns. The left
column contains images, the right column contains advice on what the
display/non-display of the column means.
The rest of this proposal assumes that the service is running on port
9999. The port should be configurable, and configuring the port enables the
service. The service must run on 127.0.0.1.
In all the examples below [uniquesessionid] refers to a random, base64
encoded string that is unique to the URL it is contained in. Tor only ever
stores the most recently generated [uniquesessionid] for each URL, storing 3
in total. Tor should generate a [uniquesessionid] for each of the test URLs
below every time a HTTP GET is received at 127.0.0.1:9999 for index.htm.
The most suitable image for each test case is an implementation decision.
Tor will need to store and serve images for the first and second test
images, and possibly the third (see 'Open Issues').
1. DNS Request Test Image
This is a HTML element embedded in the page served by Tor at
http://127.0.0.1:9999:
<IMG src="http://[uniquesessionid]:9999/torlogo.jpg" alt="If you can see
this text, your browser's DNS requests are not being routed through Tor."
width="200" height="200" align="middle" border="2">
If the browser's DNS request for [uniquesessionid] is routed through Tor,
Tor will intercept the request and return 127.0.0.1 as the resolved IP
address. This will shortly be followed by a HTTP request from the browser
for http://127.0.0.1:9999/torlogo.jpg. This request should be served with
the appropriate image.
If the browser's DNS request for [uniquesessionid] is not routed through Tor
the browser may display the 'alt' text specified in the html element. The
HTML served by Tor should also contain text accompanying the image to advise
users what it means if they do not see an image. It should also provide a
link to click that provides information on how to remedy the problem. This
behaviour also applies to the images described in 2. and 3. below, so should
be assumed there as well.
2. Proxy Configuration Test Image
This is a HTML element embedded in the page served by Tor at
http://127.0.0.1:9999:
<IMG src="http://torproject.org/[uniquesessionid].jpg" alt="If you can see
this text, your browser is not configured to work with Tor." width="200"
height="200" align="middle" border="2">
If the HTTP request for the resource [uniquesessionid].jpg is received by
Tor it will serve the appropriate image in response. It should serve this
image itself, without attempting to retrieve anything from the Internet.
If Tor can identify the name of the proxy application requesting the
resource then it could store and serve an image identifying the proxy to the
user.
3. Tor Connectivity Test Image
This is a HTML element embedded in the page served by Tor at
http://127.0.0.1:9999:
<IMG src="http://torproject.org/[uniquesessionid]-torlogo.jpg" alt="If you
can see this text, your Tor installation cannot connect to the Internet."
width="200" height="200" align="middle" border="2">
The referenced image should actually exist on the Tor project website. If
Tor receives the request for the above resource it should remove the random
base64 encoded digest from the request (i.e. [uniquesessionid]-) and attempt
to retrieve the real image.
Even on a fully operational Tor client this test may not always succeed. The
user should be advised that one or more attempts to retrieve this image may
be necessary to confirm a genuine problem.
Open Issues:
The final connectivity test relies on an externally maintained resource, if
this resource becomes unavailable the connectivity test will always fail.
Either the text accompanying the test should advise of this possibility or
Tor clients should be advised of the location of the test resource in the
main network directory listings.
Any number of misconfigurations may make the web service unreachable, it is
the responsibility of the user's controller to recognize these and assist
the user in eliminating them. Tor can mitigate against the specific
misconfiguration of routing HTTP traffic to 127.0.0.1 to Tor itself by
serving such requests through the SOCKS port as well as the configured web
service report.
Now Tor is inspecting the URLs requested on its SOCKS port and 'dropping'
them. It already inspects for raw IP addresses (to warn of DNS leaks) but
maybe the behaviour proposed here is qualitatively different. Maybe this is
an unwelcome precedent that can be used to beat the project over the head in
future. Or maybe it's not such a bad thing, Tor is merely attempting to make
normally invalid resource requests valid for a given purpose.

View File

@ -1,128 +0,0 @@
Filename: 133-unreachable-ors.txt
Title: Incorporate Unreachable ORs into the Tor Network
Author: Robert Hogan
Created: 2008-03-08
Status: Draft
Overview:
Propose a scheme for harnessing the bandwidth of ORs who cannot currently
participate in the Tor network because they can only make outbound
TCP connections.
Motivation:
Restrictive local and remote firewalls are preventing many willing
candidates from becoming ORs on the Tor network.These
ORs have a casual interest in joining the network but their operator is not
sufficiently motivated or adept to complete the necessary router or firewall
configuration. The Tor network is losing out on their bandwidth. At the
moment we don't even know how many such 'candidate' ORs there are.
Objective:
1. Establish how many ORs are unable to qualify for publication because
they cannot establish that their ORPort is reachable.
2. Devise a method for making such ORs available to clients for circuit
building without prejudicing their anonymity.
Proposal:
ORs whose ORPort reachability testing fails a specified number of
consecutive times should:
1. Enlist themselves with the authorities setting a 'Fallback' flag. This
flag indicates that the OR is up and running but cannot connect to
itself.
2. Open an orconn with all ORs whose fingerprint begins with the same
byte as their own. The management of this orconn will be transferred
entirely to the OR at the other end.
2. The fallback OR should update it's router status to contain the
'Running' flag if it has managed to open an orconn with 3/4 of the ORs
with an FP beginning with the same byte as its own.
Tor ORs who are contacted by fallback ORs requesting an orconn should:
1. Accept the orconn until they have reached a defined limit of orconn
connections with fallback ORs.
2. Should only accept such orconn requests from listed fallback ORs who
have an FP beginning with the same byte as its own.
Tor clients can include fallback ORs in the network by doing the
following:
1. When building a circuit, observe the fingerprint of each node they
wish to connect to.
2. When randomly selecting a node from the set of all eligible nodes,
add all published, running fallback nodes to the set where the first
byte of the fingerprint matches the previous node in the circuit.
Anonymity Implications:
At least some, and possibly all, nodes on the network will have a set
of nodes that only they and a few others can build circuits on.
1. This means that fallback ORs might be unsuitable for use as middlemen
nodes, because if the exit node is the attacker it knows that the
number of nodes that could be the entry guard in the circuit is
reduced to roughly 1/256th of the network, or worse 1/256th of all
nodes listed as Guards. For the same reason, fallback nodes would
appear to be unsuitable for two-hop circuits.
2. This is not a problem if fallback ORs are always exit nodes. If
the fallback OR is an attacker it will not be able to reduce the
set of possible nodes for the entry guard any further than a normal,
published OR.
Possible Attacks/Open Issues:
1. Gaming Node Selection
Does running a fallback OR customized for a specific set of published ORs
improve an attacker's chances of seeing traffic from that set of published
ORs? Would such a strategy be any more effective than running published
ORs with other 'attractive' properties?
2. DOS Attack
An attacker could prevent all other legitimate fallback ORs with a
given byte-1 in their FP from functioning by running 20 or 30 fallback ORs
and monopolizing all available fallback slots on the published ORs.
This same attacker would then be in a position to monopolize all the
traffic of the fallback ORs on that byte-1 network segment. I'm not sure
what this would allow such an attacker to do.
4. Circuit-Sniffing
An observer watching exit traffic from a fallback server will know that the
previous node in the circuit is one of a very small, identifiable
subset of the total ORs in the network. To establish the full path of the
circuit they would only have to watch the exit traffic from the fallback
OR and all the traffic from the 20 or 30 ORs it is likely to be connected
to. This means it is substantially easier to establish all members of a
circuit which has a fallback OR as an exit (sniff and analyse 10-50 (i.e.
1/256 varying) + 1 ORs) rather than a normal published OR (sniff all 2560
or so ORs on the network). The same mechanism that allows the client to
expect a specific fallback OR to be available from a specific published OR
allows an attacker to prepare his ground.
Mitigant:
In terms of the resources and access required to monitor 2000 to 3000
nodes, the effort of the adversary is not significantly diminished when he
is only interested in 20 or 30. It is hard to see how an adversary who can
obtain access to a randomly selected portion of the Tor network would face
any new or qualitatively different obstacles in attempting to access much
of the rest of it.
Implementation Issues:
The number of ORs this proposal would add to the Tor network is not known.
This is because there is no mechanism at present for recording unsuccessful
attempts to become an OR. If the proposal is considered promising it may be
worthwhile to issue an alpha series release where candidate ORs post a
primitive fallback descriptor to the authority directories. This fallback
descriptor would not contain any other flag that would make it eligible for
selection by clients. It would act solely as a means of sizing the number of
Tor instances that try and fail to become ORs.
The upper limit on the number of orconns from fallback ORs a normal,
published OR should be willing to accept is an open question. Is one
hundred, mostly idle, such orconns too onerous?

View File

@ -1,105 +0,0 @@
Filename: 134-robust-voting.txt
Title: More robust consensus voting with diverse authority sets
Author: Peter Palfrader
Created: 2008-04-01
Status: Accepted
Target: 0.2.2.x
Overview:
A means to arrive at a valid directory consensus even when voters
disagree on who is an authority.
Motivation:
Right now there are about five authoritative directory servers in the
Tor network, tho this number is expected to rise to about 15 eventually.
Adding a new authority requires synchronized action from all operators of
directory authorities so that at any time during the update at least half of
all authorities are running and agree on who is an authority. The latter
requirement is there so that the authorities can arrive at a common
consensus: Each authority builds the consensus based on the votes from
all authorities it recognizes, and so a different set of recognized
authorities will lead to a different consensus document.
Objective:
The modified voting procedure outlined in this proposal obsoletes the
requirement for most authorities to exactly agree on the list of
authorities.
Proposal:
The vote document each authority generates contains a list of
authorities recognized by the generating authority. This will be
a list of authority identity fingerprints.
Authorities will accept votes from and serve/mirror votes also for
authorities they do not recognize. (Votes contain the signing,
authority key, and the certificate linking them so they can be
verified even without knowing the authority beforehand.)
Before building the consensus we will check which votes to use for
building:
1) We build a directed graph of which authority/vote recognizes
whom.
2) (Parts of the graph that aren't reachable, directly or
indirectly, from any authorities we recognize can be discarded
immediately.)
3) We find the largest fully connected subgraph.
(Should there be more than one subgraph of the same size there
needs to be some arbitrary ordering so we always pick the same.
E.g. pick the one who has the smaller (XOR of all votes' digests)
or something.)
4) If we are part of that subgraph, great. This is the list of
votes we build our consensus with.
5) If we are not part of that subgraph, remove all the nodes that
are part of it and go to 3.
Using this procedure authorities that are updated to recognize a
new authority will continue voting with the old group until a
sufficient number has been updated to arrive at a consensus with
the recently added authority.
In fact, the old set of authorities will probably be voting among
themselves until all but one has been updated to recognize the
new authority. Then which set of votes is used for consensus
building depends on which of the two equally large sets gets
ordered before the other in step (3) above.
It is necessary to continue with the process in (5) even if we
are not in the largest subgraph. Otherwise one rogue authority
could create a number of extra votes (by new authorities) so that
everybody stops at 5 and no consensus is built, even tho it would
be trusted by all clients.
Anonymity Implications:
The author does not believe this proposal to have anonymity
implications.
Possible Attacks/Open Issues/Some thinking required:
Q: Can a number (less or exactly half) of the authorities cause an honest
authority to vote for "their" consensus rather than the one that would
result were all authorities taken into account?
Q: Can a set of votes from external authorities, i.e of whom we trust either
none or at least not all, cause us to change the set of consensus makers we
pick?
A: Yes, if other authorities decide they rather build a consensus with them
then they'll be thrown out in step 3. But that's ok since those other
authorities will never vote with us anyway.
If we trust none of them then we throw them out even sooner, so no harm done.
Q: Can this ever force us to build a consensus with authorities we do not
recognize?
A: No, we can never build a fully connected set with them in step 3.

View File

@ -1,283 +0,0 @@
Filename: 135-private-tor-networks.txt
Title: Simplify Configuration of Private Tor Networks
Version: $Revision$
Last-Modified: $Date$
Author: Karsten Loesing
Created: 29-Apr-2008
Status: Closed
Target: 0.2.1.x
Implemented-In: 0.2.1.2-alpha
Change history:
29-Apr-2008 Initial proposal for or-dev
19-May-2008 Included changes based on comments by Nick to or-dev and
added a section for test cases.
18-Jun-2008 Changed testing-network-only configuration option names.
Overview:
Configuring a private Tor network has become a time-consuming and
error-prone task with the introduction of the v3 directory protocol. In
addition to that, operators of private Tor networks need to set an
increasing number of non-trivial configuration options, and it is hard
to keep FAQ entries describing this task up-to-date. In this proposal we
(1) suggest to (optionally) accelerate timing of the v3 directory voting
process and (2) introduce an umbrella config option specifically aimed at
creating private Tor networks.
Design:
1. Accelerate Timing of v3 Directory Voting Process
Tor has reasonable defaults for setting up a large, Internet-scale
network with comparably high latencies and possibly wrong server clocks.
However, those defaults are bad when it comes to quickly setting up a
private Tor network for testing, either on a single node or LAN (things
might be different when creating a test network on PlanetLab or
something). Some time constraints should be made configurable for private
networks. The general idea is to accelerate everything that has to do
with propagation of directory information, but nothing else, so that a
private network is available as soon as possible. (As a possible
safeguard, changing these configuration values could be made dependent on
the umbrella configuration option introduced in 2.)
1.1. Initial Voting Schedule
When a v3 directory does not know any consensus, it assumes an initial,
hard-coded VotingInterval of 30 minutes, VoteDelay of 5 minutes, and
DistDelay of 5 minutes. This is important for multiple, simultaneously
restarted directory authorities to meet at a common time and create an
initial consensus. Unfortunately, this means that it may take up to half
an hour (or even more) for a private Tor network to bootstrap.
We propose to make these three time constants configurable (note that
V3AuthVotingInterval, V3AuthVoteDelay, and V3AuthDistDelay do not have an
effect on the _initial_ voting schedule, but only on the schedule that a
directory authority votes for). This can be achieved by introducing three
new configuration options: TestingV3AuthInitialVotingInterval,
TestingV3AuthInitialVoteDelay, and TestingV3AuthInitialDistDelay.
As first safeguards, Tor should only accept configuration values for
TestingV3AuthInitialVotingInterval that divide evenly into the default
value of 30 minutes. The effect is that even if people misconfigured
their directory authorities, they would meet at the default values at the
latest. The second safeguard is to allow configuration only when the
umbrella configuration option TestingTorNetwork is set.
1.2. Immediately Provide Reachability Information (Running flag)
The default behavior of a directory authority is to provide the Running
flag only after the authority is available for at least 30 minutes. The
rationale is that before that time, an authority simply cannot deliver
useful information about other running nodes. But for private Tor
networks this may be different. This is currently implemented in the code
as:
/** If we've been around for less than this amount of time, our
* reachability information is not accurate. */
#define DIRSERV_TIME_TO_GET_REACHABILITY_INFO (30*60)
There should be another configuration option
TestingAuthDirTimeToLearnReachability with a default value of 30 minutes
that can be changed when running testing Tor networks, e.g. to 0 minutes.
The configuration value would simply replace the quoted constant. Again,
changing this option could be safeguarded by requiring the umbrella
configuration option TestingTorNetwork to be set.
1.3. Reduce Estimated Descriptor Propagation Time
Tor currently assumes that it takes up to 10 minutes until router
descriptors are propagated from the authorities to directory caches.
This is not very useful for private Tor networks, and we want to be able
to reduce this time, so that clients can download router descriptors in a
timely manner.
/** Clients don't download any descriptor this recent, since it will
* probably not have propagated to enough caches. */
#define ESTIMATED_PROPAGATION_TIME (10*60)
We suggest to introduce a new config option
TestingEstimatedDescriptorPropagationTime which defaults to 10 minutes,
but that can be set to any lower non-negative value, e.g. 0 minutes. The
same safeguards as in 1.2 could be used here, too.
2. Umbrella Option for Setting Up Private Tor Networks
Setting up a private Tor network requires a number of specific settings
that are not required or useful when running Tor in the public Tor
network. Instead of writing down these options in a FAQ entry, there
should be a single configuration option, e.g. TestingTorNetwork, that
changes all required settings at once. Newer Tor versions would keep the
set of configuration options up-to-date. It should still remain possible
to manually overwrite the settings that the umbrella configuration option
affects.
The following configuration options are set by TestingTorNetwork:
- ServerDNSAllowBrokenResolvConf 1
Ignore the situation that private relays are not aware of any name
servers.
- DirAllowPrivateAddresses 1
Allow router descriptors containing private IP addresses.
- EnforceDistinctSubnets 0
Permit building circuits with relays in the same subnet.
- AssumeReachable 1
Omit self-testing for reachability.
- AuthDirMaxServersPerAddr 0
- AuthDirMaxServersPerAuthAddr 0
Permit an unlimited number of nodes on the same IP address.
- ClientDNSRejectInternalAddresses 0
Believe in DNS responses resolving to private IP addresses.
- ExitPolicyRejectPrivate 0
Allow exiting to private IP addresses. (This one is a matter of
taste---it might be dangerous to make this a default in a private
network, although people setting up private Tor networks should know
what they are doing.)
- V3AuthVotingInterval 5 minutes
- V3AuthVoteDelay 20 seconds
- V3AuthDistDelay 20 seconds
Accelerate voting schedule after first consensus has been reached.
- TestingV3AuthInitialVotingInterval 5 minutes
- TestingV3AuthInitialVoteDelay 20 seconds
- TestingV3AuthInitialDistDelay 20 seconds
Accelerate initial voting schedule until first consensus is reached.
- TestingAuthDirTimeToLearnReachability 0 minutes
Consider routers as Running from the start of running an authority.
- TestingEstimatedDescriptorPropagationTime 0 minutes
Clients try downloading router descriptors from directory caches,
even when they are not 10 minutes old.
In addition to changing the defaults for these configuration options,
TestingTorNetwork can only be set when a user has manually configured
DirServer lines.
Test:
The implementation of this proposal must pass the following tests:
1. Set TestingTorNetwork and see if dependent configuration options are
correctly changed.
tor DataDirectory . ControlPort 9051 TestingTorNetwork 1 DirServer \
"mydir 127.0.0.1:1234 0000000000000000000000000000000000000000"
telnet 127.0.0.1 9051
AUTHENTICATE
GETCONF TestingTorNetwork TestingAuthDirTimeToLearnReachability
250-TestingTorNetwork=1
250 TestingAuthDirTimeToLearnReachability=0
QUIT
2. Set TestingTorNetwork and a dependent configuration value to see if
the provided value is used for the dependent option.
tor DataDirectory . ControlPort 9051 TestingTorNetwork 1 DirServer \
"mydir 127.0.0.1:1234 0000000000000000000000000000000000000000" \
TestingAuthDirTimeToLearnReachability 5
telnet 127.0.0.1 9051
AUTHENTICATE
GETCONF TestingTorNetwork TestingAuthDirTimeToLearnReachability
250-TestingTorNetwork=1
250 TestingAuthDirTimeToLearnReachability=5
QUIT
3. Start with TestingTorNetwork set and change a dependent configuration
option later on.
tor DataDirectory . ControlPort 9051 TestingTorNetwork 1 DirServer \
"mydir 127.0.0.1:1234 0000000000000000000000000000000000000000"
telnet 127.0.0.1 9051
AUTHENTICATE
SETCONF TestingAuthDirTimeToLearnReachability=5
GETCONF TestingAuthDirTimeToLearnReachability
250 TestingAuthDirTimeToLearnReachability=5
QUIT
4. Start with TestingTorNetwork set and a dependent configuration value,
and reset that dependent configuration value. The result should be
the testing-network specific default value.
tor DataDirectory . ControlPort 9051 TestingTorNetwork 1 DirServer \
"mydir 127.0.0.1:1234 0000000000000000000000000000000000000000" \
TestingAuthDirTimeToLearnReachability 5
telnet 127.0.0.1 9051
AUTHENTICATE
GETCONF TestingAuthDirTimeToLearnReachability
250 TestingAuthDirTimeToLearnReachability=5
RESETCONF TestingAuthDirTimeToLearnReachability
GETCONF TestingAuthDirTimeToLearnReachability
250 TestingAuthDirTimeToLearnReachability=0
QUIT
5. Leave TestingTorNetwork unset and check if dependent configuration
options are left unchanged.
tor DataDirectory . ControlPort 9051 DirServer \
"mydir 127.0.0.1:1234 0000000000000000000000000000000000000000"
telnet 127.0.0.1 9051
AUTHENTICATE
GETCONF TestingTorNetwork TestingAuthDirTimeToLearnReachability
250-TestingTorNetwork=0
250 TestingAuthDirTimeToLearnReachability=1800
QUIT
6. Leave TestingTorNetwork unset, but set dependent configuration option
which should fail.
tor DataDirectory . ControlPort 9051 DirServer \
"mydir 127.0.0.1:1234 0000000000000000000000000000000000000000" \
TestingAuthDirTimeToLearnReachability 0
[warn] Failed to parse/validate config:
TestingAuthDirTimeToLearnReachability may only be changed in testing
Tor networks!
7. Start with TestingTorNetwork unset and change dependent configuration
option later on which should fail.
tor DataDirectory . ControlPort 9051 DirServer \
"mydir 127.0.0.1:1234 0000000000000000000000000000000000000000"
telnet 127.0.0.1 9051
AUTHENTICATE
SETCONF TestingAuthDirTimeToLearnReachability=0
513 Unacceptable option value: TestingAuthDirTimeToLearnReachability
may only be changed in testing Tor networks!
8. Start with TestingTorNetwork unset and set it later on which should
fail.
tor DataDirectory . ControlPort 9051 DirServer \
"mydir 127.0.0.1:1234 0000000000000000000000000000000000000000"
telnet 127.0.0.1 9051
AUTHENTICATE
SETCONF TestingTorNetwork=1
553 Transition not allowed: While Tor is running, changing
TestingTorNetwork is not allowed.
9. Start with TestingTorNetwork set and unset it later on which should
fail.
tor DataDirectory . ControlPort 9051 TestingTorNetwork 1 DirServer \
"mydir 127.0.0.1:1234 0000000000000000000000000000000000000000"
telnet 127.0.0.1 9051
AUTHENTICATE
RESETCONF TestingTorNetwork
513 Unacceptable option value: TestingV3AuthInitialVotingInterval may
only be changed in testing Tor networks!
10. Set TestingTorNetwork, but do not provide an alternate DirServer
which should fail.
tor DataDirectory . ControlPort 9051 TestingTorNetwork 1
[warn] Failed to parse/validate config: TestingTorNetwork may only be
configured in combination with a non-default set of DirServers.

View File

@ -1,100 +0,0 @@
Filename: 136-legacy-keys.txt
Title: Mass authority migration with legacy keys
Author: Nick Mathewson
Created: 13-May-2008
Status: Closed
Implemented-In: 0.2.0.x
Overview:
This document describes a mechanism to change the keys of more than
half of the directory servers at once without breaking old clients
and caches immediately.
Motivation:
If a single authority's identity key is believed to be compromised,
the solution is obvious: remove that authority from the list,
generate a new certificate, and treat the new cert as belonging to a
new authority. This approach works fine so long as less than 1/2 of
the authority identity keys are bad.
Unfortunately, the mass-compromise case is possible if there is a
sufficiently bad bug in Tor or in any OS used by a majority of v3
authorities. Let's be prepared for it!
We could simply stop using the old keys and start using new ones,
and tell all clients running insecure versions to upgrade.
Unfortunately, this breaks our cacheing system pretty badly, since
caches won't cache a consensus that they don't believe in. It would
be nice to have everybody become secure the moment they upgrade to a
version listing the new authority keys, _without_ breaking upgraded
clients until the caches upgrade.
So, let's come up with a way to provide a time window where the
consensuses are signed with the new keys and with the old.
Design:
We allow directory authorities to list a single "legacy key"
fingerprint in their votes. Each authority may add a single legacy
key. The format for this line is:
legacy-dir-key FINGERPRINT
We describe a new consensus method for generating directory
consensuses. This method is consensus method "3".
When the authorities decide to use method "3" (as described in 3.4.1
of dir-spec.txt), for every included vote with a legacy-dir-key line,
the consensus includes an extra dir-source line. The fingerprint in
this extra line is as in the legacy-dir-key line. The ports and
addresses are in the dir-source line. The nickname is as in the
dir-source line, with the string "-legacy" appended.
[We need to include this new dir-source line because the code
won't accept or preserve signatures from authorities not listed
as contributing to the consensus.]
Authorities using legacy dir keys include two signatures on their
consensuses: one generated with a signing key signed with their real
signing key, and another generated with a signing key signed with
another signing key attested to by their identity key. These
signing keys MUST be different. Authorities MUST serve both
certificates if asked.
Process:
In the event of a mass key failure, we'll follow the following
(ugly) procedure:
- All affected authorities generate new certificates and identity
keys, and circulate their new dirserver lines. They copy their old
certificates and old broken keys, but put them in new "legacy
key files".
- At the earliest time that can be arranged, the authorities
replace their signing keys, identity keys, and certificates
with the new uncompromised versions, and update to the new list
of dirserer lines.
- They add an "V3DirAdvertiseLegacyKey 1" option to their torrc.
- Now, new consensuses will be generated using the new keys, but
the results will also be signed with the old keys.
- Clients and caches are told they need to upgrade, and given a
time window to do so.
- At the end of the time window, authorities remove the
V3DirAdvertiseLegacyKey option.
Notes:
It might be good to get caches to cache consensuses that they do not
believe in. I'm not sure the best way of how to do this.
It's a superficially neat idea to have new signing keys and have
them signed by the new and by the old authority identity keys. This
breaks some code, though, and doesn't actually gain us anything,
since we'd still need to include each signature twice.
It's also a superficially neat idea, if identity keys and signing
keys are compromised, to at least replace all the signing keys.
I don't think this achieves us anything either, though.

View File

@ -1,237 +0,0 @@
Filename: 137-bootstrap-phases.txt
Title: Keep controllers informed as Tor bootstraps
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 07-Jun-2008
Status: Closed
Implemented-In: 0.2.1.x
1. Overview.
Tor has many steps to bootstrapping directory information and
initial circuits, but from the controller's perspective we just have
a coarse-grained "CIRCUIT_ESTABLISHED" status event. Tor users with
slow connections or with connectivity problems can wait a long time
staring at the yellow onion, wondering if it will ever change color.
This proposal describes a new client status event so Tor can give
more details to the controller. Section 2 describes the changes to the
controller protocol; Section 3 describes Tor's internal bootstrapping
phases when everything is going correctly; Section 4 describes when
Tor detects a problem and issues a bootstrap warning; Section 5 covers
suggestions for how controllers should display the results.
2. Controller event syntax.
The generic status event is:
"650" SP StatusType SP StatusSeverity SP StatusAction
[SP StatusArguments] CRLF
So in this case we send
650 STATUS_CLIENT NOTICE/WARN BOOTSTRAP \
PROGRESS=num TAG=Keyword SUMMARY=String \
[WARNING=String REASON=Keyword COUNT=num RECOMMENDATION=Keyword]
The arguments MAY appear in any order. Controllers MUST accept unrecognized
arguments.
"Progress" gives a number between 0 and 100 for how far through
the bootstrapping process we are. "Summary" is a string that can be
displayed to the user to describe the *next* task that Tor will tackle,
i.e., the task it is working on after sending the status event. "Tag"
is an optional string that controllers can use to recognize bootstrap
phases from Section 3, if they want to do something smarter than just
blindly displaying the summary string.
The severity describes whether this is a normal bootstrap phase
(severity notice) or an indication of a bootstrapping problem
(severity warn). If severity warn, it should also include a "warning"
argument string with any hints Tor has to offer about why it's having
troubles bootstrapping, a "reason" string that lists one of the reasons
allowed in the ORConn event, a "count" number that tells how many
bootstrap problems there have been so far at this phase, and a
"recommendation" keyword to indicate how the controller ought to react.
3. The bootstrap phases.
This section describes the various phases currently reported by
Tor. Controllers should not assume that the percentages and tags listed
here will continue to match up, or even that the tags will stay in
the same order. Some phases might also be skipped (not reported) if the
associated bootstrap step is already complete, or if the phase no longer
is necessary. Only "starting" and "done" are guaranteed to exist in all
future versions.
Current Tor versions enter these phases in order, monotonically;
future Tors MAY revisit earlier stages.
Phase 0:
tag=starting summary="starting"
Tor starts out in this phase.
Phase 5:
tag=conn_dir summary="Connecting to directory mirror"
Tor sends this event as soon as Tor has chosen a directory mirror ---
one of the authorities if bootstrapping for the first time or after
a long downtime, or one of the relays listed in its cached directory
information otherwise.
Tor will stay at this phase until it has successfully established
a TCP connection with some directory mirror. Problems in this phase
generally happen because Tor doesn't have a network connection, or
because the local firewall is dropping SYN packets.
Phase 10
tag=handshake_dir summary="Finishing handshake with directory mirror"
This event occurs when Tor establishes a TCP connection with a relay used
as a directory mirror (or its https proxy if it's using one). Tor remains
in this phase until the TLS handshake with the relay is finished.
Problems in this phase generally happen because Tor's firewall is
doing more sophisticated MITM attacks on it, or doing packet-level
keyword recognition of Tor's handshake.
Phase 15:
tag=onehop_create summary="Establishing one-hop circuit for dir info"
Once TLS is finished with a relay, Tor will send a CREATE_FAST cell
to establish a one-hop circuit for retrieving directory information.
It will remain in this phase until it receives the CREATED_FAST cell
back, indicating that the circuit is ready.
Phase 20:
tag=requesting_status summary="Asking for networkstatus consensus"
Once we've finished our one-hop circuit, we will start a new stream
for fetching the networkstatus consensus. We'll stay in this phase
until we get the 'connected' relay cell back, indicating that we've
established a directory connection.
Phase 25:
tag=loading_status summary="Loading networkstatus consensus"
Once we've established a directory connection, we will start fetching
the networkstatus consensus document. This could take a while; this
phase is a good opportunity for using the "progress" keyword to indicate
partial progress.
This phase could stall if the directory mirror we picked doesn't
have a copy of the networkstatus consensus so we have to ask another,
or it does give us a copy but we don't find it valid.
Phase 40:
tag=loading_keys summary="Loading authority key certs"
Sometimes when we've finished loading the networkstatus consensus,
we find that we don't have all the authority key certificates for the
keys that signed the consensus. At that point we put the consensus we
fetched on hold and fetch the keys so we can verify the signatures.
Phase 45
tag=requesting_descriptors summary="Asking for relay descriptors"
Once we have a valid networkstatus consensus and we've checked all
its signatures, we start asking for relay descriptors. We stay in this
phase until we have received a 'connected' relay cell in response to
a request for descriptors.
Phase 50:
tag=loading_descriptors summary="Loading relay descriptors"
We will ask for relay descriptors from several different locations,
so this step will probably make up the bulk of the bootstrapping,
especially for users with slow connections. We stay in this phase until
we have descriptors for at least 1/4 of the usable relays listed in
the networkstatus consensus. This phase is also a good opportunity to
use the "progress" keyword to indicate partial steps.
Phase 80:
tag=conn_or summary="Connecting to entry guard"
Once we have a valid consensus and enough relay descriptors, we choose
some entry guards and start trying to build some circuits. This step
is similar to the "conn_dir" phase above; the only difference is
the context.
If a Tor starts with enough recent cached directory information,
its first bootstrap status event will be for the conn_or phase.
Phase 85:
tag=handshake_or summary="Finishing handshake with entry guard"
This phase is similar to the "handshake_dir" phase, but it gets reached
if we finish a TCP connection to a Tor relay and we have already reached
the "conn_or" phase. We'll stay in this phase until we complete a TLS
handshake with a Tor relay.
Phase 90:
tag=circuit_create "Establishing circuits"
Once we've finished our TLS handshake with an entry guard, we will
set about trying to make some 3-hop circuits in case we need them soon.
Phase 100:
tag=done summary="Done"
A full 3-hop circuit has been established. Tor is ready to handle
application connections now.
4. Bootstrap problem events.
When an OR Conn fails, we send a "bootstrap problem" status event, which
is like the standard bootstrap status event except with severity warn.
We include the same progress, tag, and summary values as we would for
a normal bootstrap event, but we also include "warning", "reason",
"count", and "recommendation" key/value combos.
The "reason" values are long-term-stable controller-facing tags to
identify particular issues in a bootstrapping step. The warning
strings, on the other hand, are human-readable. Controllers SHOULD
NOT rely on the format of any warning string. Currently the possible
values for "recommendation" are either "ignore" or "warn" -- if ignore,
the controller can accumulate the string in a pile of problems to show
the user if the user asks; if warn, the controller should alert the
user that Tor is pretty sure there's a bootstrapping problem.
Currently Tor uses recommendation=ignore for the first nine bootstrap
problem reports for a given phase, and then uses recommendation=warn
for subsequent problems at that phase. Hopefully this is a good
balance between tolerating occasional errors and reporting serious
problems quickly.
5. Suggested controller behavior.
Controllers should start out with a yellow onion or the equivalent
("starting"), and then watch for either a bootstrap status event
(meaning the Tor they're using is sufficiently new to produce them,
and they should load up the progress bar or whatever they plan to use
to indicate progress) or a circuit_established status event (meaning
bootstrapping is finished).
In addition to a progress bar in the display, controllers should also
have some way to indicate progress even when no controller window is
open. For example, folks using Tor Browser Bundle in hostile Internet
cafes don't want a big splashy screen up. One way to let the user keep
informed of progress in a more subtle way is to change the task tray
icon and/or tooltip string as more bootstrap events come in.
Controllers should also have some mechanism to alert their user when
bootstrapping problems are reported. Perhaps we should gather a set of
help texts and the controller can send the user to the right anchor in a
"bootstrapping problems" page in the controller's help subsystem?
6. Getting up to speed when the controller connects.
There's a new "GETINFO /status/bootstrap-phase" option, which returns
the most recent bootstrap phase status event sent. Specifically,
it returns a string starting with either "NOTICE BOOTSTRAP ..." or
"WARN BOOTSTRAP ...".
Controllers should use this getinfo when they connect or attach to
Tor to learn its current state.

View File

@ -1,51 +0,0 @@
Filename: 138-remove-down-routers-from-consensus.txt
Title: Remove routers that are not Running from consensus documents
Version: $Revision$
Last-Modified: $Date$
Author: Peter Palfrader
Created: 11-Jun-2008
Status: Closed
Implemented-In: 0.2.1.2-alpha
1. Overview.
Tor directory authorities hourly vote and agree on a consensus document
which lists all the routers on the network together with some of their
basic properties, like if a router is an exit node, whether it is
stable or whether it is a version 2 directory mirror.
One of the properties given with each router is the 'Running' flag.
Clients do not use routers that are not listed as running.
This proposal suggests that routers without the Running flag are not
listed at all.
2. Current status
At a typical bootstrap a client downloads a 140KB consensus, about
10KB of certificates to verify that consensus, and about 1.6MB of
server descriptors, about 1/4 of which it requires before it will
start building circuits.
Another proposal deals with how to get that huge 1.6MB fraction to
effectively zero (by downloading only individual descriptors, on
demand). Should that get successfully implemented that will leave the
140KB compressed consensus as a large fraction of what a client needs
to get in order to work.
About one third of the routers listed in a consensus are not running
and will therefore never be used by clients who use this consensus.
Not listing those routers will save about 30% to 40% in size.
3. Proposed change
Authority directory servers produce vote documents that include all
the servers they know about, running or not, like they currently
do. In addition these vote documents also state that the authority
supports a new consensus forming method (method number 4).
If more than two thirds of votes that an authority has received claim
they support method 4 then this new method will be used: The
consensus document is formed like before but a new last step removes
all routers from the listing that are not marked as Running.

View File

@ -1,94 +0,0 @@
Filename: 139-conditional-consensus-download.txt
Title: Download consensus documents only when it will be trusted
Author: Peter Palfrader
Created: 2008-04-13
Status: Closed
Implemented-In: 0.2.1.x
Overview:
Servers only provide consensus documents to clients when it is known that
the client will trust it.
Motivation:
When clients[1] want a new network status consensus they request it
from a Tor server using the URL path /tor/status-vote/current/consensus.
Then after downloading the client checks if this consensus can be
trusted. Whether the client trusts the consensus depends on the
authorities that the client trusts and how many of those
authorities signed the consensus document.
If the client cannot trust the consensus document it is disregarded
and a new download is tried at a later time. Several hundred
kilobytes of server bandwidth were wasted by this single client's
request.
With hundreds of thousands of clients this will have undesirable
consequences when the list of authorities has changed so much that a
large number of established clients no longer can trust any consensus
document formed.
Objective:
The objective of this proposal is to make clients not download
consensuses they will not trust.
Proposal:
The list of authorities that are trusted by a client are encoded in
the URL they send to the directory server when requesting a consensus
document.
The directory server then only sends back the consensus when more than
half of the authorities listed in the request have signed the
consensus. If it is known that the consensus will not be trusted
a 404 error code is sent back to the client.
This proposal does not require directory caches to keep more than one
consensus document. This proposal also does not require authorities
to verify the signature on the consensus document of authorities they
do not recognize.
The new URL scheme to download a consensus is
/tor/status-vote/current/consensus/<F> where F is a list of
fingerprints, sorted in ascending order, and concatenated using a +
sign.
Fingerprints are uppercase hexadecimal encodings of the authority
identity key's digest. Servers should also accept requests that
use lower case or mixed case hexadecimal encodings.
A .z URL for compressed versions of the consensus will be provided
similarly to existing resources and is the URL that usually should
be used by clients.
Migration:
The old location of the consensus should continue to work
indefinitely. Not only is it used by old clients, but it is a useful
resource for automated tools that do not particularly care which
authorities have signed the consensus.
Authorities that are known to the client a priori by being shipped
with the Tor code are assumed to handle this format.
When downloading a consensus document from caches that do not support this
new format they fall back to the old download location.
Caches support the new format starting with Tor version 0.2.1.1-alpha.
Anonymity Implications:
By supplying the list of authorities a client trusts to the directory
server we leak information (like likely version of Tor client) to the
directory server. In the current system we also leak that we are
very old - by re-downloading the consensus over and over again, but
only when we are so old that we no longer can trust the consensus.
Footnotes:
1. For the purpose of this proposal a client can be any Tor instance
that downloads a consensus document. This includes relays,
directory caches as well as end users.

View File

@ -1,149 +0,0 @@
Filename: 140-consensus-diffs.txt
Title: Provide diffs between consensuses
Version: $Revision$
Last-Modified: $Date$
Author: Peter Palfrader
Created: 13-Jun-2008
Status: Accepted
Target: 0.2.2.x
1. Overview.
Tor clients and servers need a list of which relays are on the
network. This list, the consensus, is created by authorities
hourly and clients fetch a copy of it, with some delay, hourly.
This proposal suggests that clients download diffs of consensuses
once they have a consensus instead of hourly downloading a full
consensus.
2. Numbers
After implementing proposal 138 which removes nodes that are not
running from the list a consensus document is about 92 kilobytes
in size after compression.
The diff between two consecutive consensus, in ed format, is on
average 13 kilobytes compressed.
3. Proposal
3.1 Clients
If a client has a consensus that is recent enough it SHOULD
try to download a diff to get the latest consensus rather than
fetching a full one.
[XXX: what is recent enough?
time delta in hours / size of compressed diff
0 20
1 9650
2 17011
3 23150
4 29813
5 36079
6 39455
7 43903
8 48907
9 54549
10 60057
11 67810
12 71171
13 73863
14 76048
15 80031
16 84686
17 89862
18 94760
19 94868
20 94223
21 93921
22 92144
23 90228
[ size of gzip compressed "diff -e" between the consensus on
2008-06-01-00:00:00 and the following consensuses that day.
Consensuses have been modified to exclude down routers per
proposal 138. ]
Data suggests that for the first few hours diffs are very useful,
saving about 60% for the first three hours, 30% for the first 10,
and almost nothing once we are past 16 hours.
]
3.2 Servers
Directory authorities and servers need to keep up to X [XXX: depends
on how long clients try to download diffs per above] old consensus
documents so they can build diffs. They should offer a diff to the
most recent consensus at the URL
http://tor.noreply.org/tor/status-vote/current/consensus/diff/<HASH>/<FPRLIST>
where hash is the full digest of the consensus the client currently
has, and FPRLIST is a list of (abbreviated) fingerprints of
authorities the client trusts.
Servers will only return a consensus if more than half of the requested
authorities have signed the document, otherwise a 404 error will be sent
back. The fingerprints can be shortened to a length of any multiple of
two, using only the leftmost part of the encoded fingerprint. Tor uses
3 bytes (6 hex characters) of the fingerprint. (This is just like the
conditional consensus downloads that Tor supports starting with
0.1.2.1-alpha.)
If a server cannot offer a diff from the consensus identified by the
hash but has a current consensus it MUST return the full consensus.
[XXX: what should we do when the client already has the latest
consensus? I can think of the following options:
- send back 3xx not modified
- send back 200 ok and an empty diff
- send back 404 nothing newer here.
I currently lean towards the empty diff.]
4. Diff Format
Diffs start with the token "network-status-diff-version" followed by a
space and the version number, currently "1".
If a document does not start with network-status-diff it is assumed
to be a full consensus download and would therefore currently start
with "network-status-version 3".
Following the network-status-diff header line is a diff, or patch, in
limited ed format. We choose this format because it is easy to create
and process with standard tools (patch, diff -e, ed). This will help
us in developing and testing this proposal and it should make future
debugging easier.
[ If at one point in the future we decide that the space benefits from
a custom diff format outweighs these benefits we can always
introduce a new diff format and offer it at for instance
../diff2/... ]
We support the following ed commands, each on a line by itself:
- "<n1>d" Delete line n1
- "<n1>,<n2>d" Delete lines n1 through n2, including
- "<n1>c" Replace line n1 with the following block
- "<n1>,<n2>c" Replace lines n1 through n2, including, with the
following block.
- "<n1>a" Append the following block after line n1.
- "a" Append the following block after the current line.
- "s/.//" Remove the first character in the current line.
Note that line numbers always apply to the file after all previous
commands have already been applied.
The "current line" is either the first line of the file, if this is
the first command, the last line of a block we added in an append or
change command, or the line immediate following a set of lines we just
deleted (or the last line of the file if there are no lines after
that).
The replace and append command take blocks. These blocks are simply
appended to the diff after the line with the command. A line with
just a period (".") ends the block (and is not part of the lines
to add). Note that it is impossible to insert a line with just
a single dot. Recommended procedure is to insert a line with
two dots, then remove the first character of that line using s/.//.

View File

@ -1,325 +0,0 @@
Filename: 141-jit-sd-downloads.txt
Title: Download server descriptors on demand
Version: $Revision$
Last-Modified: $Date$
Author: Peter Palfrader
Created: 15-Jun-2008
Status: Draft
1. Overview
Downloading all server descriptors is the most expensive part
of bootstrapping a Tor client. These server descriptors currently
amount to about 1.5 Megabytes of data, and this size will grow
linearly with network size.
Fetching all these server descriptors takes a long while for people
behind slow network connections. It is also a considerable load on
our network of directory mirrors.
This document describes proposed changes to the Tor network and
directory protocol so that clients will no longer need to download
all server descriptors.
These changes consist of moving load balancing information into
network status documents, implementing a means to download server
descriptors on demand in an anonymity-preserving way, and dealing
with exit node selection.
2. What is in a server descriptor
When a Tor client starts the first thing it will try to get is a
current network status document: a consensus signed by a majority
of directory authorities. This document is currently about 100
Kilobytes in size, tho it will grow linearly with network size.
This document lists all servers currently running on the network.
The Tor client will then try to get a server descriptor for each
of the running servers. All server descriptors currently amount
to about 1.5 Megabytes of downloads.
A Tor client learns several things about a server from its descriptor.
Some of these it already learned from the network status document
published by the authorities, but the server descriptor contains it
again in a single statement signed by the server itself, not just by
the directory authorities.
Tor clients use the information from server descriptors for
different purposes, which are considered in the following sections.
#three ways: One, to determine if a server will be able to handle
#this client's request; two, to actually communicate or use the server;
#three, for load balancing decisions.
#
#These three points are considered in the following subsections.
2.1 Load balancing
The Tor load balancing mechanism is quite complex in its details, but
it has a simple goal: The more traffic a server can handle the more
traffic it should get. That means the more traffic a server can
handle the more likely a client will use it.
For this purpose each server descriptor has bandwidth information
which tries to convey a server's capacity to clients.
Currently we weigh servers differently for different purposes. There
is a weigh for when we use a server as a guard node (our entry to the
Tor network), there is one weigh we assign servers for exit duties,
and a third for when we need intermediate (middle) nodes.
2.2 Exit information
When a Tor wants to exit to some resource on the internet it will
build a circuit to an exit node that allows access to that resource's
IP address and TCP Port.
When building that circuit the client can make sure that the circuit
ends at a server that will be able to fulfill the request because the
client already learned of all the servers' exit policies from their
descriptors.
2.3 Capability information
Server descriptors contain information about the specific version or
the Tor protocol they understand [proposal 105].
Furthermore the server descriptor also contains the exact version of
the Tor software that the server is running and some decisions are
made based on the server version number (for instance a Tor client
will only make conditional consensus requests [proposal 139] when
talking to Tor servers version 0.2.1.1-alpha or later).
2.4 Contact/key information
A server descriptor lists a server's IP address and TCP ports on which
it accepts onion and directory connections. Furthermore it contains
the onion key (a short lived RSA key to which clients encrypt CREATE
cells).
2.5 Identity information
A Tor client learns the digest of a server's key from the network
status document. Once it has a server descriptor this descriptor
contains the full RSA identity key of the server. Clients verify
that 1) the digest of the identity key matches the expected digest
it got from the consensus, and 2) that the signature on the descriptor
from that key is valid.
3. No longer require clients to have copies of all SDs
3.1 Load balancing info in consensus documents
One of the reasons why clients download all server descriptors is for
doing load proper load balancing as described in 2.1. In order for
clients to not require all server descriptors this information will
have to move into the network status document.
Consensus documents will have a new line per router similar
to the "r", "s", and "v" lines that already exist. This line
will convey weight information to clients.
"w Bandwidth=193"
The bandwidth number is the lesser of observed bandwidth and bandwidth
rate limit from the server descriptor that the "r" line referenced by
digest (1st and 3rd field of the bandwidth line in the descriptor).
It is given in kilobytes per second so the byte value in the
descriptor has to be divided by 1024 (and is then truncated, i.e.
rounded down).
Authorities will cap the bandwidth number at some arbitrary value,
currently 10MB/sec. If a router claims a larger bandwidth an
authority's vote will still only show Bandwidth=10240.
The consensus value for bandwidth is the median of all bandwidth
numbers given in votes. In case of an even number of votes we use
the lower median. (Using this procedure allows us to change the
cap value more easily.)
Clients should believe the bandwidth as presented in the consensus,
not capping it again.
3.2 Fetching descriptors on demand
As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
and the onion key for a server.
A client already knows the IP address and the ports from the consensus
documents, but without the onion key it will not be able to send
CREATE/EXTEND cells for that server. Since the client needs the onion
key it needs the descriptor.
If a client only downloaded a few descriptors in an observable manner
then that would leak which nodes it was going to use.
This proposal suggests the following:
1) when connecting to a guard node for which the client does not
yet have a cached descriptor it requests the descriptor it
expects by hash. (The consensus document that the client holds
has a hash for the descriptor of this server. We want exactly
that descriptor, not a different one.)
It does that by sending a RELAY_REQUEST_SD cell.
A client MAY cache the descriptor of the guard node so that it does
not need to request it every single time it contacts the guard.
2) when a client wants to extend a circuit that currently ends in
server B to a new next server C, the client will send a
RELAY_REQUEST_SD cell to server B. This cell contains in its
payload the hash of a server descriptor the client would like
to obtain (C's server descriptor). The server sends back the
descriptor and the client can now form a valid EXTEND/CREATE cell
encrypted to C's onion key.
Clients MUST NOT cache such descriptors. If they did they might
leak that they already extended to that server at least once
before.
Replies to RELAY_REQUEST_SD requests need to be padded to some
constant upper limit in order to conceal a client's destination
from anybody who might be counting cells/bytes.
RELAY_REQUEST_SD cells contain the following information:
- hash of the server descriptor requested
- hash of the identity digest of the server for which we want the SD
- IP address and OR-port or the server for which we want the SD
- padding factor - the number of cells we want the answer
padded to.
[XXX this just occured to me and it might be smart. or it might
be stupid. clients would learn the padding factor they want
to use from the consensus document. This allows us to grow
the replies later on should SDs become larger.]
[XXX: figure out a decent padding size]
3.3 Protocol versions
Server descriptors contain optional information of supported
link-level and circuit-level protocols in the form of
"opt protocols Link 1 2 Circuit 1". These are not currently needed
and will probably eventually move into the "v" (version) line in
the consensus. This proposal does not deal with them.
Similarly a server descriptor contains the version number of
a Tor node. This information is already present in the consensus
and is thus available to all clients immediately.
3.4 Exit selection
Currently finding an appropriate exit node for a user's request is
easy for a client because it has complete knowledge of all the exit
policies of all servers on the network.
The consensus document will once again be extended to contain the
information required by clients. This information will be a summary
of each node's exit policy. The exit policy summary will only contain
the list of ports to which a node exits to most destination IP
addresses.
A summary should claim a router exits to a specific TCP port if,
ignoring private IP addresses, the exit policy indicates that the
router would exit to this port to most IP address. either two /8
netblocks, or one /8 and a couple of /12s or any other combination).
The exact algorith used is this: Going through all exit policy items
- ignore any accept that is not for all IP addresses ("*"),
- ignore rejects for these netblocks (exactly, no subnetting):
0.0.0.0/8, 169.254.0.0/16, 127.0.0.0/8, 192.168.0.0/16, 10.0.0.0/8,
and 172.16.0.0/12m
- for each reject count the number of IP addresses rejected against
the affected ports,
- once we hit an accept for all IP addresses ("*") add the ports in
that policy item to the list of accepted ports, if they don't have
more than 2^25 IP addresses (that's two /8 networks) counted
against them (i.e. if the router exits to a port to everywhere but
at most two /8 networks).
An exit policy summary will be included in votes and consensus as a
new line attached to each exit node. The line will have the format
"p" <space> "accept"|"reject" <portlist>
where portlist is a comma seperated list of single port numbers or
portranges (e.g. "22,80-88,1024-6000,6667").
Whether the summary shows the list of accepted ports or the list of
rejected ports depends on which list is shorter (has a shorter string
representation). In case of ties we choose the list of accepted
ports. As an exception to this rule an allow-all policy is
represented as "accept 1-65535" instead of "reject " and a reject-all
policy is similarly given as "reject 1-65535".
Summary items are compressed, that is instead of "80-88,89-100" there
only is a single item of "80-100", similarly instead of "20,21" a
summary will say "20-21".
Port lists are sorted in ascending order.
The maximum allowed length of a policy summary (including the "accept "
or "reject ") is 1000 characters. If a summary exceeds that length we
use an accept-style summary and list as much of the port list as is
possible within these 1000 bytes.
3.4.1 Consensus selection
When building a consensus, authorities have to agree on a digest of
the server descriptor to list in the router line for each router.
This is documented in dir-spec section 3.4.
All authorities that listed that agreed upon descriptor digest in
their vote should also list the same exit policy summary - or list
none at all if the authority has not been upgraded to list that
information in their vote.
If we have votes with matching server descriptor digest of which at
least one of them has an exit policy then we differ between two cases:
a) all authorities agree (or abstained) on the policy summary, and we
use the exit policy summary that they all listed in their vote,
b) something went wrong (or some authority is playing foul) and we
have different policy summaries. In that case we pick the one
that is most commonly listed in votes with the matching
descriptor. We break ties in favour of the lexigraphically larger
vote.
If none one of the votes with a matching server descriptor digest has
an exit policy summary we use the most commonly listed one in all
votes, breaking ties like in case b above.
3.4.2 Client behaviour
When choosing an exit node for a specific request a Tor client will
choose from the list of nodes that exit to the requested port as given
by the consensus document. If a client has additional knowledge (like
cached full descriptors) that indicates the so chosen exit node will
reject the request then it MAY use that knowledge (or not include such
nodes in the selection to begin with). However, clients MUST NOT use
nodes that do not list the port as accepted in the summary (but for
which they know that the node would exit to that address from other
sources, like a cached descriptor).
An exception to this is exit enclave behaviour: A client MAY use the
node at a specific IP address to exit to any port on the same address
even if that node is not listed as exiting to the port in the summary.
4. Migration
4.1 Consensus document changes.
The consensus will need to include
- bandwidth information (see 3.1)
- exit policy summaries (3.4)
A new consensus method (number TBD) will be chosen for this.
5. Future possibilities
This proposal still requires that all servers have the descriptors of
every other node in the network in order to answer RELAY_REQUEST_SD
cells. These cells are sent when a circuit is extended from ending at
node B to a new node C. In that case B would have to answer a
RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
In order to answer that request B obviously needs a copy of C's server
descriptor. The RELAY_REQUEST_SD cell already has all the info that
B needs to contact C so it can ask about the descriptor before passing it
back to the client.

View File

@ -1,279 +0,0 @@
Filename: 142-combine-intro-and-rend-points.txt
Title: Combine Introduction and Rendezvous Points
Version: $Revision$
Last-Modified: $Date$
Author: Karsten Loesing, Christian Wilms
Created: 27-Jun-2008
Status: Dead
Change history:
27-Jun-2008 Initial proposal for or-dev
04-Jul-2008 Give first security property the new name "Responsibility"
and change new cell formats according to rendezvous protocol
version 3 draft.
19-Jul-2008 Added comment by Nick (but no solution, yet) that sharing of
circuits between multiple clients is not supported by Tor.
Overview:
Establishing a connection to a hidden service currently involves two Tor
relays, introduction and rendezvous point, and 10 more relays distributed
over four circuits to connect to them. The introduction point is
established in the mid-term by a hidden service to transfer introduction
requests from client to the hidden service. The rendezvous point is set
up by the client for a single hidden service request and actually
transfers end-to-end encrypted application data between client and hidden
service.
There are some reasons for separating the two roles of introduction and
rendezvous point: (1) Responsibility: A relay shall not be made
responsible that it relays data for a certain hidden service; in the
original design as described in [1] an introduction point relays no
application data, and a rendezvous points neither knows the hidden
service nor can it decrypt the data. (2) Scalability: The hidden service
shall not have to maintain a number of open circuits proportional to the
expected number of client requests. (3) Attack resistance: The effect of
an attack on the only visible parts of a hidden service, its introduction
points, shall be as small as possible.
However, elimination of a separate rendezvous connection as proposed by
Øverlier and Syverson [2] is the most promising approach to improve the
delay in connection establishment. From all substeps of connection
establishment extending a circuit by only a single hop is responsible for
a major part of delay. Reducing on-demand circuit extensions from two to
one results in a decrease of mean connection establishment times from 39
to 29 seconds [3]. Particularly, eliminating the delay on hidden-service
side allows the client to better observe progress of connection
establishment, thus allowing it to use smaller timeouts. Proposal 114
introduced new introduction keys for introduction points and provides for
user authorization data in hidden service descriptors; it will be shown
in this proposal that introduction keys in combination with new
introduction cookies provide for the first security property
responsibility. Further, eliminating the need for a separate introduction
connection benefits the overall network load by decreasing the number of
circuit extensions. After all, having only one connection between client
and hidden service reduces the overall protocol complexity.
Design:
1. Hidden Service Configuration
Hidden services should be able to choose whether they would like to use
this protocol. This might be opt-in for 0.2.1.x and opt-out for later
major releases.
2. Contact Point Establishment
When preparing a hidden service, a Tor client selects a set of relays to
act as contact points instead of introduction points. The contact point
combines both roles of introduction and rendezvous point as proposed in
[2]. The only requirement for a relay to be picked as contact point is
its capability of performing this role. This can be determined from the
Tor version number that needs to be equal or higher than the first
version that implements this proposal.
The easiest way to implement establishment of contact points is to
introduce v2 ESTABLISH_INTRO cells. By convention, the relay recognizes
version 2 ESTABLISH_INTRO cells as requests to establish a contact point
rather than an introduction point.
V Format byte: set to 255 [1 octet]
V Version byte: set to 2 [1 octet]
KLEN Key length [2 octets]
PK Public introduction key [KLEN octets]
HS Hash of session info [20 octets]
SIG Signature of above information [variable]
The hidden service does not create a fixed number of contact points, like
3 in the current protocol. It uses a minimum of 3 contact points, but
increases this number depending on the history of client requests within
the last hour. The hidden service also increases this number depending on
the frequency of failing contact points in order to defend against
attacks on its contact points. When client authorization as described in
proposal 121 is used, a hidden service can also use the number of
authorized clients as first estimate for the required number of contact
points.
3. Hidden Service Descriptor Creation
A hidden service needs to issue a fresh introduction cookie for each
established introduction point. By requiring clients to use this cookie
in a later connection establishment, an introduction point cannot access
the hidden service that it works for. Together with the fresh
introduction key that was introduced in proposal 114, this reduces
responsibility of a contact point for a specific hidden service.
The v2 hidden service descriptor format contains an
"intro-authentication" field that may contain introduction-point specific
keys. The hidden service creates a random string, comparable to the
rendezvous cookie, and includes it in the descriptor as introduction
cookie for auth-type "1". By convention, clients recognize existence of
auth-type 1 as possibility to connect to a hidden service via a contact
point rather than an introduction point. Older clients that do not
understand this new protocol simply ignore that cookie.
4. Connection Establishment
When establishing a connection to a hidden service a client learns about
the capability of using the new protocol from the hidden service
descriptor. It may choose whether to use this new protocol or not,
whereas older clients cannot understand the new capability and can only
use the current protocol. Client using version 0.2.1.x should be able to
opt-in for using the new protocol, which should change to opt-out for
later major releases.
When using the new capability the client creates a v2 INTRODUCE1 cell
that extends an unversioned INTRODUCE1 cell by adding the content of an
ESTABLISH_RENDEZVOUS cell. Further, the client sends this cell using the
new cell type 41 RELAY_INTRODUCE1_VERSIONED to the introduction point,
because unversioned and versioned INTRODUCE1 cells are indistinguishable:
Cleartext
V Version byte: set to 2 [1 octet]
PK_ID Identifier for Bob's PK [20 octets]
RC Rendezvous cookie [20 octets]
Encrypted to introduction key:
VER Version byte: set to 3. [1 octet]
AUTHT The auth type that is supported [1 octet]
AUTHL Length of auth data [2 octets]
AUTHD Auth data [variable]
RC Rendezvous cookie [20 octets]
g^x Diffie-Hellman data, part 1 [128 octets]
The cleartext part contains the rendezvous cookie that the contact point
remembers just as a rendezvous point would do.
The encrypted part contains the introduction cookie as auth data for the
auth type 1. The rendezvous cookie is contained as before, but there is
no further rendezvous point information, as there is no separate
rendezvous point.
5. Rendezvous Establishment
The contact point recognizes a v2 INTRODUCE1 cell with auth type 1 as a
request to be used in the new protocol. It remembers the contained
rendezvous cookie, replies to the client with an INTRODUCE_ACK cell
(omitting the RENDEZVOUS_ESTABLISHED cell), and forwards the encrypted
part of the INTRODUCE1 cell as INTRODUCE2 cell to the hidden service.
6. Introduction at Hidden Service
The hidden services recognizes an INTRODUCE2 cell containing an
introduction cookie as authorization data. In this case, it does not
extend a circuit to a rendezvous point, but sends a RENDEZVOUS1 cell
directly back to its contact point as usual.
7. Rendezvous at Contact Point
The contact point processes a RENDEZVOUS1 cell just as a rendezvous point
does. The only difference is that the hidden-service-side circuit is not
exclusive for the client connection, but shared among multiple client
connections.
[Tor does not allow sharing of a single circuit among multiple client
connections easily. We need to think about a smart and efficient way to
implement this. Comment by Nick. -KL]
Security Implications:
(1) Responsibility
One of the original reasons for the separation of introduction and
rendezvous points is that a relay shall not be made responsible that it
relays data for a certain hidden service. In the current design an
introduction point relays no application data and a rendezvous points
neither knows the hidden service nor can it decrypt the data.
This property is also fulfilled in this new design. A contact point only
learns a fresh introduction key instead of the hidden service key, so
that it cannot recognize a hidden service. Further, the introduction
cookie, which is unknown to the contact point, prevents it from accessing
the hidden service itself. The only way for a contact point to access a
hidden service is to look up whether it is contained in the descriptors
of known hidden services. A contact point cannot directly be made
responsible for which hidden service it is working. In addition to that,
it cannot learn the data that it transfers, because all communication
between client and hidden service are end-to-end encrypted.
(2) Scalability
Another goal of the existing hidden service protocol is that a hidden
service does not have to maintain a number of open circuits proportional
to the expected number of client requests. The rationale behind this is
better scalability.
The new protocol eliminates the need for a hidden service to extend
circuits on demand, which has a positive effect on circuits establishment
times and overall network load. The solution presented here to establish
a number of contact points proportional to the history of connection
requests reduces the number of circuits to a minimum number that fits the
hidden service's needs.
(3) Attack resistance
The third goal of separating introduction and rendezvous points is to
limit the effect of an attack on the only visible parts of a hidden
service which are the contact points in this protocol.
In theory, the new protocol is more vulnerable to this attack. An
attacker who can take down a contact point does not only eliminate an
access point to the hidden service, but also breaks current client
connections to the hidden service using that contact point.
Øverlier and Syverson proposed the concept of valet nodes as additional
safeguard for introduction/contact points [4]. Unfortunately, this
increases hidden service protocol complexity conceptually and from an
implementation point of view. Therefore, it is not included in this
proposal.
However, in practice attacking a contact point (or introduction point) is
not as rewarding as it might appear. The cost for a hidden service to set
up a new contact point and publish a new hidden service descriptor is
minimal compared to the efforts necessary for an attacker to take a Tor
relay down. As a countermeasure to further frustrate this attack, the
hidden service raises the number of contact points as a function of
previous contact point failures.
Further, the probability of breaking client connections due to attacking
a contact point is minimal. It can be assumed that the probability of one
of the other five involved relays in a hidden service connection failing
or being shut down is higher than that of a successful attack on a
contact point.
(4) Resistance against Locating Attacks
Clients are no longer able to force a hidden service to create or extend
circuits. This further reduces an attacker's capabilities of locating a
hidden server as described by Øverlier and Syverson [5].
Compatibility:
The presented protocol does not raise compatibility issues with current
Tor versions. New relay versions support both, the existing and the
proposed protocol as introduction/rendezvous/contact points. A contact
point acts as introduction point simultaneously. Hidden services and
clients can opt-in to use the new protocol which might change to opt-out
some time in the future.
References:
[1] Roger Dingledine, Nick Mathewson, and Paul Syverson, Tor: The
Second-Generation Onion Router. In the Proceedings of the 13th USENIX
Security Symposium, August 2004.
[2] Lasse Øverlier and Paul Syverson, Improving Efficiency and Simplicity
of Tor Circuit Establishment and Hidden Services. In the Proceedings of
the Seventh Workshop on Privacy Enhancing Technologies (PET 2007),
Ottawa, Canada, June 2007.
[3] Christian Wilms, Improving the Tor Hidden Service Protocol Aiming at
Better Performance, diploma thesis, June 2008, University of Bamberg.
[4] Lasse Øverlier and Paul Syverson, Valet Services: Improving Hidden
Servers with a Personal Touch. In the Proceedings of the Sixth Workshop
on Privacy Enhancing Technologies (PET 2006), Cambridge, UK, June 2006.
[5] Lasse Øverlier and Paul Syverson, Locating Hidden Servers. In the
Proceedings of the 2006 IEEE Symposium on Security and Privacy, May 2006.

View File

@ -1,196 +0,0 @@
Filename: 143-distributed-storage-improvements.txt
Title: Improvements of Distributed Storage for Tor Hidden Service Descriptors
Version: $Revision$
Last-Modified: $Date$
Author: Karsten Loesing
Created: 28-Jun-2008
Status: Open
Target: 0.2.1.x
Change history:
28-Jun-2008 Initial proposal for or-dev
Overview:
An evaluation of the distributed storage for Tor hidden service
descriptors and subsequent discussions have brought up a few improvements
to proposal 114. All improvements are backwards compatible to the
implementation of proposal 114.
Design:
1. Report Bad Directory Nodes
Bad hidden service directory nodes could deny existence of previously
stored descriptors. A bad directory node that does this with all stored
descriptors causes harm to the distributed storage in general, but
replication will cope with this problem in most cases. However, an
adversary that attempts to make a specific hidden service unavailable by
running relays that become responsible for all of a service's
descriptors poses a more serious threat. The distributed storage needs to
defend against this attack by detecting and removing bad directory nodes.
As a countermeasure hidden services try to download their descriptors
every hour at random times from the hidden service directories that are
responsible for storing it. If a directory node replies with 404 (Not
found), the hidden service reports the supposedly bad directory node to
a random selection of half of the directory authorities (with version
numbers equal to or higher than the first version that implements this
proposal). The hidden service posts a complaint message using HTTP 'POST'
to a URL "/tor/rendezvous/complain" with the following message format:
"hidden-service-directory-complaint" identifier NL
[At start, exactly once]
The identifier of the hidden service directory node to be
investigated.
"rendezvous-service-descriptor" descriptor NL
[At end, Excatly once]
The hidden service descriptor that the supposedly bad directory node
does not serve.
The directory authority checks if the descriptor is valid and the hidden
service directory responsible for storing it. It waits for a random time
of up to 30 minutes before posting the descriptor to the hidden service
directory. If the publication is acknowledged, the directory authority
waits another random time of up to 30 minutes before attempting to
request the descriptor that it has posted. If the directory node replies
with 404 (Not found), it will be blacklisted for being a hidden service
directory node for the next 48 hours.
A blacklisted hidden service directory is assigned the new flag BadHSDir
instead of the HSDir flag in the vote that a directory authority creates.
In a consensus a relay is only assigned a HSDir flag if the majority of
votes contains a HSDir flag and no more than one third of votes contains
a BadHSDir flag. As a result, clients do not have to learn about the
BadHSDir flag. A blacklisted directory node will simply not be assigned
the HSDir flag in the consensus.
In order to prevent an attacker from setting up new nodes as replacement
for blacklisted directory nodes, all directory nodes in the same /24
subnet are blacklisted, too. Furthermore, if two or more directory nodes
are blacklisted in the same /16 subnet concurrently, all other directory
nodes in that /16 subnet are blacklisted, too. Blacklisting holds for at
most 48 hours.
2. Publish Fewer Replicas
The evaluation has shown that the probability of a directory node to
serve a previously stored descriptor is 85.7% (more precisely, this is
the 0.001-quantile of the empirical distribution with the rationale that
it holds for 99.9% of all empirical cases). If descriptors are replicated
to x directory nodes, the probability of at least one of the replicas to
be available for clients is 1 - (1 - 85.7%) ^ x. In order to achieve an
overall availability of 99.9%, x = 3.55 replicas need to be stored. From
this follows that 4 replicas are sufficient, rather than the currently
stored 6 replicas.
Further, the current design stores 2 sets of descriptors on 3 directory
nodes with consecutive identities. Originally, this was meant to
facilitate replication between directory nodes, which has not been and
will not be implemented (the selection criterion of 24 hours uptime does
not make it necessary). As a result, storing descriptors on directory
nodes with consecutive identities is not required. In fact it should be
avoided to enable an attacker to create "black holes" in the identifier
ring.
Hidden services should store their descriptors on 4 non-consecutive
directory nodes, and clients should request descriptors from these
directory nodes only. For compatibility reasons, hidden services also
store their descriptors on 2 consecutive directory nodes. Hence, 0.2.0.x
clients will be able to retrieve 4 out of 6 descriptors, but will fail
for the remaining 2 descriptors, which is sufficient for reliability. As
soon as 0.2.0.x is deprecated, hidden services can stop publishing the
additional 2 replicas.
3. Change Default Value of Being Hidden Service Directory
The requirements for becoming a hidden service directory node are an open
directory port and an uptime of at least 24 hours. The evaluation has
shown that there are 300 hidden service directory candidates in the mean,
but only 6 of them are configured to act as hidden service directories.
This is bad, because those 6 nodes need to serve a large share of all
hidden service descriptors. Optimally, there should be hundreds of hidden
service directories. Having a large number of 0.2.1.x directory nodes
also has a positive effect on 0.2.0.x hidden services and clients.
Therefore, the new default of HidServDirectoryV2 should be 1, so that a
Tor relay that has an open directory port automatically accepts and
serves v2 hidden service descriptors. A relay operator can still opt-out
running a hidden service directory by changing HidServDirectoryV2 to 0.
The additional bandwidth requirements for running a hidden service
directory node in addition to being a directory cache are negligible.
4. Make Descriptors Persistent on Directory Nodes
Hidden service directories that are restarted by their operators or after
a failure will not be selected as hidden service directories within the
next 24 hours. However, some clients might still think that these nodes
are responsible for certain descriptors, because they work on the basis
of network consensuses that are up to three hours old. The directory
nodes should be able to serve the previously received descriptors to
these clients. Therefore, directory nodes make all received descriptors
persistent and load previously received descriptors on startup.
5. Store and Serve Descriptors Regardless of Responsibility
Currently, directory nodes only accept descriptors for which they think
they are responsible. This may lead to problems when a directory node
uses an older or newer network consensus than hidden service or client
or when a directory node has been restarted recently. In fact, there are
no security issues in storing or serving descriptors for which a
directory node thinks it is not responsible. To the contrary, doing so
may improve reliability in border cases. As a result, a directory node
does not pay attention to responsibilty when receiving a publication or
fetch request, but stores or serves the requested descriptor. Likewise,
the directory node does not remove descriptors when it thinks it is not
responsible for them any more.
6. Avoid Periodic Descriptor Re-Publication
In the current implementation a hidden service re-publishes its
descriptor either when its content changes or an hour elapses. However,
the evaluation has shown that failures of hidden service directory nodes,
i.e. of nodes that have not failed within the last 24 hours, are very
rare. Together with making descriptors persistent on directory nodes,
there is no necessity to re-publish descriptors hourly.
The only two events leading to descriptor re-publication should be a
change of the descriptor content and a new directory node becoming
responsible for the descriptor. Hidden services should therefore consider
re-publication every time they learn about a new network consensus
instead of hourly.
7. Discard Expired Descriptors
The current implementation lets directory nodes keep a descriptor for two
days before discarding it. However, with the v2 design, descriptors are
only valid for at most one day. Directory nodes should determine the
validity of stored descriptors and discard them one hour after they have
expired (to compensate wrong clocks on clients).
8. Shorten Client-Side Descriptor Fetch History
When clients try to download a hidden service descriptor, they memorize
fetch requests to directory nodes for up to 15 minutes. This allows them
to request all replicas of a descriptor to avoid bad or failing directory
nodes, but without querying the same directory node twice.
The downside is that a client that has requested a descriptor without
success, will not be able to find a hidden service that has been started
during the following 15 minutes after the client's last request.
This can be improved by shortening the fetch history to only 5 minutes.
This time should be sufficient to complete requests for all replicas of a
descriptor, but without ending in an infinite request loop.
Compatibility:
All proposed improvements are compatible to the currently implemented
design as described in proposal 114.

View File

@ -1,165 +0,0 @@
Filename: 144-enforce-distinct-providers.txt
Title: Increase the diversity of circuits by detecting nodes belonging the
same provider
Author: Mfr
Created: 2008-06-15
Status: Draft
Overview:
Increase network security by reducing the capacity of the relay or
ISPs monitoring personally or requisition, a large part of traffic
Tor trying to break circuits privacy. A way to increase the
diversity of circuits without killing the network performance.
Motivation:
Since 2004, Roger an Nick publication about diversity [1], very fast
relays Tor running are focused among an half dozen of providers,
controlling traffic of some dozens of routers [2].
In the same way the generalization of VMs clonables paid by hour,
allowing starting in few minutes and for a small cost, a set of very
high-speed relay whose in a few hours can attract a big traffic that
can be analyzed, increasing the vulnerability of the network.
Whether ISPs or domU providers, these usually have several groups of
IP Class B. Also the restriction in place EnforceDistinctSubnets
automatically excluding IP subnet class B is only partially
effective. By contrast a restriction at the class A will be too
restrictive.
Therefore it seems necessary to consider another approach.
Proposal:
Add a provider control based on AS number added by the router on is
descriptor, controlled by Directories Authorities, and used like the
declarative family field for circuit creating.
Design:
Step 1 :
Add to the router descriptor a provider information get request [4]
by the router itself.
"provider" name NL
'names' is the AS number of the router formated like this:
'ASxxxxxx' where AS is fixed and xxxxxx is the AS number,
left aligned ( ex: AS98304 , AS4096,AS1 ) or if AS number
is missing the network A class number is used like that:
'ANxxx' where AN is fixed and xxx is the first 3 digits of
the IP (ex: for the IP 1.1.1.2 AN1) or an 'L' value is set
if it's a local network IP.
If two ORs list one another in their "provider" entries,
then OPs should treat them as a single OR for the purpose
of path selection.
For example, if node A's descriptor contains "provider B",
and node B's descriptor contains "provider A", then node A
and node B should never be used on the same circuit.
Add the regarding config option in torrc
EnforceDistinctProviders set to 1 by default.
Permit building circuits with relays in the same provider
if set to 0.
Regarding to proposal 135 if TestingTorNetwork is set
need to be EnforceDistinctProviders is unset.
Control by Authorities Directories of the AS numbers
The Directories Authority control the AS numbers of the new node
descriptor uploaded.
If an old version is operated by the node this test is
bypassed.
If AS number get by request is different from the
description, router is flagged as non-Valid by the testing
Authority for the voting process.
Step 2 When a ' significant number of nodes' of valid routers are
generating descriptor with provider information.
Add missing provider information get by DNS request
functionality for the circuit user:
During circuit building, computing, OP apply first
family check and EnforceDistinctSubnets directives for
performance, then if provider info is needed and
missing in router descriptor try to get AS provider
info by DNS request [4]. This information could be
DNS cached. AN ( class A number) is never generated
during this process to prevent DNS block problems. If
DNS request fails ignore and continue building
circuit.
Step 3 When the 'whole majority' of valid Tor clients are providing
DNS request.
Older versions are deprecated and mark as no-Valid.
EnforceDistinctProviders replace EnforceDistinctSubnets functionnality.
EnforceDistinctSubnets is removed.
Functionalities deployed in step 2 are removed.
Security implications:
This providermeasure will increase the number of providers
addresses that an attacker must use in order to carry out
traffic analysis.
Compatibility:
The presented protocol does not raise compatibility issues
with current Tor versions. The compatibility is preserved by
implementing this functionality in 3 steps, giving time to
network users to upgrade clients and routers.
Performance and scalability notes:
Provider change for all routers could reduce a little
performance if the circuit to long.
During step 2 Get missing provider information could increase
building path time and should have a time out.
Possible Attacks/Open Issues/Some thinking required:
These proposal seems be compatible with proposal 135 Simplify
Configuration of Private Tor Networks.
This proposal does not resolve multiples AS owners and top
providers traffic monitoring attacks [5].
Unresolved AS number are treated as a Class A network. Perhaps
should be marked as invalid. But there's only fives items on
last check see [2].
Need to define what's a 'significant number of nodes' and
'whole majority' ;-)
References:
[1] Location Diversity in Anonymity Networks by Nick Feamster and Roger
Dingledine.
In the Proceedings of the Workshop on Privacy in the Electronic Society
(WPES 2004), Washington, DC, USA, October 2004
http://freehaven.net/anonbib/#feamster:wpes2004
[2] http://as4jtw5gc6efb267.onion/IPListbyAS.txt
[3] see Goodell Tor Exit Page
http://cassandra.eecs.harvard.edu/cgi-bin/exit.py
[4] see the great IP to ASN DNS Tool
http://www.team-cymru.org/Services/ip-to-asn.html
[5] Sampled Traffic Analysis by Internet-Exchange-Level Adversaries by
Steven J. Murdoch and Piotr Zielinski.
In the Proceedings of the Seventh Workshop on Privacy Enhancing Technologies
(PET 2007), Ottawa, Canada, June 2007.
http://freehaven.net/anonbib/#murdoch-pet2007
[5] http://bugs.noreply.org/flyspray/index.php?do=details&id=690

View File

@ -1,41 +0,0 @@
Filename: 145-newguard-flag.txt
Title: Separate "suitable as a guard" from "suitable as a new guard"
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: 1-Jul-2008
Status: Open
Target: 0.2.1.x
[This could be obsoleted by proposal 141, which could replace NewGuard
with a Guard weight.]
Overview
Right now, Tor has one flag that clients use both to tell which
nodes should be kept as guards, and which nodes should be picked
when choosing new guards. This proposal separates this flag into
two.
Motivation
Balancing clients amoung guards is not done well by our current
algorithm. When a new guard appears, it is chosen by clients
looking for a new guard with the same probability as all existing
guards... but new guards are likelier to be under capacity, whereas
old guards are likelier to be under more use.
Implementation
We add a new flag, NewGuard. Clients will change so that when they
are choosing new guards, they only consider nodes with the NewGuard
flag set.
For now, authorities will always set NewGuard if they are setting
the Guard flag. Later, it will be easy to migrate authorities to
set NewGuard for underused guards.
Alternatives
We might instead have authorities list weights with which nodes
should be picked as guards.

View File

@ -1,86 +0,0 @@
Filename: 146-long-term-stability.txt
Title: Add new flag to reflect long-term stability
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: 19-Jun-2008
Status: Open
Target: 0.2.1.x
Overview
This document proposes a new flag to indicate that a router has
existed at the same address for a long time, describes how to
implement it, and explains what it's good for.
Motivation
Tor has had three notions of "stability" for servers. Older
directory protocols based a server's stability on its
(self-reported) uptime: a server that had been running for a day was
more stable than a server that had been running for five minutes,
regardless of their past history. Current directory protocols track
weighted mean time between failure (WMTBF) and weighted fractional
uptime (WFU). WFU is computed as the fraction of time for which the
server is running, with measurements weighted to exponentially
decay such that old days count less. WMTBF is computed as the
average length of intervals for which the server runs between
downtime, with old intervals weighted to count less.
WMTBF is useful in answering the question: "If a server is running
now, how long is it likely to stay running?" This makes it a good
choice for picking servers for streams that need to be long-lived.
WFU is useful in answering the question: "If I try connecting to
this server at an arbitrary time, is it likely to be running?" This
makes it an important factor for picking guard nodes, since we want
guard nodes to be usually-up.
There are other questions that clients want to answer, however, for
which the current flags aren't very useful. The one that this
proposal addresses is,
"If I found this server in an old consensus, is it likely to
still be running at the same address?"
This one is useful when we're trying to find directory mirrors in a
fallback-consensus file. This property is equivalent to,
"If I find this server in a current consensus, how long is it
likely to exist on the network?"
This one is useful if we're trying to pick introduction points or
something and care more about churn rate than about whether every IP
will be up all the time.
Implementation:
I propose we add a new flag, called "Longterm." Authorities should
set this flag for routers if their Longevity is in the upper
quartile of all routers. A router's Longevity is computed as the
total amount of days in the last year or so[*] for which the router has
been Running at least once at its current IP:orport pair.
Clients should use directory servers from a fallback-consensus only
if they have the Longterm flag set.
Authority ops should be able to mark particular routers as not
Longterm, regardless of history. (For instance, it makes sense to
remove the Longterm flag from a router whose op says that it will
need to shutdown in a month.)
[*] This is deliberately vague, to permit efficient implementations.
Compatibility and migration issues:
The voting protocol already acts gracefully when new flags are
added, so no change to the voting protocol is needed.
Tor won't have collected this data, however. It might be desirable
to bootstrap it from historical consensuses. Alternatively, we can
just let the algorithm run for a month or two.
Issues and future possibilities:
Longterm is a really awkward name.

View File

@ -1,60 +0,0 @@
Filename: 147-prevoting-opinions.txt
Title: Eliminate the need for v2 directories in generating v3 directories
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: 2-Jul-2008
Status: Accepted
Target: 0.2.1.x
Overview
We propose a new v3 vote document type to replace the role of v2
networkstatus information in generating v3 consensuses.
Motivation
When authorities vote on which descriptors are to be listed in the
next consensus, it helps if they all know about the same descriptors
as one another. But a hostile, confused, or out-of-date server may
upload a descriptor to only some authorities. In the current v3
directory design, the authorities don't have a good way to tell one
another about the new descriptor until they exchange votes... but by
the time this happens, they are already committed to their votes,
and they can't add anybody they learn about from other authorities
until the next voting cycle. That's no good!
The current Tor implementation avoids this problem by having
authorities also look at v2 networkstatus documents, but we'd like
in the long term to eliminate these, once 0.1.2.x is obsolete.
Design:
We add a new value for vote-status in v3 consensus documents in
addition to "consensus" and "vote": "opinion". Authorities generate
and sign an opinion document as if they were generating a vote,
except that they generate opinions earlier than they generate votes.
Authorities don't need to generate more than one opinion document
per voting interval, but may. They should send it to the other
authorities they know about, at the regular vote upload URL, before
the authorities begin voting, so that enough time remains for the
authorities to fetch new descriptors.
Additionally, authories make their opinions available at
http://<hostname>/tor/status-vote/next/opinion.z
and download opinions from authorities they haven't heard from in a
while.
Authorities MAY generate opinions on demand.
Upon receiving an opinion document, authorities scan it for any
descriptors that:
- They might accept.
- Are for routers they don't know about, or are published more
recently than any descriptor they have for that router.
Authorities then begin downloading such descriptors from authorities
that claim to have them.
Authorities MAY cache opinion documents, but don't need to.

View File

@ -1,59 +0,0 @@
Filename: 148-uniform-client-end-reason.txt
Title: Stream end reasons from the client side should be uniform
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 2-Jul-2008
Status: Closed
Implemented-In: 0.2.1.9-alpha
Overview
When a stream closes before it's finished, the end relay cell that's
sent includes an "end stream reason" to tell the other end why it
closed. It's useful for the exit relay to send a reason to the client,
so the client can choose a different circuit, inform the user, etc. But
there's no reason to include it from the client to the exit relay,
and in some cases it can even harm anonymity.
We should pick a single reason for the client-to-exit-relay direction
and always just send that.
Motivation
Back when I first deployed the Tor network, it was useful to have
the Tor relays learn why a stream closed, so I could debug both ends
of the stream at once. Now that streams have worked for many years,
there's no need to continue telling the exit relay whether the client
gave up on a stream because of "timeout" or "misc" or what.
Then in Tor 0.2.0.28-rc, I fixed this bug:
- Fix a bug where, when we were choosing the 'end stream reason' to
put in our relay end cell that we send to the exit relay, Tor
clients on Windows were sometimes sending the wrong 'reason'. The
anonymity problem is that exit relays may be able to guess whether
the client is running Windows, thus helping partition the anonymity
set. Down the road we should stop sending reasons to exit relays,
or otherwise prevent future versions of this bug.
It turned out that non-Windows clients were choosing their reason
correctly, whereas Windows clients were potentially looking at errno
wrong and so always choosing 'misc'.
I fixed that particular bug, but I think we should prevent future
versions of the bug too.
(We already fixed it so *circuit* end reasons don't get sent from
the client to the exit relay. But we appear to be have skipped over
stream end reasons thus far.)
Design:
One option would be to no longer include any 'reason' field in end
relay cells. But that would introduce a partitioning attack ("users
running the old version" vs "users running the new version").
Instead I suggest that clients all switch to sending the "misc" reason,
like most of the Windows clients currently do and like the non-Windows
clients already do sometimes.

View File

@ -1,44 +0,0 @@
Filename: 149-using-netinfo-data.txt
Title: Using data from NETINFO cells
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: 2-Jul-2008
Status: Open
Target: 0.2.1.x
Overview
Current Tor versions send signed IP and timestamp information in
NETINFO cells, but don't use them to their fullest. This proposal
describes how they should start using this info in 0.2.1.x.
Motivation
Our directory system relies on clients and routers having
reasonably accurate clocks to detect replayed directory info, and
to set accurate timestamps on directory info they publish
themselves. NETINFO cells contain timestamps.
Also, the directory system relies on routers having a reasonable
idea of their own IP addresses, so they can publish correct
descriptors. This is also in NETINFO cells.
Learning the time and IP
We need to think about attackers here. Just because a router tells
us that we have a given IP or a given clock skew doesn't mean that
it's true. We believe this information only if we've heard it from
a majority of the routers we've connected to recently, including at
least 3 routers. Routers only believe this information if the
majority inclues at least one authority.
Avoiding MITM attacks
Current Tors use the IP addresses published in the other router's
NETINFO cells to see whether the connection is "canonical". Right
now, we prefer to extend circuits over "canonical" connections. In
0.2.1.x, we should refuse to extend circuits over non-canonical
connections without first trying to build a canonical one.

View File

@ -1,48 +0,0 @@
Filename: 150-exclude-exit-nodes.txt
Title: Exclude Exit Nodes from a circuit
Version: $Revision$
Author: Mfr
Created: 2008-06-15
Status: Closed
Implemented-In: 0.2.1.3-alpha
Overview
Right now, Tor users can manually exclude a node from all positions
in their circuits created using the directive ExcludeNodes.
This proposal makes this exclusion less restrictive, allowing users to
exclude a node only from the exit part of a circuit.
Motivation
This feature would Help the integration into vidalia (tor exit
branch) or other tools, of features to exclude a country for exit
without reducing circuits possibilities, and privacy. This feature
could help people from a country were many sites are blocked to
exclude this country for browsing, giving them a more stable
navigation. It could also add the possibility for the user to
exclude a currently used exit node.
Implementation
ExcludeExitNodes is similar to ExcludeNodes except it's only
the exit node which is excluded for circuit build.
Tor doesn't warn if node from this list is not an exit node.
Security implications:
Open also possibilities for a future user bad exit reporting
Risks:
Use of this option can make users partitionable under certain attack
assumptions. However, ExitNodes already creates this possibility,
so there isn't much increased risk in ExcludeExitNodes.
We should still encourage people who exclude an exit node because
of bad behavior to report it instead of just adding it to their
ExcludeExit list. It would be unfortunate if we didn't find out
about broken exits because of this option. This issue can probably
be addressed sufficiently with documentation.

View File

@ -1,147 +0,0 @@
Filename: 151-path-selection-improvements.txt
Title: Improving Tor Path Selection
Version:
Last-Modified:
Author: Fallon Chen, Mike Perry
Created: 5-Jul-2008
Status: Draft
Overview
The performance of paths selected can be improved by adjusting the
CircuitBuildTimeout and avoiding failing guard nodes. This proposal
describes a method of tracking buildtime statistics at the client, and
using those statistics to adjust the CircuitBuildTimeout.
Motivation
Tor's performance can be improved by excluding those circuits that
have long buildtimes (and by extension, high latency). For those Tor
users who require better performance and have lower requirements for
anonymity, this would be a very useful option to have.
Implementation
Storing Build Times
Circuit build times will be stored in the circular array
'circuit_build_times' consisting of uint16_t elements as milliseconds.
The total size of this array will be based on the number of circuits
it takes to converge on a good fit of the long term distribution of
the circuit builds for a fixed link. We do not want this value to be
too large, because it will make it difficult for clients to adapt to
moving between different links.
From our initial observations, this value appears to be on the order
of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.
The exact value for this #define will be determined by performing
goodness of fit tests using measurments obtained from the shufflebt.py
script from TorFlow.
Long Term Storage
The long-term storage representation will be implemented by storing a
histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
writing out the statistics to disk. The format of this histogram on disk
is yet to be finalized, but it will likely be of the format
'CircuitBuildTime <bin> <count>', with the total specified as
'TotalBuildTimes <total>'
Example:
TotalBuildTimes 100
CircuitBuildTimeBin 1 50
CircuitBuildTimeBin 2 25
CircuitBuildTimeBin 3 13
...
Reading the histogram in will entail multiplying each bin by the
BUILDTIME_BIN_WIDTH and then inserting <count> values into the
circuit_build_times array each with the value of
<bin>*BUILDTIME_BIN_WIDTH. In order to evenly distribute the
values in the circular array, a form of index skipping must
be employed. Values from bin #N with bin count C and total T
will occupy indexes specified by N+((T/C)*k)-1, where k is the
set of integers ranging from 0 to C-1.
For example, this would mean that the values from bin 1 would
occupy indexes 1+(100/50)*k-1, or 0, 2, 4, 6, 8, 10 and so on.
The values for bin 2 would occupy positions 1, 5, 9, 13. Collisions
will be inserted at the first empty position in the array greater
than the selected index (which may requiring looping around the
array back to index 0).
Learning the CircuitBuildTimeout
Based on studies of build times, we found that the distribution of
circuit buildtimes appears to be a Pareto distribution.
We will calculate the parameters for a Pareto distribution
fitting the data using the estimators at
http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
The timeout itself will be calculated by solving the CDF for the
a percentile cutoff BUILDTIME_PERCENT_CUTOFF. This value
represents the percentage of paths the Tor client will accept out of
the total number of paths. We have not yet determined a good
cutoff for this mathematically, but 85% seems a good choice for now.
From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,
the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm.
Testing
After circuit build times, storage, and learning are implemented,
the resulting histogram should be checked for consistency by
verifying it persists across successive Tor invocations where
no circuits are built. In addition, we can also use the existing
buildtime scripts to record build times, and verify that the histogram
the python produces matches that which is output to the state file in Tor,
and verify that the Pareto parameters and cutoff points also match.
Soft timeout vs Hard Timeout
At some point, it may be desirable to change the cutoff from a
single hard cutoff that destroys the circuit to a soft cutoff and
a hard cutoff, where the soft cutoff merely triggers the building
of a new circuit, and the hard cutoff triggers destruction of the
circuit.
Good values for hard and soft cutoffs seem to be 85% and 65%
respectively, but we should eventually justify this with observation.
When to Begin Calculation
The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before
changing the CircuitBuildTimeout will be tunable via a #define. From
our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be
on the order of 100.
Dealing with Timeouts
Timeouts should be counted as the expectation of the region of
of the Pareto distribution beyond the cutoff. The proposal will
be updated with this value soon.
Also, in the event of network failure, the observation mechanism
should stop collecting timeout data.
Client Hints
Some research still needs to be done to provide initial values
for CircuitBuildTimeout based on values learned from modem
users, DSL users, Cable Modem users, and dedicated links. A
radiobutton in Vidalia should eventually be provided that
sets CircuitBuildTimeout to one of these values and also
provide the option of purging all learned data, should any exist.
These values can either be published in the directory, or
shipped hardcoded for a particular Tor version.
Issues
Impact on anonymity
Since this follows a Pareto distribution, large reductions on the
timeout can be achieved without cutting off a great number of the
total paths. This will eliminate a great deal of the performance
variation of Tor usage.

View File

@ -1,64 +0,0 @@
Filename: 152-single-hop-circuits.txt
Title: Optionally allow exit from single-hop circuits
Version:
Last-Modified:
Author: Geoff Goodell
Created: 13-Jul-2008
Status: Closed
Implemented-In: 0.2.1.6-alpha
Overview
Provide a special configuration option that adds a line to descriptors
indicating that a router can be used as an exit for one-hop circuits,
and allow clients to attach streams to one-hop circuits provided
that the descriptor for the router in the circuit includes this
configuration option.
Motivation
At some point, code was added to restrict the attachment of streams
to one-hop circuits.
The idea seems to be that we can use the cost of forking and
maintaining a patch as a lever to prevent people from writing
controllers that jeopardize the operational security of routers
and the anonymity properties of the Tor network by creating and
using one-hop circuits rather than the standard three-hop circuits.
It may be, for example, that some users do not actually seek true
anonymity but simply reachability through network perspectives
afforded by the Tor network, and since anonymity is stronger in
numbers, forcing users to contribute to anonymity and decrease the
risk to server operators by using full-length paths may be reasonable.
As presently implemented, the sweeping restriction of one-hop circuits
for all routers limits the usefulness of Tor as a general-purpose
technology for building circuits. In particular, we should allow
for controllers, such as Blossom, that create and use single-hop
circuits involving routers that are not part of the Tor network.
Design
Introduce a configuration option for Tor servers that, when set,
indicates that a router is willing to provide exit from one-hop
circuits. Routers with this policy will not require that a circuit
has at least two hops when it is used as an exit.
In addition, routers for which this configuration option
has been set will have a line in their descriptors, "opt
exit-from-single-hop-circuits". Clients will keep track of which
routers have this option and allow streams to be attached to
single-hop circuits that include such routers.
Security Considerations
This approach seems to eliminate the worry about operational router
security, since server operators will not set the configuraiton
option unless they are willing to take on such risk.
To reduce the impact on anonymity of the network resulting
from including such "risky" routers in regular Tor path
selection, clients may systematically exclude routers with "opt
exit-from-single-hop-circuits" when choosing random paths through
the Tor network.

View File

@ -1,177 +0,0 @@
Filename: 153-automatic-software-update-protocol.txt
Title: Automatic software update protocol
Version: $Revision$
Last-Modified: $Date$
Author: Jacob Appelbaum
Created: 14-July-2008
Status: Superseded
[Superseded by thandy-spec.txt]
Automatic Software Update Protocol Proposal
0.0 Introduction
The Tor project and its users require a robust method to update shipped
software bundles. The software bundles often includes Vidalia, Privoxy, Polipo,
Torbutton and of course Tor itself. It is not inconcievable that an update
could include all of the Tor Browser Bundle. It seems reasonable to make this
a standalone program that can be called in shell scripts, cronjobs or by
various Tor controllers.
0.1 Minimal Tasks To Implement Automatic Updating
At the most minimal, an update must be able to do the following:
0 - Detect the curent Tor version, note the working status of Tor.
1 - Detect the latest Tor version.
2 - Fetch the latest version in the form of a platform specific package(s).
3 - Verify the itegrity of the downloaded package(s).
4 - Install the verified package(s).
5 - Test that the new package(s) works properly.
0.2 Specific Enumeration Of Minimal Tasks
To implement requirement 0, we need to detect the current Tor version of both
the updater and the current running Tor. The update program itself should be
versioned internally. This requirement should also test connecting through Tor
itself and note if such connections are possible.
To implement requirement 1, we need to learn the concensus from the directory
authorities or fail back to a known good URL with cryptographically signed
content.
To implement requirement 2, we need to download Tor - hopefully over Tor.
To implement requirement 3, we need to verify the package signature.
To implement requirement 4, we need to use a platform specific method of
installation. The Tor controller performing the update perform these platform
specific methods.
To implement requirement 5, we need to be able to extend circuits and reach
the internet through Tor.
0.x Implementation Goals
The update system will be cross platform and rely on as little external code
as possible. If the update system uses it, it must be updated by the update
system itself. It will consist only of free software and will not rely on any
non-free components until the actual installation phase. If a package manager
is in use, it will be platform specific and thus only invoked by the update
system implementing the update protocol.
The update system itself will attempt to perform update related network
activity over Tor. Possibly it will attempt to use a hidden service first.
It will attempt to use novel and not so novel caching
when possible, it will always verify cryptographic signatures before any
remotely fetched code is executed. In the event of an unusable Tor system,
it will be able to attempt to fetch updates without Tor. This should be user
configurable, some users will be unwilling to update without the protection of
using Tor - others will simply be unable because of blocking of the main Tor
website.
The update system will track current version numbers of Tor and supporting
software. The update system will also track known working versions to assist
with automatic The update system itself will be a standalone library. It will be
strongly versioned internally to match the Tor bundle it was shiped with. The
update system will keep track of the given platform, cpu architecture, lsb_release,
package management functionality and any other platform specific metadata.
We have referenced two popular automatic update systems, though neither fit
our needs, both are useful as an idea of what others are doing in the same
area.
The first is sparkle[0] but it is sadly only available for Cocoa
environments and is written in Objective C. This doesn't meet our requirements
because it is directly tied into the private Apple framework.
The second is the Mozilla Automatic Update System[1]. It is possibly useful
as an idea of how other free software projects automatically update. It is
however not useful in its currently documented form.
[0] http://sparkle.andymatuschak.org/documentation/
[1] http://wiki.mozilla.org/AUS:Manual
0.x Previous methods of Tor and related software update
Previously, Tor users updated their Tor related software by hand. There has
been no fully automatic method for any user to update. In addition, there
hasn't been any specific way to find out the most current stable version of Tor
or related software as voted on by the directory authority concensus.
0.x Changes to the directory specification
We will want to supplement client-versions and server-versions in the
concensus voting with another version identifier known as
'auto-update-versions'. This will keep track of the current concensus of
specific versions that are best per platform and per architecture. It should
be noted that while the Mac OS X universal binary may be the best for x86
processers with Tiger, it may not be the best for PPC users on Panther. This
goes for all of the package updates. We want to prevent updates that cause Tor
to break even if the updating program can recover gracefully.
x.x Assumptions About Operating System Package Management
It is assumed that users will use their package manager unless they are on
Microsoft Windows (any version) or Mac OS X (any version). Microsoft Windows
users will have integration with the normal "add/remove program" functionality
that said users would expect.
x.x Package Update System Failure Modes
The package update will try to ensure that a user always has a working Tor at
the very least. It will keep state to remember versions of Tor that were able
to bootstrap properly and reach the rest of the Tor network. It will also keep
note of which versions broke. It will select the best Tor that works for the
user. It will also allow for anonymized bug reporting on the packages
available and tested by the auto-update system.
x.x Package Signature Verification
The update system will be aware of replay attacks against the update signature
system itself. It will not allow package update signatures that are radically
out of date. It will be a multi-key system to prevent any single party from
forging an update. The key will be updated regularly. This is like authority
key (see proposal 103) usage.
x.x Package Caching
The update system will iterate over different update methods. Whichever method
is picked will have caching functionality. Each Tor server itself should be
able to serve cached update files. This will be an option that friendly server
administrators can turn on should they wish to support caching. In addition,
it is possible to cache the full contents of a package in an
authoratative DNS zone. Users can then query the DNS zone for their package.
If we wish to further distribute the update load, we can also offer packages
with encrypted bittorrent. Clients who wish to share the updates but do not
wish to be a server can help distribute Tor updates. This can be tied together
with the DNS caching[2][3] if needed.
[2] http://www.netrogenic.com/dnstorrent/
[3] http://www.doxpara.com/ozymandns_src_0.1.tgz
x.x Helping Our Users Spread Tor
There should be a way for a user to participate in the packaging caching as
described in section x.x. This option should be presented by the Tor
controller.
x.x Simple HTTP Proxy To The Tor Project Website
It has been suggested that we should provide a simple proxy that allows a user
to visit the main Tor website to download packages. This was part of a
previous proposal and has not been closely examined.
x.x Package Installation
Platform specific methods for proper package installation will be left to the
controller that is calling for an update. Each platform is different, the
installation options and user interface will be specific to the controller in
question.
x.x Other Things
Other things should be added to this proposal. What are they?

View File

@ -1,379 +0,0 @@
Filename: 154-automatic-updates.txt
Title: Automatic Software Update Protocol
Version: $Revision$
Last-Modified: $Date$
Author: Matt Edman
Created: 30-July-2008
Status: Superseded
Target: 0.2.1.x
Superseded by thandy-spec.txt
Scope
This proposal specifies the method by which an automatic update client can
determine the most recent recommended Tor installation package for the
user's platform, download the package, and then verify that the package was
downloaded successfully. While this proposal focuses on only the Tor
software, the protocol defined is sufficiently extensible such that other
components of the Tor bundles, like Vidalia, Polipo, and Torbutton, can be
managed and updated by the automatic update client as well.
The initial target platform for the automatic update framework is Windows,
given that's the platform used by a majority of our users and that it lacks
a sane package management system that many Linux distributions already have.
Our second target platform will be Mac OS X, and so the protocol will be
designed with this near-future direction in mind.
Other client-side aspects of the automatic update process, such as user
interaction, the interface presented, and actual package installation
procedure, are outside the scope of this proposal.
Motivation
Tor releases new versions frequently, often with important security,
anonymity, and stability fixes. Thus, it is important for users to be able
to promptly recognize when new versions are available and to easily
download, authenticate, and install updated Tor and Tor-related software
packages.
Tor's control protocol [2] provides a method by which controllers can
identify when the user's Tor software is obsolete or otherwise no longer
recommended. Currently, however, no mechanism exists for clients to
automatically download and install updated Tor and Tor-related software for
the user.
Design Overview
The core of the automatic update framework is a well-defined file called a
"recommended-packages" file. The recommended-packages file is accessible via
HTTP[S] at one or more well-defined URLs. An example recommended-packages
URL may be:
https://updates.torproject.org/recommended-packages
The recommended-packages document is formatted according to Section 1.2
below and specifies the most recent recommended installation package
versions for Tor or Tor-related software, as well as URLs at which the
packages and their signatures can be downloaded.
An automatic update client process runs on the Tor user's computer and
periodically retrieves the recommended-packages file according to the method
described in Section 2.0. As described further in Section 1.2, the
recommended-packages file is signed and can be verified by the automatic
update client with one or more public keys included in the client software.
Since it is signed, the recommended-packages file can be mirrored by
multiple hosts (e.g., Tor directory authorities), whose URLs are included in
the automatic update client's configuration.
After retrieving and verifying the recommended-packages file, the automatic
update client compares the versions of the recommended software packages
listed in the file with those currently installed on the end-user's
computer. If one or more of the installed packages is determined to be out
of date, an updated package and its signature will be downloaded from one of
the package URLs listed in the recommended-packages file as described in
Section 2.2.
The automatic update system uses a multilevel signing key scheme for package
signatures. There are a small number of entities we call "packaging
authorities" that each have their own signing key. A packaging authority is
responsible for signing and publishing the recommended-packages file.
Additionally, each individual packager responsible for producing an
installation package for one or more platforms has their own signing key.
Every packager's signing key must be signed by at least one of the packaging
authority keys.
Specification
1. recommended-packages Specification
In this section we formally specify the format of the published
recommended-packages file.
1.1. Document Meta-format
The recommended-packages document follows the lightweight extensible
information format defined in Tor's directory protocol specification [1]. In
the interest of self-containment, we have reproduced the relevant portions
of that format's specification in this Section. (Credits to Nick Mathewson
for much of the original format definition language.)
The highest level object is a Document, which consists of one or more
Items. Every Item begins with a KeywordLine, followed by zero or more
Objects. A KeywordLine begins with a Keyword, optionally followed by
whitespace and more non-newline characters, and ends with a newline. A
Keyword is a sequence of one or more characters in the set [A-Za-z0-9-].
An Object is a block of encoded data in pseudo-Open-PGP-style
armor. (cf. RFC 2440)
More formally:
Document ::= (Item | NL)+
Item ::= KeywordLine Object*
KeywordLine ::= Keyword NL | Keyword WS ArgumentChar+ NL
Keyword ::= KeywordChar+
KeywordChar ::= 'A' ... 'Z' | 'a' ... 'z' | '0' ... '9' | '-'
ArgumentChar ::= any printing ASCII character except NL.
WS ::= (SP | TAB)+
Object ::= BeginLine Base-64-encoded-data EndLine
BeginLine ::= "-----BEGIN " Keyword "-----" NL
EndLine ::= "-----END " Keyword "-----" NL
The BeginLine and EndLine of an Object must use the same keyword.
In our Document description below, we also tag Items with a multiplicity in
brackets. Possible tags are:
"At start, exactly once": These items MUST occur in every instance of the
document type, and MUST appear exactly once, and MUST be the first item in
their documents.
"Exactly once": These items MUST occur exactly one time in every
instance of the document type.
"Once or more": These items MUST occur at least once in any instance
of the document type, and MAY occur more than once.
"At end, exactly once": These items MUST occur in every instance of
the document type, and MUST appear exactly once, and MUST be the
last item in their documents.
1.2. recommended-packages Document Format
When interpreting a recommended-packages Document, software MUST ignore
any KeywordLine that starts with a keyword it doesn't recognize; future
implementations MUST NOT require current automatic update clients to
understand any KeywordLine not currently described.
In lines that take multiple arguments, extra arguments SHOULD be
accepted and ignored.
The currently defined Items contained in a recommended-packages document
are:
"recommended-packages-format" SP number NL
[Exactly once]
This Item specifies the version of the recommended-packages format that
is contained in the subsequent document. The version defined in this
proposal is version "1". Subsequent iterations of this protocol MUST
increment this value if they introduce incompatible changes to the
document format and MAY increment this value if they only introduce
additional Keywords.
"published" SP YYYY-MM-DD SP HH:MM:SS NL
[Exactly once]
The time, in GMT, when this recommended-packages document was generated.
Automatic update clients SHOULD ignore Documents over 60 days old.
"tor-stable-win32-version" SP TorVersion NL
[Exactly once]
This keyword specifies the latest recommended release of Tor's "stable"
branch for the Windows platform that has an installation package
available. Note that this version does not necessarily correspond to the
most recently tagged stable Tor version, since that version may not yet
have an installer package available, or may have known issues on
Windows.
The TorVersion field is formatted according to Section 2 of Tor's
version specification [3].
"tor-stable-win32-package" SP Url NL
[Once or more]
This Item specifies the location from which the most recent
recommended Windows installation package for Tor's stable branch can be
downloaded.
When this Item appears multiple times within the Document, automatic
update clients SHOULD select randomly from the available package
mirrors.
"tor-dev-win32-version" SP TorVersion NL
[Exactly once]
This Item specifies the latest recommended release of Tor's
"development" branch for the Windows platform that has an installation
package available. The same caveats from the description of
"tor-stable-win32-version" also apply to this keyword.
The TorVersion field is formatted according to Section 2 of Tor's
version specification [3].
"tor-dev-win32-package" SP Url NL
[Once or more]
This Item specifies the location from which the most recent recommended
Windows installation package and its signature for Tor's development
branch can be downloaded.
When this Keyword appears multiple times within the Document, automatic
update clients SHOULD select randomly from the available package
mirrors.
"signature" NL SIGNATURE NL
[At end, exactly once]
The "SIGNATURE" Object contains a PGP signature (using a packaging
authority signing key) of the entire document, taken from the beginning
of the "recommended-packages-format" keyword, through the newline after
the "signature" Keyword.
2. Automatic Update Client Behavior
The client-side component of the automatic update framework is an
application that runs on the end-user's machine. It is responsible for
fetching and verifying a recommended-packages document, as well as
downloading, verifying, and subsequently installing any necessary updated
software packages.
2.1. Download and verify a recommended-packages document
The first step in the automatic update process is for the client to download
a copy of the recommended-packages file. The automatic update client
contains a (hardcoded and/or user-configurable) list of URLs from which it
will attempt to retrieve a recommended-packages file.
Connections to each of the recommended-packages URLs SHOULD be attempted in
the following order:
1) HTTPS over Tor
2) HTTP over Tor
3) Direct HTTPS
4) Direct HTTP
If the client fails to retrieve a recommended-packages document via any of
the above connection methods from any of the configured URLs, the client
SHOULD retry its download attempts following an exponential back-off
algorithm. After the first failed attempt, the client SHOULD delay one hour
before attempting again, up to a maximum of 24 hours delay between retry
attempts.
After successfully downloading a recommended-packages file, the automatic
update client will verify the signature using one of the public keys
distributed with the client software. If more than one recommended-packages
file is downloaded and verified, the file with the most recent "published"
date that is verified will be retained and the rest discarded.
2.2. Download and verify the updated packages
The automatic update client next compares the latest recommended package
version from the recommended-packages document with the currently installed
Tor version. If the user currently has installed a Tor version from Tor's
"development" branch, then the version specified in "tor-dev-*-version" Item
is used for comparison. Similarly, if the user currently has installed a Tor
version from Tor's "stable" branch, then the version specified in the
"tor-stable-*version" Item is used for comparison. Version comparisons are
done according to Tor's version specification [3].
If the automatic update client determines an installation package newer than
the user's currently installed version is available, it will attempt to
download a package appropriate for the user's platform and Tor branch from a
URL specified by a "tor-[branch]-[platform]-package" Item. If more than one
mirror for the selected package is available, a mirror will be chosen at
random from all those available.
The automatic update client must also download a ".asc" signature file for
the retrieved package. The URL for the package signature is the same as that
for the package itself, except with the extension ".asc" appended to the
package URL.
Connections to download the updated package and its signature SHOULD be
attempted in the same order described in Section 2.1.
After completing the steps described in Sections 2.1 and 2.2, the automatic
update client will have downloaded and verified a copy of the latest Tor
installation package. It can then take whatever subsequent platform-specific
steps are necessary to install the downloaded software updates.
2.3. Periodic checking for updates
The automatic update client SHOULD maintain a local state file in which it
records (at a minimum) the timestamp at which it last retrieved a
recommended-packages file and the timestamp at which the client last
successfully downloaded and installed a software update.
Automatic update clients SHOULD check for an updated recommended-packages
document at most once per day but at least once every 30 days.
3. Future Extensions
There are several possible areas for future extensions of this framework.
The extensions below are merely suggestions and should be the subject of
their own proposal before being implemented.
3.1. Additional Software Updates
There are several software packages often included in Tor bundles besides
Tor, such as Vidalia, Privoxy or Polipo, and Torbutton. The versions and
download locations of updated installation packages for these bundle
components can be easily added to the recommended-packages document
specification above.
3.2. Including ChangeLog Information
It may be useful for automatic update clients to be able to display for
users a summary of the changes made in the latest Tor or Tor-related
software release, before the user chooses to install the update. In the
future, we can add keywords to the specification in Section 1.2 that specify
the location of a ChangeLog file for the latest recommended package
versions. It may also be desirable to allow localized ChangeLog information,
so that the automatic update client can fetch release notes in the
end-user's preferred language.
3.3. Weighted Package Mirror Selection
We defined in Section 1.2 a method by which automatic update clients can
select from multiple available package mirrors. We may want to add a Weight
argument to the "*-package" Items that allows the recommended-packages file
to suggest to clients the probability with which a package mirror should be
chosen. This will allow clients to more appropriately distribute package
downloads across available mirrors proportional to their approximate
bandwidth.
Implementation
Implementation of this proposal will consist of two separate components.
The first component is a small "au-publish" tool that takes as input a
configuration file specifying the information described in Section 1.2 and a
private key. The tool is run by a "packaging authority" (someone responsible
for publishing updated installation packages), who will be prompted to enter
the passphrase for the private key used to sign the recommended-packages
document. The output of the tool is a document formatted according to
Section 1.2, with a signature appended at the end. The resulting document
can then be published to any of the update mirrors.
The second component is an "au-client" tool that is run on the end-user's
machine. It periodically checks for updated installation packages according
to Section 2 and fetches the packages if necessary. The public keys used
to sign the recommended-packages file and any of the published packages are
included in the "au-client" tool.
References
[1] Tor directory protocol (version 3),
https://tor-svn.freehaven.net/svn/tor/trunk/doc/spec/dir-spec.txt
[2] Tor control protocol (version 2),
https://tor-svn.freehaven.net/svn/tor/trunk/doc/spec/control-spec.txt
[3] Tor version specification,
https://tor-svn.freehaven.net/svn/tor/trunk/doc/spec/version-spec.txt

View File

@ -1,122 +0,0 @@
Filename: 155-four-hidden-service-improvements.txt
Title: Four Improvements of Hidden Service Performance
Version: $Revision$
Last-Modified: $Date$
Author: Karsten Loesing, Christian Wilms
Created: 25-Sep-2008
Status: Finished
Implemented-In: 0.2.1.x
Change history:
25-Sep-2008 Initial proposal for or-dev
Overview:
A performance analysis of hidden services [1] has brought up a few
possible design changes to reduce advertisement time of a hidden service
in the network as well as connection establishment time. Some of these
design changes have side-effects on anonymity or overall network load
which had to be weighed up against individual performance gains. A
discussion of seven possible design changes [2] has led to a selection
of four changes [3] that are proposed to be implemented here.
Design:
1. Shorter Circuit Extension Timeout
When establishing a connection to a hidden service a client cannibalizes
an existing circuit and extends it by one hop to one of the service's
introduction points. In most cases this can be accomplished within a few
seconds. Therefore, the current timeout of 60 seconds for extending a
circuit is far too high.
Assuming that the timeout would be reduced to a lower value, for example
30 seconds, a second (or third) attempt to cannibalize and extend would
be started earlier. With the current timeout of 60 seconds, 93.42% of all
circuits can be established, whereas this fraction would have been only
0.87% smaller at 92.55% with a timeout of 30 seconds.
For a timeout of 30 seconds the performance gain would be approximately 2
seconds in the mean as opposed to the current timeout of 60 seconds. At
the same time a smaller timeout leads to discarding an increasing number
of circuits that might have been completed within the current timeout of
60 seconds.
Measurements with simulated low-bandwidth connectivity have shown that
there is no significant effect of client connectivity on circuit
extension times. The reason for this might be that extension messages are
small and thereby independent of the client bandwidth. Further, the
connection between client and entry node only constitutes a single hop of
a circuit, so that its influence on the whole circuit is limited.
The exact value of the new timeout does not necessarily have to be 30
seconds, but might also depend on the results of circuit build timeout
measurements as described in proposal 151.
2. Parallel Connections to Introduction Points
An additional approach to accelerate extension of introduction circuits
is to extend a second circuit in parallel to a different introduction
point. Such parallel extension attempts should be started after a short
delay of, e.g., 15 seconds in order to prevent unnecessary circuit
extensions and thereby save network resources. Whichever circuit
extension succeeds first is used for introduction, while the other
attempt is aborted.
An evaluation has been performed for the more resource-intensive approach
of starting two parallel circuits immediately instead of waiting for a
short delay. The result was a reduction of connection establishment times
from 27.4 seconds in the original protocol to 22.5 seconds.
While the effect of the proposed approach of delayed parallelization on
mean connection establishment times is expected to be smaller,
variability of connection attempt times can be reduced significantly.
3. Increase Count of Internal Circuits
Hidden services need to create or cannibalize and extend a circuit to a
rendezvous point for every client request. Really popular hidden services
require more than two internal circuits in the pool to answer multiple
client requests at the same time. This scenario was not yet analyzed, but
will probably exhibit worse performance than measured in the previous
analysis. The number of preemptively built internal circuits should be a
function of connection requests in the past to adapt to changing needs.
Furthermore, an increased number of internal circuits on client side
would allow clients to establish connections to more than one hidden
service at a time.
Under the assumption that a popular hidden service cannot make use of
cannibalization for connecting to rendezvous points, the circuit creation
time needs to be added to the current results. In the mean, the
connection establishment time to a popular hidden service would increase
by 4.7 seconds.
4. Build More Introduction Circuits
When establishing introduction points, a hidden service should launch 5
instead of 3 introduction circuits at the same time and use only the
first 3 that could be established. The remaining two circuits could still
be used for other purposes afterwards.
The effect has been simulated using previously measured data, too.
Therefore, circuit establishment times were derived from log files and
written to an array. Afterwards, a simulation with 10,000 runs was
performed picking 5 (4, 6) random values and using the 3 lowest values in
contrast to picking only 3 values at random. The result is that the mean
time of the 3-out-of-3 approach is 8.1 seconds, while the mean time of
the 3-out-of-5 approach is 4.4 seconds.
The effect on network load is minimal, because the hidden service can
reuse the slower internal circuits for other purposes, e.g., rendezvous
circuits. The only change is that a hidden service starts establishing
more circuits at once instead of subsequently doing so.
References:
[1] http://freehaven.net/~karsten/hidserv/perfanalysis-2008-06-15.pdf
[2] http://freehaven.net/~karsten/hidserv/discussion-2008-07-15.pdf
[3] http://freehaven.net/~karsten/hidserv/design-2008-08-15.pdf

View File

@ -1,529 +0,0 @@
Filename: 156-tracking-blocked-ports.txt
Title: Tracking blocked ports on the client side
Version: $Revision$
Last-Modified: $Date$
Author: Robert Hogan
Created: 14-Oct-2008
Status: Open
Target: 0.2.?
Motivation:
Tor clients that are behind extremely restrictive firewalls can end up
waiting a while for their first successful OR connection to a node on the
network. Worse, the more restrictive their firewall the more susceptible
they are to an attacker guessing their entry nodes. Tor routers that
are behind extremely restrictive firewalls can only offer a limited,
'partitioned' service to other routers and clients on the network. Exit
nodes behind extremely restrictive firewalls may advertise ports that they
are actually not able to connect to, wasting network resources in circuit
constructions that are doomed to fail at the last hop on first use.
Proposal:
When a client attempts to connect to an entry guard it should avoid
further attempts on ports that fail once until it has connected to at
least one entry guard successfully. (Maybe it should wait for more than
one failure to reduce the skew on the first node selection.) Thereafter
it should select entry guards regardless of port and warn the user if
it observes that connections to a given port have failed every multiple
of 5 times without success or since the last success.
Tor should warn the operators of exit, middleman and entry nodes if it
observes that connections to a given port have failed a multiple of 5
times without success or since the last success. If attempts on a port
fail 20 or more times without or since success, Tor should add the port
to a 'blocked-ports' entry in its descriptor's extra-info. Some thought
needs to be given to what the authorities might do with this information.
Related TODO item:
"- Automatically determine what ports are reachable and start using
those, if circuits aren't working and it's a pattern we
recognize ("port 443 worked once and port 9001 keeps not
working")."
I've had a go at implementing all of this in the attached.
Addendum:
Just a note on the patch, storing the digest of each router that uses the port
is a bit of a memory hog, and its only real purpose is to provide a count of
routers using that port when warning the user. That could be achieved when
warning the user by iterating through the routerlist instead.
Index: src/or/connection_or.c
===================================================================
--- src/or/connection_or.c (revision 17104)
+++ src/or/connection_or.c (working copy)
@@ -502,6 +502,9 @@
connection_or_connect_failed(or_connection_t *conn,
int reason, const char *msg)
{
+ if ((reason == END_OR_CONN_REASON_NO_ROUTE) ||
+ (reason == END_OR_CONN_REASON_REFUSED))
+ or_port_hist_failure(conn->identity_digest,TO_CONN(conn)->port);
control_event_or_conn_status(conn, OR_CONN_EVENT_FAILED, reason);
if (!authdir_mode_tests_reachability(get_options()))
control_event_bootstrap_problem(msg, reason);
@@ -580,6 +583,7 @@
/* already marked for close */
return NULL;
}
+
return conn;
}
@@ -909,6 +913,7 @@
control_event_or_conn_status(conn, OR_CONN_EVENT_CONNECTED, 0);
if (started_here) {
+ or_port_hist_success(TO_CONN(conn)->port);
rep_hist_note_connect_succeeded(conn->identity_digest, now);
if (entry_guard_register_connect_status(conn->identity_digest,
1, now) < 0) {
Index: src/or/rephist.c
===================================================================
--- src/or/rephist.c (revision 17104)
+++ src/or/rephist.c (working copy)
@@ -18,6 +18,7 @@
static void bw_arrays_init(void);
static void predicted_ports_init(void);
static void hs_usage_init(void);
+static void or_port_hist_init(void);
/** Total number of bytes currently allocated in fields used by rephist.c. */
uint64_t rephist_total_alloc=0;
@@ -89,6 +90,25 @@
digestmap_t *link_history_map;
} or_history_t;
+/** or_port_hist_t contains our router/client's knowledge of
+ all OR ports offered on the network, and how many servers with each port we
+ have succeeded or failed to connect to. */
+typedef struct {
+ /** The port this entry is tracking. */
+ uint16_t or_port;
+ /** Have we ever connected to this port on another OR?. */
+ unsigned int success:1;
+ /** The ORs using this port. */
+ digestmap_t *ids;
+ /** The ORs using this port we have failed to connect to. */
+ digestmap_t *failure_ids;
+ /** Are we excluding ORs with this port during entry selection?*/
+ unsigned int excluded;
+} or_port_hist_t;
+
+static unsigned int still_searching = 0;
+static smartlist_t *or_port_hists;
+
/** When did we last multiply all routers' weighted_run_length and
* total_run_weights by STABILITY_ALPHA? */
static time_t stability_last_downrated = 0;
@@ -164,6 +184,16 @@
tor_free(hist);
}
+/** Helper: free storage held by a single OR port history entry. */
+static void
+or_port_hist_free(or_port_hist_t *p)
+{
+ tor_assert(p);
+ digestmap_free(p->ids,NULL);
+ digestmap_free(p->failure_ids,NULL);
+ tor_free(p);
+}
+
/** Update an or_history_t object <b>hist</b> so that its uptime/downtime
* count is up-to-date as of <b>when</b>.
*/
@@ -1639,7 +1669,7 @@
tmp_time = smartlist_get(predicted_ports_times, i);
if (*tmp_time + PREDICTED_CIRCS_RELEVANCE_TIME < now) {
tmp_port = smartlist_get(predicted_ports_list, i);
- log_debug(LD_CIRC, "Expiring predicted port %d", *tmp_port);
+ log_debug(LD_HIST, "Expiring predicted port %d", *tmp_port);
smartlist_del(predicted_ports_list, i);
smartlist_del(predicted_ports_times, i);
rephist_total_alloc -= sizeof(uint16_t)+sizeof(time_t);
@@ -1821,6 +1851,12 @@
tor_free(last_stability_doc);
built_last_stability_doc_at = 0;
predicted_ports_free();
+ if (or_port_hists) {
+ SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, p,
+ or_port_hist_free(p));
+ smartlist_free(or_port_hists);
+ or_port_hists = NULL;
+ }
}
/****************** hidden service usage statistics ******************/
@@ -2356,3 +2392,225 @@
tor_free(fname);
}
+/** Create a new entry in the port tracking cache for the or_port in
+ * <b>ri</b>. */
+void
+or_port_hist_new(const routerinfo_t *ri)
+{
+ or_port_hist_t *result;
+ const char *id=ri->cache_info.identity_digest;
+
+ if (!or_port_hists)
+ or_port_hist_init();
+
+ SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp,
+ {
+ /* Cope with routers that change their advertised OR port or are
+ dropped from the networkstatus. We don't discard the failures of
+ dropped routers because they are still valid when counting
+ consecutive failures on a port.*/
+ if (digestmap_get(tp->ids, id) && (tp->or_port != ri->or_port)) {
+ digestmap_remove(tp->ids, id);
+ }
+ if (tp->or_port == ri->or_port) {
+ if (!(digestmap_get(tp->ids, id)))
+ digestmap_set(tp->ids, id, (void*)1);
+ return;
+ }
+ });
+
+ result = tor_malloc_zero(sizeof(or_port_hist_t));
+ result->or_port=ri->or_port;
+ result->success=0;
+ result->ids=digestmap_new();
+ digestmap_set(result->ids, id, (void*)1);
+ result->failure_ids=digestmap_new();
+ result->excluded=0;
+ smartlist_add(or_port_hists, result);
+}
+
+/** Create the port tracking cache. */
+/*XXX: need to call this when we rebuild/update our network status */
+static void
+or_port_hist_init(void)
+{
+ routerlist_t *rl = router_get_routerlist();
+
+ if (!or_port_hists)
+ or_port_hists=smartlist_create();
+
+ if (rl && rl->routers) {
+ SMARTLIST_FOREACH(rl->routers, routerinfo_t *, ri,
+ {
+ or_port_hist_new(ri);
+ });
+ }
+}
+
+#define NOT_BLOCKED 0
+#define FAILURES_OBSERVED 1
+#define POSSIBLY_BLOCKED 5
+#define PROBABLY_BLOCKED 10
+/** Return the list of blocked ports for our router's extra-info.*/
+char *
+or_port_hist_get_blocked_ports(void)
+{
+ char blocked_ports[2048];
+ char *bp;
+
+ tor_snprintf(blocked_ports,sizeof(blocked_ports),"blocked-ports");
+ SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp,
+ {
+ if (digestmap_size(tp->failure_ids) >= PROBABLY_BLOCKED)
+ tor_snprintf(blocked_ports+strlen(blocked_ports),
+ sizeof(blocked_ports)," %u,",tp->or_port);
+ });
+ if (strlen(blocked_ports) == 13)
+ return NULL;
+ bp=tor_strdup(blocked_ports);
+ bp[strlen(bp)-1]='\n';
+ bp[strlen(bp)]='\0';
+ return bp;
+}
+
+/** Revert to client-only mode if we have seen to many failures on a port or
+ * range of ports.*/
+static void
+or_port_hist_report_block(unsigned int min_severity)
+{
+ or_options_t *options=get_options();
+ char failures_observed[2048],possibly_blocked[2048],probably_blocked[2048];
+ char port[1024];
+
+ memset(failures_observed,0,sizeof(failures_observed));
+ memset(possibly_blocked,0,sizeof(possibly_blocked));
+ memset(probably_blocked,0,sizeof(probably_blocked));
+
+ SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp,
+ {
+ unsigned int failures = digestmap_size(tp->failure_ids);
+ if (failures >= min_severity) {
+ tor_snprintf(port, sizeof(port), " %u (%u failures %s out of %u on the"
+ " network)",tp->or_port,failures,
+ (!tp->success)?"and no successes": "since last success",
+ digestmap_size(tp->ids));
+ if (failures >= PROBABLY_BLOCKED) {
+ strlcat(probably_blocked, port, sizeof(probably_blocked));
+ } else if (failures >= POSSIBLY_BLOCKED)
+ strlcat(possibly_blocked, port, sizeof(possibly_blocked));
+ else if (failures >= FAILURES_OBSERVED)
+ strlcat(failures_observed, port, sizeof(failures_observed));
+ }
+ });
+
+ log_warn(LD_HIST,"%s%s%s%s%s%s%s%s",
+ server_mode(options) &&
+ ((min_severity==FAILURES_OBSERVED) || strlen(probably_blocked))?
+ "You should consider disabling your Tor server.":"",
+ (min_severity==FAILURES_OBSERVED)?
+ "Tor appears to be blocked from connecting to a range of ports "
+ "with the result that it cannot connect to one tenth of the Tor "
+ "network. ":"",
+ strlen(failures_observed)?
+ "Tor has observed failures on the following ports: ":"",
+ failures_observed,
+ strlen(possibly_blocked)?
+ "Tor is possibly blocked on the following ports: ":"",
+ possibly_blocked,
+ strlen(probably_blocked)?
+ "Tor is almost certainly blocked on the following ports: ":"",
+ probably_blocked);
+
+}
+
+/** Record the success of our connection to <b>digest</b>'s
+ * OR port. */
+void
+or_port_hist_success(uint16_t or_port)
+{
+ SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp,
+ {
+ if (tp->or_port != or_port)
+ continue;
+ /*Reset our failure stats so we can notice if this port ever gets
+ blocked again.*/
+ tp->success=1;
+ if (digestmap_size(tp->failure_ids)) {
+ digestmap_free(tp->failure_ids,NULL);
+ tp->failure_ids=digestmap_new();
+ }
+ if (still_searching) {
+ still_searching=0;
+ SMARTLIST_FOREACH(or_port_hists,or_port_hist_t *,t,t->excluded=0;);
+ }
+ return;
+ });
+}
+/** Record the failure of our connection to <b>digest</b>'s
+ * OR port. Warn, exclude the port from future entry guard selection, or
+ * add port to blocked-ports in our server's extra-info as appropriate. */
+void
+or_port_hist_failure(const char *digest, uint16_t or_port)
+{
+ int total_failures=0, ports_excluded=0, report_block=0;
+ int total_routers=smartlist_len(router_get_routerlist()->routers);
+
+ SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp,
+ {
+ ports_excluded += tp->excluded;
+ total_failures+=digestmap_size(tp->failure_ids);
+ if (tp->or_port != or_port)
+ continue;
+ /* We're only interested in unique failures */
+ if (digestmap_get(tp->failure_ids, digest))
+ return;
+
+ total_failures++;
+ digestmap_set(tp->failure_ids, digest, (void*)1);
+ if (still_searching && !tp->success) {
+ tp->excluded=1;
+ ports_excluded++;
+ }
+ if ((digestmap_size(tp->ids) >= POSSIBLY_BLOCKED) &&
+ !(digestmap_size(tp->failure_ids) % POSSIBLY_BLOCKED))
+ report_block=POSSIBLY_BLOCKED;
+ });
+
+ if (total_failures >= (int)(total_routers/10))
+ or_port_hist_report_block(FAILURES_OBSERVED);
+ else if (report_block)
+ or_port_hist_report_block(report_block);
+
+ if (ports_excluded >= smartlist_len(or_port_hists)) {
+ log_warn(LD_HIST,"During entry node selection Tor tried every port "
+ "offered on the network on at least one server "
+ "and didn't manage a single "
+ "successful connection. This suggests you are behind an "
+ "extremely restrictive firewall. Tor will keep trying to find "
+ "a reachable entry node.");
+ SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp, tp->excluded=0;);
+ }
+}
+
+/** Add any ports marked as excluded in or_port_hist_t to <b>rt</b> */
+void
+or_port_hist_exclude(routerset_t *rt)
+{
+ SMARTLIST_FOREACH(or_port_hists, or_port_hist_t *, tp,
+ {
+ char portpolicy[9];
+ if (tp->excluded) {
+ tor_snprintf(portpolicy,sizeof(portpolicy),"*:%u", tp->or_port);
+ log_warn(LD_HIST,"Port %u may be blocked, excluding it temporarily "
+ "from entry guard selection.", tp->or_port);
+ routerset_parse(rt, portpolicy, "Ports");
+ }
+ });
+}
+
+/** Allow the exclusion of ports during our search for an entry node. */
+void
+or_port_hist_search_again(void)
+{
+ still_searching=1;
+}
Index: src/or/or.h
===================================================================
--- src/or/or.h (revision 17104)
+++ src/or/or.h (working copy)
@@ -3864,6 +3864,13 @@
int any_predicted_circuits(time_t now);
int rep_hist_circbuilding_dormant(time_t now);
+void or_port_hist_failure(const char *digest, uint16_t or_port);
+void or_port_hist_success(uint16_t or_port);
+void or_port_hist_new(const routerinfo_t *ri);
+void or_port_hist_exclude(routerset_t *rt);
+void or_port_hist_search_again(void);
+char *or_port_hist_get_blocked_ports(void);
+
/** Possible public/private key operations in Tor: used to keep track of where
* we're spending our time. */
typedef enum {
Index: src/or/routerparse.c
===================================================================
--- src/or/routerparse.c (revision 17104)
+++ src/or/routerparse.c (working copy)
@@ -1401,6 +1401,8 @@
goto err;
}
+ or_port_hist_new(router);
+
if (!router->platform) {
router->platform = tor_strdup("<unknown>");
}
Index: src/or/router.c
===================================================================
--- src/or/router.c (revision 17104)
+++ src/or/router.c (working copy)
@@ -1818,6 +1818,7 @@
char published[ISO_TIME_LEN+1];
char digest[DIGEST_LEN];
char *bandwidth_usage;
+ char *blocked_ports;
int result;
size_t len;
@@ -1825,7 +1826,6 @@
extrainfo->cache_info.identity_digest, DIGEST_LEN);
format_iso_time(published, extrainfo->cache_info.published_on);
bandwidth_usage = rep_hist_get_bandwidth_lines(1);
-
result = tor_snprintf(s, maxlen,
"extra-info %s %s\n"
"published %s\n%s",
@@ -1835,6 +1835,16 @@
if (result<0)
return -1;
+ blocked_ports = or_port_hist_get_blocked_ports();
+ if (blocked_ports) {
+ result = tor_snprintf(s+strlen(s), maxlen-strlen(s),
+ "%s",
+ blocked_ports);
+ tor_free(blocked_ports);
+ if (result<0)
+ return -1;
+ }
+
if (should_record_bridge_info(options)) {
static time_t last_purged_at = 0;
char *geoip_summary;
Index: src/or/circuitbuild.c
===================================================================
--- src/or/circuitbuild.c (revision 17104)
+++ src/or/circuitbuild.c (working copy)
@@ -62,6 +62,7 @@
static void entry_guards_changed(void);
static time_t start_of_month(time_t when);
+static int num_live_entry_guards(void);
/** Iterate over values of circ_id, starting from conn-\>next_circ_id,
* and with the high bit specified by conn-\>circ_id_type, until we get
@@ -1627,12 +1628,14 @@
smartlist_t *excluded;
or_options_t *options = get_options();
router_crn_flags_t flags = 0;
+ routerset_t *_ExcludeNodes;
if (state && options->UseEntryGuards &&
(purpose != CIRCUIT_PURPOSE_TESTING || options->BridgeRelay)) {
return choose_random_entry(state);
}
+ _ExcludeNodes = routerset_new();
excluded = smartlist_create();
if (state && (r = build_state_get_exit_router(state))) {
@@ -1670,12 +1673,18 @@
if (options->_AllowInvalid & ALLOW_INVALID_ENTRY)
flags |= CRN_ALLOW_INVALID;
+ if (options->ExcludeNodes)
+ routerset_union(_ExcludeNodes,options->ExcludeNodes);
+
+ or_port_hist_exclude(_ExcludeNodes);
+
choice = router_choose_random_node(
NULL,
excluded,
- options->ExcludeNodes,
+ _ExcludeNodes,
flags);
smartlist_free(excluded);
+ routerset_free(_ExcludeNodes);
return choice;
}
@@ -2727,6 +2736,7 @@
entry_guards_update_state(or_state_t *state)
{
config_line_t **next, *line;
+ unsigned int have_reachable_entry=0;
if (! entry_guards_dirty)
return;
@@ -2740,6 +2750,7 @@
char dbuf[HEX_DIGEST_LEN+1];
if (!e->made_contact)
continue; /* don't write this one to disk */
+ have_reachable_entry=1;
*next = line = tor_malloc_zero(sizeof(config_line_t));
line->key = tor_strdup("EntryGuard");
line->value = tor_malloc(HEX_DIGEST_LEN+MAX_NICKNAME_LEN+2);
@@ -2785,6 +2796,11 @@
if (!get_options()->AvoidDiskWrites)
or_state_mark_dirty(get_or_state(), 0);
entry_guards_dirty = 0;
+
+ /* XXX: Is this the place to decide that we no longer have any reachable
+ guards? */
+ if (!have_reachable_entry)
+ or_port_hist_search_again();
}
/** If <b>question</b> is the string "entry-guards", then dump

View File

@ -1,104 +0,0 @@
Filename: 157-specific-cert-download.txt
Title: Make certificate downloads specific
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: 2-Dec-2008
Status: Accepted
Target: 0.2.1.x
History:
2008 Dec 2, 22:34
Changed name of cross certification field to match the other authority
certificate fields.
Status:
As of 0.2.1.9-alpha:
Cross-certification is implemented for new certificates, but not yet
required. Directories support the tor/keys/fp-sk urls.
Overview:
Tor's directory specification gives two ways to download a certificate:
by its identity fingerprint, or by the digest of its signing key. Both
are error-prone. We propose a new download mechanism to make sure that
clients get the certificates they want.
Motivation:
When a client wants a certificate to verify a consensus, it has two choices
currently:
- Download by identity key fingerprint. In this case, the client risks
getting a certificate for the same authority, but with a different
signing key than the one used to sign the consensus.
- Download by signing key fingerprint. In this case, the client risks
getting a forged certificate that contains the right signing key
signed with the wrong identity key. (Since caches are willing to
cache certs from authorities they do not themselves recognize, the
attacker wouldn't need to compromise an authority's key to do this.)
Current solution:
Clients fetch by identity keys, and re-fetch with backoff if they don't get
certs with the signing key they want.
Proposed solution:
Phase 1: Add a URL type for clients to download certs by identity _and_
signing key fingerprint. Unless both fields match, the client doesn't
accept the certificate(s). Clients begin using this method when their
randomly chosen directory cache supports it.
Phase 1A: Simultaneously, add a cross-certification element to
certificates.
Phase 2: Once many directory caches support phase 1, clients should prefer
to fetch certificates using that protocol when available.
Phase 2A: Once all authorities are generating cross-certified certificates
as in phase 1A, require cross-certification.
Specification additions:
The key certificate whose identity key fingerprint is <F> and whose signing
key fingerprint is <S> should be available at:
http://<hostname>/tor/keys/fp-sk/<F>-<S>.z
As usual, clients may request multiple certificates using:
http://<hostname>/tor/keys/fp-sk/<F1>-<S1>+<F2>-<S2>.z
Clients SHOULD use this format whenever they know both key fingerprints for
a desired certificate.
Certificates SHOULD contain the following field (at most once):
"dir-key-crosscert" NL CrossSignature NL
where CrossSignature is a signature, made using the certificate's signing
key, of the digest of the PKCS1-padded hash of the certificate's identity
key. For backward compatibility with broken versions of the parser, we
wrap the base64-encoded signature in -----BEGIN ID SIGNATURE---- and
-----END ID SIGNATURE----- tags. (See bug 880.) Implementations MUST allow
the "ID " portion to be omitted, however.
When encountering a certificate with a dir-key-crosscert entry,
implementations MUST verify that the signature is a correct signature of
the hash of the identity key using the signing key.
(In a future version of this specification, dir-key-crosscert entries will
be required.)
Why cross-certify too?
Cross-certification protects clients who haven't updated yet, by reducing
the number of caches that are willing to hold and serve bogus certificates.
References:
This is related to part 2 of bug 854.

View File

@ -1,207 +0,0 @@
Filename: 158-microdescriptors.txt
Title: Clients download consensus + microdescriptors
Version: $Revision$
Last-Modified: $Date$
Author: Roger Dingledine
Created: 17-Jan-2009
Status: Open
1. Overview
This proposal replaces section 3.2 of proposal 141, which was
called "Fetching descriptors on demand". Rather than modifying the
circuit-building protocol to fetch a server descriptor inline at each
circuit extend, we instead put all of the information that clients need
either into the consensus itself, or into a new set of data about each
relay called a microdescriptor. The microdescriptor is a direct
transform from the relay descriptor, so relays don't even need to know
this is happening.
Descriptor elements that are small and frequently changing should go
in the consensus itself, and descriptor elements that are small and
relatively static should go in the microdescriptor. If we ever end up
with descriptor elements that aren't small yet clients need to know
them, we'll need to resume considering some design like the one in
proposal 141.
2. Motivation
See
http://archives.seul.org/or/dev/Nov-2008/msg00000.html and
http://archives.seul.org/or/dev/Nov-2008/msg00001.html and especially
http://archives.seul.org/or/dev/Nov-2008/msg00007.html
for a discussion of the options and why this is currently the best
approach.
3. Design
There are three pieces to the proposal. First, authorities will list in
their votes (and thus in the consensus) what relay descriptor elements
are included in the microdescriptor, and also list the expected hash
of microdescriptor for each relay. Second, directory mirrors will serve
microdescriptors. Third, clients will ask for them and cache them.
3.1. Consensus changes
V3 votes should include a new line:
microdescriptor-elements bar baz foo
listing each descriptor element (sorted alphabetically) that authority
included when it calculated its expected microdescriptor hashes.
We also need to include the hash of each expected microdescriptor in
the routerstatus section. I suggest a new "m" line for each stanza,
with the base64 of the hash of the elements that the authority voted
for above.
The consensus microdescriptor-elements and "m" lines are then computed
as described in Section 3.1.2 below.
I believe that means we need a new consensus-method "6" that knows
how to compute the microdescriptor-elements and add "m" lines.
3.1.1. Descriptor elements to include for now
To start, the element list that authorities suggest should be
family onion-key
(Note that the or-dev posts above only mention onion-key, but if
we don't also include family then clients will never learn it. It
seemed like it should be relatively static, so putting it in the
microdescriptor is smarter than trying to fit it into the consensus.)
We could imagine a config option "family,onion-key" so authorities
could change their voted preferences without needing to upgrade.
3.1.2. Computing consensus for microdescriptor-elements and "m" lines
One approach is for the consensus microdescriptor-elements line to
include every element listed by a majority of authorities, sorted. The
problem here is that it will no longer be deterministic what the correct
hash for the "m" line should be. We could imagine telling the authority
to go look in its descriptor and produce the right hash itself, but
we don't want consensus calculation to be based on external data like
that. (Plus, the authority may not have the descriptor that everybody
else voted to use.)
The better approach is to take the exact set that has the most votes
(breaking ties by the set that has the most elements, and breaking
ties after that by whichever is alphabetically first). That will
increase the odds that we actually get a microdescriptor hash that
is both a) for the descriptor we're putting in the consensus, and b)
over the elements that we're declaring it should be for.
Then the "m" line for a given relay is the one that gets the most votes
from authorities that both a) voted for the microdescriptor-elements
line we're using, and b) voted for the descriptor we're using.
(If there's a tie, use the smaller hash. But really, if there are
multiple such votes and they differ about a microdescriptor, we caught
one of them lying or being buggy. We should log it to track down why.)
If there are no such votes, then we leave out the "m" line for that
relay. That means clients should avoid it for this time period. (As
an extension it could instead mean that clients should fetch the
descriptor and figure out its microdescriptor themselves. But let's
not get ahead of ourselves.)
It would be nice to have a more foolproof way to agree on what
microdescriptor hash each authority should vote for, so we can avoid
missing "m" lines. Just switching to a new consensus-method each time
we change the set of microdescriptor-elements won't help though, since
each authority will still have to decide what hash to vote for before
knowing what consensus-method will be used.
Here's one way we could do it. Each vote / consensus includes
the microdescriptor-elements that were used to compute the hashes,
and also a preferred-microdescriptor-elements set. If an authority
has a consensus from the previous period, then it should use the
consensus preferred-microdescriptor-elements when computing its votes
for microdescriptor-elements and the appropriate hashes in the upcoming
period. (If it has no previous consensus, then it just writes its
own preferences in both lines.)
3.2. Directory mirrors serve microdescriptors
Directory mirrors should then read the microdescriptor-elements line
from the consensus, and learn how to answer requests. (Directory mirrors
continue to serve normal relay descriptors too, a) to serve old clients
and b) to be able to construct microdescriptors on the fly.)
The microdescriptors with hashes <D1>,<D2>,<D3> should be available at:
http://<hostname>/tor/micro/d/<D1>+<D2>+<D3>.z
All the microdescriptors from the current consensus should also be
available at:
http://<hostname>/tor/micro/all.z
so a client that's bootstrapping doesn't need to send a 70KB URL just
to name every microdescriptor it's looking for.
The format of a microdescriptor is the header line
"microdescriptor-header"
followed by each element (keyword and body), alphabetically. There's
no need to mention what hash it's for, since it's self-identifying:
you can hash the elements to learn this.
(Do we need a footer line to show that it's over, or is the next
microdescriptor line or EOF enough of a hint? A footer line wouldn't
hurt much. Also, no fair voting for the microdescriptor-element
"microdescriptor-header".)
The hash of the microdescriptor is simply the hash of the concatenated
elements -- not counting the header line or hypothetical footer line.
Unless you prefer that?
Is there a reasonable way to version these things? We could say that
the microdescriptor-header line can contain arguments which clients
must ignore if they don't understand them. Any better ways?
Directory mirrors should check to make sure that the microdescriptors
they're about to serve match the right hashes (either the hashes from
the fetch URL or the hashes from the consensus, respectively).
We will probably want to consider some sort of smart data structure to
be able to quickly convert microdescriptor hashes into the appropriate
microdescriptor. Clients will want this anyway when they load their
microdescriptor cache and want to match it up with the consensus to
see what's missing.
3.3. Clients fetch them and cache them
When a client gets a new consensus, it looks to see if there are any
microdescriptors it needs to learn. If it needs to learn more than
some threshold of the microdescriptors (half?), it requests 'all',
else it requests only the missing ones.
Clients maintain a cache of microdescriptors along with metadata like
when it was last referenced by a consensus. They keep a microdescriptor
until it hasn't been mentioned in any consensus for a week. Future
clients might cache them for longer or shorter times.
3.3.1. Information leaks from clients
If a client asks you for a set of microdescs, then you know she didn't
have them cached before. How much does that leak? What about when
we're all using our entry guards as directory guards, and we've seen
that user make a bunch of circuits already?
Fetching "all" when you need at least half is a good first order fix,
but might not be all there is to it.
Another future option would be to fetch some of the microdescriptors
anonymously (via a Tor circuit).
4. Transition and deployment
Phase one, the directory authorities should start voting on
microdescriptors and microdescriptor elements, and putting them in the
consensus. This should happen during the 0.2.1.x series, and should
be relatively easy to do.
Phase two, directory mirrors should learn how to serve them, and learn
how to read the consensus to find out what they should be serving. This
phase could be done either in 0.2.1.x or early in 0.2.2.x, depending
on how messy it turns out to be and how quickly we get around to it.
Phase three, clients should start fetching and caching them instead
of normal descriptors. This should happen post 0.2.1.x.

View File

@ -1,144 +0,0 @@
Filename: 159-exit-scanning.txt
Title: Exit Scanning
Version: $Revision$
Last-Modified: $Date$
Author: Mike Perry
Created: 13-Feb-2009
Status: Open
Overview:
This proposal describes the implementation and integration of an
automated exit node scanner for scanning the Tor network for malicious,
misconfigured, firewalled or filtered nodes.
Motivation:
Tor exit nodes can be run by anyone with an Internet connection. Often,
these users aren't fully aware of limitations of their networking
setup. Content filters, antivirus software, advertisements injected by
their service providers, malicious upstream providers, and the resource
limitations of their computer or networking equipment have all been
observed on the current Tor network.
It is also possible that some nodes exist purely for malicious
purposes. In the past, there have been intermittent instances of
nodes spoofing SSH keys, as well as nodes being used for purposes of
plaintext surveillance.
While it is not realistic to expect to catch extremely targeted or
completely passive malicious adversaries, the goal is to prevent
malicious adversaries from deploying dragnet attacks against large
segments of the Tor userbase.
Scanning methodology:
The first scans to be implemented are HTTP, HTML, Javascript, and
SSL scans.
The HTTP scan scrapes Google for common filetype urls such as exe, msi,
doc, dmg, etc. It then fetches these urls through Non-Tor and Tor, and
compares the SHA1 hashes of the resulting content.
The SSL scan downloads certificates for all IPs a domain will locally
resolve to and compares these certificates to those seen over Tor. The
scanner notes if a domain had rotated certificates locally in the
results for each scan.
The HTML scan checks HTML, Javascript, and plugin content for
modifications. Because of the dynamic nature of most of the web, the
scanner has a number of mechanisms built in to filter out false
positives that are used when a change is noticed between Tor and
Non-Tor.
All tests also share a URL-based false positive filter that
automatically removes results retroactively if the number of failures
exceeds a certain percentage of nodes tested with the URL.
Deployment Stages:
To avoid instances where bugs cause us to mark exit nodes as BadExit
improperly, it is proposed that we begin use of the scanner in stages.
1. Manual Review:
In the first stage, basic scans will be run by a small number of
people while we stabilize the scanner. The scanner has the ability
to resume crashed scans, and to rescan nodes that fail various
tests.
2. Human Review:
In the second stage, results will be automatically mailed to
an email list of interested parties for review. We will also begin
classifying failure types into three to four different severity
levels, based on both the reliability of the test and the nature of
the failure.
3. Automatic BadExit Marking:
In the final stage, the scanner will begin marking exits depending
on the failure severity level in one of three different ways: by
node idhex, by node IP, or by node IP mask. A potential fourth, less
severe category of results may still be delivered via email only for
review.
BadExit markings will be delivered in batches upon completion
of whole-network scans, so that the final false positive
filter has an opportunity to filter out URLs that exhibit
dynamic content beyond what we can filter.
Specification of Exit Marking:
Technically, BadExit could be marked via SETCONF AuthDirBadExit over
the control port, but this would allow full access to the directory
authority configuration and operation.
The approved-routers file could also be used, but currently it only
supports fingerprints, and it also contains other data unrelated to
exit scanning that would be difficult to coordinate.
Instead, we propose that a new badexit-routers file that has three
keywords:
BadExitNet 1*[exitpattern from 2.3 in dir-spec.txt]
BadExitFP 1*[hexdigest from 2.3 in dir-spec.txt]
BadExitNet lines would follow the codepaths used by AuthDirBadExit to
set authdir_badexit_policy, and BadExitFP would follow the codepaths
from approved-router's !badexit lines.
The scanner would have exclusive ability to write, append, rewrite,
and modify this file. Prior to building a new consensus vote, a
participating Tor authority would read in a fresh copy.
Security Implications:
Aside from evading the scanner's detection, there are two additional
high-level security considerations:
1. Ensure nodes cannot be marked BadExit by an adversary at will
It is possible individual website owners will be able to target certain
Tor nodes, but once they begin to attempt to fail more than the URL
filter percentage of the exits, their sites will be automatically
discarded.
Failing specific nodes is possible, but scanned results are fully
reproducible, and BadExits should be rare enough that humans are never
fully removed from the loop.
State (cookies, cache, etc) does not otherwise persist in the scanner
between exit nodes to enable one exit node to bias the results of a
later one.
2. Ensure that scanner compromise does not yield authority compromise
Having a separate file that is under the exclusive control of the
scanner allows us to heavily isolate the scanner from the Tor
authority, potentially even running them on separate machines.

View File

@ -1,39 +0,0 @@
Notes on an auto updater:
steve wants a "latest" symlink so he can always just fetch that.
roger worries that this will exacerbate the "what version are you
using?" "latest." problem.
weasel suggests putting the latest recommended version in dns. then
we don't have to hit the website. it's got caching, it's lightweight,
it scales. just put it in a TXT record or something.
but, no dnssec.
roger suggests a file on the https website that lists the latest
recommended version (or filename or url or something like that).
(steve seems to already be doing this with xerobank. he additionally
suggests a little blurb that can be displayed to the user to describe
what's new.)
how to verify you're getting the right file?
a) it's https.
b) ship with a signing key, and use some openssl functions to verify.
c) both
andrew reminds us that we have a "recommended versions" line in the
consensus directory already.
if only we had some way to point out the "latest stable recommendation"
from this list. we could list it first, or something.
the recommended versions line also doesn't take into account which
packages are available -- e.g. on Windows one version might be the best
available, and on OS X it might be a different one.
aren't there existing solutions to this? surely there is a beautiful,
efficient, crypto-correct auto updater lib out there. even for windows.

View File

@ -1,174 +0,0 @@
How to hand out bridges.
Divide bridges into 'strategies' as they come in. Do this uniformly
at random for now.
For each strategy, we'll hand out bridges in a different way to
clients. This document describes two strategies: email-based and
IP-based.
0. Notation:
HMAC(k,v) : an HMAC of v using the key k.
A|B: The string A concatenated with the string B.
1. Email-based.
Goal: bootstrap based on one or more popular email service's sybil
prevention algorithms.
Parameters:
HMAC -- an HMAC function
P -- a time period
K -- the number of bridges to send in a period.
Setup: Generate two nonces, N and M.
As bridges arrive, put them into a ring according to HMAC(N,ID)
where ID is the bridges's identity digest.
Divide time into divisions of length P.
When we get an email:
If it's not from a supported email service, reject it.
If we already sent a response to that email address (normalized)
in this period, send _exactly_ the same response.
If it is from a supported service, generate X = HMAC(M,PS|E) where E
is the lowercased normalized email address for the user, and
where PS is the start of the currrent period. Send
the first K bridges in the ring after point X.
[If we want to make sure that repeat queries are given exactly the
same results, then we can't let the ring change during the
time period. For a long time period like a month, that's quite a
hassle. How about instead just keeping a replay cache of addresses
that have been answered, and sending them a "sorry, you already got
your addresses for the time period; perhaps you should try these
other fine distribution strategies while you wait?" response? This
approach would also resolve the "Make sure you can't construct a
distinct address to match an existing one" note below. -RD]
[I think, if we get a replay, we need to send back the same
answer as we did the first time, not say "try again."
Otherwise we need to worry that an attacker can keep people
from getting bridges by preemtively asking for them,
or that an attacker may force them to prove they haven't
gotten any bridges by asking. -NM]
[While we're at it, if we do the replay cache thing and don't need
repeatable answers, we could just pick K random answers from the
pool. Is it beneficial that a bridge user who knows about a clump of
nodes will be sharing them with other users who know about a similar
(overlapping) clump? One good aspect is against an adversary who
learns about a clump this way and watches those bridges to learn
other users and discover *their* bridges: he doesn't learn about
as many new bridges as he might if they were randomly distributed.
A drawback is against an adversary who happens to pick two email
addresses in P that include overlapping answers: he can measure
the difference in clumps and estimate how quickly the bridge pool
is growing. -RD]
[Random is one more darn thing to implement; rings are already
there. -NM]
[If we make the period P be mailbox-specific, and make it a random
value around some mean, then we make it harder for an attacker to
know when to try using his small army of gmail addresses to gather
another harvest. But we also make it harder for users to know when
they can try again. -RD]
[Letting the users know about when they can try again seems
worthwhile. Otherwise users and attackers will all probe and
probe and probe until they get an answer. No additional
security will be achieved, but bandwidth will be lost. -NM]
To normalize an email address:
Start with the RFC822 address. Consider only the mailbox {???}
portion of the address (username@domain). Put this into lowercase
ascii.
Questions:
What to do with weird character encodings? Look up the RFC.
Notes:
Make sure that you can't force a single email address to appear
in lots of different ways. IOW, if nickm@freehaven.net and
NICKM@freehaven.net aren't treated the same, then I can get lots
more bridges than I should.
Make sure you can't construct a distinct address to match an
existing one. IOW, if we treat nickm@X and nickm@Y as the same
user, then anybody can register nickm@Z and use it to tell which
bridges nickm@X got (or would get).
Make sure that we actually check headers so we can't be trivially
used to spam people.
2. IP-based.
Goal: avoid handing out all the bridges to users in a similar IP
space and time.
Parameters:
T_Flush -- how long it should take a user on a single network to
see a whole cluster of bridges.
N_C
K -- the number of bridges we hand out in response to a single
request.
Setup: using an AS map or a geoip map or some other flawed input
source, divide IP space into "areas" such that surveying a large
collection of "areas" is hard. For v0, use /24 address blocks.
Group areas into N_C clusters.
Generate secrets L, M, N.
Set the period P such that P*(bridges-per-cluster/K) = T_flush.
Don't set P to greater than a week, or less than three hours.
When we get a bridge:
Based on HMAC(L,ID), assign the bridge to a cluster. Within each
cluster, keep the bridges in a ring based on HMAC(M,ID).
[Should we re-sort the rings for each new time period, so the ring
for a given cluster is based on HMAC(M,PS|ID)? -RD]
When we get a connection:
If it's http, redirect it to https.
Let area be the incoming IP network. Let PS be the current
period. Compute X = HMAC(N, PS|area). Return the next K bridges
in the ring after X.
[Don't we want to compute C = HMAC(key, area) to learn what cluster
to answer from, and then X = HMAC(key, PS|area) to pick a point in
that ring? -RD]
Need to clarify that some HMACs are for rings, and some are for
partitions. How rings scale is clear. How do we grow the number of
partitions? Looking at successive bits from the HMAC output is one way.
3. Open issues
Denial of service attacks
A good view of network topology
at some point we should learn some reliability stats on our bridges. when
we say above 'give out k bridges', we might give out 2 reliable ones and
k-2 others. we count around the ring the same way we do now, to find them.

View File

@ -1,44 +0,0 @@
Author: Geoff Goodell
Title: Allow controller to manage circuit extensions
Date: 12 March 2006
History:
This was once bug 268. Moving it into the proposal system for posterity.
Test:
Tor controllers should have a means of learning more about circuits built
through Tor routers. Specifically, if a Tor controller is connected to a Tor
router, it should be able to subscribe to a new class of events, perhaps
"onion" or "router" events. A Tor router SHOULD then ensure that the
controller is informed:
(a) (NEW) when it receives a connection from some other location, in which
case it SHOULD indicate (1) a unique identifier for the circuit, and (2) a
ServerID in the event of an OR connection from another Tor router, and
Hostname otherwise.
(b) (REQUEST) when it receives a request to extend an existing circuit to a
successive Tor router, in which case it SHOULD provide (1) the unique
identifier for the circuit, (2) a Hostname (or, if possible, ServerID) of the
previous Tor router in the circuit, and (3) a ServerID for the requested
successive Tor router in the circuit;
(c) (EXTEND) Tor will attempt to extend the circuit to some other router, in
which case it SHOULD provide the same fields as provided for REQUEST.
(d) (SUCCEEDED) The circuit has been successfully extended to some ther
router, in which case it SHOULD provide the same fields as provided for
REQUEST.
We also need a new configuration option analogous to _leavestreamsunattached,
specifying whether the controller is to manage circuit extensions or not.
Perhaps we can call it "_leavecircuitsunextended". When set to 0, Tor
manages everything as usual. When set to 1, a circuit received by the Tor
router cannot transition from "REQUEST" to "EXTEND" state without being
directed by a new controller command. The controller command probably does
not need any arguments, since circuits are extended per client source
routing, and all that the controller does is accept or reject the extension.
This feature can be used as a basis for enforcing routing policy.

View File

@ -1,44 +0,0 @@
1. Scanning process
A. Non-HTML/JS HTTP mime types compared via SHA1 hash
B. Dynamic HTTP content filtered at 4 levels:
1. IP change+Tor cookie utilization
- Tor cookies replayed with new IP in case of changes
2. HTML Tag+Attribute+JS comparison
- Comparisons made based only on "relevant" HTML tags
and attributes
3. HTML Tag+Attribute+JS diffing
- Tags, attributes and JS AST nodes that change during
Non-Tor fetches pruned from comparison
4. URLS with > N% of node failures removed
- results purged from filesystem at end of scan loop
C. SSL scanning handles some forms of dynamic certs
1. Catalogs certs for all IPs resolved locally
by getaddrinfo over the duration of the scan.
- Updated each test.
2. If the domain presents a new cert for each IP, this
is noted on the failure result for the node
3. If the same IP presents two different certs locally,
the cert list is first refreshed, and if it happens
again, discarded
4. A N% node failure filter also applies
D. Scanner can be restarted from any point in the event
of scanner or system crashes, or graceful shutdown.
- Results+scan state pickled to filesystem continuously
2. Cron job checks results periodically for reporting
A. Divide failures into three types of BadExit based on type
and frequency over time and incident rate
B. write reject lines to approved-routers for those three types:
1. ID Hex based (for misconfig/network problems easily fixed)
2. IP based (for content modification)
3. IP+mask based (for continuous/egregious content modification)
C. Emails results to tor-scanners@freehaven.net
3. Human Review and Appeal
A. ID Hex-based BadExit is meant to be possible to removed easily
without needing to beg us.
- Should this behavior be encouraged?
B. Optionally can reserve IP based badexits for human review
1. Results are encapsulated fully on the filesystem and can be
reviewed without network access
2. Soat has --rescan to rescan failed nodes from a data directory
- New set of URLs used

View File

@ -1,137 +0,0 @@
Abstract
This document explains how to tell about how many Tor users there
are, and how many there are in which country. Statistics are
involved.
Motivation
There are a few reasons we need to keep track of which countries
Tor users (in aggregate) are coming from:
- Resource allocation. Knowing about underserved countries with
lots of users can let us know about where we need to direct
translation and outreach efforts.
- Anticensorship. Sudden drops in usage on a national basis can
indicate the arrival of a censorious firewall.
- Sponsor outreach and self-evalutation. Many people and
organizations who are interested in funding The Tor Project's
work want to know that we're successfully serving parts of the
world they're interested in, and that efforts to expand our
userbase are actually succeeding. So do we.
Goals
We want to know approximately how many Tor users there are, and which
countries they're in, even in the presence of a hypothetical
"directory guard" feature. Some uncertainty is okay, but we'd like
to be able to put a bound on the uncertainty.
We need to make sure this information isn't exposed in a way that
helps an adversary.
Methods for current clients:
Every client downloads network status documents. There are
currently three methods (one hypothetical) for clients to get them.
- 0.1.2.x clients (and earlier) fetch a v2 networkstatus
document about every NETWORKSTATUS_CLIENT_DL_INTERVAL [30
minutes].
- 0.2.0.x clients fetch a v3 networkstatus consensus document
at a random interval between when their current document is no
longer freshest, and when their current document is about to
expire.
[In both of the above cases, clients choose a running
directory cache at random with odds roughly proportional to
its bandwidth. If they're just starting, they know a XXXX FIXME -NM]
- In some future version, clients will choose directory caches
to serve as their "directory guards" to avoid profiling
attacks, similarly to how clients currently start all their
circuits at guard nodes.
We assume that a directory cache can tell which of these three
categories a client is in by the format of its status request.
A directory cache can be made to count distinct client IP
addresses that make a certain request of it in a given timeframe,
and total requests made to it over that timeframe. For the first
two cases, a cache can get a picture of the overall
number and countries of users in the network by dividing the IP
count by the probability with which they (as a cache) would be
chosen. Assuming that our listed bandwidth is such that we expect
to be chosen with probability P for any given request, and we've
been counting IPs for long enough that we expect the average
client to have made N requests, they will have visited us at least
once with probability P' = 1-(1-P)^N, and so we divide the IP
counts we've seen by P' for our estimate. To estimate total
number of clients of a given type, determine how many requests a
client of that type will make over that time, and assume we'll
have seen P of them.
Both of these numbers are useful: the IP counts will give the
total number of IPs connecting to the network, and the request
counts will give the total number of users on the network at any
given time.
Notes:
- [Over H hours, the N for V2 clients is 2*H, and the N for V3
clients is currently around H/2 or H/3.]
- (We should only count requests that we actually intend to answer;
503 requests shouldn't count.)
- These measurements should also be taken at a directory
authority if possible: their picture of the network is skewed
by clients that fetch from them directly. These clients,
however, are all the clients that are just bootstrapping
(assuming that the fallback-consensus feature isn't yet used
much).
- These measurements also overestimate the V2 download rate if
some downloads fail and clients retry them later after backing
off.
Methods for directory guards:
If directory guards are in use, directory guards get a picture of
all those users who chose them as a guard when they were listed
as a good choice for a guard, and who are also on the network
now. The cleanest data here will come from nodes that were listed
as good new-guards choices for a while, and have not been so for a
while longer (to study decay rates); nodes that have been listed
as good new-guard choices consistently for a long time (to get a
sample of the network); and nodes that have been listed as good
new-guard choices only recently (to get a sample of new users and
users whose guards have died out.)
Since directory guards are currently unspecified, we'll need to
make some guesses about how they'll turn out to work. Here are
a couple of approaches that could work.
- We could have clients pick completely new directory guards on
a rolling basis every two months or so. This would ensure
that staying as a guard for a while would be sufficient to
see a sample of users. This is potentially advantageous for
load-balancing the network as well, though it might lose some
of the benefits of directory guard. We need to quantify the
impact of this; it might not actually make stuff worse in
practice, if most guards don't stay good guards for a month
or two.
- We could try to collect statistics at several directory
guards and combine their statisics, but we would need to make
sure that for all time, at least one of the directory guards
had been recommended as a good choice for new guards. By
looking at new-IP rates for guards, we could get an idea of
user uptake; for looking at old-IP decay rates, we could get
an idea of turnover. This approach would entail significant
complexity, and we'd probably need to record more information
than we'd really like to.

View File

@ -1,97 +0,0 @@
Right now as I understand it, there are n big scaling problems heading
our way:
1) Clients need to learn all the relay descriptors they could use. That's
a lot of bytes through a potentially small pipe.
2) Relays need to hold open TCP connections to most other relays.
3) Clients need to learn the whole networkstatus. Even using v3, as
the network grows that will become unwieldy.
4) Dir mirrors need to mirror all the relay descriptors; eventually this
will get big too.
Here's my plan.
--------------------------------------------------------------------
Piece one: download O(1) descriptors rather than O(n) descriptors.
We need to change our circuit extend protocol so it fetches a relay
descriptor at every 'extend' operation:
- Client fetches networkstatus, picks guards, connects to one.
- Client picks middle hop out of networkstatus, asks guard for
its descriptor, then extends to it.
- Clients picks exit hop out of networkstatus, asks middle hop
for its descriptor, then extends to it. Done.
The client needs to ask for the descriptor even if it already has a
copy, because otherwise we leak too much. Also, the descriptor needs to
be padded to some large (but not too large) size to prevent the middle
hops from guessing about it.
The first step towards this is to instrument the current code to see
how much of a win this would actually be -- I am guessing it is already
a win even with the current number of descriptors.
We also would need to assign the 'Exit' flag more usefully, and make
clients pay attention to it when picking their last hop, since they
don't actually know the exit policies of the relays they're choosing from.
We also need to think harder about other implications -- for example,
a relay with a tiny exit policy won't get the Exit flag, and thus won't
ever get picked as an exit relay. Plus, our "enclave exit" model is out
the window unless we figure out a cool trick.
More generally, we'll probably want to compress the descriptors that we
send back; maybe 8k is a good upper bound? I wonder if we could ask for
several descriptors, and bundle back all of the ones that fit in the 8k?
We'd also want to put the load balancing weights into the networkstatus,
so clients can choose fast nodes more often without needing to see the
descriptors. This is a good opportunity for the authorities to be able
to put "more accurate" weights in if they learn to detect attacks. It
also means we should consider running automated audits to make sure the
authorities aren't trying to snooker everybody.
I'm aiming to get Peter Palfrader to tackle this problem in mid 2008,
but I bet he could use some help.
--------------------------------------------------------------------
Piece two: inter-relay communication uses UDP
If relays send packets to/from other relays via UDP, they don't need a
new descriptor for each such link. Thus we'll still need to keep state
for each link, but we won't max out on sockets.
Clearly a lot more work needs to be done here. Ian Goldberg has a student
who has been working on it, and if all goes well we'll be chipping in
some funding to continue that. Also, Camilo Viecco has been doing his
PhD thesis on it.
--------------------------------------------------------------------
Piece three: networkstatus documents get partitioned
While the authorities should be expected to be able to handle learning
about all the relays, there's no reason the clients or the mirrors need
to. Authorities should put a cap on the number of relays listed in a
single networkstatus, and split them when they get too big.
We'd need a good way to have each authority come to the same conclusion
about which partition a given relay goes into.
Directory mirrors would then mirror all the relay descriptors in their
partition. This is compatible with 'piece one' above, since clients in
a given partition will only ask about descriptors in that partition.
More complex versions of this design would involve overlapping partitions,
but that would seem to start contradicting other parts of this proposal
right quick.
Nobody is working on this piece yet. It's hard to say when we'll need
it, but it would be nice to have some more thought on it before the week
that we need it.
--------------------------------------------------------------------

View File

@ -1,39 +0,0 @@
Filename: xxx-hide-platform.txt
Title: Hide Tor Platform Information
Version: $Revision$
Last-Modified: $Date$
Author: Jacob Appelbaum
Created: 24-July-2008
Status: Draft
Hiding Tor Platform Information
0.0 Introduction
The current Tor program publishes its specific Tor version and related OS
platform information. This information could be misused by an attacker.
0.1 Current Implementation
Currently, the Tor binary sends data that looks like the following:
Tor 0.2.0.26-rc (r14597) on Darwin Power Macintosh
Tor 0.1.2.19 on Windows XP Service Pack 3 [workstation] {terminal services,
single user}
1.0 Suggested changes
It would be useful to allow a user to configure the disclosure of such
information. Such a change would be an option in the torrc file like so:
HidePlatform Yes
1.1 Suggested default behavior in the future
If a user would like to disclose this information, they could configure their
Tor to do so.
HidePlatform No

View File

@ -1,93 +0,0 @@
Filename: xxx-port-knocking.txt
Title: Port knocking for bridge scanning resistance
Version: $Revision$
Last-Modified: $Date$
Author: Jacob Appelbaum
Created: 19-April-2009
Status: Draft
Port knocking for bridge scanning resistance
0.0 Introduction
This document is a collection of ideas relating to improving scanning
resistance for private bridge relays. This is intented to stop opportunistic
network scanning and subsequent discovery of private bridge relays.
0.1 Current Implementation
Currently private bridges are only hidden by their obscurity. If you know
a bridge ip address, the bridge can be detected trivially and added to a block
list.
0.2 Configuring an external port knocking program to control the firewall
It is currently possible for bridge operators to configure a port knocking
daemon that controls access to the incoming OR port. This is currently out of
scope for Tor and Tor configuration. This process requires the firewall to know
the current nodes in the Tor network.
1.0 Suggested changes
Private bridge operators should be able to configure a method of hiding their
relay. Only authorized users should be able to communicate with the private
bridge. This should be done with Tor and if possible without the help of the
firewall. It should be possible for a Tor user to enter a secret key into
Tor or optionally Vidalia on a per bridge basis. This secret key should be
used to authenticate the bridge user to the private bridge.
1.x Issues with low ports and bind() for ORPort
Tor opens low numbered ports during startup and then drops privileges. It is
no longer possible to rebind to those lower ports after they are closed.
1.x Issues with OS level packet filtering
Tor does not know about any OS level packet filtering. Currently there is no
packet filters that understands the Tor network in real time.
1.x Possible partioning of users by bridge operator
Depending on implementation, it may be possible for bridge operators to
uniquely identify users. This appears to be a general bridge issue when a
bridge operator uniquely deploys bridges per user.
2.0 Implementation ideas
This is a suggested set of methods for port knocking.
2.x Using SPA port knocking
Single Packet Authentication port knocking encodes all required data into a
single UDP packet. Improperly formatted packets may be simply discarded.
Properly formatted packets should be processed and appropriate actions taken.
2.x Using DNS as a transport for SPA
It should be possible for Tor to bind to port 53 at startup and merely drop all
packets that are not valid. UDP does not require a response and invalid packets
will not trigger a response from Tor. With base32 encoding it should be
possible to encode SPA as valid DNS requests. This should allow use of the
public DNS infrastructure for authorization requests if desired.
2.x Ghetto firewalling with opportunistic connection closing
Until a user has authenticated with Tor, Tor only has a UDP listener. This
listener should never send data in response, it should only open an ORPort
when a user has successfully authenticated. After a user has authenticated
with Tor to open an ORPort, only users who have authenticated will be able
to use it. All other users as identified by their ip address will have their
connection closed before any data is sent or received. This should be
accomplished with an access policy. By default, the access policy should block
all access to the ORPort.
2.x Timing and reset of access policies
Access to the ORPort is sensitive. The bridge should remove any exceptions
to its access policy regularly when the ORPort is unused. Valid users should
reauthenticate if they do not use the ORPort within a given time frame.
2.x Additional considerations
There are many. A format of the packet and the crypto involved is a good start.

View File

@ -1,63 +0,0 @@
1. Overview
We should rate limit the volume of stream creations at exits:
2.1. Per-circuit limits
If a given circuit opens more than N streams in X seconds, further
stream requests over the next Y seconds should fail with the reason
'resourcelimit'. Clients will automatically notice this and switch to
a new circuit.
The goal is to limit the effects of port scans on a given exit relay,
so the relay's ISP won't get hassled as much.
First thoughts for parameters would be N=100 streams in X=5 seconds
causes 30 seconds of fails; and N=300 streams in X=30 seconds causes
30 seconds of fails.
We could simplify by, instead of having a "for 30 seconds" parameter,
just marking the circuit as forever failing new requests. (We don't want
to just close the circuit because it may still have open streams on it.)
2.2. Per-destination limits
If a given circuit opens more than N1 streams in X seconds to a single
IP address, or all the circuits combined open more than N2 streams,
then we should fail further attempts to reach that address for a while.
The goal is to limit the abuse that Tor exit relays can dish out
to a single target either for socket DoS or for web crawling, in
the hopes of a) not triggering their automated defenses, and b) not
making them upset at Tor. Hopefully these self-imposed bans will be
much shorter-lived than bans or barriers put up by the websites.
3. Issues
3.1. Circuit-creation overload
Making clients move to new circuits more often will cause more circuit
creation requests.
3.2. How to pick the parameters?
If we pick the numbers too low, then popular sites are effectively
cut out of Tor. If we pick them too high, we don't do much good.
Worse, picking them wrong isn't easy to fix, since the deployed Tor
servers will ship with a certain set of numbers.
We could put numbers (or "general settings") in the networkstatus
consensus, and Tor exits would adapt more dynamically.
We could also have a local config option about how aggressive this
server should be with its parameters.
4. Client-side limitations
Perhaps the clients should have built-in rate limits too, so they avoid
harrassing the servers by default?
Tricky if we want to get Tor clients in use at large enclaves.

View File

@ -1,61 +0,0 @@
Filename: xxx-separate-streams-by-port.txt
Title: Separate streams across circuits by destination port
Version: $Revision$
Last-Modified: $Date$
Author: Robert Hogan
Created: 21-Oct-2008
Status: Draft
Here's a patch Robert Hogan wrote to use only one destination port per
circuit. It's based on a wishlist item Roger wrote, to never send AIM
usernames over the same circuit that we're hoping to browse anonymously
through. The remaining open question is: how many extra circuits does this
cause an ordinary user to create? My guess is not very many, but I'm wary
of putting this in until we have some better estimate. On the other hand,
not putting it in means that we have a known security flaw. Hm.
Index: src/or/or.h
===================================================================
--- src/or/or.h (revision 17143)
+++ src/or/or.h (working copy)
@@ -1874,6 +1874,7 @@
uint8_t state; /**< Current status of this circuit. */
uint8_t purpose; /**< Why are we creating this circuit? */
+ uint16_t service; /**< Port conn must have to use this circuit. */
/** How many relay data cells can we package (read from edge streams)
* on this circuit before we receive a circuit-level sendme cell asking
Index: src/or/circuituse.c
===================================================================
--- src/or/circuituse.c (revision 17143)
+++ src/or/circuituse.c (working copy)
@@ -62,10 +62,16 @@
return 0;
}
- if (purpose == CIRCUIT_PURPOSE_C_GENERAL)
+ if (purpose == CIRCUIT_PURPOSE_C_GENERAL) {
if (circ->timestamp_dirty &&
circ->timestamp_dirty+get_options()->MaxCircuitDirtiness <= now)
return 0;
+ /* If the circuit is dirty and used for services on another port,
+ then it is not suitable. */
+ if (circ->service && conn->socks_request->port &&
+ (circ->service != conn->socks_request->port))
+ return 0;
+ }
/* decide if this circ is suitable for this conn */
@@ -1351,7 +1357,9 @@
if (connection_ap_handshake_send_resolve(conn) < 0)
return -1;
}
-
+ if (conn->socks_request->port
+ && (TO_CIRCUIT(circ)->purpose == CIRCUIT_PURPOSE_C_GENERAL))
+ TO_CIRCUIT(circ)->service = conn->socks_request->port;
return 1;
}

View File

@ -1,140 +0,0 @@
Filename: xxx-what-uses-sha1.txt
Title: Where does Tor use SHA-1 today?
Version: $Revision$
Last-Modified: $Date$
Author: Nick Mathewson
Created: 30-Dec-2008
Status: Meta
Introduction:
Tor uses SHA-1 as a message digest. SHA-1 is showing its age:
theoretical attacks for finding collisions against it get better
every year or two, and it will likely be broken in practice before
too long.
According to smart crypto people, the SHA-2 functions (SHA-256, etc)
share too much of SHA-1's structure to be very good. Some people
like other hash functions; most of these have not seen enough
analysis to be widely regarded as an extra-good idea.
By 2012, the NIST SHA-3 competition will be done, and with luck we'll
have something good to switch too. But it's probably a bad idea to
wait until 2012 to figure out _how_ to migrate to a new hash
function, for two reasons:
1) It's not inconceivable we'll want to migrate in a hurry
some time before then.
2) It's likely that migrating to a new hash function will
require protocol changes, and it's easiest to make protocol
changes backward compatible if we lay the groundwork in
advance. It would suck to have to break compatibility with
a big hard-to-test "flag day" protocol change.
This document attempts to list everything Tor uses SHA-1 for today.
This is the first step in getting all the design work done to switch
to something else.
This document SHOULD NOT be a clearinghouse of what to do about our
use of SHA-1. That's better left for other individual proposals.
Why now?
The recent publication of "MD5 considered harmful today: Creating a
rogue CA certificate" by Alexander Sotirov, Marc Stevens, Jacob
Appelbaum, Arjen Lenstra, David Molnar, Dag Arne Osvik, and Benne de
Weger has reminded me that:
* You can't rely on theoretical attacks to stay theoretical.
* It's quite unpleasant when theoretical attacks become practical
and public on days you were planning to leave for vacation.
* Broken hash functions (which SHA-1 is not quite yet AFAIU)
should be dropped like hot potatoes. Failure to do so can make
one look silly.
What Tor uses hashes for today:
1. Infrastructure.
A. Our X.509 certificates are signed with SHA-1.
B. TLS uses SHA-1 (and MD5) internally to generate keys.
C. Some of the TLS ciphersuites we allow use SHA-1.
D. When we sign our code with GPG, it might be using SHA-1.
E. Our GPG keys might be authenticated with SHA-1.
F. OpenSSL's random number generator uses SHA-1, I believe.
2. The Tor protocol
A. Everything we sign, we sign using SHA-1-based OAEP-MGF1.
B. Our CREATE cell format uses SHA-1 for: OAEP padding.
C. Our EXTEND cells use SHA-1 to hash the identity key of the
target server.
D. Our CREATED cells use SHA-1 to hash the derived key data.
E. The data we use in CREATE_FAST cells to generate a key is the
length of a SHA-1.
F. The data we send back in a CREATED/CREATED_FAST cell is the length
of a SHA-1.
G. We use SHA-1 to derive our circuit keys from the negotiated g^xy value.
H. We use SHA-1 to derive the digest field of each RELAY cell, but that's
used more as a checksum than as a strong digest.
3. Directory services
A. All signatures are generated on the SHA-1 of their corresponding
documents, using PKCS1 padding.
B. Router descriptors identify their corresponding extra-info documents
by their SHA-1 digest.
C. Fingerprints in router descriptors are taken using SHA-1.
D. Fingerprints in authority certs are taken using SHA-1.
E. Fingerprints in dir-source lines of votes and consensuses are taken
using SHA-1.
F. Networkstatuses refer to routers identity keys and descriptors by their
SHA-1 digests.
G. Directory-signature lines identify which key is doing the signing by
the SHA-1 digests of the authority's signing key and its identity key.
H. The following items are downloaded by the SHA-1 of their contents:
XXXX list them
I. The following items are downloaded by the SHA-1 of an identity key:
XXXX list them too.
4. The rendezvous protocol
A. Hidden servers use SHA-1 to establish introduction points on relays,
and relays use SHA-1 to check incoming introduction point
establishment requests.
B. Hidden servers use SHA-1 in multiple places when generating hidden
service descriptors.
C. Hidden servers performing basic-type client authorization for their
services use SHA-1 when encrypting introduction points contained in
hidden service descriptors.
D. Hidden service directories use SHA-1 to check whether a given hidden
service descriptor may be published under a given descriptor
identifier or not.
E. Hidden servers use SHA-1 to derive .onion addresses of their
services.
F. Clients use SHA-1 to generate the current hidden service descriptor
identifiers for a given .onion address.
G. Hidden servers use SHA-1 to remember digests of the first parts of
Diffie-Hellman handshakes contained in introduction requests in order
to detect replays.
H. Hidden servers use SHA-1 during the Diffie-Hellman key exchange with
a connecting client.
5. The bridge protocol
XXXX write me
6. The Tor user interface
A. We log information about servers based on SHA-1 hashes of their
identity keys.
B. The controller identifies servers based on SHA-1 hashes of their
identity keys.
C. Nearly all of our configuration options that list servers allow SHA-1
hashes of their identity keys.
E. The deprecated .exit notation uses SHA-1 hashes of identity keys

View File

@ -1,117 +0,0 @@
#!/usr/bin/python
import re, os
class Error(Exception): pass
STATUSES = """DRAFT NEEDS-REVISION NEEDS-RESEARCH OPEN ACCEPTED META FINISHED
CLOSED SUPERSEDED DEAD""".split()
REQUIRED_FIELDS = [ "Filename", "Status", "Title" ]
CONDITIONAL_FIELDS = { "OPEN" : [ "Target" ],
"ACCEPTED" : [ "Target "],
"CLOSED" : [ "Implemented-In" ],
"FINISHED" : [ "Implemented-In" ] }
FNAME_RE = re.compile(r'^(\d\d\d)-.*[^\~]$')
DIR = "."
OUTFILE = "000-index.txt"
TMPFILE = OUTFILE+".tmp"
def indexed(seq):
n = 0
for i in seq:
yield n, i
n += 1
def readProposal(fn):
fields = { }
f = open(fn, 'r')
lastField = None
try:
for lineno, line in indexed(f):
line = line.rstrip()
if not line:
return fields
if line[0].isspace():
fields[lastField] += " %s"%(line.strip())
else:
parts = line.split(":", 1)
if len(parts) != 2:
raise Error("%s:%s: Neither field nor continuation"%
(fn,lineno))
else:
fields[parts[0]] = parts[1].strip()
lastField = parts[0]
return fields
finally:
f.close()
def checkProposal(fn, fields):
status = fields.get("Status")
need_fields = REQUIRED_FIELDS + CONDITIONAL_FIELDS.get(status, [])
for f in need_fields:
if not fields.has_key(f):
raise Error("%s has no %s field"%(fn, f))
if fn != fields['Filename']:
print `fn`, `fields['Filename']`
raise Error("Mismatched Filename field in %s"%fn)
if fields['Title'][-1] == '.':
fields['Title'] = fields['Title'][:-1]
status = fields['Status'] = status.upper()
if status not in STATUSES:
raise Error("I've never heard of status %s in %s"%(status,fn))
if status in [ "SUPERSEDED", "DEAD" ]:
for f in [ 'Implemented-In', 'Target' ]:
if fields.has_key(f): del fields[f]
def readProposals():
res = []
for fn in os.listdir(DIR):
m = FNAME_RE.match(fn)
if not m: continue
if not fn.endswith(".txt"):
raise Error("%s doesn't end with .txt"%fn)
num = m.group(1)
fields = readProposal(fn)
checkProposal(fn, fields)
fields['num'] = num
res.append(fields)
return res
def writeIndexFile(proposals):
proposals.sort(key=lambda f:f['num'])
seenStatuses = set()
for p in proposals:
seenStatuses.add(p['Status'])
out = open(TMPFILE, 'w')
inf = open(OUTFILE, 'r')
for line in inf:
out.write(line)
if line.startswith("====="): break
inf.close()
out.write("Proposals by number:\n\n")
for prop in proposals:
out.write("%(num)s %(Title)s [%(Status)s]\n"%prop)
out.write("\n\nProposals by status:\n\n")
for s in STATUSES:
if s not in seenStatuses: continue
out.write(" %s:\n"%s)
for prop in proposals:
if s == prop['Status']:
out.write(" %(num)s %(Title)s"%prop)
if prop.has_key('Target'):
out.write(" [for %(Target)s]"%prop)
if prop.has_key('Implemented-In'):
out.write(" [in %(Implemented-In)s]"%prop)
out.write("\n")
out.close()
os.rename(TMPFILE, OUTFILE)
try:
os.unlink(TMPFILE)
except OSError:
pass
writeIndexFile(readProposals())

View File

@ -1,768 +0,0 @@
$Id$
Tor Rendezvous Specification
0. Overview and preliminaries
Read
https://www.torproject.org/doc/design-paper/tor-design.html#sec:rendezvous
before you read this specification. It will make more sense.
Rendezvous points provide location-hidden services (server
anonymity) for the onion routing network. With rendezvous points,
Bob can offer a TCP service (say, a webserver) via the onion
routing network, without revealing the IP of that service.
Bob does this by anonymously advertising a public key for his
service, along with a list of onion routers to act as "Introduction
Points" for his service. He creates forward circuits to those
introduction points, and tells them about his public key. To
connect to Bob, Alice first builds a circuit to an OR to act as
her "Rendezvous Point." She then connects to one of Bob's chosen
introduction points, optionally provides authentication or
authorization information, and asks it to tell him about her Rendezvous
Point (RP). If Bob chooses to answer, he builds a circuit to her
RP, and tells it to connect him to Alice. The RP joins their
circuits together, and begins relaying cells. Alice's 'BEGIN'
cells are received directly by Bob's OP, which passes data to
and from the local server implementing Bob's service.
Below we describe a network-level specification of this service,
along with interfaces to make this process transparent to Alice
(so long as she is using an OP).
0.1. Notation, conventions and prerequisites
In the specifications below, we use the same notation and terminology
as in "tor-spec.txt". The service specified here also requires the
existence of an onion routing network as specified in that file.
H(x) is a SHA1 digest of x.
PKSign(SK,x) is a PKCS.1-padded RSA signature of x with SK.
PKEncrypt(SK,x) is a PKCS.1-padded RSA encryption of x with SK.
Public keys are all RSA, and encoded in ASN.1.
All integers are stored in network (big-endian) order.
All symmetric encryption uses AES in counter mode, except where
otherwise noted.
In all discussions, "Alice" will refer to a user connecting to a
location-hidden service, and "Bob" will refer to a user running a
location-hidden service.
An OP is (as defined elsewhere) an "Onion Proxy" or Tor client.
An OR is (as defined elsewhere) an "Onion Router" or Tor server.
An "Introduction point" is a Tor server chosen to be Bob's medium-term
'meeting place'. A "Rendezvous point" is a Tor server chosen by Alice to
be a short-term communication relay between her and Bob. All Tor servers
potentially act as introduction and rendezvous points.
0.2. Protocol outline
1. Bob->Bob's OP: "Offer IP:Port as
public-key-name:Port". [configuration]
(We do not specify this step; it is left to the implementor of
Bob's OP.)
2. Bob's OP generates keypair and rendezvous service descriptor:
"Meet public-key X at introduction point A, B, or C." (signed)
3. Bob's OP->Introduction point via Tor: [introduction setup]
"This pk is me."
4. Bob's OP->directory service via Tor: publishes Bob's service
descriptor [advertisement]
5. Out of band, Alice receives a [x.y.]z.onion:port address.
She opens a SOCKS connection to her OP, and requests
x.y.z.onion:port.
6. Alice's OP retrieves Bob's descriptor via Tor. [descriptor lookup.]
7. Alice's OP chooses a rendezvous point, opens a circuit to that
rendezvous point, and establishes a rendezvous circuit. [rendezvous
setup.]
8. Alice connects to the Introduction point via Tor, and tells it about
her rendezvous point and optional authentication/authorization
information. (Encrypted to Bob.) [Introduction 1]
9. The Introduction point passes this on to Bob's OP via Tor, along the
introduction circuit. [Introduction 2]
10. Bob's OP decides whether to connect to Alice, and if so, creates a
circuit to Alice's RP via Tor. Establishes a shared circuit.
[Rendezvous.]
11. Alice's OP sends begin cells to Bob's OP. [Connection]
0.3. Constants and new cell types
Relay cell types
32 -- RELAY_ESTABLISH_INTRO
33 -- RELAY_ESTABLISH_RENDEZVOUS
34 -- RELAY_INTRODUCE1
35 -- RELAY_INTRODUCE2
36 -- RELAY_RENDEZVOUS1
37 -- RELAY_RENDEZVOUS2
38 -- RELAY_INTRO_ESTABLISHED
39 -- RELAY_RENDEZVOUS_ESTABLISHED
40 -- RELAY_COMMAND_INTRODUCE_ACK
0.4. Version overview
There are several parts in the hidden service protocol that have
changed over time, each of them having its own version number, whereas
other parts remained the same. The following list of potentially
versioned protocol parts should help reduce some confusion:
- Hidden service descriptor: the binary-based v0 was the default for
a long time, and an ascii-based v2 has been added by proposal
114. See 1.2.
- Hidden service descriptor propagation mechanism: currently related to
the hidden service descriptor version -- v0 publishes to the original
hs directory authorities, whereas v2 publishes to a rotating subset
of relays with the "hsdir" flag; see 1.4 and 1.6.
- Introduction protocol for how to generate an introduction cell:
v0 specified a nickname for the rendezvous point and assumed the
relay would know about it, whereas v2 now specifies IP address,
port, and onion key so the relay doesn't need to already recognize
it. See 1.8.
1. The Protocol
1.1. Bob configures his local OP.
We do not specify a format for the OP configuration file. However,
OPs SHOULD allow Bob to provide more than one advertised service
per OP, and MUST allow Bob to specify one or more virtual ports per
service. Bob provides a mapping from each of these virtual ports
to a local IP:Port pair.
1.2. Bob's OP generates service descriptors.
The first time the OP provides an advertised service, it generates
a public/private keypair (stored locally). Periodically, the OP
generates and publishes a descriptor of type "V0".
The "V0" descriptor contains:
KL Key length [2 octets]
PK Bob's public key [KL octets]
TS A timestamp [4 octets]
NI Number of introduction points [2 octets]
Ipt A list of NUL-terminated ORs [variable]
SIG Signature of above fields [variable]
KL is the length of PK, in octets.
TS is the number of seconds elapsed since Jan 1, 1970.
The members of Ipt may be either (a) nicknames, or (b) identity key
digests, encoded in hex, and prefixed with a '$'. Clients must
accept both forms. Services must only generate the second form.
Once 0.0.9.x is obsoleted, we can drop the first form.
[It's ok for Bob to advertise 0 introduction points. He might want
to do that if he previously advertised some introduction points,
and now he doesn't have any. -RD]
Beginning with 0.2.0.10-alpha, Bob's OP encodes "V2" descriptors in
addition to "V0" descriptors. The format of a "V2" descriptor is as
follows:
"rendezvous-service-descriptor" descriptor-id NL
[At start, exactly once]
Indicates the beginning of the descriptor. "descriptor-id" is a
periodically changing identifier of 160 bits formatted as 32 base32
chars that is calculated by the hidden service and its clients. If
the optional "descriptor-cookie" is used, this "descriptor-id"
cannot be computed by anyone else. (Everyone can verify that this
"descriptor-id" belongs to the rest of the descriptor, even without
knowing the optional "descriptor-cookie", as described below.) The
"descriptor-id" is calculated by performing the following operation:
descriptor-id =
H(permanent-id | H(time-period | descriptor-cookie | replica))
"permanent-id" is the permanent identifier of the hidden service,
consisting of 80 bits. It can be calculated by computing the hash value
of the public hidden service key and truncating after the first 80 bits:
permanent-id = H(public-key)[:10]
"H(time-period | descriptor-cookie | replica)" is the (possibly
secret) id part that is
necessary to verify that the hidden service is the true originator
of this descriptor. It can only be created by the hidden service
and its clients, but the "signature" below can only be created by
the service.
"descriptor-cookie" is an optional secret password of 128 bits that
is shared between the hidden service provider and its clients.
"replica" denotes the number of the non-consecutive replica.
(Each descriptor is replicated on a number of _consecutive_ nodes
in the identifier ring by making every storing node responsible
for the identifier intervals starting from its 3rd predecessor's
ID to its own ID. In addition to that, every service publishes
multiple descriptors with different descriptor IDs in order to
distribute them to different places on the ring. Therefore,
"replica" chooses one of the _non-consecutive_ replicas. -KL)
The "time-period" changes periodically depending on the global time and
as a function of "permanent-id". The current value for "time-period" can
be calculated using the following formula:
time-period = (current-time + permanent-id-byte * 86400 / 256)
/ 86400
"current-time" contains the current system time in seconds since
1970-01-01 00:00, e.g. 1188241957. "permanent-id-byte" is the first
(unsigned) byte of the permanent identifier (which is in network
order), e.g. 143. Adding the product of "permanent-id-byte" and
86400 (seconds per day), divided by 256, prevents "time-period" from
changing for all descriptors at the same time of the day. The result
of the overall operation is a (network-ordered) 32-bit integer, e.g.
13753 or 0x000035B9 with the example values given above.
"version" version-number NL
[Exactly once]
The version number of this descriptor's format. In this case: 2.
"permanent-key" NL a public key in PEM format
[Exactly once]
The public key of the hidden service which is required to verify the
"descriptor-id" and the "signature".
"secret-id-part" secret-id-part NL
[Exactly once]
The result of the following operation as explained above, formatted as
32 base32 chars. Using this secret id part, everyone can verify that
the signed descriptor belongs to "descriptor-id".
secret-id-part = H(time-period | descriptor-cookie | replica)
"publication-time" YYYY-MM-DD HH:MM:SS NL
[Exactly once]
A timestamp when this descriptor has been created.
"protocol-versions" version-string NL
[Exactly once]
A comma-separated list of recognized and permitted version numbers
for use in INTRODUCE cells; these versions are described in section
1.8 below.
"introduction-points" NL encrypted-string
[At most once]
A list of introduction points. If the optional "descriptor-cookie" is
used, this list is encrypted with AES in CTR mode with a random
initialization vector of 128 bits that is written to
the beginning of the encrypted string, and the "descriptor-cookie" as
secret key of 128 bits length.
The string containing the introduction point data (either encrypted
or not) is encoded in base64, and surrounded with
"-----BEGIN MESSAGE-----" and "-----END MESSAGE-----".
The unencrypted string may begin with:
["service-authentication" auth-type NL auth-data ... reserved]
[At start, any number]
The service-specific authentication data can be used to perform
client authentication. This data is independent of the selected
introduction point as opposed to "intro-authentication" below.
Subsequently, an arbitrary number of introduction point entries may
follow, each containing the following data:
"introduction-point" identifier NL
[At start, exactly once]
The identifier of this introduction point: the base-32 encoded
hash of this introduction point's identity key.
"ip-address" ip-address NL
[Exactly once]
The IP address of this introduction point.
"onion-port" port NL
[Exactly once]
The TCP port on which the introduction point is listening for
incoming onion requests.
"onion-key" NL a public key in PEM format
[Exactly once]
The public key that can be used to encrypt messages to this
introduction point.
"service-key" NL a public key in PEM format
[Exactly once]
The public key that can be used to encrypt messages to the hidden
service.
["intro-authentication" auth-type NL auth-data ... reserved]
[Any number]
The introduction-point-specific authentication data can be used
to perform client authentication. This data depends on the
selected introduction point as opposed to "service-authentication"
above.
(This ends the fields in the encrypted portion of the descriptor.)
"signature" NL signature-string
[At end, exactly once]
A signature of all fields above with the private key of the hidden
service.
1.2.1. Other descriptor formats we don't use.
The V1 descriptor format was understood and accepted from
0.1.1.5-alpha-cvs to 0.2.0.6-alpha-dev, but no Tors generated it and
it was removed:
V Format byte: set to 255 [1 octet]
V Version byte: set to 1 [1 octet]
KL Key length [2 octets]
PK Bob's public key [KL octets]
TS A timestamp [4 octets]
PROTO Protocol versions: bitmask [2 octets]
NI Number of introduction points [2 octets]
For each introduction point: (as in INTRODUCE2 cells)
IP Introduction point's address [4 octets]
PORT Introduction point's OR port [2 octets]
ID Introduction point identity ID [20 octets]
KLEN Length of onion key [2 octets]
KEY Introduction point onion key [KLEN octets]
SIG Signature of above fields [variable]
A hypothetical "V1" descriptor, that has never been used but might
be useful for historical reasons, contains:
V Format byte: set to 255 [1 octet]
V Version byte: set to 1 [1 octet]
KL Key length [2 octets]
PK Bob's public key [KL octets]
TS A timestamp [4 octets]
PROTO Rendezvous protocol versions: bitmask [2 octets]
NA Number of auth mechanisms accepted [1 octet]
For each auth mechanism:
AUTHT The auth type that is supported [2 octets]
AUTHL Length of auth data [1 octet]
AUTHD Auth data [variable]
NI Number of introduction points [2 octets]
For each introduction point: (as in INTRODUCE2 cells)
ATYPE An address type (typically 4) [1 octet]
ADDR Introduction point's IP address [4 or 16 octets]
PORT Introduction point's OR port [2 octets]
AUTHT The auth type that is supported [2 octets]
AUTHL Length of auth data [1 octet]
AUTHD Auth data [variable]
ID Introduction point identity ID [20 octets]
KLEN Length of onion key [2 octets]
KEY Introduction point onion key [KLEN octets]
SIG Signature of above fields [variable]
AUTHT specifies which authentication/authorization mechanism is
required by the hidden service or the introduction point. AUTHD
is arbitrary data that can be associated with an auth approach.
Currently only AUTHT of [00 00] is supported, with an AUTHL of 0.
See section 2 of this document for details on auth mechanisms.
1.3. Bob's OP establishes his introduction points.
The OP establishes a new introduction circuit to each introduction
point. These circuits MUST NOT be used for anything but hidden service
introduction. To establish the introduction, Bob sends a
RELAY_ESTABLISH_INTRO cell, containing:
KL Key length [2 octets]
PK Bob's public key [KL octets]
HS Hash of session info [20 octets]
SIG Signature of above information [variable]
[XXX011, need to add auth information here. -RD]
To prevent replay attacks, the HS field contains a SHA-1 hash based on the
shared secret KH between Bob's OP and the introduction point, as
follows:
HS = H(KH | "INTRODUCE")
That is:
HS = H(KH | [49 4E 54 52 4F 44 55 43 45])
(KH, as specified in tor-spec.txt, is H(g^xy | [00]) .)
Upon receiving such a cell, the OR first checks that the signature is
correct with the included public key. If so, it checks whether HS is
correct given the shared state between Bob's OP and the OR. If either
check fails, the OP discards the cell; otherwise, it associates the
circuit with Bob's public key, and dissociates any other circuits
currently associated with PK. On success, the OR sends Bob a
RELAY_INTRO_ESTABLISHED cell with an empty payload.
If a hidden service is configured to publish only v2 hidden service
descriptors, Bob's OP does not include its own public key in the
RELAY_ESTABLISH_INTRO cell, but the public key of a freshly generated
key pair. The OP also includes these fresh public keys in the v2 hidden
service descriptor together with the other introduction point
information. The reason is that the introduction point does not need to
and therefore should not know for which hidden service it works, so as
to prevent it from tracking the hidden service's activity. If the hidden
service is configured to publish both, v0 and v2 descriptors, two
separate sets of introduction points are established.
1.4. Bob's OP advertises his service descriptor(s).
Bob's OP opens a stream to each directory server's directory port via Tor.
(He may re-use old circuits for this.) Over this stream, Bob's OP makes
an HTTP 'POST' request, to a URL "/tor/rendezvous/publish" relative to the
directory server's root, containing as its body Bob's service descriptor.
Bob should upload a service descriptor for each version format that
is supported in the current Tor network.
Upon receiving a descriptor, the directory server checks the signature,
and discards the descriptor if the signature does not match the enclosed
public key. Next, the directory server checks the timestamp. If the
timestamp is more than 24 hours in the past or more than 1 hour in the
future, or the directory server already has a newer descriptor with the
same public key, the server discards the descriptor. Otherwise, the
server discards any older descriptors with the same public key and
version format, and associates the new descriptor with the public key.
The directory server remembers this descriptor for at least 24 hours
after its timestamp. At least every 18 hours, Bob's OP uploads a
fresh descriptor.
If Bob's OP is configured to publish v2 descriptors instead of or in
addition to v0 descriptors, it does so to a changing subset of all v2
hidden service directories instead of the authoritative directory
servers. Therefore, Bob's OP opens a stream via Tor to each
responsible hidden service directory. (He may re-use old circuits
for this.) Over this stream, Bob's OP makes an HTTP 'POST' request to a
URL "/tor/rendezvous2/publish" relative to the hidden service
directory's root, containing as its body Bob's service descriptor.
At any time, there are 6 hidden service directories responsible for
keeping replicas of a descriptor; they consist of 2 sets of 3 hidden
service directories with consecutive onion IDs. Bob's OP learns about
the complete list of hidden service directories by filtering the
consensus status document received from the directory authorities. A
hidden service directory is deemed responsible for all descriptor IDs in
the interval from its direct predecessor, exclusive, to its own ID,
inclusive; it further holds replicas for its 2 predecessors. A
participant only trusts its own routing list and never learns about
routing information from other parties.
Bob's OP publishes a new v2 descriptor once an hour or whenever its
content changes. V2 descriptors can be found by clients within a given
time period of 24 hours, after which they change their ID as described
under 1.2. If a published descriptor would be valid for less than 60
minutes (= 2 x 30 minutes to allow the server to be 30 minutes behind
and the client 30 minutes ahead), Bob's OP publishes the descriptor
under the ID of both, the current and the next publication period.
1.5. Alice receives a x.y.z.onion address.
When Alice receives a pointer to a location-hidden service, it is as a
hostname of the form "z.onion" or "y.z.onion" or "x.y.z.onion", where
z is a base-32 encoding of a 10-octet hash of Bob's service's public
key, computed as follows:
1. Let H = H(PK).
2. Let H' = the first 80 bits of H, considering each octet from
most significant bit to least significant bit.
2. Generate a 16-character encoding of H', using base32 as defined
in RFC 3548.
(We only use 80 bits instead of the 160 bits from SHA1 because we
don't need to worry about arbitrary collisions, and because it will
make handling the url's more convenient.)
The string "x", if present, is the base-32 encoding of the
authentication/authorization required by the introduction point.
The string "y", if present, is the base-32 encoding of the
authentication/authorization required by the hidden service.
Omitting a string is taken to mean auth type [00 00].
See section 2 of this document for details on auth mechanisms.
[Yes, numbers are allowed at the beginning. See RFC 1123. -NM]
1.6. Alice's OP retrieves a service descriptor.
Alice opens a stream to a directory server via Tor, and makes an HTTP GET
request for the document '/tor/rendezvous/<z>', where '<z>' is replaced
with the encoding of Bob's public key as described above. (She may re-use
old circuits for this.) The directory replies with a 404 HTTP response if
it does not recognize <z>, and otherwise returns Bob's most recently
uploaded service descriptor.
If Alice's OP receives a 404 response, it tries the other directory
servers, and only fails the lookup if none recognize the public key hash.
Upon receiving a service descriptor, Alice verifies with the same process
as the directory server uses, described above in section 1.4.
The directory server gives a 400 response if it cannot understand Alice's
request.
Alice should cache the descriptor locally, but should not use
descriptors that are more than 24 hours older than their timestamp.
[Caching may make her partitionable, but she fetched it anonymously,
and we can't very well *not* cache it. -RD]
Alice's OP fetches v2 descriptors in parallel to v0 descriptors. Similarly
to the description in section 1.4, the OP fetches a v2 descriptor from a
randomly chosen hidden service directory out of the changing subset of
6 nodes. If the request is unsuccessful, Alice retries the other
remaining responsible hidden service directories in a random order.
Alice relies on Bob to care about a potential clock skew between the two
by possibly storing two sets of descriptors (see end of section 1.4).
Alice's OP opens a stream via Tor to the chosen v2 hidden service
directory. (She may re-use old circuits for this.) Over this stream,
Alice's OP makes an HTTP 'GET' request for the document
"/tor/rendezvous2/<z>", where z is replaced with the encoding of the
descriptor ID. The directory replies with a 404 HTTP response if it does
not recognize <z>, and otherwise returns Bob's most recently uploaded
service descriptor.
1.7. Alice's OP establishes a rendezvous point.
When Alice requests a connection to a given location-hidden service,
and Alice's OP does not have an established circuit to that service,
the OP builds a rendezvous circuit. It does this by establishing
a circuit to a randomly chosen OR, and sending a
RELAY_ESTABLISH_RENDEZVOUS cell to that OR. The body of that cell
contains:
RC Rendezvous cookie [20 octets]
[XXX011 this looks like an auth mechanism. should we generalize here? -RD]
The rendezvous cookie is an arbitrary 20-byte value, chosen randomly by
Alice's OP.
Upon receiving a RELAY_ESTABLISH_RENDEZVOUS cell, the OR associates the
RC with the circuit that sent it. It replies to Alice with an empty
RELAY_RENDEZVOUS_ESTABLISHED cell to indicate success.
Alice's OP MUST NOT use the circuit which sent the cell for any purpose
other than rendezvous with the given location-hidden service.
1.8. Introduction: from Alice's OP to Introduction Point
Alice builds a separate circuit to one of Bob's chosen introduction
points, and sends it a RELAY_INTRODUCE1 cell containing:
Cleartext
PK_ID Identifier for Bob's PK [20 octets]
Encrypted to Bob's PK: (in the v0 intro protocol)
RP Rendezvous point's nickname [20 octets]
RC Rendezvous cookie [20 octets]
g^x Diffie-Hellman data, part 1 [128 octets]
OR (in the v1 intro protocol)
VER Version byte: set to 1. [1 octet]
RP Rendezvous point nick or ID [42 octets]
RC Rendezvous cookie [20 octets]
g^x Diffie-Hellman data, part 1 [128 octets]
OR (in the v2 intro protocol)
VER Version byte: set to 2. [1 octet]
IP Rendezvous point's address [4 octets]
PORT Rendezvous point's OR port [2 octets]
ID Rendezvous point identity ID [20 octets]
KLEN Length of onion key [2 octets]
KEY Rendezvous point onion key [KLEN octets]
RC Rendezvous cookie [20 octets]
g^x Diffie-Hellman data, part 1 [128 octets]
PK_ID is the hash of Bob's public key. RP is NUL-padded and
terminated. In version 0, it must contain a nickname. In version 1,
it must contain EITHER a nickname or an identity key digest that is
encoded in hex and prefixed with a '$'.
The hybrid encryption to Bob's PK works just like the hybrid
encryption in CREATE cells (see tor-spec). Thus the payload of the
version 0 RELAY_INTRODUCE1 cell on the wire will contain
20+42+16+20+20+128=246 bytes, and the version 1 and version 2
introduction formats have other sizes.
Through Tor 0.2.0.6-alpha, clients only generated the v0 introduction
format, whereas hidden services have understood and accepted v0,
v1, and v2 since 0.1.1.x. As of Tor 0.2.0.7-alpha and 0.1.2.18,
clients switched to using the v2 intro format.
If Alice has downloaded a v2 descriptor, she uses the contained public
key ("service-key") instead of Bob's public key to create the
RELAY_INTRODUCE1 cell as described above.
1.8.1. Other introduction formats we don't use.
We briefly speculated about using the following format for the
"encrypted to Bob's PK" part of the introduction, but no Tors have
ever generated these.
VER Version byte: set to 3. [1 octet]
ATYPE An address type (typically 4) [1 octet]
ADDR Rendezvous point's IP address [4 or 16 octets]
PORT Rendezvous point's OR port [2 octets]
AUTHT The auth type that is supported [2 octets]
AUTHL Length of auth data [1 octet]
AUTHD Auth data [variable]
ID Rendezvous point identity ID [20 octets]
KLEN Length of onion key [2 octets]
KEY Rendezvous point onion key [KLEN octets]
RC Rendezvous cookie [20 octets]
g^x Diffie-Hellman data, part 1 [128 octets]
1.9. Introduction: From the Introduction Point to Bob's OP
If the Introduction Point recognizes PK_ID as a public key which has
established a circuit for introductions as in 1.3 above, it sends the body
of the cell in a new RELAY_INTRODUCE2 cell down the corresponding circuit.
(If the PK_ID is unrecognized, the RELAY_INTRODUCE1 cell is discarded.)
After sending the RELAY_INTRODUCE2 cell, the OR replies to Alice with an
empty RELAY_COMMAND_INTRODUCE_ACK cell. If no RELAY_INTRODUCE2 cell can
be sent, the OR replies to Alice with a non-empty cell to indicate an
error. (The semantics of the cell body may be determined later; the
current implementation sends a single '1' byte on failure.)
When Bob's OP receives the RELAY_INTRODUCE2 cell, it decrypts it with
the private key for the corresponding hidden service, and extracts the
rendezvous point's nickname, the rendezvous cookie, and the value of g^x
chosen by Alice.
1.10. Rendezvous
Bob's OP builds a new Tor circuit ending at Alice's chosen rendezvous
point, and sends a RELAY_RENDEZVOUS1 cell along this circuit, containing:
RC Rendezvous cookie [20 octets]
g^y Diffie-Hellman [128 octets]
KH Handshake digest [20 octets]
(Bob's OP MUST NOT use this circuit for any other purpose.)
If the RP recognizes RC, it relays the rest of the cell down the
corresponding circuit in a RELAY_RENDEZVOUS2 cell, containing:
g^y Diffie-Hellman [128 octets]
KH Handshake digest [20 octets]
(If the RP does not recognize the RC, it discards the cell and
tears down the circuit.)
When Alice's OP receives a RELAY_RENDEZVOUS2 cell on a circuit which
has sent a RELAY_ESTABLISH_RENDEZVOUS cell but which has not yet received
a reply, it uses g^y and H(g^xy) to complete the handshake as in the Tor
circuit extend process: they establish a 60-octet string as
K = SHA1(g^xy | [00]) | SHA1(g^xy | [01]) | SHA1(g^xy | [02])
and generate
KH = K[0..15]
Kf = K[16..31]
Kb = K[32..47]
Subsequently, the rendezvous point passes relay cells, unchanged, from
each of the two circuits to the other. When Alice's OP sends
RELAY cells along the circuit, it first encrypts them with the
Kf, then with all of the keys for the ORs in Alice's side of the circuit;
and when Alice's OP receives RELAY cells from the circuit, it decrypts
them with the keys for the ORs in Alice's side of the circuit, then
decrypts them with Kb. Bob's OP does the same, with Kf and Kb
interchanged.
1.11. Creating streams
To open TCP connections to Bob's location-hidden service, Alice's OP sends
a RELAY_BEGIN cell along the established circuit, using the special
address "", and a chosen port. Bob's OP chooses a destination IP and
port, based on the configuration of the service connected to the circuit,
and opens a TCP stream. From then on, Bob's OP treats the stream as an
ordinary exit connection.
[ Except he doesn't include addr in the connected cell or the end
cell. -RD]
Alice MAY send multiple RELAY_BEGIN cells along the circuit, to open
multiple streams to Bob. Alice SHOULD NOT send RELAY_BEGIN cells for any
other address along her circuit to Bob; if she does, Bob MUST reject them.
2. Authentication and authorization.
Foo.
3. Hidden service directory operation
This section has been introduced with the v2 hidden service descriptor
format. It describes all operations of the v2 hidden service descriptor
fetching and propagation mechanism that are required for the protocol
described in section 1 to succeed with v2 hidden service descriptors.
3.1. Configuring as hidden service directory
Every onion router that has its directory port open can decide whether it
wants to store and serve hidden service descriptors. An onion router which
is configured as such includes the "hidden-service-dir" flag in its router
descriptors that it sends to directory authorities.
The directory authorities include a new flag "HSDir" for routers that
decided to provide storage for hidden service descriptors and that
have been running for at least 24 hours.
3.2. Accepting publish requests
Hidden service directory nodes accept publish requests for v2 hidden service
descriptors and store them to their local memory. (It is not necessary to
make descriptors persistent, because after restarting, the onion router
would not be accepted as a storing node anyway, because it has not been
running for at least 24 hours.) All requests and replies are formatted as
HTTP messages. Requests are initiated via BEGIN_DIR cells directed to
the router's directory port, and formatted as HTTP POST requests to the URL
"/tor/rendezvous2/publish" relative to the hidden service directory's root,
containing as its body a v2 service descriptor.
A hidden service directory node parses every received descriptor and only
stores it when it thinks that it is responsible for storing that descriptor
based on its own routing table. See section 1.4 for more information on how
to determine responsibility for a certain descriptor ID.
3.3. Processing fetch requests
Hidden service directory nodes process fetch requests for hidden service
descriptors by looking them up in their local memory. (They do not need to
determine if they are responsible for the passed ID, because it does no harm
if they deliver a descriptor for which they are not (any more) responsible.)
All requests and replies are formatted as HTTP messages. Requests are
initiated via BEGIN_DIR cells directed to the router's directory port,
and formatted as HTTP GET requests for the document "/tor/rendezvous2/<z>",
where z is replaced with the encoding of the descriptor ID.

View File

@ -1,79 +0,0 @@
$Id$
Tor's extensions to the SOCKS protocol
1. Overview
The SOCKS protocol provides a generic interface for TCP proxies. Client
software connects to a SOCKS server via TCP, and requests a TCP connection
to another address and port. The SOCKS server establishes the connection,
and reports success or failure to the client. After the connection has
been established, the client application uses the TCP stream as usual.
Tor supports SOCKS4 as defined in [1], SOCKS4A as defined in [2], and
SOCKS5 as defined in [3].
The stickiest issue for Tor in supporting clients, in practice, is forcing
DNS lookups to occur at the OR side: if clients do their own DNS lookup,
the DNS server can learn which addresses the client wants to reach.
SOCKS4 supports addressing by IPv4 address; SOCKS4A is a kludge on top of
SOCKS4 to allow addressing by hostname; SOCKS5 supports IPv4, IPv6, and
hostnames.
1.1. Extent of support
Tor supports the SOCKS4, SOCKS4A, and SOCKS5 standards, except as follows:
BOTH:
- The BIND command is not supported.
SOCKS4,4A:
- SOCKS4 usernames are ignored.
SOCKS5:
- The (SOCKS5) "UDP ASSOCIATE" command is not supported.
- IPv6 is not supported in CONNECT commands.
- Only the "NO AUTHENTICATION" (SOCKS5) authentication method [00] is
supported.
2. Name lookup
As an extension to SOCKS4A and SOCKS5, Tor implements a new command value,
"RESOLVE" [F0]. When Tor receives a "RESOLVE" SOCKS command, it initiates
a remote lookup of the hostname provided as the target address in the SOCKS
request. The reply is either an error (if the address couldn't be
resolved) or a success response. In the case of success, the address is
stored in the portion of the SOCKS response reserved for remote IP address.
(We support RESOLVE in SOCKS4 too, even though it is unnecessary.)
For SOCKS5 only, we support reverse resolution with a new command value,
"RESOLVE_PTR" [F1]. In response to a "RESOLVE_PTR" SOCKS5 command with
an IPv4 address as its target, Tor attempts to find the canonical
hostname for that IPv4 record, and returns it in the "server bound
address" portion of the reply.
(This command was not supported before Tor 0.1.2.2-alpha.)
3. Other command extensions.
Tor 0.1.2.4-alpha added a new command value: "CONNECT_DIR" [F2].
In this case, Tor will open an encrypted direct TCP connection to the
directory port of the Tor server specified by address:port (the port
specified should be the ORPort of the server). It uses a one-hop tunnel
and a "BEGIN_DIR" relay cell to accomplish this secure connection.
The F2 command value was removed in Tor 0.2.0.10-alpha in favor of a
new use_begindir flag in edge_connection_t.
4. HTTP-resistance
Tor checks the first byte of each SOCKS request to see whether it looks
more like an HTTP request (that is, it starts with a "G", "H", or "P"). If
so, Tor returns a small webpage, telling the user that his/her browser is
misconfigured. This is helpful for the many users who mistakenly try to
use Tor as an HTTP proxy instead of a SOCKS proxy.
References:
[1] http://archive.socks.permeo.com/protocol/socks4.protocol
[2] http://archive.socks.permeo.com/protocol/socks4a.protocol
[3] SOCKS5: RFC1928

View File

@ -1,993 +0,0 @@
$Id$
Tor Protocol Specification
Roger Dingledine
Nick Mathewson
Note: This document aims to specify Tor as implemented in 0.2.1.x. Future
versions of Tor may implement improved protocols, and compatibility is not
guaranteed. Compatibility notes are given for versions 0.1.1.15-rc and
later; earlier versions are not compatible with the Tor network as of this
writing.
This specification is not a design document; most design criteria
are not examined. For more information on why Tor acts as it does,
see tor-design.pdf.
0. Preliminaries
0.1. Notation and encoding
PK -- a public key.
SK -- a private key.
K -- a key for a symmetric cypher.
a|b -- concatenation of 'a' and 'b'.
[A0 B1 C2] -- a three-byte sequence, containing the bytes with
hexadecimal values A0, B1, and C2, in that order.
All numeric values are encoded in network (big-endian) order.
H(m) -- a cryptographic hash of m.
0.2. Security parameters
Tor uses a stream cipher, a public-key cipher, the Diffie-Hellman
protocol, and a hash function.
KEY_LEN -- the length of the stream cipher's key, in bytes.
PK_ENC_LEN -- the length of a public-key encrypted message, in bytes.
PK_PAD_LEN -- the number of bytes added in padding for public-key
encryption, in bytes. (The largest number of bytes that can be encrypted
in a single public-key operation is therefore PK_ENC_LEN-PK_PAD_LEN.)
DH_LEN -- the number of bytes used to represent a member of the
Diffie-Hellman group.
DH_SEC_LEN -- the number of bytes used in a Diffie-Hellman private key (x).
HASH_LEN -- the length of the hash function's output, in bytes.
PAYLOAD_LEN -- The longest allowable cell payload, in bytes. (509)
CELL_LEN -- The length of a Tor cell, in bytes.
0.3. Ciphers
For a stream cipher, we use 128-bit AES in counter mode, with an IV of all
0 bytes.
For a public-key cipher, we use RSA with 1024-bit keys and a fixed
exponent of 65537. We use OAEP-MGF1 padding, with SHA-1 as its digest
function. We leave the optional "Label" parameter unset. (For OAEP
padding, see ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-1/pkcs-1v2-1.pdf)
For Diffie-Hellman, we use a generator (g) of 2. For the modulus (p), we
use the 1024-bit safe prime from rfc2409 section 6.2 whose hex
representation is:
"FFFFFFFFFFFFFFFFC90FDAA22168C234C4C6628B80DC1CD129024E08"
"8A67CC74020BBEA63B139B22514A08798E3404DDEF9519B3CD3A431B"
"302B0A6DF25F14374FE1356D6D51C245E485B576625E7EC6F44C42E9"
"A637ED6B0BFF5CB6F406B7EDEE386BFB5A899FA5AE9F24117C4B1FE6"
"49286651ECE65381FFFFFFFFFFFFFFFF"
As an optimization, implementations SHOULD choose DH private keys (x) of
320 bits. Implementations that do this MUST never use any DH key more
than once.
[May other implementations reuse their DH keys?? -RD]
[Probably not. Conceivably, you could get away with changing DH keys once
per second, but there are too many oddball attacks for me to be
comfortable that this is safe. -NM]
For a hash function, we use SHA-1.
KEY_LEN=16.
DH_LEN=128; DH_SEC_LEN=40.
PK_ENC_LEN=128; PK_PAD_LEN=42.
HASH_LEN=20.
When we refer to "the hash of a public key", we mean the SHA-1 hash of the
DER encoding of an ASN.1 RSA public key (as specified in PKCS.1).
All "random" values should be generated with a cryptographically strong
random number generator, unless otherwise noted.
The "hybrid encryption" of a byte sequence M with a public key PK is
computed as follows:
1. If M is less than PK_ENC_LEN-PK_PAD_LEN, pad and encrypt M with PK.
2. Otherwise, generate a KEY_LEN byte random key K.
Let M1 = the first PK_ENC_LEN-PK_PAD_LEN-KEY_LEN bytes of M,
and let M2 = the rest of M.
Pad and encrypt K|M1 with PK. Encrypt M2 with our stream cipher,
using the key K. Concatenate these encrypted values.
[XXX Note that this "hybrid encryption" approach does not prevent
an attacker from adding or removing bytes to the end of M. It also
allows attackers to modify the bytes not covered by the OAEP --
see Goldberg's PET2006 paper for details. We will add a MAC to this
scheme one day. -RD]
0.4. Other parameter values
CELL_LEN=512
1. System overview
Tor is a distributed overlay network designed to anonymize
low-latency TCP-based applications such as web browsing, secure shell,
and instant messaging. Clients choose a path through the network and
build a ``circuit'', in which each node (or ``onion router'' or ``OR'')
in the path knows its predecessor and successor, but no other nodes in
the circuit. Traffic flowing down the circuit is sent in fixed-size
``cells'', which are unwrapped by a symmetric key at each node (like
the layers of an onion) and relayed downstream.
1.1. Keys and names
Every Tor server has multiple public/private keypairs:
- A long-term signing-only "Identity key" used to sign documents and
certificates, and used to establish server identity.
- A medium-term "Onion key" used to decrypt onion skins when accepting
circuit extend attempts. (See 5.1.) Old keys MUST be accepted for at
least one week after they are no longer advertised. Because of this,
servers MUST retain old keys for a while after they're rotated.
- A short-term "Connection key" used to negotiate TLS connections.
Tor implementations MAY rotate this key as often as they like, and
SHOULD rotate this key at least once a day.
Tor servers are also identified by "nicknames"; these are specified in
dir-spec.txt.
2. Connections
Connections between two Tor servers, or between a client and a server,
use TLS/SSLv3 for link authentication and encryption. All
implementations MUST support the SSLv3 ciphersuite
"SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA", and SHOULD support the TLS
ciphersuite "TLS_DHE_RSA_WITH_AES_128_CBC_SHA" if it is available.
There are three acceptable ways to perform a TLS handshake when
connecting to a Tor server: "certificates up-front", "renegotiation", and
"backwards-compatible renegotiation". ("Backwards-compatible
renegotiation" is, as the name implies, compatible with both other
handshake types.)
Before Tor 0.2.0.21, only "certificates up-front" was supported. In Tor
0.2.0.21 or later, "backwards-compatible renegotiation" is used.
In "certificates up-front", the connection initiator always sends a
two-certificate chain, consisting of an X.509 certificate using a
short-term connection public key and a second, self- signed X.509
certificate containing its identity key. The other party sends a similar
certificate chain. The initiator's ClientHello MUST NOT include any
ciphersuites other than:
TLS_DHE_RSA_WITH_AES_256_CBC_SHA
TLS_DHE_RSA_WITH_AES_128_CBC_SHA
SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA
SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA
In "renegotiation", the connection initiator sends no certificates, and
the responder sends a single connection certificate. Once the TLS
handshake is complete, the initiator renegotiates the handshake, with each
parties sending a two-certificate chain as in "certificates up-front".
The initiator's ClientHello MUST include at least once ciphersuite not in
the list above. The responder SHOULD NOT select any ciphersuite besides
those in the list above.
[The above "should not" is because some of the ciphers that
clients list may be fake.]
In "backwards-compatible renegotiation", the connection initiator's
ClientHello MUST include at least one ciphersuite other than those listed
above. The connection responder examines the initiator's ciphersuite list
to see whether it includes any ciphers other than those included in the
list above. If extra ciphers are included, the responder proceeds as in
"renegotiation": it sends a single certificate and does not request
client certificates. Otherwise (in the case that no extra ciphersuites
are included in the ClientHello) the responder proceeds as in
"certificates up-front": it requests client certificates, and sends a
two-certificate chain. In either case, once the responder has sent its
certificate or certificates, the initiator counts them. If two
certificates have been sent, it proceeds as in "certificates up-front";
otherwise, it proceeds as in "renegotiation".
All new implementations of the Tor server protocol MUST support
"backwards-compatible renegotiation"; clients SHOULD do this too. If
this is not possible, new client implementations MUST support both
"renegotiation" and "certificates up-front" and use the router's
published link protocols list (see dir-spec.txt on the "protocols" entry)
to decide which to use.
In all of the above handshake variants, certificates sent in the clear
SHOULD NOT include any strings to identify the host as a Tor server. In
the "renegotation" and "backwards-compatible renegotiation", the
initiator SHOULD chose a list of ciphersuites and TLS extensions chosen
to mimic one used by a popular web browser.
Responders MUST NOT select any TLS ciphersuite that lacks ephemeral keys,
or whose symmetric keys are less then KEY_LEN bits, or whose digests are
less than HASH_LEN bits. Responders SHOULD NOT select any SSLv3
ciphersuite other than those listed above.
Even though the connection protocol is identical, we will think of the
initiator as either an onion router (OR) if it is willing to relay
traffic for other Tor users, or an onion proxy (OP) if it only handles
local requests. Onion proxies SHOULD NOT provide long-term-trackable
identifiers in their handshakes.
In all handshake variants, once all certificates are exchanged, all
parties receiving certificates must confirm that the identity key is as
expected. (When initiating a connection, the expected identity key is
the one given in the directory; when creating a connection because of an
EXTEND cell, the expected identity key is the one given in the cell.) If
the key is not as expected, the party must close the connection.
When connecting to an OR, all parties SHOULD reject the connection if that
OR has a malformed or missing certificate. When accepting an incoming
connection, an OR SHOULD NOT reject incoming connections from parties with
malformed or missing certificates. (However, an OR should not believe
that an incoming connection is from another OR unless the certificates
are present and well-formed.)
[Before version 0.1.2.8-rc, ORs rejected incoming connections from ORs and
OPs alike if their certificates were missing or malformed.]
Once a TLS connection is established, the two sides send cells
(specified below) to one another. Cells are sent serially. All
cells are CELL_LEN bytes long. Cells may be sent embedded in TLS
records of any size or divided across TLS records, but the framing
of TLS records MUST NOT leak information about the type or contents
of the cells.
TLS connections are not permanent. Either side MAY close a connection
if there are no circuits running over it and an amount of time
(KeepalivePeriod, defaults to 5 minutes) has passed since the last time
any traffic was transmitted over the TLS connection. Clients SHOULD
also hold a TLS connection with no circuits open, if it is likely that a
circuit will be built soon using that connection.
(As an exception, directory servers may try to stay connected to all of
the ORs -- though this will be phased out for the Tor 0.1.2.x release.)
To avoid being trivially distinguished from servers, client-only Tor
instances are encouraged but not required to use a two-certificate chain
as well. Clients SHOULD NOT keep using the same certificates when
their IP address changes. Clients MAY send no certificates at all.
3. Cell Packet format
The basic unit of communication for onion routers and onion
proxies is a fixed-width "cell".
On a version 1 connection, each cell contains the following
fields:
CircID [2 bytes]
Command [1 byte]
Payload (padded with 0 bytes) [PAYLOAD_LEN bytes]
On a version 2 connection, all cells are as in version 1 connections,
except for the initial VERSIONS cell, whose format is:
Circuit [2 octets; set to 0]
Command [1 octet; set to 7 for VERSIONS]
Length [2 octets; big-endian integer]
Payload [Length bytes]
The CircID field determines which circuit, if any, the cell is
associated with.
The 'Command' field holds one of the following values:
0 -- PADDING (Padding) (See Sec 7.2)
1 -- CREATE (Create a circuit) (See Sec 5.1)
2 -- CREATED (Acknowledge create) (See Sec 5.1)
3 -- RELAY (End-to-end data) (See Sec 5.5 and 6)
4 -- DESTROY (Stop using a circuit) (See Sec 5.4)
5 -- CREATE_FAST (Create a circuit, no PK) (See Sec 5.1)
6 -- CREATED_FAST (Circuit created, no PK) (See Sec 5.1)
7 -- VERSIONS (Negotiate proto version) (See Sec 4)
8 -- NETINFO (Time and address info) (See Sec 4)
9 -- RELAY_EARLY (End-to-end data; limited) (See sec 5.6)
The interpretation of 'Payload' depends on the type of the cell.
PADDING: Payload is unused.
CREATE: Payload contains the handshake challenge.
CREATED: Payload contains the handshake response.
RELAY: Payload contains the relay header and relay body.
DESTROY: Payload contains a reason for closing the circuit.
(see 5.4)
Upon receiving any other value for the command field, an OR must
drop the cell. Since more cell types may be added in the future, ORs
should generally not warn when encountering unrecognized commands.
The payload is padded with 0 bytes.
PADDING cells are currently used to implement connection keepalive.
If there is no other traffic, ORs and OPs send one another a PADDING
cell every few minutes.
CREATE, CREATED, and DESTROY cells are used to manage circuits;
see section 5 below.
RELAY cells are used to send commands and data along a circuit; see
section 6 below.
VERSIONS and NETINFO cells are used to set up connections. See section 4
below.
4. Negotiating and initializing connections
4.1. Negotiating versions with VERSIONS cells
There are multiple instances of the Tor link connection protocol. Any
connection negotiated using the "certificates up front" handshake (see
section 2 above) is "version 1". In any connection where both parties
have behaved as in the "renegotiation" handshake, the link protocol
version is 2 or higher.
To determine the version, in any connection where the "renegotiation"
handshake was used (that is, where the server sent only one certificate
at first and where the client did not send any certificates until
renegotiation), both parties MUST send a VERSIONS cell immediately after
the renegotiation is finished, before any other cells are sent. Parties
MUST NOT send any other cells on a connection until they have received a
VERSIONS cell.
The payload in a VERSIONS cell is a series of big-endian two-byte
integers. Both parties MUST select as the link protocol version the
highest number contained both in the VERSIONS cell they sent and in the
versions cell they received. If they have no such version in common,
they cannot communicate and MUST close the connection.
Since the version 1 link protocol does not use the "renegotiation"
handshake, implementations MUST NOT list version 1 in their VERSIONS
cell.
4.2. NETINFO cells
If version 2 or higher is negotiated, each party sends the other a
NETINFO cell. The cell's payload is:
Timestamp [4 bytes]
Other OR's address [variable]
Number of addresses [1 byte]
This OR's addresses [variable]
The address format is a type/length/value sequence as given in section
6.4 below. The timestamp is a big-endian unsigned integer number of
seconds since the unix epoch.
Implementations MAY use the timestamp value to help decide if their
clocks are skewed. Initiators MAY use "other OR's address" to help
learn which address their connections are originating from, if they do
not know it. Initiators SHOULD use "this OR's address" to make sure
that they have connected to another OR at its canonical address.
[As of 0.2.0.23-rc, implementations use none of the above values.]
5. Circuit management
5.1. CREATE and CREATED cells
Users set up circuits incrementally, one hop at a time. To create a
new circuit, OPs send a CREATE cell to the first node, with the
first half of the DH handshake; that node responds with a CREATED
cell with the second half of the DH handshake plus the first 20 bytes
of derivative key data (see section 5.2). To extend a circuit past
the first hop, the OP sends an EXTEND relay cell (see section 5)
which instructs the last node in the circuit to send a CREATE cell
to extend the circuit.
The payload for a CREATE cell is an 'onion skin', which consists
of the first step of the DH handshake data (also known as g^x).
This value is hybrid-encrypted (see 0.3) to Bob's onion key, giving
an onion-skin of:
PK-encrypted:
Padding [PK_PAD_LEN bytes]
Symmetric key [KEY_LEN bytes]
First part of g^x [PK_ENC_LEN-PK_PAD_LEN-KEY_LEN bytes]
Symmetrically encrypted:
Second part of g^x [DH_LEN-(PK_ENC_LEN-PK_PAD_LEN-KEY_LEN)
bytes]
The relay payload for an EXTEND relay cell consists of:
Address [4 bytes]
Port [2 bytes]
Onion skin [DH_LEN+KEY_LEN+PK_PAD_LEN bytes]
Identity fingerprint [HASH_LEN bytes]
The port and address field denote the IPV4 address and port of the next
onion router in the circuit; the public key hash is the hash of the PKCS#1
ASN1 encoding of the next onion router's identity (signing) key. (See 0.3
above.) Including this hash allows the extending OR verify that it is
indeed connected to the correct target OR, and prevents certain
man-in-the-middle attacks.
The payload for a CREATED cell, or the relay payload for an
EXTENDED cell, contains:
DH data (g^y) [DH_LEN bytes]
Derivative key data (KH) [HASH_LEN bytes] <see 5.2 below>
The CircID for a CREATE cell is an arbitrarily chosen 2-byte integer,
selected by the node (OP or OR) that sends the CREATE cell. To prevent
CircID collisions, when one node sends a CREATE cell to another, it chooses
from only one half of the possible values based on the ORs' public
identity keys: if the sending node has a lower key, it chooses a CircID with
an MSB of 0; otherwise, it chooses a CircID with an MSB of 1.
(An OP with no public key MAY choose any CircID it wishes, since an OP
never needs to process a CREATE cell.)
Public keys are compared numerically by modulus.
As usual with DH, x and y MUST be generated randomly.
5.1.1. CREATE_FAST/CREATED_FAST cells
When initializing the first hop of a circuit, the OP has already
established the OR's identity and negotiated a secret key using TLS.
Because of this, it is not always necessary for the OP to perform the
public key operations to create a circuit. In this case, the
OP MAY send a CREATE_FAST cell instead of a CREATE cell for the first
hop only. The OR responds with a CREATED_FAST cell, and the circuit is
created.
A CREATE_FAST cell contains:
Key material (X) [HASH_LEN bytes]
A CREATED_FAST cell contains:
Key material (Y) [HASH_LEN bytes]
Derivative key data [HASH_LEN bytes] (See 5.2 below)
The values of X and Y must be generated randomly.
If an OR sees a circuit created with CREATE_FAST, the OR is sure to be the
first hop of a circuit. ORs SHOULD reject attempts to create streams with
RELAY_BEGIN exiting the circuit at the first hop: letting Tor be used as a
single hop proxy makes exit nodes a more attractive target for compromise.
5.2. Setting circuit keys
Once the handshake between the OP and an OR is completed, both can
now calculate g^xy with ordinary DH. Before computing g^xy, both client
and server MUST verify that the received g^x or g^y value is not degenerate;
that is, it must be strictly greater than 1 and strictly less than p-1
where p is the DH modulus. Implementations MUST NOT complete a handshake
with degenerate keys. Implementations MUST NOT discard other "weak"
g^x values.
(Discarding degenerate keys is critical for security; if bad keys
are not discarded, an attacker can substitute the server's CREATED
cell's g^y with 0 or 1, thus creating a known g^xy and impersonating
the server. Discarding other keys may allow attacks to learn bits of
the private key.)
If CREATE or EXTEND is used to extend a circuit, the client and server
base their key material on K0=g^xy, represented as a big-endian unsigned
integer.
If CREATE_FAST is used, the client and server base their key material on
K0=X|Y.
From the base key material K0, they compute KEY_LEN*2+HASH_LEN*3 bytes of
derivative key data as
K = H(K0 | [00]) | H(K0 | [01]) | H(K0 | [02]) | ...
The first HASH_LEN bytes of K form KH; the next HASH_LEN form the forward
digest Df; the next HASH_LEN 41-60 form the backward digest Db; the next
KEY_LEN 61-76 form Kf, and the final KEY_LEN form Kb. Excess bytes from K
are discarded.
KH is used in the handshake response to demonstrate knowledge of the
computed shared key. Df is used to seed the integrity-checking hash
for the stream of data going from the OP to the OR, and Db seeds the
integrity-checking hash for the data stream from the OR to the OP. Kf
is used to encrypt the stream of data going from the OP to the OR, and
Kb is used to encrypt the stream of data going from the OR to the OP.
5.3. Creating circuits
When creating a circuit through the network, the circuit creator
(OP) performs the following steps:
1. Choose an onion router as an exit node (R_N), such that the onion
router's exit policy includes at least one pending stream that
needs a circuit (if there are any).
2. Choose a chain of (N-1) onion routers
(R_1...R_N-1) to constitute the path, such that no router
appears in the path twice.
3. If not already connected to the first router in the chain,
open a new connection to that router.
4. Choose a circID not already in use on the connection with the
first router in the chain; send a CREATE cell along the
connection, to be received by the first onion router.
5. Wait until a CREATED cell is received; finish the handshake
and extract the forward key Kf_1 and the backward key Kb_1.
6. For each subsequent onion router R (R_2 through R_N), extend
the circuit to R.
To extend the circuit by a single onion router R_M, the OP performs
these steps:
1. Create an onion skin, encrypted to R_M's public onion key.
2. Send the onion skin in a relay EXTEND cell along
the circuit (see section 5).
3. When a relay EXTENDED cell is received, verify KH, and
calculate the shared keys. The circuit is now extended.
When an onion router receives an EXTEND relay cell, it sends a CREATE
cell to the next onion router, with the enclosed onion skin as its
payload. As special cases, if the extend cell includes a digest of
all zeroes, or asks to extend back to the relay that sent the extend
cell, the circuit will fail and be torn down. The initiating onion
router chooses some circID not yet used on the connection between the
two onion routers. (But see section 5.1. above, concerning choosing
circIDs based on lexicographic order of nicknames.)
When an onion router receives a CREATE cell, if it already has a
circuit on the given connection with the given circID, it drops the
cell. Otherwise, after receiving the CREATE cell, it completes the
DH handshake, and replies with a CREATED cell. Upon receiving a
CREATED cell, an onion router packs it payload into an EXTENDED relay
cell (see section 5), and sends that cell up the circuit. Upon
receiving the EXTENDED relay cell, the OP can retrieve g^y.
(As an optimization, OR implementations may delay processing onions
until a break in traffic allows time to do so without harming
network latency too greatly.)
5.3.1. Canonical connections
It is possible for an attacker to launch a man-in-the-middle attack
against a connection by telling OR Alice to extend to OR Bob at some
address X controlled by the attacker. The attacker cannot read the
encrypted traffic, but the attacker is now in a position to count all
bytes sent between Alice and Bob (assuming Alice was not already
connected to Bob.)
To prevent this, when an OR we gets an extend request, it SHOULD use an
existing OR connection if the ID matches, and ANY of the following
conditions hold:
- The IP matches the requested IP.
- The OR knows that the IP of the connection it's using is canonical
because it was listed in the NETINFO cell.
- The OR knows that the IP of the connection it's using is canonical
because it was listed in the server descriptor.
[This is not implemented in Tor 0.2.0.23-rc.]
5.4. Tearing down circuits
Circuits are torn down when an unrecoverable error occurs along
the circuit, or when all streams on a circuit are closed and the
circuit's intended lifetime is over. Circuits may be torn down
either completely or hop-by-hop.
To tear down a circuit completely, an OR or OP sends a DESTROY
cell to the adjacent nodes on that circuit, using the appropriate
direction's circID.
Upon receiving an outgoing DESTROY cell, an OR frees resources
associated with the corresponding circuit. If it's not the end of
the circuit, it sends a DESTROY cell for that circuit to the next OR
in the circuit. If the node is the end of the circuit, then it tears
down any associated edge connections (see section 6.1).
After a DESTROY cell has been processed, an OR ignores all data or
destroy cells for the corresponding circuit.
To tear down part of a circuit, the OP may send a RELAY_TRUNCATE cell
signaling a given OR (Stream ID zero). That OR sends a DESTROY
cell to the next node in the circuit, and replies to the OP with a
RELAY_TRUNCATED cell.
When an unrecoverable error occurs along one connection in a
circuit, the nodes on either side of the connection should, if they
are able, act as follows: the node closer to the OP should send a
RELAY_TRUNCATED cell towards the OP; the node farther from the OP
should send a DESTROY cell down the circuit.
The payload of a RELAY_TRUNCATED or DESTROY cell contains a single octet,
describing why the circuit is being closed or truncated. When sending a
TRUNCATED or DESTROY cell because of another TRUNCATED or DESTROY cell,
the error code should be propagated. The origin of a circuit always sets
this error code to 0, to avoid leaking its version.
The error codes are:
0 -- NONE (No reason given.)
1 -- PROTOCOL (Tor protocol violation.)
2 -- INTERNAL (Internal error.)
3 -- REQUESTED (A client sent a TRUNCATE command.)
4 -- HIBERNATING (Not currently operating; trying to save bandwidth.)
5 -- RESOURCELIMIT (Out of memory, sockets, or circuit IDs.)
6 -- CONNECTFAILED (Unable to reach server.)
7 -- OR_IDENTITY (Connected to server, but its OR identity was not
as expected.)
8 -- OR_CONN_CLOSED (The OR connection that was carrying this circuit
died.)
9 -- FINISHED (The circuit has expired for being dirty or old.)
10 -- TIMEOUT (Circuit construction took too long)
11 -- DESTROYED (The circuit was destroyed w/o client TRUNCATE)
12 -- NOSUCHSERVICE (Request for unknown hidden service)
5.5. Routing relay cells
When an OR receives a RELAY or RELAY_EARLY cell, it checks the cell's
circID and determines whether it has a corresponding circuit along that
connection. If not, the OR drops the cell.
Otherwise, if the OR is not at the OP edge of the circuit (that is,
either an 'exit node' or a non-edge node), it de/encrypts the payload
with the stream cipher, as follows:
'Forward' relay cell (same direction as CREATE):
Use Kf as key; decrypt.
'Back' relay cell (opposite direction from CREATE):
Use Kb as key; encrypt.
Note that in counter mode, decrypt and encrypt are the same operation.
The OR then decides whether it recognizes the relay cell, by
inspecting the payload as described in section 6.1 below. If the OR
recognizes the cell, it processes the contents of the relay cell.
Otherwise, it passes the decrypted relay cell along the circuit if
the circuit continues. If the OR at the end of the circuit
encounters an unrecognized relay cell, an error has occurred: the OR
sends a DESTROY cell to tear down the circuit.
When a relay cell arrives at an OP, the OP decrypts the payload
with the stream cipher as follows:
OP receives data cell:
For I=N...1,
Decrypt with Kb_I. If the payload is recognized (see
section 6..1), then stop and process the payload.
For more information, see section 6 below.
5.6. Handling relay_early cells
A RELAY_EARLY cell is designed to limit the length any circuit can reach.
When an OR receives a RELAY_EARLY cell, and the next node in the circuit
is speaking v2 of the link protocol or later, the OR relays the cell as a
RELAY_EARLY cell. Otherwise, it relays it as a RELAY cell.
If a node ever receives more than 8 RELAY_EARLY cells on a given
outbound circuit, it SHOULD close the circuit. (For historical reasons,
we don't limit the number of inbound RELAY_EARLY cells; they should
be harmless anyway because clients won't accept extend requests. See
bug 1038.)
When speaking v2 of the link protocol or later, clients MUST only send
EXTEND cells inside RELAY_EARLY cells. Clients SHOULD send the first ~8
RELAY cells that are not targeted at the first hop of any circuit as
RELAY_EARLY cells too, in order to partially conceal the circuit length.
[In a future version of Tor, servers will reject any EXTEND cell not
received in a RELAY_EARLY cell. See proposal 110.]
6. Application connections and stream management
6.1. Relay cells
Within a circuit, the OP and the exit node use the contents of
RELAY packets to tunnel end-to-end commands and TCP connections
("Streams") across circuits. End-to-end commands can be initiated
by either edge; streams are initiated by the OP.
The payload of each unencrypted RELAY cell consists of:
Relay command [1 byte]
'Recognized' [2 bytes]
StreamID [2 bytes]
Digest [4 bytes]
Length [2 bytes]
Data [CELL_LEN-14 bytes]
The relay commands are:
1 -- RELAY_BEGIN [forward]
2 -- RELAY_DATA [forward or backward]
3 -- RELAY_END [forward or backward]
4 -- RELAY_CONNECTED [backward]
5 -- RELAY_SENDME [forward or backward] [sometimes control]
6 -- RELAY_EXTEND [forward] [control]
7 -- RELAY_EXTENDED [backward] [control]
8 -- RELAY_TRUNCATE [forward] [control]
9 -- RELAY_TRUNCATED [backward] [control]
10 -- RELAY_DROP [forward or backward] [control]
11 -- RELAY_RESOLVE [forward]
12 -- RELAY_RESOLVED [backward]
13 -- RELAY_BEGIN_DIR [forward]
32..40 -- Used for hidden services; see rend-spec.txt.
Commands labelled as "forward" must only be sent by the originator
of the circuit. Commands labelled as "backward" must only be sent by
other nodes in the circuit back to the originator. Commands marked
as either can be sent either by the originator or other nodes.
The 'recognized' field in any unencrypted relay payload is always set
to zero; the 'digest' field is computed as the first four bytes of
the running digest of all the bytes that have been destined for
this hop of the circuit or originated from this hop of the circuit,
seeded from Df or Db respectively (obtained in section 5.2 above),
and including this RELAY cell's entire payload (taken with the digest
field set to zero).
When the 'recognized' field of a RELAY cell is zero, and the digest
is correct, the cell is considered "recognized" for the purposes of
decryption (see section 5.5 above).
(The digest does not include any bytes from relay cells that do
not start or end at this hop of the circuit. That is, it does not
include forwarded data. Therefore if 'recognized' is zero but the
digest does not match, the running digest at that node should
not be updated, and the cell should be forwarded on.)
All RELAY cells pertaining to the same tunneled stream have the
same stream ID. StreamIDs are chosen arbitrarily by the OP. RELAY
cells that affect the entire circuit rather than a particular
stream use a StreamID of zero -- they are marked in the table above
as "[control]" style cells. (Sendme cells are marked as "sometimes
control" because they can take include a StreamID or not depending
on their purpose -- see Section 7.)
The 'Length' field of a relay cell contains the number of bytes in
the relay payload which contain real payload data. The remainder of
the payload is padded with NUL bytes.
If the RELAY cell is recognized but the relay command is not
understood, the cell must be dropped and ignored. Its contents
still count with respect to the digests, though.
6.2. Opening streams and transferring data
To open a new anonymized TCP connection, the OP chooses an open
circuit to an exit that may be able to connect to the destination
address, selects an arbitrary StreamID not yet used on that circuit,
and constructs a RELAY_BEGIN cell with a payload encoding the address
and port of the destination host. The payload format is:
ADDRESS | ':' | PORT | [00]
where ADDRESS can be a DNS hostname, or an IPv4 address in
dotted-quad format, or an IPv6 address surrounded by square brackets;
and where PORT is a decimal integer between 1 and 65535, inclusive.
[What is the [00] for? -NM]
[It's so the payload is easy to parse out with string funcs -RD]
Upon receiving this cell, the exit node resolves the address as
necessary, and opens a new TCP connection to the target port. If the
address cannot be resolved, or a connection can't be established, the
exit node replies with a RELAY_END cell. (See 6.4 below.)
Otherwise, the exit node replies with a RELAY_CONNECTED cell, whose
payload is in one of the following formats:
The IPv4 address to which the connection was made [4 octets]
A number of seconds (TTL) for which the address may be cached [4 octets]
or
Four zero-valued octets [4 octets]
An address type (6) [1 octet]
The IPv6 address to which the connection was made [16 octets]
A number of seconds (TTL) for which the address may be cached [4 octets]
[XXXX No version of Tor currently generates the IPv6 format.]
[Tor servers before 0.1.2.0 set the TTL field to a fixed value. Later
versions set the TTL to the last value seen from a DNS server, and expire
their own cached entries after a fixed interval. This prevents certain
attacks.]
The OP waits for a RELAY_CONNECTED cell before sending any data.
Once a connection has been established, the OP and exit node
package stream data in RELAY_DATA cells, and upon receiving such
cells, echo their contents to the corresponding TCP stream.
RELAY_DATA cells sent to unrecognized streams are dropped.
Relay RELAY_DROP cells are long-range dummies; upon receiving such
a cell, the OR or OP must drop it.
6.2.1. Opening a directory stream
If a Tor server is a directory server, it should respond to a
RELAY_BEGIN_DIR cell as if it had received a BEGIN cell requesting a
connection to its directory port. RELAY_BEGIN_DIR cells ignore exit
policy, since the stream is local to the Tor process.
If the Tor server is not running a directory service, it should respond
with a REASON_NOTDIRECTORY RELAY_END cell.
Clients MUST generate an all-zero payload for RELAY_BEGIN_DIR cells,
and servers MUST ignore the payload.
[RELAY_BEGIN_DIR was not supported before Tor 0.1.2.2-alpha; clients
SHOULD NOT send it to routers running earlier versions of Tor.]
6.3. Closing streams
When an anonymized TCP connection is closed, or an edge node
encounters error on any stream, it sends a 'RELAY_END' cell along the
circuit (if possible) and closes the TCP connection immediately. If
an edge node receives a 'RELAY_END' cell for any stream, it closes
the TCP connection completely, and sends nothing more along the
circuit for that stream.
The payload of a RELAY_END cell begins with a single 'reason' byte to
describe why the stream is closing, plus optional data (depending on
the reason.) The values are:
1 -- REASON_MISC (catch-all for unlisted reasons)
2 -- REASON_RESOLVEFAILED (couldn't look up hostname)
3 -- REASON_CONNECTREFUSED (remote host refused connection) [*]
4 -- REASON_EXITPOLICY (OR refuses to connect to host or port)
5 -- REASON_DESTROY (Circuit is being destroyed)
6 -- REASON_DONE (Anonymized TCP connection was closed)
7 -- REASON_TIMEOUT (Connection timed out, or OR timed out
while connecting)
8 -- (unallocated) [**]
9 -- REASON_HIBERNATING (OR is temporarily hibernating)
10 -- REASON_INTERNAL (Internal error at the OR)
11 -- REASON_RESOURCELIMIT (OR has no resources to fulfill request)
12 -- REASON_CONNRESET (Connection was unexpectedly reset)
13 -- REASON_TORPROTOCOL (Sent when closing connection because of
Tor protocol violations.)
14 -- REASON_NOTDIRECTORY (Client sent RELAY_BEGIN_DIR to a
non-directory server.)
(With REASON_EXITPOLICY, the 4-byte IPv4 address or 16-byte IPv6 address
forms the optional data, along with a 4-byte TTL; no other reason
currently has extra data.)
OPs and ORs MUST accept reasons not on the above list, since future
versions of Tor may provide more fine-grained reasons.
Tors SHOULD NOT send any reason except REASON_MISC for a stream that they
have originated.
[*] Older versions of Tor also send this reason when connections are
reset.
[**] Due to a bug in versions of Tor through 0095, error reason 8 must
remain allocated until that version is obsolete.
--- [The rest of this section describes unimplemented functionality.]
Because TCP connections can be half-open, we follow an equivalent
to TCP's FIN/FIN-ACK/ACK protocol to close streams.
An exit connection can have a TCP stream in one of three states:
'OPEN', 'DONE_PACKAGING', and 'DONE_DELIVERING'. For the purposes
of modeling transitions, we treat 'CLOSED' as a fourth state,
although connections in this state are not, in fact, tracked by the
onion router.
A stream begins in the 'OPEN' state. Upon receiving a 'FIN' from
the corresponding TCP connection, the edge node sends a 'RELAY_FIN'
cell along the circuit and changes its state to 'DONE_PACKAGING'.
Upon receiving a 'RELAY_FIN' cell, an edge node sends a 'FIN' to
the corresponding TCP connection (e.g., by calling
shutdown(SHUT_WR)) and changing its state to 'DONE_DELIVERING'.
When a stream in already in 'DONE_DELIVERING' receives a 'FIN', it
also sends a 'RELAY_FIN' along the circuit, and changes its state
to 'CLOSED'. When a stream already in 'DONE_PACKAGING' receives a
'RELAY_FIN' cell, it sends a 'FIN' and changes its state to
'CLOSED'.
If an edge node encounters an error on any stream, it sends a
'RELAY_END' cell (if possible) and closes the stream immediately.
6.4. Remote hostname lookup
To find the address associated with a hostname, the OP sends a
RELAY_RESOLVE cell containing the hostname to be resolved with a nul
terminating byte. (For a reverse lookup, the OP sends a RELAY_RESOLVE
cell containing an in-addr.arpa address.) The OR replies with a
RELAY_RESOLVED cell containing a status byte, and any number of
answers. Each answer is of the form:
Type (1 octet)
Length (1 octet)
Value (variable-width)
TTL (4 octets)
"Length" is the length of the Value field.
"Type" is one of:
0x00 -- Hostname
0x04 -- IPv4 address
0x06 -- IPv6 address
0xF0 -- Error, transient
0xF1 -- Error, nontransient
If any answer has a type of 'Error', then no other answer may be given.
The RELAY_RESOLVE cell must use a nonzero, distinct streamID; the
corresponding RELAY_RESOLVED cell must use the same streamID. No stream
is actually created by the OR when resolving the name.
7. Flow control
7.1. Link throttling
Each client or relay should do appropriate bandwidth throttling to
keep its user happy.
Communicants rely on TCP's default flow control to push back when they
stop reading.
The mainline Tor implementation uses token buckets (one for reads,
one for writes) for the rate limiting.
Since 0.2.0.x, Tor has let the user specify an additional pair of
token buckets for "relayed" traffic, so people can deploy a Tor relay
with strict rate limiting, but also use the same Tor as a client. To
avoid partitioning concerns we combine both classes of traffic over a
given OR connection, and keep track of the last time we read or wrote
a high-priority (non-relayed) cell. If it's been less than N seconds
(currently N=30), we give the whole connection high priority, else we
give the whole connection low priority. We also give low priority
to reads and writes for connections that are serving directory
information. See proposal 111 for details.
7.2. Link padding
Link padding can be created by sending PADDING cells along the
connection; relay cells of type "DROP" can be used for long-range
padding.
Currently nodes are not required to do any sort of link padding or
dummy traffic. Because strong attacks exist even with link padding,
and because link padding greatly increases the bandwidth requirements
for running a node, we plan to leave out link padding until this
tradeoff is better understood.
7.3. Circuit-level flow control
To control a circuit's bandwidth usage, each OR keeps track of two
'windows', consisting of how many RELAY_DATA cells it is allowed to
originate (package for transmission), and how many RELAY_DATA cells
it is willing to consume (receive for local streams). These limits
do not apply to cells that the OR receives from one host and relays
to another.
Each 'window' value is initially set to 1000 data cells
in each direction (cells that are not data cells do not affect
the window). When an OR is willing to deliver more cells, it sends a
RELAY_SENDME cell towards the OP, with Stream ID zero. When an OR
receives a RELAY_SENDME cell with stream ID zero, it increments its
packaging window.
Each of these cells increments the corresponding window by 100.
The OP behaves identically, except that it must track a packaging
window and a delivery window for every OR in the circuit.
An OR or OP sends cells to increment its delivery window when the
corresponding window value falls under some threshold (900).
If a packaging window reaches 0, the OR or OP stops reading from
TCP connections for all streams on the corresponding circuit, and
sends no more RELAY_DATA cells until receiving a RELAY_SENDME cell.
[this stuff is badly worded; copy in the tor-design section -RD]
7.4. Stream-level flow control
Edge nodes use RELAY_SENDME cells to implement end-to-end flow
control for individual connections across circuits. Similarly to
circuit-level flow control, edge nodes begin with a window of cells
(500) per stream, and increment the window by a fixed value (50)
upon receiving a RELAY_SENDME cell. Edge nodes initiate RELAY_SENDME
cells when both a) the window is <= 450, and b) there are less than
ten cell payloads remaining to be flushed at that edge.
A.1. Differences between spec and implementation
- The current specification requires all ORs to have IPv4 addresses, but
allows servers to exit and resolve to IPv6 addresses, and to declare IPv6
addresses in their exit policies. The current codebase has no IPv6
support at all.

View File

@ -1,45 +0,0 @@
$Id$
HOW TOR VERSION NUMBERS WORK
1. The Old Way
Before 0.1.0, versions were of the format:
MAJOR.MINOR.MICRO(status(PATCHLEVEL))?(-cvs)?
where MAJOR, MINOR, MICRO, and PATCHLEVEL are numbers, status is one
of "pre" (for an alpha release), "rc" (for a release candidate), or
"." for a release. As a special case, "a.b.c" was equivalent to
"a.b.c.0". We compare the elements in order (major, minor, micro,
status, patchlevel, cvs), with "cvs" preceding non-cvs.
We would start each development branch with a final version in mind:
say, "0.0.8". Our first pre-release would be "0.0.8pre1", followed by
(for example) "0.0.8pre2-cvs", "0.0.8pre2", "0.0.8pre3-cvs",
"0.0.8rc1", "0.0.8rc2-cvs", and "0.0.8rc2". Finally, we'd release
0.0.8. The stable CVS branch would then be versioned "0.0.8.1-cvs",
and any eventual bugfix release would be "0.0.8.1".
2. The New Way
After 0.1.0, versions are of the format:
MAJOR.MINOR.MICRO(.PATCHLEVEL)(-status_tag)
The stuff in parentheses is optional. As before, MAJOR, MINOR, MICRO,
and PATCHLEVEL are numbers, with an absent number equivalent to 0.
All versions should be distinguishable purely by those four
numbers. The status tag is purely informational, and lets you know how
stable we think the release is: "alpha" is pretty unstable; "rc" is a
release candidate; and no tag at all means that we have a final
release. If the tag ends with "-cvs" or "-dev", you're looking at a
development snapshot that came after a given release. If we *do*
encounter two versions that differ only by status tag, we compare them
lexically.
Now, we start each development branch with (say) 0.1.1.1-alpha. The
patchlevel increments consistently as the status tag changes, for
example, as in: 0.1.1.2-alpha, 0.1.1.3-alpha, 0.1.1.4-rc, 0.1.1.5-rc.
Eventually, we release 0.1.1.6. The next patch release is 0.1.1.7.
Between these releases, CVS is versioned with a -cvs tag: after
0.1.1.1-alpha comes 0.1.1.1-alpha-cvs, and so on. But starting with
0.1.2.1-alpha-dev, we switched to SVN and started using the "-dev"
suffix instead of the "-cvs" suffix.