Network Working Group                                            R. Peon
Internet-Draft                                            Facebook, Inc.
Intended status: Informational                                 J. Pinner
Expires: July 21, 2018                                        Lyft, Inc.
                                                        January 17, 2018


                     Proposal for QUIC Abstractions
                 draft-peon-pinner-quic-abstractions-03

Abstract

   Proposes abstraction layers for QUIC and proposes recommendations for
   draft v1.

Note to Readers

   Discussion of this draft takes place on the QUIC working group
   mailing list (quic@ietf.org), which is archived at
   https://mailarchive.ietf.org/arch/search/?email_list=quic [1].

   Working Group information can be found at https://github.com/quicwg
   [2]; source code and issues list for this draft can be found at
   https://github.com/quicwg/base-drafts/labels/-http [3].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on July 21, 2018.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.


Peon & Pinner             Expires July 21, 2018                 [Page 1]

Internet-Draft                     I-D                      January 2018


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

1.  Introduction

   This document proposes 5 layers of abstraction for QUIC: QUIC,
   Connections, Streams, H3, and HTTP.

2.  Abstractions

2.1.  QUIC provides:

   o  Packets

   o  MTU discovery (packet sizing)

   o  Version negotiation

   o  Packet loss detection

   o  A cryptographic context enabling data encryption within a packet

   o  Zero-RTT connection establishment with limited data payload

   o  One-RTT connection establishment

2.2.  QUIC Connections provide:

   o  Identification of the connection including a Connection ID in
      addition to the 5-tuple

   o  Alternate connection IDs/connID 'renaming' without requiring
      connection re-establishment

   o  Multiplexed, non HoL blocking, streams

   o  Congestion control on a per-path basis

   o  Data (not packet) retransmission

   o  Flow control on a per-connection basis


Peon & Pinner             Expires July 21, 2018                 [Page 2]

Internet-Draft                     I-D                      January 2018


   o  Mechanisms to prove liveness, measure-RTT

2.3.  QUIC Streams provide:

   o  Flow control on a per-stream basis

   o  Ordered but not necessarily in-order bytestreams

   o  Grouping: a statement that streams should be delivered to the same
      endpoint through proxies

   o  Data frames

   o  Support for non-data frames

2.4.  'H3' provides:

   o  Flow-controlled headers frames on streams

   o  Compression for headers data in a robust way which trades off HoL
      blocking and compression efficiency

2.5.  HTTP on QUIC:

   o  Maps requests to streams using H3

   o  Defines restrictions on header/data frame sequencing in line with
      HTTP semantics

2.6.  APIs above these layers:

   APIs above these layers will then determine how and when data is
   presented to the application, including decisions about whether to
   present ordered data as in-order (i.e. socket-like), or to present it
   as if a file (ordered but not necessarily in-order), and when to
   request retransmissions or discards ('reliable' or partially
   reliable).

   Note that HTTP does not imply reliable.  HTTP implies request-
   response.

3.  Deeper explanations

3.1.  QUIC (Packets):

   In order to establish connections, QUIC sends packets before QUIC
   connections can be confirmed to be established.  The QUIC-layer
   abstraction thus includes all parts necessary to operate on a per-


Peon & Pinner             Expires July 21, 2018                 [Page 3]

Internet-Draft                     I-D                      January 2018


   packet basis without already being in the context of a QUIC
   connection.

   QUIC packets are UDP datagrams.  These may or may not have a 1:1
   correspondence to IP packets based on path MTU estimation and IP
   fragmentation.

   Payload data is AEAD Encrypted.  Minimal routing data is unencrypted:
   - In particular this means that acks (and thus congestion control and
   loss recovery) are end-to-end instead of hop-by-hop.

   Packets are NOT reliably delivered or retransmitted.  Some of the
   application payload carried by a packet MAY be retransmitted but that
   is not required.

   Note that this does not preclude the L2 layer from doing its own
   retransmissions; duplicate packets may be received, even when not
   sent.

   All other intermediaries must "participate" in the QUIC connection -
   they must be "terminating" intermediaries and have the encryption
   keys necessary to terminate connections.  Tunneling L5-over-L5 still
   requires an initial connection to be terminated at the proxy.

   All packets before a the 1-RTT keys are established for a connection
   must be versioned.  The version number location in these packets must
   be static across all versions of the protocol.

3.2.  QUIC Connections

   QUIC connections may be created between two endpoints communicating
   over UDP.  A QUIC connection consists of a shared cryptographic
   context and set of multiplexed "streams".  Connections are created
   through a combined cryptographic and transport handshake that is
   capable of providing 0-RTT connection establishment when
   communicating with a known peer.  Finally, in order to be resilient
   to NAT re-bindings and changes in network topology, connections may
   persist across changes of the client or server IP and port addresses.

   QUIC connections are identified by a set of 64-bit unsigned numbers,
   one chosen randomly by the client and one or more chosen by the
   server, in addition to the "5-tuple" used to identify the underlying
   UDP connection.  The QUIC connection identifiers allow for the client
   and server IP address or port number (or the connection identifier
   itself) to change throughout the lifetime of the connection, while
   still allowing datagrams to be correctly routed between the two
   endpoints.


Peon & Pinner             Expires July 21, 2018                 [Page 4]

Internet-Draft                     I-D                      January 2018


3.2.1.  0-RTT Connection Establishment

   TLS 1.3 enables 0-RTT, and QUIC endpoints should support it.

   Since packets are not required to arrive in order (or arrive at all)
   an endpoint may receive 0-RTT data for a connection that has yet to
   be established.  Implementations should make appropriate tradeoffs
   between buffering this data as to not render 0-RTT connection
   establishment infeasible in practice.

   An endpoint can always "pretend" it does not have decryption keys for
   0-RTT content.  Servers can always force a fallback to a 1-RTT
   establishment handshake.  The existence of this fallback is important
   since it is the only mechanism for a server to do address validation
   (and thus protect itself from some classes of denial-of-service
   attacks).

3.2.2.  L4 routing and Connection migration: Requires Working Group
        decisions

   While the protocol allows for both connection migration across
   changes of the endpoint's underlying network address and for changes
   of the connection identifiers, it is unclear (under the current
   specification) that connection migration can be implemented in a
   scalable, interoperable manner.

   For data within a QUIC connection to be of utility, packets intended
   to be associated with that connection should flow to a specific
   endpoint.

   For large deployments, there are likely to be a number of L4 load
   balancers deployed to ensure that this happens while utilizing L7
   endpoints effectively.  A set of TCP load balancers in a deployment,
   for instance, would forward packets with the same source IP address
   and port number to a sole host regardless of which load balancer
   received the packet.

   A QUIC connection is determined by both the network address and a set
   of connection identifiers.  As a result, L4 load balancing which uses
   only IP address and port number is insufficient to ensure that
   packets associated with a QUIC connection actually arrive at the
   correct endpoint.  A reasonable solution to this problem might be to
   hash on the connection ID instead of hashing on the network address;
   however, if multiple identifiers are used simultaneously throughout
   the lifetime of the connection, this is insufficient given all
   identifiers would have to hash to the same host.


Peon & Pinner             Expires July 21, 2018                 [Page 5]

Internet-Draft                     I-D                      January 2018


   There are several strategies that can be employed to solve the L4 LB
   problem with alternate connection-IDs.  The simplest and most
   scalable approach requires shared knowledge between the L4 LB and the
   endpoint of the connection, specifically an encryption key and/or
   cryptographic algorithm.  This allows the L7 endpoint to compute a
   new connection ID which the L4 LB could successfully deliver to the
   correct L7.  Other means of making this work (global NAT tables in a
   cluster, distributed NAT tables) require additional hops within
   datacenters and make successful implementations more difficult while
   also likely decreasing performance.

   In order to associate multiple alternative connection IDs with the
   same connection, we must expose some data to the L4 load balancer to
   allow it to correctly map IDs to the expected L7 host.  This data
   could take the form of some structure embedded in the connection
   identifier and agreed upon between all intermediaries on the path,
   for example choosing some number of bits to be used for routing that
   must be identical between all identifiers for a given connection.
   This is most certainly a potential avenue for ossification.

   The use of multiple connection IDs to identify a connection is
   provided as a mechanism to prevent a passive observer from
   correlating activity for the same connection across multiple paths
   during connection migration.  It is worth noting that while a client
   may want to use a new connection identifier, it requires the server
   to issue new identifiers, and no mechanism is provided in the
   specification for the client to request them or require the server to
   issue them.  In addition, multi-path support will arguably do a more
   effective job of making packet inspection difficult than having
   multiple connection IDs would, for those connections where multiple
   paths are available.  For connections where multiple paths are not
   available, the client has the option to open multiple connections to
   achieve the same effect.

   W.I.P.  (The other argument for multiple connection IDs is not packet
   inspection but instead privacy, i.e. link-ability between IP address.
   If multi-path requires the ability to share connection state between
   multiple paths, could we extend this to the application layer to
   share state across multiple connections each with its own connection
   ID?  If so, then there is no privacy concern since the client can
   instead open one connection per path.) - Recommendation: defer
   alternative connection IDs to the v2 specification.  Even excluding
   the association of multiple server selected connection IDs to a
   single connection, the connection still is identified by two
   identifiers, the one randomly selected by the client and the ID
   chosen by the server.  Without providing mechanism for intermediaries
   to route the both identifiers to the same endpoint, load balancers


Peon & Pinner             Expires July 21, 2018                 [Page 6]

Internet-Draft                     I-D                      January 2018


   must instead perform some form of address translation in order to
   associate both identifiers with the same host.

3.2.3.  Multi-Path

   Connection migration across network addresses requires the connection
   to (briefly) exist simultaneously across multiple paths and as such
   should instead be considered in the context of broader multi-path
   support.

3.3.  Streams

   A stream is an ordered sequence of bytes.  A QUIC connection contains
   a multiplexed set of streams that are grouped into four different
   namespaces based upon two properties: if the stream is client or
   server initiated; and if the stream is unidirectional or
   bidirectional.  Streams are flow controlled, both individually and in
   aggregate across the connection.

   Questions/recommendations: - Streams really have whatever reliability
   is used by the two endpoints of the connection - intermediaries must
   assume unreliability and we should verify that congestion control and
   flow control are not dependent upon any reliability assumption - Now
   that we have 4 stream types (unidirection and endpoint-originated) we
   should not make any attempt to provide a "mapping" to TCP or Socket -
   STREAMS now need to be their own concept independent of prior art so
   let's make this explicit - Stream closure is unreliable in QUIC -
   when either endpoint closes a stream data is not required to be
   flushed.  This also leads to connection-level flow control
   requirements (i.e. don't block until you get the data to increase it
   or you are going to deadlock)

   Streams are neither required (at the QUIC layer) to be re-transmitted
   nor to be transmitted in-order.  They provide no guarantee that data
   will be transmitted in their entirety.

   Flow control windows are increased when a receiver decides that it is
   willing to accept (and possible discard) bytes from a stream up to a
   given offset.  It is neither a signal that the receiver has received
   all bytes below the flow control window nor is a receiver obligated
   to treat its flow control window as a contiguous number of bytes
   within the stream.

   Because streams are flow controlled individually and in their
   entirety, and because there is no QUIC-layer requirement that stream
   data be transmitted in its entirety, there is the possibility at the
   application that connection deadlock may occur if the application
   only increases the flow control window based on receiving data


Peon & Pinner             Expires July 21, 2018                 [Page 7]

Internet-Draft                     I-D                      January 2018


   encoded in streams.  In particular: - any application that deals with
   out-of-order data within a stream must carefully do flow control at
   the QUIC layer

3.3.1.  Grouping of Streams

   As this hasn't been discussed within the working group, this likely
   needs to be deferred to v2.

   Streams may be placed within groups (by default there is only one
   group), in which case a different frame-type is used for data and
   headers within that stream.  This is why grouping is at the stream
   layer and not below.

   Groups signal to the L7 routing fabric that the data on multiple
   streams should be routed to the same (L7) endpoint.

   Video is a good example usecase, though pubsub and similar end up
   with the same problemset.  With video, there are various components
   of the video stream which can be interpreted separately.  An example
   would be I-frames and P-frames.  I frames are essentially JPGs and
   encode an image.  P-frames encode a difference from some prior state
   (or to some other state, depending on one's perspective).  If the
   application presents these at the same priority within one stream, it
   would be substantially suboptimal.  However, without groups, if the
   application presents these as different streams, they may not be
   routed to the same L7 endpoint, which would be essential for correct
   understanding of the data given the inherently stateful nature of
   video codecs (and most any compression).  Breaking up the video into
   multiple items allows video to be transported and cached using HTTP
   semantics reasonably.

   Pub-sub, as mentioned before works far better when groups exist: A
   subscription is established, and any number of responses may flow
   back to the subscriber; If the subscriber wishes to update the
   subscription, it sends a new request with the same group, ensuring
   the subscription state can be correctly managed.

4.  References

4.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.


Peon & Pinner             Expires July 21, 2018                 [Page 8]

Internet-Draft                     I-D                      January 2018


4.2.  URIs

   [1] https://mailarchive.ietf.org/arch/search/?email_list=quic

   [2] https://github.com/quicwg

   [3] https://github.com/quicwg/base-drafts/labels/-http

Authors' Addresses

   Roberto Peon
   Facebook, Inc.

   Email: fenix@fb.com


   Jeff Pinner
   Lyft, Inc.

   Email: jpinner@lyft.com


Peon & Pinner             Expires July 21, 2018                 [Page 9]