```
```Denmark

```
```
janus@steinwurf.com

```
```

```
```
Code On Network Coding LLC
Cambridge
```
```USA

```
```
xshi@alum.mit.edu

```
```

```
```
Code On Network Coding LLC
Cambridge
```
```USA

```
```
muriel.medard@codeontechnologies.com

```
```

```
```
Inmarsat PLC
London
```
```United Kingdom

```
```
Vince.Chook@inmarsat.com

```
```

```
```
IRTF
NWCRG
This document describes the use of Random Linear Network Coding
(RLNC) schemes for erasure correction in data transfer, with an emphasis
on RLNC encoded symbol representations in data packets. Both block and
sliding window RLNC code implementations are described. By providing
erasure correction using randomly generated repair symbols, such
RLNC-based schemes offer advantages in accommodating varying frame sizes
and dynamically changing connections, reducing the need for feedback,
and lowering the amount of state information needed at the sender and
receiver.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 .

```
```
Network Coding is an emerging coding discipline that jointly improves
network reliability and efficiency. In general communication networks,
source coding is performed as a compression mechanism to reduce data
redundancy and to reduce resources necessary for data transportation
over the network. Channel coding, on the other hand, introduces
redundancy for data transmission reliability over lossy channels.
Network coding adds another layer of coding in-between these two. Random
Linear Network Coding (RLNC) is an efficient network coding approach
that enables network nodes to generate independently and randomly linear
mappings of input to output data symbols over a finite field.
This document describes the use of RLNC schemes, with an emphasis on
RLNC encoded symbol representations in data packets. Specifically, a
block RLNC scheme and a sliding window RLNC scheme are discussed in the
context of varying data frame sizes.
Unlike conventional communication systems based on the
"store-and-forward" principle, RLNC allows network nodes to
independently and randomly combine input source data into coded
symbols over a finite field . Such an approach
enables receivers to decode and recover the original source data as
long as enough linearly independent coded symbols, with sufficient
degrees of freedom, are received. At the sender, RLNC can introduce
redundancy into data streams in a granular way. At the receiver, RLNC
enables progressive decoding and reduces feedback necessary for
retransmission. Collectively, RLNC provides network utilization and
throughput improvements, high degrees of robustness and
decentralization, reduces transmission latency, and simplifies
feedback and state management.
To encode using RLNC, original source data are divided into symbols
of a given size and linearly combined. Each symbol is multiplied with
a scalar coding coefficient drawn randomly from a finite field, and
the resulting coded symbol is of the same size as the original data
symbols.
Thus, each RLNC encoding operation can be viewed as creating a
linear equation in the data symbols, where the random scalar coding
coefficients can be grouped and viewed as a coding vector. Similarly,
the overall encoding process where multiple coded symbols are
generated can be viewed as a system of linear equations with randomly
generated coefficients. Any number of coded symbols can be generated
from a set of data symbols, similarly to expandable forward error
correction codes specified in and . Coding vectors must be implicitly or explicitly
transmitted from the sender to the receiver for successful decoding of
the original data. For example, sending a seed for generating
pseudo-random coding coefficients can be considered as an implicit
transmission of the coding vectors. In addition, while coding vectors
are often transmitted together with coded data in the same data
packet, it is also possible to separate the transmission of coding
coefficient vectors from the coded data, if desired.
To reconstruct the original data from coded symbols, a network node
collects a finite but sufficient number of degrees of freedom for
solving the system of linear equations. This is beneficial over
conventional approaches as the network node is no longer required to
gather each individual data symbol. In general, the network node needs
to collect slightly more independent coded symbols than there are
original data symbols, where the slight overhead arises because coding
coefficients are drawn at random, with a non-zero probability that a
coding vector is linearly dependent on another coding vector, and that
one coded symbol is linearly dependent on another coded symbol. This
overhead can be made arbitrarily small, provided that the finite field
used is sufficiently large.
A unique advantage of RLNC is the ability to re-encode or "recode"
without first decoding. Recoding can be performed jointly on existing
coded symbols, partially decoded symbols, or uncoded systematic data
symbols. This feature allows intermediate network nodes to re-encode
and generate new linear combinations on the fly, thus increasing the
likelihood of innovative transmissions to the receiver. Recoded
symbols and recoded coefficient vectors have the same size as before
and are indistinguishable from the original coded symbols and
coefficient vectors.
In practical implementations of RLNC, the original source data are
often divided into multiple coding blocks or "generations" where
coding is performed over each individual generation to lower the
computational complexity of the encoding and decoding operations.
Alternatively, a convolutional approach can be used, where coding is
applied to overlapping spans of data symbols, possibly of different
spanning widths, viewed as a sliding coding window. In
generation-based RLNC, not all symbols within a single generation need
to be present for coding to start. Similarly, a sliding window can be
variable-sized, with more data symbols added to the coding window as
they arrive. Thus, innovative coded symbols can be generated as data
symbols arrive. This "on-the-fly" coding technique reduces coding
delays at transmit buffers, and together with rateless encoding
operations, enables the sender to start emitting coded packets as soon
as data is received from an upper layer in the protocol stack,
adapting to fluctuating incoming traffic flows. Injecting coded
symbols based on a dynamic transmission window also breaks the
decoding delay lower bound imposed by traditional block codes and is
well suited for delay-sensitive applications and streaming
protocols.
When coded symbols are transmitted through a communication network,
erasures may occur, depending on channel conditions and interactions
with underlying transport protocols. RLNC can efficiently repair such
erasures, potentially improving protocol response to erasure events to
ensure reliability and throughput over the communication network. For
example, in a point-to-point connection, RLNC can proactively
compensate for packet erasures by generating Forward Erasure
Correcting (FEC) redundancy, especially when a packet erasure
probability can be estimated. As any number of coded symbols may be
generated from a set of data symbols, RLNC is naturally suited for
adapting to network conditions by adjusting redundancy dynamically to
fit the level of erasures, and by updating coding parameters during a
session. Alternatively, packet erasures may be repaired reactively by
using feedback requests from the receiver to the sender, or by a
combination of FEC and retransmission. RLNC simplifies state and
feedback management and coordination as only a desired number of
degrees of freedom needs to be communicated from the receiver to the
sender, instead of indications of the exact packets to be
retransmitted. The need to exchange packet arrival state information
is therefore greatly reduced in feedback operations.
The advantages of RLNC in state and feedback management are
apparent in a multicast setting. In this one-to-many setup,
uncorrelated losses may occur, and any retransmitted data symbol is
likely to benefit only a single receiver. By comparison, a transmitted
RLNC coded symbol is likely to carry a new degree of freedom that may
correct different errors at different receivers simultaneously.
Similarly, RLNC offers advantages in coordinating multiple paths,
multiple sources, mesh networking and cooperation, and peer-to-peer
operations.
A more detailed introduction to network coding including RLNC is
provided in the books and .
This section describes a generation-based RLNC scheme.
In generation-based RLNC, input data as received from an upper
layer in the protocol stack is segmented into equal-sized blocks,
denoted as generations, and each generation is further segmented into
equal-sized data symbols for encoding, with paddings added when
necessary. Encoding and decoding are performed over each individual
generation. Figure 1 below provides an illustrative example where each
generation includes four data symbols, and a systematic RLNC code is
generated with rate 2/3.
Symbols can be of any size, although symbol sizes typically depend
on application or system specifications. In scenarios with highly
varying input data frame sizes, a small symbol size may be desirable
for achieving flexibility and transmission efficiency, with one or
more symbols concatenated to form a coded data packet. In this
context, existing basic FEC schemes do not
support the use of a single header for multiple coded symbols, whereas
the symbol representation design as described herein provides this
option.
For any protocol that utilizes generation-based RLNC, a setup
process is necessary for establishing a connection and conveying
coding parameters from the sender to the receiver. Such coding
parameters can include one or more of field size, code specifications,
index of the current generation being encoded at the sender,
generation size, code rate, and desired feedback frequency or
probability. Some coding parameters are updated dynamically during the
transmission process, reflecting the coding operations over sequences
of generations, and adjusting to channel conditions and resource
availability. For example, an outer header can be added to the symbol
representation specified in this document to indicate the current
generation encoded within the symbol representation. Such information
is essential for proper recoding and decoding operations, but the
exact design of the outer header is outside the scope of the current
document. At the minimum, an outer header should indicate the current
generation, generation size, symbol size, and field size. Section 2
provides a detailed discussion of coding parameter considerations.
This section describes a sliding-window RLNC scheme. Sliding window
RLNC was first described in
In sliding-window RLNC, input data as received from an upper layer
in the protocol stack is segmented into equal-sized data symbols for
encoding. In some implementations, the sliding encoding window can
expand in size as new data packets arrive, until it is closed off by
an explicit instruction, such as a feedback message that re-initiates
the encoding window. In some implementations, the size of the sliding
encoding window is upper bounded by some parameter, fixed or
dynamically determined by online behavior such as packet loss or
congestion estimation. Figure 3 below provides an example of a
systematic finite sliding window code with rate 2/3.
For any protocol that utilizes sliding-window RLNC, a setup process
is necessary for establishing a connection and conveying coding
parameters from the sender to the receiver. Such coding parameters can
include one or more of field size, code specifications, symbol
ordering, encoding window position, encoding window size, code rate,
and desired feedback frequency or probability. Some coding parameters
can also be updated dynamically during the transmission process in
accordance to channel conditions and resource availability. For
example, an outer header can be added to the symbol representation
specified in this document to indicate an encoding window position, as
a starting index for current data symbols being encoded within the
symbol representation. Again, such information is essential for proper
recoding and decoding operations, but the exact design of the outer
header is outside the scope of the current document. At the minimum,
an outer header should indicate the current encoding window position,
encoding window size, symbol size, and field size. Section 2 provides
a detailed discussion of coding parameter considerations.
Once a connection is established, RLNC coded packets comprising one
or more coded symbols are transmitted from the sender to the receiver.
The sender can transmit in either a systematic or coded fashion, with
or without receiver feedback. In progressive decoding of RLNC coded
symbols, the notion of “seen” packets can be utilized to
provide degree of freedom feedbacks. Seen packets are those packet
that have contributed to a received coded packet, where generally the
oldest such packet that has yet to be declared seen is declared as
seen .
Recoding is the process where coded or partially decoded symbols
are re-encoded without first being fully decoded. To recode, both
coded symbols and corresponding coding coefficient vectors are
linearly combined, respectively, with new randomly generated recoding
coefficients. Recoded symbols and recoded coefficient vectors
generally have the same size as before and are indistinguishable from
the original coded symbols and coding coefficient vectors. Recoding is
typically performed at intermediate network nodes, in either an
intra-session or an inter-session fashion. Intra-session coding refers
to coding between packets of the same flow or session, while
inter-session coding refers to combination of packets from different
flows or sessions in a network.
As recoding requires the same operations on the coding coefficient
vectors as applied to the coded symbols, coding coefficients must be
updated by recoding coefficients. This is generally achieved by having
coding coefficient accessible at recoding nodes so that they may be
updated using the recoding coefficients. Thus, either the original
coding coefficients or reversible representations of the coding
coefficients need to be communicated from upstream nodes to the
recoding nodes.
For any protocol that utilizes generation-based or sliding-window
RLNC, several coding parameters must be communicated from the sender to
the receiver as part of the protocol design. Without elaborating on all
such parameters, this section examines those essential for symbol
representation design, including field size, symbol size, maximum number
of symbols, and maximum generation or window size.
As RLNC is performed over a finite field, field size determines the
complexity of the required mathematical computations. Larger field sizes
translate to higher probability of successful decoding, as randomly
generated coding coefficient vectors are more likely to be independent
from each other. However, larger field sizes may also result in higher
computational complexity, leading to longer decoding latency, higher
energy usage, and other hardware requirements for both the encoder and
the decoder. A finite field size of 2 or the binary Finite Field FF(2)
should always be supported since addition, multiplication, and division
over FF(2) are equivalent to elementary XOR, AND, and IDENTITY
operations respectively. It is also desirable to support a field size of
2^8, corresponding to a single byte, and where operations are performed
over the binary extension field FF(2^8) with polynomial
x^8+x^4+x^3+x^2+1.
The choice of symbol size typically depends on the application or
system specification. For example, a symbol size may be chosen based on
the size of a maximum transmission unit (MTU) so that datagrams are not
fragmented as they traverse a network, while also ensuring no symbol
bits are unnecessarily wasted. The current symbol representation design
is flexible and can accommodate any symbol size in bytes. For example,
an IP packet is typically in the range between 500 and 1500 bytes, while
a much smaller datagram having a size of 90 bytes may be used by
satellite communication networks. The current symbol representation can
be configured to support either or both cases in different
implementations.
The generation size or coding window size is a tradeoff between the
strength of the code and the computational complexity of performing the
coding operations. With a larger generation/window size, fewer
generations or coding windows are needed to enclose a data message of a
given size, thus reducing protocol overhead for coordinating individual
generations or coding windows. In addition, a larger generation/window
size increases the likelihood that a received coded symbol is innovative
with respect to previously received symbols, thus amortizing
retransmission or FEC overheads. Conversely, when coding coefficients
are attached, larger generation/window size also lead to larger
overheads per packet. The generation/window size to be used can be
signaled between the sender and receiver when the connection is first
established.
Lastly, to successfully decode RLNC coded symbols, sufficient degrees
of freedom are required at the decoder. The maximum number of redundant
symbols that can be transmitted is therefore limited by the number of
linearly independent coding coefficient vectors that can be supported by
the system. For example, if coding vectors are constructed using a
pseudo-random generator, the maximum number of redundant symbols that
can be transmitted is limited by the number of available generator
states.
This section provides a symbol representation design for implementing
RLNC-based erasure correction schemes. In this symbol representation
design, multiple symbols are concatenated and associated with a single
symbol representation header.
The symbol representation design is provided for constructing a data
payload portion of a data packet for a protocol that utilizes a
generation-based or sliding-window RLNC, where recoding can be used at
intermediate nodes. A data packet data payload comprises one or more
symbol representations. Each symbol representation in turn comprises one
or more symbols that can be systematic, coded or recoded. The use of
this symbol representation design is not limited by transmission
schemes. It can be applied to unicast, multiple-unicast, multicast,
multi-path, and multi-source settings and the like.
Coding coefficient vectors must be implicitly or explicitly
transmitted from the sender to the receiver, generally along with the
coded data for successful decoding of the original data. One option is
to attach each coding coefficient vector to the corresponding coded
symbol as a header, thus also enabling recoding at intermediate nodes.
Another option is to attach the current state of a pseudo-random
generator for generating the coding coefficient vector, to reduce the
size of the header. Adding a header to each symbol may result in a high
overhead when the symbol size is small or when generation or sliding
window size is large. Adding a joint header to the beginning of each
generation may also cause synchronization to be re-initiated only at the
beginning of each generation instead of every symbol. In what follows, a
symbol representation is provided that allow for both of these options
such that both a general representation with coding coefficients and a
compact representation with a seed for generating the coding
coefficients can be used, in order to reduce the header overhead.
This section specifies a symbol representation that enables both a
general form with coding vectors attached, and a compact form where a
seed is attached instead for the first symbol in the symbol
representation, and where subsequent coding vectors are deduced from
the first one. Different maximum generation and window size are
supported for RLNC encoding, recoding, and decoding.
To encode over a set of data symbols, a coding vector as described
in Section 1.1 is first generated, comprising a number of finite field
elements as specified by a generation/window size variable. In the
case of systematic codes, systematic symbols correspond to unit coding
vectors.
Figure 4 illustrates the general symbol representation design. Four
header fields precede the symbol data: TYPE flag, SYMBOLS, ENCODER
RANK, and SEED or CODING COEFFICIENTS. The TYPE Flag indicates if the
symbol is systematic, coded, or recoded. SYMBOLS indicates the number
of symbols in the “Symbol(s) Data” field. ENCODER RANK
represents the current rank of the encoder, which is the number of
symbols being linearly combined. SEED is used to generate the coding
coefficient vector(s) using a pseudo-random number generator, for a
compact form of the symbol representation. CODING COEFFICIENTS field
is a list of SYMBOLS number of coding vectors used to generate the
ensuing SYMBOL(S) DATA.
The TYPE Flag indicates if the symbol is systematic, coded, or
recoded, and has the following properties:
2 bits long.
If the TYPE flag is ‘1’, all symbols included in
this symbol representation are systematic or uncoded, with symbol
index starting from ENCODER RANK. This option allows for efficient
representation of systematic symbols.
If the TYPE is '2', all symbols included in this symbol
representation are coded, with coding vectors generated using the
included SEED and the ENCODER RANK. This option allows for compact
and efficient representation of coded symbols, which may also
subsequently be recoded.
If the TYPE is '3', all symbols included in this symbol
representation are either coded or recoded, with the coding
coefficients included constituting ENCODER RANK coefficients
each.

SYMBOLS indicates the number of symbols in the “Symbol(s)
Data” field, and has the following properties:
4 bits long. A maximum number of 15 symbols are concatenated
within each symbol representation.
The special case of SYMBOLS = 0 indicates that zero symbols are
included, and consequently the size of ‘Symbol(s)
Data’ is 0 bytes. This can, for example, be used to
implement a flush functionality or ensure that protocol operations
do not stop in certain case for purely event-driven protocols.

ENCODER RANK represents the current rank of the encoder, and has
the following properties:
MUST be no larger than generation/window size.
If TYPE flag is '1', ENCODER RANK is the symbol index of the
first data symbol in this symbol representation.
If TYPE flag is '2' or '3', ENCODER RANK is the number of data
symbols over which coding was performed for all coded symbols in
this symbol representation.
Coded symbols can be generated before a generation or window is
filled. ENCODER RANK describes the number of original symbols
included in the coded symbol(s).

SEED is used to generate the coding coefficient vector(s) using a
pseudo-random number generator, for a compact form of the symbol
representation, and has the following properties:
The SEED field is only present when TYPE flag is '2'. If TYPE
is '1' or '3', this field is absent.
The pseudo-random generator MUST be seeded with this value and
all coding coefficient vectors are produced by the same generator.
For example, if ENCODER RANK is 12, then coding vector for the
first symbol in this symbol representation is coefficients 0
through 11 generated by the pseudo-random generator seeded by
SEED, and coding vector for the second symbol in this symbol
representation is coefficients 12 through 23 generated by the
pseudo-random generator seeded by SEED. If generation/window size
is larger than ENCODER RANK, the remaining coefficients in the
coding vector are zero.
To ensure that SEED can be interpreted correctly at the
receiver, the same pseudo-random number generator MUST be used by
the sender and a recoding or receiving node. Otherwise, more than
one SEED field would need to be used.
8 bits long. Thus, 256 different seed values can be served. One
SEED is used per symbol representation, each of which can contain
up to 15 symbols, all derived using the same SEED. For distinct
ENCODER RANKs, different coding vectors would be generated from
the same SEED, since only an ENCODER RANK number of coefficients
from the random generator is grouped as a coding coefficient
vector, before progressing to the next coding vector for the next
symbol in the symbol representation. Consequently, the maximal
number of coded symbols that can be generated for a generation is
|SEED| * |SYMBOLS| * |ENCODER RANK| which in the best case is
(2^8)*(2^4-1)*(2^10) ~ 2^22, which for all practical
considerations can be considered as an infinite number of coded
symbols. If all coded symbols that can be represented using a SEED
is exhausted, symbols where the coding vectors is included can be
sent instead.
In the case where no random number generator is available, or
where its use is not desired, the coding coefficients can be
produced by other means, such as functions of the data, state of
the network, or the like, and transmitted explicitly by setting
the TYPE flag to ‘3’.

CODING COEFFICIENTS field is a list of SYMBOLS number of coding
vectors used to generate the ensuing SYMBOL(S) DATA, and has the
following properties:
The CODING COEFFICIENT field is only present when TYPE flag is
'3'. If TYPE is '1' or '2', this field is absent.
Each coding vector comprises ENCODER RANK number of coding
coefficients, each coding coefficient having a predetermined field
size.

In a first small encoding window symbol representation, ENCODER
RANK is 10 bits long, and the maximum generation/window size is
2^10.
Figures 5 to 7 below illustrate systematic, coded, and recoded
symbol representations within an encoding window of size 2^10.
Systematic symbols are uncoded. Coded symbols are compact in form and
comprises a seed for coding coefficient generation. Recoded symbols
are general in form and comprises the coding coefficient vectors
explicitly.
The following examples show different symbol representations for
an illustrative case where the symbol size is 2 bytes,
generation/window size is 8, and field size is 2^8.
Example 1: Three systematic symbols with ID 0, 1 and 2. As the
TYPE flag is '1' , SEED/CODING COEFFICIENTS is absent, and ENCODER
RANK is the symbol index of the first data symbol with ID 0 in this
compact symbol representation.
Example 2: Two coded symbols using a compact representation. In
this example, TYPE is '2', the SEED to the pseudo-random number
generator shared by the sender and receiver is 4. The coding vector
for Symbol A is coefficients 0 to 7 generated by the pseudo-random
number generator, the coding vector for symbol B is coefficients 8
to 15 generated by the pseudo-random number generator.
Example 3: Two recoded symbols. Coefficients A0 to A7 constitutes
the coding vector for Symbol A, coefficients B0 to B7 constitutes
the coding vector for symbol B . In practical implementations,
symbols sizes are much larger than 2, leading to amortization of the
coding coefficient overheads.
In a second large encoding window symbol representation, ENCODER
RANK is 18-bit long, and the maximum generation/window size is
2^18.
Figures 11 to 13 below illustrate systematic, coded, and recoded
symbol representations within an encoding window of size 2^18.
Systematic symbols are uncoded. Coded symbols are compact in form and
comprises a seed for coding coefficient generation. Recoded symbols
are general in form and comprises the coding coefficient vectors
explicitly.
This document does not present new security considerations.
This document has no actions for IANA.
The Benefits of Coding over Routing in a Randomized
Setting
```
```

```
```

```
```

```
```
```
```

```
```

```
```

```
```
```
```

```
```

```
```

```
```
```
```

```
```

```
```

```
```
```
```

```
```

```
```

```
```

```
```

```
```
Network Coding: Fundamentals and Applications
```
```

```
```

```
```

```
```

```
```

```
```
Network Coding: An Introduction
```
```

```
```

```
```

```
```

```
```

```
```
Network Coding Meets TCP
```
```

```
```

```
```

```
```
```
```

```
```

```
```

```
```
```
```

```
```

```
```

```
```
```
```

```
```

```
```

```
```

```
```

```
```

```
```

```
```

```
```