Abstract. The goal of this work is to separately control individual secure sessions between unlimited pairs of multicast receivers and senders. At the same time, the solution given preserves the scalability of receiver initiated Internet multicast for the data transfer itself. Unlike other multicast key management solutions, there are absolutely no side effects on other receivers when a single receiver joins or leaves a session and no smartcards are required. The cost per receiver-session is typically just one short set-up message exchange with a key manager. Key managers can be replicated without limit because they are only loosely coupled to the senders who can remain oblivious to members being added or removed. The technique is a general solution for access to an arbitrary sub-range of a sequence of information and for its revocation, as long as the end of each sub-range can be planned at the time each access is requested.
We adopt an approach where the key used to encrypt sent data is systematically changed for each new unit of application data. The keys are taken from a sequence seeded with values initially known only to senders. A key sequence construction is presented where arbitrarily different sub-sequences can be revealed to each receiver by only revealing a small number of intermediate seed values rather than having to reveal the every key in each sub-sequence. Specifically a maximum of O(log(N)) seeds need to be revealed once per session to each receiver in order to reconstruct a sub-sequence N keys long. This should be compared with the most efficient multicast key management solutions to date, that require a message of length O(log(n)) to be multicast to all n receivers every time a receiver or group of receivers joins or leaves. Further, calculation of each key in the sequence only requires a mean of under two fast hash operations. (Notation is explained in Appendix B.)
In contrast, whenever a receiver is added or removed with the present scheme, there is zero side effect on other receivers. A special group key change isn't required because systematic changes occur sufficiently regularly anyway. No keys are sent over multicast, therefore reliable multicast isn't required. If key managers are delegated to handle requests to set-up receiver sessions, the senders can be completely oblivious to any receiver addition or removal. Thus, there is absolutely no coupling back to the senders. In many commercial scenarios (e.g. prepayment) key managers can be stateless, allowing performance to grow linearly with unbounded key manager replication. Resilience of the whole system would also be assured in such scenarios, even in the face of partial failures, due to the complete decoupling of all the elements.
Our thesis is that there are many applications that only rarely if ever require premature eviction, e.g. pre-paid or subscription pay-TV or pay-per-view. Thus, we don't present a solution for unplanned eviction, but instead concentrate on the pragmatic scenario of pre-planned eviction, which we believe is a novel approach. Each eviction from the multicast group is planned at each session set-up, but each is still allowed to occur at an arbitrary time. Nonetheless, we briefly describe how the occasional unplanned eviction can be catered for by modular combination with existing solutions at the expense of loss of simplicity and scalability.
Four other key sequence constructions are presented in a companion technical report [Briscoee99], which also presents a mathematical model that encompasses all five schemes and others in the same class (including the one-way function tree (OFT) [McGrew98]). Each scheme in the companion report has particular strengths; one is useful for sessions of unknown duration, another multiplies the effective key length against brute force attack (without increasing the operational key-length) and yet another is extremely simple and efficient in terms of message bandwidth but has limited commercial applicability. The scheme chosen for this paper is the simplest and is secure enough for most commercial scenarios.
In section 2, we discuss requirements and describe related work on multicast key management and other multicast security issues. In Section 3 we use an example application to put the paper into a practical context and to highlight the scalability advantages of using systematic key changes. In section 4 we present the key sequence construction that allows different portions of a key sequence to be reconstructed from various combinations of intermediate seeds. Section 5 discusses the efficiency and security of the construction. Section 6 very briefly describes variations on the approach to add other security requirements such as multi-sender multicast, a watermarked audit trail and unplanned eviction, although more detail and a wider literature review can be found in [Briscoee99]. Finally limitations of the approach are discussed followed by conclusions.
If a multicast sender wishes to restrict its data to a set of receivers, it will typically encrypt the data at the application level. End-to-end access is then controlled by limiting the circulation of the key. A new receiver could have been storing away the encrypted stream before it joined the secure session. Therefore, every time a receiver is allowed in, the key needs to be changed (termed backward security [McGrew98]). Similarly, after a receiver is thrown out or requests to leave, it will still be able to decrypt the stream unless the key is changed again (forward security). Most approaches work on the basis that when the key needs to be changed, every receiver will have to be given a new key. Continually changing keys clearly has messaging side effects on all the other receivers than the one joining or leaving.
We define a 'secure multicast session' as the set of data that a receiver could understand, having passed one access control test. If one key is used for many related multicast groups, they all form one secure session. If a particular receiver leaves a multicast group then re-joins but she could have decrypted the information she missed, the whole transmission is still a single secure session. We envisage very large receiver communities, e.g. ten million viewers for a popular Internet pay-TV channel. Even if just 10% of the audience tuned in or out within a fifteen minute period, this would potentially cause thousands of secure joins or leaves per second.
We use the term 'application data unit' (ADU) as a more general term for the minimum useful atom of data from a security or commercial point of view (one second in the above example). The ADU equates to the aggregation interval used in Chang et al [Chang99] and has also been called a cryptoperiod when measured in units of time. ADU size is application and security scenario dependent. It may be an initialisation frame and its set of associated 'P-frames' in a video sequence or it may be ten minutes of access to a network game. Note that the ADU from a security point of view can be different from that used at a different layer of the application. ADU size can vary throughout the duration of a stream dependent on the content. ADU size is a primary determinant of system scalability. If a million receivers were to join within fifteen minutes, but the ADU size was also fifteen minutes, this would only require one re-key event.
However, reduction in re-keying requirements isn't the only scalability issue. In the above example, a system that can handle a million requests in fifteen minutes still has to be provided, even if its output is just one re-key request to the senders. With just such scalability problems in mind, many multicast key management architectures introduce a key manager role as a separate concern from the senders. This deals with policy concerns over membership and isolates the senders from much of the messaging traffic needed for access requests.
An alternative class of approaches involves a single key for the multicast data, but a hierarchy of keys under which to send out a new key over the same multicast channel as the data. These approaches involve a degree of redundant re-keying traffic arriving at every receiver in order for the occasional message to arrive that is decipherable by that receiver. The state of the art in this class is Chang et al [Chang99]. The group members are arranged as the leaves of a binary tree with the group session key at the root. Two auxiliary keys are assigned per layer of the tree. If each member is assigned a different user identity number (UID) this effectively assigns a pair of auxiliary keys to each bit in the UID space. The first of each pair is given to users with a 1 at that bit position in their UID and the other when there is a 0. When a single member leaves, a new group session key is randomly generated and multicast encrypted with every auxiliary key in the tree except those held by the leaving member. This guarantees (aside from the reliability of multicast) that every remaining member will get at least one message they can decrypt. A variant recognises the potential for aggregation of member removals if many occur within the timespan of one ADU. The group session key is multicast to the group multiple times, each encrypted with different logical combinations of the auxiliary keys in order to ensure all members but the leaving ones can decrypt at least one message. Finding this minimised set has the same solution as the familiar problem of reducing the number of logic gates and inputs in the field of logic hardware design. Wong et al [Wong98] take an approach that is a generalisation of Chang et al, analysing key graphs as a general case of trees. They find a tree of degree four rather than binary is the most efficient for large groups. The standardised approach to pay-TV key management also falls into this class [ITU-R.810]. A set of secondary keys is created and each receiver holds a sub-set of these in tamper-resistant storage. The group key is also unknown outside the tamper-resistant part of the receiver. In case the group key becomes compromised, a new one is regularly generated and broadcast multiple times under different secondary keys to ensure the appropriate receivers can re-key.
All work in this class of approaches uses multicast itself as the transport to send new keys. As 'reliable multicast' is still to some extent a contradiction in terms, all such approaches have to allow for some receivers missing the occasional multicast of a new key due to localised transmission losses. Some approaches include redundancy in the re-keying to allow for losses, but this reduces their efficiency and increases their complexity. Others simply ignore the possibility of losses, delegating the problem to a choice of a sufficiently reliable multicast scheme.
The Nark scheme [Briscoec99] falls into the same class as the present work because the group key is systematically changed for each new ADU in a stream. However, unlike with the present approach, a smartcard happens to be required to give non-repudiation of delivery and latency so its presence can also be exploited to control which keys in the sequence to reveal. Each receiver has a proxy of the sender running within her smartcard, so all smartcards can be sent one primary seed for the whole key sequence. The proxy on the smartcard then determines which keys to give out depending on the policy it was given by the key manager when the receiver set up the session. The present paper shows how to construct a key sequence such that it can be partially reconstructed from intermediate seeds, thus removing the need for a smartcard if non-repudiation is not a requirement.
Beyond the requirement we focus on, two taxonomies of multicast security requirements [Bagnall99, Canetti99] include many other possible combinations of security requirements for multicast. It is generally agreed that a modular approach is required to building solutions for combined requirements, rather than searching for a single monolithic 'super-solution'. Later, as examples of this modular approach, we show how a number of variations can be added to our basic key management schemes to achieve a selection of the more commercially important requirements.
We deliberately choose an example where the financial value of an ADU (defined in Section 2) doesn't relate to time or data volume, but only to a completely application-specific factor. In this example, participation is charged per 'game-minute', a duration that is not strictly related to real-time minutes, but is defined and signalled by the game time-keeper. The game consists of many virtual zones, each moderated by a different zone controller. The zone controllers provide the background events and data that bring the zone to life. They send this data encrypted on a multicast address per zone, but the same ADU index and hence key is used at any one time in all zones. Thus the whole game is one single 'secure multicast session' (defined in Section 2) despite being spread across many multicast addresses. Players can tune in to the background data for any zone as long as they have the current key. The foreground events created by the players in the zone are not encrypted, but they are meaningless without reference to this background data.
Fig 3.1 only shows data flows relevant to game security and only those once the game is in progress, not during set-up. Clearly all players are sending data, but the figure only shows encrypting senders, S - the zone controllers. Similarly, only receivers that decrypt, R, are shown - the game players. A game controller sets up the game security (not shown but described below). Key management operations are delegated to a number of replicated key managers, KM, that use secure Web server technology.
The key to the secure multicast session is changed every game-minute (every ADU) in a sequence. All encrypted data is headed by an ADU index in the clear, which refers to the key needed to decrypt it. After the set-up phase, the game controller, zone controllers and key managers hold initial seeds that enable them to calculate the sequence of keys to be used for the entire duration of the game.
In this example, pre-payment is used to buy seeds. This ensures key managers hold no state about their customers. This means they can be infinitely replicated as no central state repository is required, as would otherwise be the case if seeds were bought on account and the customer's account status needed to be checked. Thus performance can be linear with key manager replication and system resilience is independent of key manager resilience.
Thus:
b0(s)
= b(r0(s)); b1(s) = b(r1(s))
For instance, the first well-known blinding function could be a one bit left circular shift followed by an MD5 hash, while the second blinding function could be a one bit right circular shift followed by an MD5 hash. Other alternatives might be to precede one blinding function with an XOR with 1 or a concatenation with a well-known word. It seems advantageous to choose two functions that consume minimal but equal amounts of processor resource as this balances the load in all cases and limits the susceptibility to covert channels that would otherwise appear given the level of processor load would reveal the choice of function being executed. Alternatively, for efficiency, two variants of a hash function could be used, e.g. MD5 with two different initialisation vectors. However, it seems ill advised to tamper with tried-and-tested algorithms.
The key sequence is constructed as follows:
Note that each layer can be assigned an arbitrary value of d as long as it uniquely identifies the layer. Nothing relies on the actual value of d or D. Therefore it is not necessary for the sender to reveal how far the tree extends upwards, thus improving security.
Often a session will have an unknown duration when it starts. Clearly, the choice of D limits the maximum length of key sequence from any one starting point. The simplest work-round is just to generate a new initial seed and start a new binary hash tree alongside the old if it is required. If D is known by all senders and receivers, a range of keys that overflows the maximum key index, 2D, will be immediately apparent to all parties. In such cases it would be sensible to allocate a 'tree id' for each new tree and specify this along with the seeds for each tree.
For senders and receivers using the BHT, it is most efficient to only store the seeds on the branch of the tree from a root to that key following the one currently in use. Note that there may be multiple roots, particularly for receivers, where each revealed seed is a root. In practice this principle translates into being able to deallocate memory for a parent seed immediately it has been hashed to produce its right child. If leaf seeds are also deallocated as soon as the next in the sequence is in use, this will ensure the tree only holds log(N) seeds in memory on top of any revealed seeds being held to generate the rest of the tree to the right of the current key.
Re-using the earlier example in Fig 4.2.2, we will now follow the key calculation sequence step-by-step. For brevity we will assume keys are synonymous with their corresponding leaf seeds:
(mean no. of hashes per key) = (no. of branches) / (no. of leaves)If memory is extremely scarce (e.g. an embedded device) but some clock cycles are spare, storage can be traded off against processing. Any intermediate seeds down the branch of the tree to the current key need to be calculated, but they don't all need to be stored. Those closest to the leaves should be stored (cached), as they will be needed soonest to calculate the next few keys. As intermediate seeds nearer to the root are required, they can be recalculated as long as the seeds originally sent by the key manager are never discarded until the sequence has left them behind.
= (2(D+1) - 1) / 2D
< 2
Unlike senders or receivers, a key manager cannot guarantee to only access the key-space sequentially. It will have to respond to requests for seeds from anywhere in the tree. However, for most scenarios it is likely that requests will tend not to be randomly distributed. Therefore, a key manager can use an identical approach to the device with scarce memory. It can calculate seeds in any part of the tree from the initial seeds, but cache those being most frequently used. This simply requires a fixed size cache memory allocation and discard of the least recently used values in the store.
R, S and KM are the receiver, sender and key manager, respectively, as defined in Section 3 N (= n-m+1) is the length of the range of keys that the receiver requires, randomly positioned in the key space ws is the size of a seed (typically 128b) wh is the size of the key management protocol header overhead ts is the processor time to blind a seed (plus one relatively negligible circular shifting operation)
BHT | |||
---|---|---|---|
per R | (unicast message size)/ws - wh
or (min storage)/ws |
min | 1 |
max | 2(log(N+2) - 1) | ||
mean | O(log(N) - 1) | ||
per R | (processing latency)/ts | min | 0 |
max | log(N) | ||
mean | O(log(N) /2) | ||
per R or S | (processing per key)/ts | min | 1 |
max | log(N) | ||
mean | 2 | ||
per S or KM | (min storage)/ws | 1 | |
per S | (min random bits)/ws |
The unicast message size for each receiver's session set-up is shown equated to the minimum amount of storage each receiver requires. This is the storage required before starting the session, not once keys have started to be calculated. The minimum sender storage row has the same meaning. The processing latency is the time required for one receiver to be ready to decrypt incoming data after having received the unicast set-up message for its session. Note that there is no latency cost when other members join or leave, as in schemes that cater for unplanned eviction. The figures for processing per key assume sequential access of keys and the caching strategy described in Section 5.1. The exceptional cases when a session starts or ends are not included in the figures for per key processing. Only the sender (or a group controller if there are multiple senders) is required to generate random bits for the initial seeds. The number of bits required is clearly equal to the minimum sender storage of these initial seeds.
It can be seen that the only parameters that depend on the size of the group membership are those that are per receiver. The cost of two of these (storage and processing latency) is distributed across the group membership thus being constant per receiver. Only the unicast message size causes a cost at a key manager that rises linearly with group membership size, but the cost is only borne once per receiver session. Certainly, none of the per receiver costs are themselves dependent on the group size as in all schemes that allow unplanned eviction. Thus, the BHT construction is highly scalable.
Generally, the more random values that are needed to build a tree, the more it can contain sustained attacks to within the bounds of the sub-tree created from each new random seed. However, for long-running sessions, there is a trade-off between security and the convenience of a continuous key-space (as against concatenating BHTs side-by-side described earlier). The randomness of the randomly generated seeds is another potential area of weakness that must be correctly designed.
Any key sequence construction like that discussed here is vulnerable to collusion between valid group members. If a sub-group of members agree amongst themselves to each buy a different range of the key space, they can all share the seeds they are sent so that they can all access the union of their otherwise separate key spaces. Arbitrage is a variant of member collusion that has already been discussed. This is where one group member buys the whole key sequence then sells portions of it more cheaply than the selling price, still making a profit if most keys are bought by more than one customer. Protection against collusion with non-group members is discussed in Section 6.2 on watermarking.
Finally, the total system security for any particular application clearly depends on the strength of the security used when setting up the session. The example scenario in Section 3 describes the issues that need to be addressed and suggests standard cryptographic techniques to meet them. As always, the overall security of an application is as strong as the weakest part, which is more likely to be some 'human' element than the key sequence construction discussed here.
In Chameleon a stream is ciphered by combining a regular stream cipher with a large block of bits (512kB in Chameleon's concrete example). Each receiver is given a long-term copy of the block to decipher the stream. The block is watermarked for each receiver in a way specific to the medium. Because the block is only used for the XOR operation, the position of any watermarked bits is preserved in the output, allowing the approach to be generic. Thus, the keys generated by the BHT construction can be treated as a sequence of intermediate keys from which a watermarked sequence of final keys is generated, thus enforcing watermarked decryption.
However, this approach suffers from an applicability limitation of Chameleon that has not been previously discussed to our knowledge. Chameleon doesn't detect 'semi-internal' leakage to users who legitimately hold a valid long term key block. Intermediate keys, rather than final ones, can be leaked to any such receiver. For instance, in the above network game example, a group of players can collude to each buy a different game-hour and share the (unwatermarked) intermediate keys that each buys between themselves. Thus a receiver not entitled to certain of the intermediate keys can create final keys watermarked with her own key block and hence decrypt the cipherstream. Although the keys and data produced are stamped with her own watermark, this only gives an audit trail to the target of the leak, not the source (shutting the stable door after the horse has bolted).
Chameleon does nonetheless create an audit trail for any keys or data that are passed to a completely unauthorised receiver - that is a receiver without a long-term key block, e.g. someone who has not played the game recently. In such cases the traitor who revealed the keys or data can be traced if the keys or data are traced. Similarly, there is an audit trail if one of the players passes on their long-term key block instead, as it also contains a watermark traceable to the source of the leak. Thus Chameleon 'raises the bar' against leakage, and is therefore still a valid candidate for modular combination with BHT.
Indeed, any number of intermediate keys can be combined (e.g. using XOR) to meet multiple requirements simultaneously. For instance, MARKS, LKH++ and Chameleon intermediate keys can be combined to simultaneously achieve low cost planned eviction, occasional unplanned eviction and a watermarked audit trail against leakage outside the long-term group. Formally, the final key,
ki,j,... = c(k'i, k'j, ...),where intermediate keys k' can be generated from sequences using a BHT construction or any other means such as Chameleon or LKH++ and c() is a combining function, such as XOR.
In general, combination in this way produces an aggregate scheme with storage costs that are the sum of the individual component schemes. However, combining LKH++ with MARKS, where most evictions are planned, cuts out all the re-keying messages of LKH++ unless an unplanned eviction is actually required.
We have assumed that knowledge of more than one value blinded in different ways from the same starting value doesn't lead to an analytical solution to calculate the original value. Until proofs exist showing any blinding function is resistant to analytical (as against brute force) attack, it won't be possible to prove whether an analytical attack has been made easier by our techniques.
Finally, through pressure of time, we have avoided analysis of trees of degree three and above. They potentially offer greater efficiency at the expense of additional complexity. For instance the experiments in Wong et al recommend a tree of degree four, but the pattern of usage that their tree is subjected to is only tenuously related to the present work.
State of the art techniques that allow unplanned eviction from the group are still costly in messaging terms. In contrast we have focussed on the problem of planned eviction. That is, eviction per receiver after some arbitrary future ADU, but planned at the time the receiver requests a session. We have asserted that many commercial scenarios based on pre-payment or subscription don't require unplanned eviction but do require arbitrary planned eviction. Examples are pay-TV, pay-per-view TV or network gaming. To achieve planned but arbitrary eviction we have designed a key sequence construction that is used by the senders to systematically change the group key. It is designed such that an arbitrary sub-range of the sequence can be reconstructed by revealing a small number of seeds (16B each). We can reveal N keys to each receiver using O(log(N)) seeds. The scheme requires on average just O(log(N) /2) fast hash operations to get started, then on average no more than just two more hashes to calculate each new key in the sequence. This implies under 10us of processing time to generate each ADU key with today's technology.
To put this work in context, for pay TV charged per second with 10% of ten million viewers tuning in or out within a fifteen minute period, the best alternative scheme (Chang et al) might generate a re-key message of the order of tens of kB every second, multicast to every group member. The present work requires a message of a few hundred bytes unicast just once to each receiver at the start of perhaps four hours of viewing. This comparison is not strictly fair as, unlike the present scheme, Chang et al and the other schemes of its class allow for unplanned eviction from the group, thus allowing accurate charging for serendipitous viewing. However, the purpose of this work is to present a far more scalable solution for commercial scenarios where unplanned eviction is not required. Another way of putting this is that the cost of scenarios requiring unplanned eviction might make them economically unviable compared to those that can make do with planned eviction.
Nonetheless, if unplanned eviction is occasionally required, we have shown how to combine our scheme with Chang's to get the best of both worlds. Combining schemes sums the storage requirements of each, but both are very low in this respect. We also show how to further combine with the Chameleon watermarking scheme to give rudimentary detection of information leakage outside the group.