New Memory Model

From gem5
Jump to: navigation, search

Overview

The new memory model supports three kinds of accesses. This section gives a brief overview of key attributes.

  1. Timing accesses provide precise event-driven modeling of memory system behavior. All accesses are two-phase: a request packet is sent (via sendTiming()), then later (on some subsequent tick) a response packet is received (as a result of the responder calling sendTiming(), ending up at the requester's recvTiming()). The response packet is not necessarily the same C++ object as the request packet, though in common cases the responder will re-use the request packet for efficiency.
  2. Atomic accesses provide fast approximate modeling of the memory system. All request processing completes before returning from a call to sendAtomic(). After the call returns, the request packet is transformed in place into the response packet.
  3. Functional accesses provide a means for querying and updating values in the memory system orthogonal to any timing or organizational aspects. That is, cache contents are not updated. As with atomic accesses, the request packet is transformed in place into the response packet by the time the sendFunctional() call returns.

Rules:

  • Timing and atomic accesses cannot co-exist in the memory system at the same point in time.
  • Functional accesses can be made to the memory system legally at any time, even if there are outstanding timing accesses.

Discussion:

  • Now that we've been using the system, I'm not that happy with the names sendFunctional() and sendAtomic(), since they don't "just" send the request. Maybe doFunctional() and doAtomic() would feel more natural.

Data buffer allocation

The policy is that the data pointer in the packet may be NULL. There is no point in pre-allocating a buffer a dynamic buffer and sending it down the memory hierarchy just so bigger memory regions can be allocated by another device, or if the packet is copied, allocating extra data on copies. Thus, if the requestor is has a static member it uses for requests (e.g. the SimpleCPU) it can place the preallocated buffer in the Request. Otherwise the DataPtr should be NULL, and the responder will allocate it.

Since the requestor knows if it has a static member it uses for the data, it is up to the recieving device to free the data it receives.

Functional accesses

Functional accesses must be supported to some extent by the coherence protocol so that the up-to-date value(s) of a particular location can be tracked down (and possibly modified) even if the cache block is in some transitional state.

Tentative plan:

The result of a functional access could depend on where that access is injected into the system. For example, with a weak consistency model, the value of a particular memory location could depend on which CPU issues the load, and functional accesss should probably reflect this. Also, functional accesses that go directly to the main memory model need not propagate up the hierarchy to caches. The main motivation for this policy is to allow initial program loading (which goes straight to main memory) to run faster by not having to operate at the cache-block level. If we want to maintain that efficiency, we have to distinguish those accesses from later accesses (e.g. from syscall emulation) that do need to go through the coherence protocol. As of right now, using the port location (cache vs. memory) works for that purpose, and I see no compelling need in the future for forcing functional main-memory accesses to deal with coherence (since they can always use a cache port instead).

Magical memory tracker

This cool but far-off idea should be documented.

Resolved Issues

Deferred Timing Accesses

We discussed at some point haveing two functions

sendTiming(Packet &pkt)
sendTiming(Packet &pkt, Tick time)

each would call the corresponding recvTiming call on the peer with the same parameters. In the case of the sendTiming that had a Tick parameter we would either allow the object to implement it's own timing (like the bus, which has it's own sub-eventq) or the port class would have a default implementation that scheduled an event for the time given in the function and then when that event occured just called the recvTiming call without the tick parameter.

Somewhere along the way we removed that code. Perhaps to move towards a model where each device kept it's own sub-eventq to hold on to events before it did the request. The option is really if we want to stick to the old model where the timing is done in just the bus (or ports), or if each device (cache/memories) needs to have it's own sub-eventq.

Does anyone remember what we decided to do? --Rdreslin 13:14, 23 February 2006 (EST)

Type #2 can't work, because the request can be regected by its peer and you would need to have a way to retransmit it which can't really be done in a generic fashion. --Ali 01:20, 28 February 2006 (EST)