WO1998013763A2 - Multiport cache memory with address conflict detection - Google Patents

Multiport cache memory with address conflict detection Download PDF

Info

Publication number
WO1998013763A2
WO1998013763A2 PCT/IB1997/001146 IB9701146W WO9813763A2 WO 1998013763 A2 WO1998013763 A2 WO 1998013763A2 IB 9701146 W IB9701146 W IB 9701146W WO 9813763 A2 WO9813763 A2 WO 9813763A2
Authority
WO
WIPO (PCT)
Prior art keywords
port
bank
cache
banks
ports
Prior art date
Application number
PCT/IB1997/001146
Other languages
French (fr)
Other versions
WO1998013763A3 (en
Inventor
Eino Jacobs
Original Assignee
Philips Electronics N.V.
Philips Norden Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Philips Electronics N.V., Philips Norden Ab filed Critical Philips Electronics N.V.
Priority to JP10515453A priority Critical patent/JP2000501539A/en
Priority to KR1019980703828A priority patent/KR19990071554A/en
Priority to EP97940270A priority patent/EP0875030A2/en
Publication of WO1998013763A2 publication Critical patent/WO1998013763A2/en
Publication of WO1998013763A3 publication Critical patent/WO1998013763A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • G06F12/0851Cache with interleaved addressing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems

Definitions

  • the present invention relates to a processing system with a cache memory, and more particularly to a cache having multiple access ports.
  • a cache is a small, fast memory placed between a processor and main memory in order to reduce the effective time required by a processor to access addresses, instructions or data that are normally stored in main memory. For example, when a processor reads a word from main memory, the word and neighbouring words are read as a block from main memory into the cache. Typically, there is a high probability that the processor will next attempt to access one of the neighbouring words within the block. Because of this locality of reference property, main memory bus traffic is reduced since the processor is likely to engage in subsequent data transactions directly with the cache. Cache accesses take less time than main memory accesses. Consequently, the use of a cache increases processor throughput.
  • the processor may attempt to execute memory operations simultaneously. In those cases, the processor may require simultaneous access to multiple words stored within cache memory. Accordingly, the cache may include multiple ports, each port for conducting a separate data transaction.
  • a multi-port cache may be implemented as a single multi-port SRAM. However, such a configuration is very slow in operation and occupies a relatively large chip area.
  • a dual-port cache may be implemented with two single-port memory arrays, each corresponding to one of the cache ports. The two arrays have the same address space. This cumbersome arrangement requires complex data coherency circuitry to ensure that the arrays store the same data when data is modified at one of the cache ports. Further, the use of two arrays to store redundant copies of the same data occupies an unnecessarily large chip area.
  • the present invention provides a multi-port cache memory.
  • the multi- port cache operates in a microprocessor system, and includes multiple memory banks and multiple ports for enabling accesses to the banks.
  • Conflict detection circuitry detects simultaneous addressing of a first memory bank through a first port and a second port, and stalls microprocessor operations for a predetermined number of clock cycles in response to the detection of simultaneous addressing.
  • Conflict resolution circuitry allows access to the first bank through the first port during the stall, and allows access through the second port after the stall is complete.
  • the conflict resolution circuitry allows access through ports that are attempting to access the first memory bank in order of ascending priority during successive clock cycles while the microprocessor is stalled.
  • One or more of the ports attempting to access the first bank may be allowed access before or after the time the microprocessor is stalled.
  • Each bank is single-ported. The banks have non overlapping address spaces, and are addressed so that words within a cache block are distributed among multiple banks.
  • Figure 1 illustrates a computer system having a multi-port cache of the present invention.
  • Figure 2 is a block diagram illustrating a processor coupled to a multi- port cache of the present invention.
  • Figure 3A is a timing diagram illustrating cache timing in the absence of a bank conflict.
  • Figure 3B is a timing diagram illustrating cache timing in the presence of a bank conflict.
  • FIG. 1 illustrates a computer system having a multi-port CPU 100, a main memory 102, a main memory interface 104, and a multi-port cache 106 of the present invention.
  • the main memory interface 104 manages the information exchange between the cache 106 and main memory 102 to maintain cache coherency when a CPU access misses the cache or when the CPU writes new data into the cache.
  • the cache 106 is shown as having two ports, although those skilled in the art will recognize that the present invention is easily extended to a cache having any number of ports.
  • the processor is capable of executing multiple parallel operations, and thus may require simultaneous access to more than one word stored within the cache.
  • processors or other agents may each require access to a corresponding cache port.
  • FIG. 2 is a detailed block diagram of a processor 100 coupled to an embodiment of the cache 106 of the present invention.
  • the cache is a two- way set-associative cache.
  • the cache of the present invention does not employ a dual-port SRAM or redundant single-port arrays that store the same data.
  • the present invention employs multiple single-port memory banks, where each bank stores data for a non-overlapping address space.
  • each bank may be accessed by any of the ports. As long as no two ports attempt to access the same bank, all ports can execute simultaneous accesses to the cache.
  • the cache controls the timing of the accesses as described below.
  • the CPU 100 can issue multiple accesses to the cache 106, represented as a first address A0 and a second address Al. These addresses correspond to the two ports 201 and 203 of the cache of this example.
  • the cache itself comprises a first bank 200, bankO, and a second bank 202, bankl.
  • each bank holds eight kilobytes (8 KB) of data, where four bytes comprise one 32-bit word.
  • each bank stores 2K words.
  • each cache block is two words long, and two blocks comprise one set of the two-way set-associative cache of this example.
  • Each bank is coupled to a plurality of read buses 204 through a corresponding tri-state bus driver 206, each read bus 204 corresponding to one of the ports.
  • Each bank is further coupled to a plurality of write buses 208 through a write multiplexer 210, each write bus 208 corresponding to one of the ports.
  • the read and write busses 204, 208 are coupled to the input/output ports of the CPU 100 (the coupling is not shown to keep the figure simple).
  • each port is coupled to a dual tag RAM 212, where each tag array 214 corresponds to a way of the two-way set-associative cache.
  • the tag from each array is fed into a corresponding comparator, 216 which compares the tag to the tag field of the corresponding port address.
  • the resulting hit signal is passed to a corresponding port input of a hit multiplexer 218 for each bank.
  • the hit signal here is a two-bit "one hot" signal in which at most one bit may take on a logical one value.
  • Each bank also is coupled to a row multiplexer 220 that receives the set index field of each port address.
  • read/write control signals are passed from each CPU port to a corresponding input of a read/write multiplexer (not shown) for each bank to indicate whether a read or write memory operation is to be performed.
  • a write enable signal from each port is passed to a corresponding input of a write multiplexer.
  • a read enable signal from each port is passed to a corresponding input of a read multiplexer.
  • the output of the multiplexers is coupled to write enable and read enable inputs, respectively, of the corresponding bank.
  • the read and write multiplexers together are referred to herein as the "read/write multiplexer.”
  • the address circuitry that is common to both ports includes conflict detect circuitry 222 that receives the bank address portion of the port addresses.
  • each bank address passes through a 1:2 bank decoder 224, which produces a bank select signal in response. For example, if a zero bank address bit represents a selection of bankO, then the bank decoder 224 will output a one from its bankO output and a zero from its bankl output.
  • the bank select signal (bd) from each port's decoder is fed into a corresponding conflict resolution circuit 226 for each bank.
  • the output of the conflict resolution circuitry 226 controls the row multiplexer 220, the hit multiplexer 218 and the read/ write multiplexer (not shown) for each bank to determine which port will have access to the bank.
  • the conflict resolution circuitry 226 also controls the tri-state drivers 206 for the read buses 204 ( Figure 2 assumes active high) and the write bus multiplexers 210 to assure access to the bus corresponding to the selected port.
  • each bank stores 8 KB of data with each word comprising four bytes.
  • Each cache block comprises two words.
  • the memory contains IK sets with two blocks per set because the cache is a two-way set-associative cache. Bit 2 of the address selects the bank, whereas bits 3-12 select one of the sets. Bits 13-31 of the address are used in the tag comparison to indicate the presence of an addressed block in the cache.
  • Figure 3A illustrates cache timing where there is no bank conflict.
  • Figure 3B illustrates cache timing with a bank conflict.
  • the CPU attempts to perform simultaneous accesses of the cache by issuing an address AO from a first CPU port 228 and an address Al from a second CPU port 230.
  • the addresses are respectively received by a first cache port 201 and a second cache port 203 over an internal CPU bus 232.
  • the second bits of the addresses are fed into the conflict detection circuitry 222 to determine whether both ports are attempting to access the same memory bank.
  • the conflict resolution circuitry 226 determines which port input will be passed by the row multiplexer 220, the hit multiplexer 218 and the read/write multiplexer to each bank, and selects the proper read or write bus to communicate with the bank (depending upon whether a read or write operation is being performed) .
  • the two-bit signal selO represents the two port- select signals for bankO
  • the two-bit signal sell represents the two port-select signals for bankl. These combined signals select the appropriate port input to the multiplexers.
  • the conflict resolution logic may be implemented by any circuitry that embodies the logic of Table 1.
  • x/y indicates that the port select signal takes on a value of x in one clock cycle followed by a value of y in a subsequent clock cycle.
  • the conflict resolution circuitry 226 determines which port communicates with each bank. This selection is based upon the bank address field of the port addresses, which is bit 2 in this example. The other bits are used to address a particular word within the banks. Bits 3-12 are the set index fed into the dual tag array for each port. In this example, a set comprises two blocks, with one block in each bank. Bits 13-31 comprise the tag address field that is compared to the tags from the dual tag array 212.
  • the hit signal selects the word within the block.
  • the miss is handled by loading the miss block into the cache. Operation resumes as if the miss did not occur, resulting in a hit. For example, if one instruction attempts two simultaneous accesses and one port hits while the other port misses, the miss is first handled. Then, the instruction is restarted, resulting in two hits with the conflict resolution circuitry operating as described herein.
  • the set index and the hit signal are routed to the correct bank through the multiplexers controlled by the conflict resolution circuitry 226. Assume hits for both port addresses.
  • the hit signal, hitO, from portO 201 is routed through the hit multiplexers 218 to the hit input of bankO 220, whereas the hit signal, hitl, from portl 203 is routed through the hit multiplexers 218 to the hit input of bankl 202.
  • the data read from or written to portO 201 is represented by X
  • the data read from or written to portl 203 is represented by Y.
  • both of these ports are in communication with a bank.
  • X data from portO 201 is read from or written to bankO 200
  • Y data from portl 203 is read from or written to bankl 202.
  • Figure 3B is a timing diagram illustrating the operation of the cache of the present invention in case of a bank conflict.
  • the conflict detection circuitry 222 will stall the operations of the CPU 100 in the next cycle, i.e., cycle 1.
  • the mechanism employed by the conflict detection circuitry 222 to stall the CPU can be implemented using circuitry similar to that employed by standard cache control logic to stall the CPU during a cache miss.
  • the bank select signals for each bank are OR'ed together by an OR gate 250 having an output fed into a bank enable input. If no port attempts to access a bank, then the bank is not enabled. Here, bankl is not being accessed.
  • sel_ctrl is asserted during the stall (cycle 1) so as to force selO to select portl during cycle 1. See Figure 3B and Table 1.
  • the hit signal, hitl, from portl is routed through the hit multiplexers 218 to the hit_bank0 input, of bankO so that the data word Y can be outputted through portl during the next cycle, cycle 2.
  • the result of a read operation for portO is latched on the read bus for portO by latching circuitry on the bus (not shown).
  • data X read from portO and data Y from portl appear simultaneously during cycle 2. Because CPU operations are stalled during cycle 1 , it appears to the CPU that the dual port cache access occurs simultaneously in a cycle immediately following cycle 0.
  • conflict resolution circuitry 226 grants priority access to portO in case of a conflict.
  • the conflict resolution circuitry 226 may grant access to conflicting ports in any order of priority.
  • the ports are numbered so that low-numbered ports correspond to those requiring high-priority access, whereas high-numbered ports can wait longer for access.
  • sel_ctrll There are two selection control signals, shared by all banks, to override priorities of bank conflict resolution: sel_ctrll, sel_ctrl2. If sel_ctrll is asserted, then port 1 is selected. If sel_ctrl2 is asserted, then port 2 is selected. If neither sel_ctrll nor sel_ctrl2 is asserted, then port 0 has priority.
  • bank conflicts are avoided in the compiler and application software by allocating variables in nearby instructions to addresses in different banks. Thus, it is highly unlikely that the same bank would be addressed in the same cycle. Further, the organization of the address space itself helps to reduce the chance of a bank conflict. By using lower order address bits, e.g., the second bit, to select the bank, adjacent words of the cache block are evenly distributed among all the banks. In this manner, the addressing of adjacent words will result in the addressing of different banks. Because of the locality of reference property, this organization thus reduces the chance of conflict.
  • the cache can be organized as an eight-way set-associative cache of eight banks.
  • address bits 6-10 act as the set index.
  • Each set comprises two rows in each bank.
  • Bit 5 selects one of the two rows, and bits 2-4 select the bank.
  • the address bits 11-31 are used for the tag comparison.
  • Bits 0-1 correspond to the byte within a word.
  • the present invention can be applied to a pipelined cache.

Abstract

A multi-port cache memory is disclosed. The multi-port cache operates in a microprocessor system, and includes multiple memory banks and multiple ports for enabling accesses to the banks. Conflict detection circuitry detects simultaneous addressing of a first memory bank through a first port and a second port, and stalls microprocessor operations for a predetermined number of clock cycles in response to the detection of simultaneous addressing. Conflict resolution circuitry allows access to the first bank through the first port during the stall, and allows access through the second port after the stall is complete. Generally, the conflict resolution circuitry allows access through ports that are attempting to access the first memory bank in order of ascending priority during successive clock cycles while the microprocessor is stalled. One or more of the ports attempting to access the first bank may be allowed access before or after the time the microprocessor is stalled. Each bank is single-ported. The banks have non overlapping address spaces, and are addressed so that words within a cache block are distributed among multiple banks.

Description

MULTI-PORT CACHE MEMORY WITH ADDRESS CONFLICT DETECTION
The present invention relates to a processing system with a cache memory, and more particularly to a cache having multiple access ports.
A cache is a small, fast memory placed between a processor and main memory in order to reduce the effective time required by a processor to access addresses, instructions or data that are normally stored in main memory. For example, when a processor reads a word from main memory, the word and neighbouring words are read as a block from main memory into the cache. Typically, there is a high probability that the processor will next attempt to access one of the neighbouring words within the block. Because of this locality of reference property, main memory bus traffic is reduced since the processor is likely to engage in subsequent data transactions directly with the cache. Cache accesses take less time than main memory accesses. Consequently, the use of a cache increases processor throughput.
Many modern microprocessors execute multiple instructions within the same processor clock cycle. In some instances, the processor may attempt to execute memory operations simultaneously. In those cases, the processor may require simultaneous access to multiple words stored within cache memory. Accordingly, the cache may include multiple ports, each port for conducting a separate data transaction.
A multi-port cache may be implemented as a single multi-port SRAM. However, such a configuration is very slow in operation and occupies a relatively large chip area. Alternatively, as described in U.S. Patent No. 5,359,557, issued to Aipperspach et al., a dual-port cache may be implemented with two single-port memory arrays, each corresponding to one of the cache ports. The two arrays have the same address space. This cumbersome arrangement requires complex data coherency circuitry to ensure that the arrays store the same data when data is modified at one of the cache ports. Further, the use of two arrays to store redundant copies of the same data occupies an unnecessarily large chip area.
Accordingly, there is a desire to find a smaller, more efficient means of implementing a multi-port cache memory.
The present invention provides a multi-port cache memory. The multi- port cache operates in a microprocessor system, and includes multiple memory banks and multiple ports for enabling accesses to the banks. Conflict detection circuitry detects simultaneous addressing of a first memory bank through a first port and a second port, and stalls microprocessor operations for a predetermined number of clock cycles in response to the detection of simultaneous addressing. Conflict resolution circuitry allows access to the first bank through the first port during the stall, and allows access through the second port after the stall is complete. Generally, the conflict resolution circuitry allows access through ports that are attempting to access the first memory bank in order of ascending priority during successive clock cycles while the microprocessor is stalled. One or more of the ports attempting to access the first bank may be allowed access before or after the time the microprocessor is stalled. Each bank is single-ported. The banks have non overlapping address spaces, and are addressed so that words within a cache block are distributed among multiple banks.
The objects, features and advantages of the present invention will be apparent to one skilled in the art in light of the detailed description in which the following figures provide examples of the structure and operation of the invention:
Figure 1 illustrates a computer system having a multi-port cache of the present invention.
Figure 2 is a block diagram illustrating a processor coupled to a multi- port cache of the present invention. Figure 3A is a timing diagram illustrating cache timing in the absence of a bank conflict.
Figure 3B is a timing diagram illustrating cache timing in the presence of a bank conflict.
The present invention provides a multi-port cache memory having multiple memory banks. In the following description, numerous details are set forth in order to enable a thorough understanding of the present invention. However, it will be understood by those of ordinary skill in the art that these specific details are not required in order to practice the invention. Further, well-known elements, devices, process steps and the like are not set forth in detail in order to avoid obscuring the present invention. Figure 1 illustrates a computer system having a multi-port CPU 100, a main memory 102, a main memory interface 104, and a multi-port cache 106 of the present invention. The main memory interface 104 manages the information exchange between the cache 106 and main memory 102 to maintain cache coherency when a CPU access misses the cache or when the CPU writes new data into the cache. The cache 106 is shown as having two ports, although those skilled in the art will recognize that the present invention is easily extended to a cache having any number of ports.
Preferably, the processor is capable of executing multiple parallel operations, and thus may require simultaneous access to more than one word stored within the cache. In another configuration (not shown), separate processors or other agents may each require access to a corresponding cache port.
Figure 2 is a detailed block diagram of a processor 100 coupled to an embodiment of the cache 106 of the present invention. In this example, the cache is a two- way set-associative cache. Unlike the prior art, the cache of the present invention does not employ a dual-port SRAM or redundant single-port arrays that store the same data. Instead, the present invention employs multiple single-port memory banks, where each bank stores data for a non-overlapping address space. Preferably, each bank may be accessed by any of the ports. As long as no two ports attempt to access the same bank, all ports can execute simultaneous accesses to the cache. In the event of a bank conflict, i.e. , when two ports attempt to access the same bank, the cache controls the timing of the accesses as described below.
According to the present invention, the CPU 100 can issue multiple accesses to the cache 106, represented as a first address A0 and a second address Al. These addresses correspond to the two ports 201 and 203 of the cache of this example. In this example, the cache itself comprises a first bank 200, bankO, and a second bank 202, bankl. Here, each bank holds eight kilobytes (8 KB) of data, where four bytes comprise one 32-bit word. Thus, each bank stores 2K words. Further, each cache block is two words long, and two blocks comprise one set of the two-way set-associative cache of this example. Those skilled in the art will recognize that the present invention is applicable to other memory configurations, and that, in particular, the number of banks need not necessarily equal the number of ports.
Each bank is coupled to a plurality of read buses 204 through a corresponding tri-state bus driver 206, each read bus 204 corresponding to one of the ports. Each bank is further coupled to a plurality of write buses 208 through a write multiplexer 210, each write bus 208 corresponding to one of the ports.
The read and write busses 204, 208 are coupled to the input/output ports of the CPU 100 (the coupling is not shown to keep the figure simple).
The circuitry for addressing the banks is divided into address circuitry dedicated to a corresponding port and address circuitry common to both ports. In this embodiment, each port is coupled to a dual tag RAM 212, where each tag array 214 corresponds to a way of the two-way set-associative cache. The tag from each array is fed into a corresponding comparator, 216 which compares the tag to the tag field of the corresponding port address. The resulting hit signal is passed to a corresponding port input of a hit multiplexer 218 for each bank. The hit signal here is a two-bit "one hot" signal in which at most one bit may take on a logical one value. Each bank also is coupled to a row multiplexer 220 that receives the set index field of each port address. Further, read/write control signals are passed from each CPU port to a corresponding input of a read/write multiplexer (not shown) for each bank to indicate whether a read or write memory operation is to be performed. In one embodiment, a write enable signal from each port is passed to a corresponding input of a write multiplexer. Similarly, a read enable signal from each port is passed to a corresponding input of a read multiplexer. The output of the multiplexers is coupled to write enable and read enable inputs, respectively, of the corresponding bank. The read and write multiplexers together are referred to herein as the "read/write multiplexer." The address circuitry that is common to both ports includes conflict detect circuitry 222 that receives the bank address portion of the port addresses. In this example, each bank address passes through a 1:2 bank decoder 224, which produces a bank select signal in response. For example, if a zero bank address bit represents a selection of bankO, then the bank decoder 224 will output a one from its bankO output and a zero from its bankl output. The bank select signal (bd) from each port's decoder is fed into a corresponding conflict resolution circuit 226 for each bank. The output of the conflict resolution circuitry 226 controls the row multiplexer 220, the hit multiplexer 218 and the read/ write multiplexer (not shown) for each bank to determine which port will have access to the bank. The conflict resolution circuitry 226 also controls the tri-state drivers 206 for the read buses 204 (Figure 2 assumes active high) and the write bus multiplexers 210 to assure access to the bus corresponding to the selected port.
In one example of the memory organization of the cache of Figure 2, each bank stores 8 KB of data with each word comprising four bytes. Each cache block comprises two words. The memory contains IK sets with two blocks per set because the cache is a two-way set-associative cache. Bit 2 of the address selects the bank, whereas bits 3-12 select one of the sets. Bits 13-31 of the address are used in the tag comparison to indicate the presence of an addressed block in the cache.
The operation of the cache of the present invention will be described with respect to the timing diagrams of Figures 3A and 3B. Figure 3A illustrates cache timing where there is no bank conflict. Figure 3B illustrates cache timing with a bank conflict. In both cases, the CPU attempts to perform simultaneous accesses of the cache by issuing an address AO from a first CPU port 228 and an address Al from a second CPU port 230. The addresses are respectively received by a first cache port 201 and a second cache port 203 over an internal CPU bus 232. In this example, during cycle 0, the second bits of the addresses are fed into the conflict detection circuitry 222 to determine whether both ports are attempting to access the same memory bank. Here, assume that A0[2] = 0 and Al[2] = 1. In that case, the bank address decoder 224 for portO will output a bank select signal bd[0][0] = 1 to the conflict resolution circuitry 226 for bankO 200 and a bank select signal bd[0][l] = 0 to the conflict resolution circuitry 226 for bankl 202. The bank address decoder 224 for portl 201 will output a bank select signal bd[l][0] = 0 to the conflict resolution circuitry 226 for bankO 200 and a bank select signal bd[l][l] = 1 to the conflict resolution circuitry 226 for bankl 202. In cycle 0, the conflict resolution circuitry 226 determines which port input will be passed by the row multiplexer 220, the hit multiplexer 218 and the read/write multiplexer to each bank, and selects the proper read or write bus to communicate with the bank (depending upon whether a read or write operation is being performed) .
For this two-port example, the conflict resolution circuitry 226 implements the following logic equations: sel[0][i] = bd[0][i] AND NOT (sel_ctri[l]) sel[l][i] = (NOT (bd[0][i]) AND bd[l][i]) OR sel_ctrl[l] where the port select signal sel[j][i] gives input port j access to bank i if sel[j][i] = 1. When a bank conflict occurs, the conflict resolution circuitry first allows the lower-numbered port, portO, to access the addressed bank. In that clock cycle sel_ctrl[l] =0. In the next cycle, the override signal sel_ctrl[l] takes on a value of 1 to give priority of access to port 1.
In Figure 2, the two-bit signal selO represents the two port- select signals for bankO, and the two-bit signal sell represents the two port-select signals for bankl. These combined signals select the appropriate port input to the multiplexers. Alternatively, the conflict resolution logic may be implemented by any circuitry that embodies the logic of Table 1.
Figure imgf000008_0001
TABLE 1
In the table, "x/y" indicates that the port select signal takes on a value of x in one clock cycle followed by a value of y in a subsequent clock cycle. In this example, bd[0][0] = 1 and bd[l][0] = 0, whereas bd[0][l] = 0 and bd[l][l] = 1. Thus, in cycle 0 of Figure 3A, sel[0][0] = 1 sel[0][l] = 0 sel[l][0] = 0 sel[l][l] = 1
In the absence of a conflict, the sel_ctrl override signal is inoperative. As a result, bankO is accessible to portO and bankl is accessible to portl .
In sum, the conflict resolution circuitry 226 determines which port communicates with each bank. This selection is based upon the bank address field of the port addresses, which is bit 2 in this example. The other bits are used to address a particular word within the banks. Bits 3-12 are the set index fed into the dual tag array for each port. In this example, a set comprises two blocks, with one block in each bank. Bits 13-31 comprise the tag address field that is compared to the tags from the dual tag array 212.
If the tag comparison results in a hit in one of the arrays, the hit signal selects the word within the block. In case of a cache miss for any one of the ports, the miss is handled by loading the miss block into the cache. Operation resumes as if the miss did not occur, resulting in a hit. For example, if one instruction attempts two simultaneous accesses and one port hits while the other port misses, the miss is first handled. Then, the instruction is restarted, resulting in two hits with the conflict resolution circuitry operating as described herein.
The set index and the hit signal are routed to the correct bank through the multiplexers controlled by the conflict resolution circuitry 226. Assume hits for both port addresses. During cycle 0, the hit signal, hitO, from portO 201 is routed through the hit multiplexers 218 to the hit input of bankO 220, whereas the hit signal, hitl, from portl 203 is routed through the hit multiplexers 218 to the hit input of bankl 202. The data read from or written to portO 201 is represented by X, whereas the data read from or written to portl 203 is represented by Y. During cycle 1, both of these ports are in communication with a bank. Here, X data from portO 201 is read from or written to bankO 200, and Y data from portl 203 is read from or written to bankl 202.
Figure 3B is a timing diagram illustrating the operation of the cache of the present invention in case of a bank conflict. In this example, assume that the second bits of both port addresses equal zero, i.e., both ports attempt to access bankO. In response, the conflict detection circuitry 222 will stall the operations of the CPU 100 in the next cycle, i.e., cycle 1. The mechanism employed by the conflict detection circuitry 222 to stall the CPU can be implemented using circuitry similar to that employed by standard cache control logic to stall the CPU during a cache miss.
In this example A0[2] = Al[2] = 0. Thus, bd[0][0] = 1 and bd[l][0] = 1 , whereas bd[0][l] = 0 and bd[l][l] = 0. Accordingly, sel[0][0] = 1 AND NOT (sel_ctrll) sel[0][l] = 0 AND NOT (sel_ctrll) sel[l][0] = (NOT (1) AND 1) OR sel_ctrll sel[l][l] = (NOT (1) AND 0) OR sel_ctrll The bank select signals for each bank are OR'ed together by an OR gate 250 having an output fed into a bank enable input. If no port attempts to access a bank, then the bank is not enabled. Here, bankl is not being accessed. Consequently, the signal sel[l] (i.e., sel[0][l] and sel[l][l]) for bankl has no effect. However, both ports are attempting to read from bankO. Assume hits for both port addresses. During cycle 0, the hit signal, hitO, from portO is routed through the hit multiplexers 218 to the hit_bankO input of bankO so that the data word X can be output from portO during cycle 1.
Second, sel_ctrl is asserted during the stall (cycle 1) so as to force selO to select portl during cycle 1. See Figure 3B and Table 1. As a result, during the stall cycle 1, the hit signal, hitl, from portl is routed through the hit multiplexers 218 to the hit_bank0 input, of bankO so that the data word Y can be outputted through portl during the next cycle, cycle 2. Further, during the stall cycle, the result of a read operation for portO is latched on the read bus for portO by latching circuitry on the bus (not shown). As a result, data X read from portO and data Y from portl appear simultaneously during cycle 2. Because CPU operations are stalled during cycle 1 , it appears to the CPU that the dual port cache access occurs simultaneously in a cycle immediately following cycle 0.
Note that the conflict resolution circuitry 226 grants priority access to portO in case of a conflict. Those skilled in the art will recognize that the conflict resolution circuitry 226 may grant access to conflicting ports in any order of priority. In the examples described herein, the ports are numbered so that low-numbered ports correspond to those requiring high-priority access, whereas high-numbered ports can wait longer for access.
Further, the conflict resolution circuitry is not limited to resolving conflicts between only two ports. For a cache having K ports, if N 2 ports attempt to access the same bank, then access may first be given to the lowest numbered port, and the CPU stalled for N-l cycles to allow access by the remaining conflicting ports in ascending order by port number. For example, for a cache with K=3 ports, for each bank i, there are three bank select signals per bank, bd[0][i], bd[l][i], bd[2][i], one for each port. There are three port select signals sel[0][i], sel[l][i], sel[2][i], indicating that port 0, 1 or 2 is selected to address the bank.
There are two selection control signals, shared by all banks, to override priorities of bank conflict resolution: sel_ctrll, sel_ctrl2. If sel_ctrll is asserted, then port 1 is selected. If sel_ctrl2 is asserted, then port 2 is selected. If neither sel_ctrll nor sel_ctrl2 is asserted, then port 0 has priority. For each bank i: sel[0][i] = bdO[i] AND NOT (sel_ctrll OR sel_ctrl2) sel[l][i] = (NOT (bdO[i]) AND bdl[i]) OR sel_ctrll sel[2][i] = (NOT (bdO[i]) AND NOT (bdl[i]) AND bd2[i]) OR sel_ctrl2 In general, for K ports, where K>3, for each bank i: for each port j (0 j < K): for each bank select signal bdfj][i] of port j in bank i: for each selection control signal sel_ctrl[m](m-=l,. . ., K-l): the conflict resolution circuitry generates output select signals sel[j][i] as follows: sel[0][i]=bd[0][i] AND NOT (sel_ctrl[l] OR sel_ctrl[2] OR . . . OR sel_ctrl[K-l] sel[l][i] =(NOT (bd[0][i]) AND bd[l][i]) OR sel_ctrl[l] sel[2][i]=(NOT (bd[0][i]) AND NOT (bd[l][i]) AND bd[2][i]) OR sel_ctrl[2] sellj][i] = (NOT (bd[0][i]) AND NOT (bd[l][i])
AND NOT (bd[j-l][i]) AND (bd[j][i])) OR sel_ctrlij] One can see that a large number of bank conflicts would give rise to many stall cycles that would hinder overall performance. Thus, it is advantageous to limit the number of bank conflicts within the same CPU cycle. According to one embodiment of the present invention, bank conflicts are avoided in the compiler and application software by allocating variables in nearby instructions to addresses in different banks. Thus, it is highly unlikely that the same bank would be addressed in the same cycle. Further, the organization of the address space itself helps to reduce the chance of a bank conflict. By using lower order address bits, e.g., the second bit, to select the bank, adjacent words of the cache block are evenly distributed among all the banks. In this manner, the addressing of adjacent words will result in the addressing of different banks. Because of the locality of reference property, this organization thus reduces the chance of conflict.
Although the invention has been described in conjunction with particular embodiments, it will be appreciated that various modifications and alterations may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, the cache can be organized as an eight-way set-associative cache of eight banks. In that configuration, address bits 6-10 act as the set index. Each set comprises two rows in each bank. Bit 5 selects one of the two rows, and bits 2-4 select the bank. The address bits 11-31 are used for the tag comparison. Bits 0-1 correspond to the byte within a word. Further, the present invention can be applied to a pipelined cache.

Claims

CLAIMS:
1. A microprocessor system with a multi-port cache comprising: a plurality of memory banks; a plurality of ports for enabling accesses to the banks; and conflict detection circuitry for detecting simultjaneous addressing of a first memory bank through a first port and a second port, and for stalling processor operations for a predetermined time in response to the detection of simultaneous addressing.
2. The processor system of claim 1, further comprising: conflict resolution circuitry for allowing access to the first memory bank through the first port during the stall and for allowing access to the first memory bank through the second port after the stall is complete.
3. The processor system of claim 1, wherein each bank is single-ported.
4. The processor system of claim 1, wherein the banks are addressed so that words within a cache block are distributed among multiple banks.
5. The processor system of claim 1 , wherein the banks have non overlapping address spaces.
6. A processor system according to Claim 1, 3, 4 or 5, comprising conflict resolution circuitry for allowing access to the first memory bank through ports that are attempting to access the first memory bank in order to ascending priority during successive clock cycles while the processor is stalled.
7. A multiport memory comprising a plurality of memory banks; a plurality of ports for enabling accesses to the banks; and conflict detection circuitry for detecting simultaneous addressing of a first memory bank through a first port and a second port, and an output for a signal to stall processor operations for a predetermined time in response to the detection of simultaneous addressing.
8. A multiport memory according to Claim 7, conflict resolution circuitry for allowing access to the first memory bank through the first port during the stall and for allowing access to the first memory bank through the second port after the stall is complete.
9. A multiport memory according to Claim 8, comprising conflict resolution circuitry for allowing access to the first memory bank through ports that are attempting to access the first memory bank in order of ascending priority during successive clock cycles while the processor is stalled.
PCT/IB1997/001146 1996-09-25 1997-09-23 Multiport cache memory with address conflict detection WO1998013763A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP10515453A JP2000501539A (en) 1996-09-25 1997-09-23 Multi-port cache memory with address conflict detection
KR1019980703828A KR19990071554A (en) 1996-09-25 1997-09-23 Multi-port cache memory with address conflict detection
EP97940270A EP0875030A2 (en) 1996-09-25 1997-09-23 Multi-port cache memory with address conflict detection

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US71960996A 1996-09-25 1996-09-25
US08/719,609 1996-09-25

Publications (2)

Publication Number Publication Date
WO1998013763A2 true WO1998013763A2 (en) 1998-04-02
WO1998013763A3 WO1998013763A3 (en) 1998-06-04

Family

ID=24890679

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB1997/001146 WO1998013763A2 (en) 1996-09-25 1997-09-23 Multiport cache memory with address conflict detection

Country Status (4)

Country Link
EP (1) EP0875030A2 (en)
JP (1) JP2000501539A (en)
KR (1) KR19990071554A (en)
WO (1) WO1998013763A2 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999045474A2 (en) * 1998-03-06 1999-09-10 Pact Informationstechnologie Gmbh Speed-optimized cache system
WO2001074134A2 (en) * 2000-03-31 2001-10-11 Intel Corporation System having a configurable cache/sram memory
US6539457B1 (en) * 2000-02-21 2003-03-25 Hewlett-Packard Company Cache address conflict mechanism without store buffers
US6557078B1 (en) * 2000-02-21 2003-04-29 Hewlett Packard Development Company, L.P. Cache chain structure to implement high bandwidth low latency cache memory subsystem
US6606684B1 (en) 2000-03-31 2003-08-12 Intel Corporation Multi-tiered memory bank having different data buffer sizes with a programmable bank select
WO2004049171A2 (en) * 2002-11-26 2004-06-10 Advanced Micro Devices, Inc. Microprocessor including cache memory supporting multiple accesses per cycle
WO2007111492A1 (en) * 2006-03-29 2007-10-04 Fidelix Co., Ltd. Multi-port memory device including plurality of shared blocks
US7769950B2 (en) 2004-03-24 2010-08-03 Qualcomm Incorporated Cached memory system and cache controller for embedded digital signal processor
CN102622192A (en) * 2012-02-27 2012-08-01 北京理工大学 Weak correlation multiport parallel store controller
US8583873B2 (en) 2010-03-10 2013-11-12 Samsung Electronics Co., Ltd. Multiport data cache apparatus and method of controlling the same
US8977800B2 (en) 2011-02-25 2015-03-10 Samsung Electronics Co., Ltd. Multi-port cache memory apparatus and method

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100780621B1 (en) * 2005-09-29 2007-11-29 주식회사 하이닉스반도체 Multi port memory device
US7613065B2 (en) 2005-09-29 2009-11-03 Hynix Semiconductor, Inc. Multi-port memory device
US9171594B2 (en) * 2012-07-19 2015-10-27 Arm Limited Handling collisions between accesses in multiport memories
KR102346629B1 (en) * 2014-12-05 2022-01-03 삼성전자주식회사 Method and apparatus for controlling access for memory

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5056002A (en) * 1987-02-09 1991-10-08 Nec Corporation Cache memory for use with multiprocessor systems
US5274790A (en) * 1990-04-30 1993-12-28 Nec Corporation Cache memory apparatus having a plurality of accessibility ports
US5276850A (en) * 1988-12-27 1994-01-04 Kabushiki Kaisha Toshiba Information processing apparatus with cache memory and a processor which generates a data block address and a plurality of data subblock addresses simultaneously
US5434989A (en) * 1991-02-19 1995-07-18 Matsushita Electric Industrial Co., Ltd. Cache memory for efficient access with address selectors

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5056002A (en) * 1987-02-09 1991-10-08 Nec Corporation Cache memory for use with multiprocessor systems
US5276850A (en) * 1988-12-27 1994-01-04 Kabushiki Kaisha Toshiba Information processing apparatus with cache memory and a processor which generates a data block address and a plurality of data subblock addresses simultaneously
US5274790A (en) * 1990-04-30 1993-12-28 Nec Corporation Cache memory apparatus having a plurality of accessibility ports
US5434989A (en) * 1991-02-19 1995-07-18 Matsushita Electric Industrial Co., Ltd. Cache memory for efficient access with address selectors

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999045474A3 (en) * 1998-03-06 1999-11-11 Pact Inf Tech Gmbh Speed-optimized cache system
WO1999045474A2 (en) * 1998-03-06 1999-09-10 Pact Informationstechnologie Gmbh Speed-optimized cache system
US6539457B1 (en) * 2000-02-21 2003-03-25 Hewlett-Packard Company Cache address conflict mechanism without store buffers
US6557078B1 (en) * 2000-02-21 2003-04-29 Hewlett Packard Development Company, L.P. Cache chain structure to implement high bandwidth low latency cache memory subsystem
US6898690B2 (en) 2000-03-31 2005-05-24 Intel Corporation Multi-tiered memory bank having different data buffer sizes with a programmable bank select
WO2001074134A2 (en) * 2000-03-31 2001-10-11 Intel Corporation System having a configurable cache/sram memory
WO2001074134A3 (en) * 2000-03-31 2002-05-23 Intel Corp System having a configurable cache/sram memory
US6606684B1 (en) 2000-03-31 2003-08-12 Intel Corporation Multi-tiered memory bank having different data buffer sizes with a programmable bank select
EP2312447A1 (en) * 2000-03-31 2011-04-20 Intel Corporation System having a configurable cache/SRAM memory
US6446181B1 (en) 2000-03-31 2002-09-03 Intel Corporation System having a configurable cache/SRAM memory
US7073026B2 (en) 2002-11-26 2006-07-04 Advanced Micro Devices, Inc. Microprocessor including cache memory supporting multiple accesses per cycle
WO2004049171A3 (en) * 2002-11-26 2004-11-04 Advanced Micro Devices Inc Microprocessor including cache memory supporting multiple accesses per cycle
CN1717664B (en) * 2002-11-26 2010-10-27 先进微装置公司 Microprocessor, cache memory sub-system and cumputer system
WO2004049171A2 (en) * 2002-11-26 2004-06-10 Advanced Micro Devices, Inc. Microprocessor including cache memory supporting multiple accesses per cycle
US7769950B2 (en) 2004-03-24 2010-08-03 Qualcomm Incorporated Cached memory system and cache controller for embedded digital signal processor
US8316185B2 (en) 2004-03-24 2012-11-20 Qualcomm Incorporated Cached memory system and cache controller for embedded digital signal processor
WO2007111492A1 (en) * 2006-03-29 2007-10-04 Fidelix Co., Ltd. Multi-port memory device including plurality of shared blocks
US8583873B2 (en) 2010-03-10 2013-11-12 Samsung Electronics Co., Ltd. Multiport data cache apparatus and method of controlling the same
US8977800B2 (en) 2011-02-25 2015-03-10 Samsung Electronics Co., Ltd. Multi-port cache memory apparatus and method
CN102622192A (en) * 2012-02-27 2012-08-01 北京理工大学 Weak correlation multiport parallel store controller

Also Published As

Publication number Publication date
EP0875030A2 (en) 1998-11-04
JP2000501539A (en) 2000-02-08
WO1998013763A3 (en) 1998-06-04
KR19990071554A (en) 1999-09-27

Similar Documents

Publication Publication Date Title
US5640534A (en) Method and system for concurrent access in a data cache array utilizing multiple match line selection paths
US5247649A (en) Multi-processor system having a multi-port cache memory
US5239642A (en) Data processor with shared control and drive circuitry for both breakpoint and content addressable storage devices
US4805098A (en) Write buffer
US8032715B2 (en) Data processor
US6192458B1 (en) High performance cache directory addressing scheme for variable cache sizes utilizing associativity
US6321296B1 (en) SDRAM L3 cache using speculative loads with command aborts to lower latency
JPH06309216A (en) Data processor with cache memory capable of being used as linear ram bank
US5251310A (en) Method and apparatus for exchanging blocks of information between a cache memory and a main memory
US6157980A (en) Cache directory addressing scheme for variable cache sizes
US5668972A (en) Method and system for efficient miss sequence cache line allocation utilizing an allocation control cell state to enable a selected match line
US5805855A (en) Data cache array having multiple content addressable fields per cache line
EP0875030A2 (en) Multi-port cache memory with address conflict detection
JPH05173837A (en) Data processing system wherein static masking and dynamic masking of information in operand are both provided
US6381686B1 (en) Parallel processor comprising multiple sub-banks to which access requests are bypassed from a request queue when corresponding page faults are generated
US6988167B2 (en) Cache system with DMA capabilities and method for operating same
US5761714A (en) Single-cycle multi-accessible interleaved cache
EP0340668B1 (en) Multi-processor system having a multi-port cache memory
US5809537A (en) Method and system for simultaneous processing of snoop and cache operations
US5161219A (en) Computer system with input/output cache
US5890221A (en) Method and system for offset miss sequence handling in a data cache array having multiple content addressable field per cache line utilizing an MRU bit
JPH06318174A (en) Cache memory system and method for performing cache for subset of data stored in main memory
US20020108021A1 (en) High performance cache and method for operating same
US5696938A (en) Computer system permitting mulitple write buffer read-arounds and method therefor
US7346746B2 (en) High performance architecture with shared memory

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

WWE Wipo information: entry into national phase

Ref document number: 1997940270

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1019980703828

Country of ref document: KR

ENP Entry into the national phase

Ref country code: JP

Ref document number: 1998 515453

Kind code of ref document: A

Format of ref document f/p: F

AK Designated states

Kind code of ref document: A3

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 1997940270

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1019980703828

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 1997940270

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1019980703828

Country of ref document: KR