« 上一頁繼續 »
U.S. Patent Feb. 23, 1999 Sheet 4 of 6 5,875,470
T POSITION IN ADDRESS*
1 2345678901 2345678901 2
1 22 BIT BYTE ADDRESS
1 19 BIT ADDRESS TO CHIP
h — H 2 BIT PROC-ID
1 9 BITS TO SELECT ROW IN BANK
I 1 2 BITS TO SELECT SECTION
I 1 2 BITS TO SELECT BANK
hH 1ST OF 3 BITS TO SELECT PAGE IN
SENSE AMPLIFIER LATCHES
I 1 2ND & 3RD OF 3 BITS TO SELECT PAGE
WITHIN SENSE AMPLIFIER LATCHES
h - H 3 BITS TO SELECT 8 BYTES FROM PORT REGISTER
ACCESS DRAM CHIP
This application is a continuation of application Ser. No. 08/535,395, filed Sep. 28, 1995, now abandoned. 5
This invention relates to a multi-port multi-bank memory architected to enable manufacture of the memory in a single DRAM chip having a plurality of input/output ports and being capable of handling a large number of accesses in parallel.
The prior art has a multitude of single-port single-bank DRAM memory chips and of memory configurations of such memory chips in single-port and multiple-port arrangements. However, the prior art is not known to disclose any single chip architecture for structuring a single DRAM 20 semiconductor chip with multiple ports and multiple DRAM banks—which is the primary object of the subject invention. A clear distinction needs to be made between different types of memory chips.
For example, U.S. Pat. No. 4,745,545 shows a memory 25 using memory banks which "the memory banks are organized into each section of memory in a sequential and interleaved fashion", which is not the way the internals of the subject invention are organized (and in which it is believed that each memory bank may be a separate chip). 30 U.S. Pat. No. 4,745,545 focuses on having unidirectional ports (read or write), and on conflict resolution among its ports, and on supporting an interleaved memory for multiple processor accesses, which is not a focus within the chip of the subject invention which does not have unidrectional 35 ports.
SUMMARY OF THE INVENTION
The invention provides an architecture for a semiconduc- 4Q tor chip containing multiple bidirectional ports supporting a plurality of independent DRAM banks all packaged in the single chip. This invention can support access requests from plural processors, or plural execution units, connected to different ports of the chip. Simultaneously independent 45 accesses may be made in the separate DRAM banks within the chip without causing conflict among parallel requests to different sections of the memory within the chip. Any conflict among parallel requests to the same section of the memory may be resolved within the chip. This invention is 5Q not concerned with access conflict resolution made outside of the memory chip.
Access parallelism provided within the memory semiconductor chip of this invention enables a high data rate to/from the chip to an extent that enables the chip to replace a more 55 expensive SRAM (static random access memory) in a memory hierarchy of a computer system. For example, the chip may be used as a second level (L2) cache communicating data in parallel to/from plural processor private caches in a computer system. go
Access requests to the memory chip may be made by multiple execution units and/or instruction unit(s) in the same processor, or by different processors; and the processors may be of any type, such as central processors and I/O processors, etc. The single chip architecture of this invention 65 results in a complex high-speed memory packaged in a smaller semiconductor die size than for chips using either
SRAM or CDRAM technology for equivalent high-speed memory capacity. This DRAM single chip architecture allows lower development cost and lower manufacturing complexity than prior memory chips using combined SRAM/DRAM technology (as in Cached DRAM).
A preferred embodiment of this invention combines within a semiconductor chip a plurality of independent memory banks (comprised of DRAM arrays) into multiple memory sections, a cross-point switch for simultaneously connecting plural data buses of the multiple memory sections to a plurality of port registers, and to transfer data between the port register and a plurality of ports on the chip in either data direction to effectively support a high data rate to/from the memory chip. The data may be transferred entirely in parallel between the port and a corresponding port register, or the data may be multiplexed between the port and its port register in sets of parallel bits. Each of the DRAM banks in the chip is independently addressed through a bank address control in the chip which receives all address requests from processors in a computer system.
The banks in the chip are divided into a plurality of memory sections, and all DRAM banks in each section are connected to the same section data bus, which is connected to the matrix switch. Data flows through each data bus into or out of one of the port registers through the matrix switch. Each data bus is comprised of a large number of data lines that transfer data bits in parallel to/from all of DRAM cells in an address-selected row in one of the DRAM banks in the section.
This invention extends the "banking" concept beyond that of prior SDRAM chip technology which uses independent Row Address Strobe (RAS) banks of memory cells, complete with their associated I/O overhead, which are multiplexed onto the output ports. A "memory section" concept used in this invention sub-divides each complete memory section into plural parallel DRAM banks providing parallel address space which time shares the I/O overhead of the section. This architecture realizes DRAM latency advantages in multiprocessor systems without increased secondary data bus overhead (and therefore die area) requirements.
A cross-point switch within the chip is connected between all section data buses and the plurality of port registers. Each port register receives and stores all parallel bits for a row of a bank in any connected section of the chip. Each port register also may be permanently connected to one of a plurality of ports of the chip. Each port is comprised of a plurality of chip I/O pins, each pin transferring a bit of data in or out of the port. The number of parallel bits for each port may be a fraction of the number of data bits stored in parallel by its port register, and this difference may be accommodated by a bit selector located between each port and its port register. The bit selector selects a subset of bits of the register during each cycle for transfer between the port and port register, so that k cycles may be used to multiplex a row of bits between the port register and its port.
An I/O selection control in the chip controls the switching by the cross-point switch—to accommodate a plurality of simultaneous section-bus/port-register data transfers in either direction. When the matrix switch connects the plural section data buses to the plurality of port registers, any section data bus may be connected to any port register. Thus, the I/O selection control may simultaneously connect the data content of a row in one bank in each of the plurality of memory sections to a different one of the plurality of port registers in either direction, in order to pass data in parallel between all ports and all data buses.
The timing of the parallel data transfers between the ports and the banks is synchronized by system timing used by all processors connected to the ports. The processor timing signals are provided to the chip to synchronize the data transfers. Generally, all section data buses may start and end 5 their bank row transfers in the same cycles; the cross-point switch may then switch between the synchronized data bus transfers; and all processors may then synchronously transfer a line of data to/from their assigned ports at the same time. lQ
Extraordinarily high parallelism of both data access and data transfer is provided by the novel DRAM structure of this invention in its multiple-bank, multiple-data-bus structure in a single chip, in which all data-buses may be transferring data in parallel to/from different banks in the 15 chip, while all other banks in the chip may be simultaneously accessing data in parallel with each other and in parallel with all data-bus transfers to/from multiple requesters external of the chip.
The memory access speed of DRAMs and SRAMs is not 2o changed by this invention, and each DRAM memory access remains substantially slower than a SRAM memory access when comparing DRAM and SRAM memories using the same semiconductor manufacturing technology on the same chip size—including in the DRAM structure of this inven- 2s tion.
It is therefore not obvious that the subject DRAM chip structure, operating at the slower access speed of DRAMs, actually can provide substantially faster average memory access on a system-wide basis than can be provided with any 30 conventional SRAM chip structure occupying the same chip size and manufacturing technology. Further, the DRAM chip structure of this invention can provide a significantly better cost/performance than can such SRAM chips.
Yet with the DRAM access time disadvantage, this inven- 35 tion's DRAM shared cache chip nevertheless obtains significantly faster average access time, and a significant cost/ performance improvement, over conventional SRAM shared caches—particularly for shared caches of the type commonly called L2 caches in the prior art. 40
In today's technology, the on-chip data bit density of DRAMs is about 10 times the bit density of SRAMs. Because of the novel DRAM structure of this invention, this DRAM/SRAM density difference enables the extraordinarily high access and transfer parallelism to obtain about a 45 three times better memory-hit ratio compared to a conventional SRAM cache on the same size chip. In this manner, the invention exploits the well-known higher data density of DRAM technology over SRAM technology.
It is therefore not obvious that system level performance 50 can be significantly improved by this DRAM invention with slower processor access to the shared cache. That is, the invention exploits its extraordinary parallelism capability to obtain a very high hit ratio for multiple simultaneous processor accesses to more than makes up for slower individual 55 processor memory accesses.
Accordingly in the unique chip memory structure of this invention, the DRAM density is exploited by its extraordinarily-high parallel accessing and transferring of data to improve performance for the DRAM shared cache 60 even though individual processor accesses are slower.
SUMMARY DESCRIPTION OF THE
FIGS. 1A and IB represent a unique DRAM semicon- 65 ductor chip embodiment structured according to this invention.
FIG. 2 shows a detailed example of the structure of two DRAM banks in any section of the chip shown in FIGS. 1A and IB.
FIG. 3 shows a computer system arrangement which contains an L2 shared cache made of chips of the type shown in FIGS. 1A and IB.
FIG. 4 represents a computer system having an L3 main memory comprised of a plurality of chips of the type shown in FIGS. 1A and IB.
FIG. 5 is an address-bit diagram showing relative positions of subgroups of bits in a received processor address, and operations of these subgroups in accessing a row of data bits within the chip represented in FIGS. 1A and IB.
DESCRIPTION OF THE DETAILED
FIGS. 1A and IB together provide a block diagram of circuit logic showing the structure of a unique DRAM semiconductor chip made according to the teachings of this invention. This chip may contain an entire memory, or may be a part of a memory, for a designated level in a memory hierarchy of a computer system. For example, this one chip may provide an entire second level (L2) memory (L2 cache), accessed by a plurality of processors (central processors and I/O processors) of a computer system. And the chip also may support a level three (L3) main memory of a computer system; and such L3 main memory may be comprised of one or a plurality of chips of the type shown in FIGS. 1A and IB to accommodate from small to very large main memory sizes.
FIG. 3 is an example of a computer system having four central processors (CPUs) accessing an L2 shared memory (cache) comprised of a single chip of the type as shown in FIGS. 1A and IB, and accessing a conventional L3 main memory which may be made of conventional DRAM chips.
FIG. 4 shows another computer system having an L3 main memory made of a plurality of chips in which each chip is the type shown in FIGS. 1A and IB, but without items 25, 26, 27, 28 and 29 being in these chips because these items are not needed in this main memory configuration. No L2 shared cache is represented in the system of FIG. 4. However, a computer system may have both an L2 shared cache and an L3 main memory made from chips of the type shown in FIGS. 1A and IB.
FIGS. lAand IB show the preferred embodiment the chip which contains data storage cells, addressing interface circuits and data interface circuits which may support a multiprocessor having four independently operating central processors (CPUs) and one or more I/O processors. The plural processors may be simultaneously accessing data (reading and/or writing data) in different DRAM storage banks in the chip.
The chip structure shown in FIGS. 1A and IB is logically divided into distinct parts, including DRAM storage parts shown in FIG. 1A, and an input/output part shown in FIG. IB. The storage part in FIG. 1A comprises four DRAM memory sections 1, 2, 3 and 4. Each memory section contains four DRAM storage banks connected to one data bus 5. The four storage sections 1, 2, 3, 4 each have a respective data bus 5-1, 5-2, 5-3, 5-4. The four sections therefore have a total of 16 DRAM banks 1.1 through 4.4. Chip Memory Sections and Bank Addressing
Further, each DRAM bank is connected to a respective one of sixteen address buses 11-1 through 14-4, which are connected to a bank address control 10 within the chip. Bank address control 10 may receive all memory addresses