搜尋 圖片 地圖 Play YouTube 新聞 Gmail 雲端硬碟 更多 »
進階專利搜尋 | 網頁圖片 | 網頁紀錄 | 登入

專利

  
[blocks in formation]
[merged small][merged small][merged small][table][table]

1

METHOD AND APPARATUS FOR
PERFORMING BI-ENDIAN BYTE AND
SHORT ACCESSES IN A SINGLE-ENDIAN
MICROPROCESSOR

5

• BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to the field of microprocessors and more specifically to single-endian 10 microprocessors that are compatible with bi-endian systems.

2. Art Background

Byte ordering determines how data is read from or written to memory and buses and ultimately how data is stored in the memory. The two byte ordering types are referred to as little 15 endian and big endian. Consider a word having bytes A, B, C, and D where A is the most significant byte, B is the next most significant byte, C is the third most significant byte, and D is the least significant byte. Little endian systems store words in which the least significant byte is at the lowest 20 address in memory. If a little endian ordered word is stored at 1000H, for example, D is stored at 1000H, C is stored at 1001H, B is stored at 1002H, and A is stored at 1003H, i.e. ABCD. A big endian ordered word stores the least significant byte-at the highest byte address in memory. Therefore, 25 if a big endian word is stored at 1OOOH, A is stored at 1000H, B is stored at 1001H, C is stored at 1002H, and D is stored at 1003H, i.e. DCBA.

Typically, a processor will operate in either a little endian 3Q or big endian mode and the bus attached to the processor operates in the same mode. Although some processors can operate either in the big endian mode or little endian mode, most processors typically operate in one mode and perform, as necessary, a translation of data received from a memory 35 or other external devices prior to use by the processor. For example, processors manufactured by Intel Corporation use little endian format internally. Therefore, the processor performs operations in little endian format and likewise the internal bus which connects the processor is also little ^ endian. Translations are performed prior to use by the processor, for example at the bus controller, so that the information is in the proper format prior to receipt by the processor.

The typical translation converts big endian ordered data to 45 little endian ordered data, or vice versa, by switching the order of the bytes. For example, in processors manufactured by Intel Corporation, a big endian ordered word DCBA received by the microprocessor is converted to little endian ABCD by the bus controller before being placed on the 50 internal bus. Likewise, when the processor stores little endian data ABCD to a big endian external memory, the bus controller converts the data to DCBA before storing. Thus, the translation is performed during both load and store accesses to big endian ordered memory locations. 55

However, this byte ordering translation does not correctly handle byte and short accesses to big endian ordered memory where a processor has a little endian data cache or where a little endian processor promotes cacheable byte or short (two bytes) accesses to word accesses. For example, 60 big endian ordered data DCBA is stored at memory location 1000H. A copy of the big endian data is stored at 1000H in the data cache unit in little endian format as ABCD. Suppose that the processor requests byte data D at 1003H. In a first case, the access "hits" the data cache. But data A at 1003H 65 in the data cache is not a copy of data D at 1003H in external memory. Therefore, the data cache returns incorrect data to

2

the processor. Effectively the same result occurs in a second case when the byte access does not hit the data cache. In that case the byte access to 1003H is promoted to a word access to 1000H by the bus controller. The bus controller returns, after byte ordering translation, ABCD to the internal bus. Again, data A returned to the internal bus at the position corresponding to 1003H is not the same as data D at 1003H in external memory.

Therefore, a method and apparatus for performing biendian byte and short accesses in a single-endian microprocessor is needed.

SUMMARY OF THE INVENTION

The present invention is drawn to an apparatus, system, and method for performing bi-endian byte and short accesses in a single-endian microprocessor. The microprocessor of the present invention comprises a core means, a bus controller means, a converting means and a pointer means.

The core means issues instructions, including instructions to load sub-word and word data from a memory external to the microprocessor, on a local bus. The bus controller means loads sub-word and word data from the external memory in response to the load instructions issued by the core means. The bus controller means places the loaded data on the local bus. The core means receives the loaded data from the local bus. The bus controller means selectively promotes subword loads of sub-word data to word loads of word data. The word data loaded by a promoted sub-word load includes the sub-word data requested by the sub-word load.

The microprocessor only handles data of a first endian ordering. The external memory stores data of the first endian ordering in a first region and data of a second endian ordering in a second region.

The converting means converts data loaded from the second region of the external memory to the first endian ordering before the bus controller means places the loaded data on the local bus.

The pointer means points to sub-word data within word data placed on the local bus by the bus controller means when the bus controller means promotes a sub-word load of sub-word data to a word load of a word data. The pointer means thereby allows the core means to receive the subword data from the local bus The pointer means points to the first sub-word data independently of whether the word data is loaded from the first region of the external memory or the second region of the external memory.

One object of the present invention is to allow the data cache of a single-endian microprocessor to correctly handle sub-word memory accesses to bi-endian external memory. For this end the present invention provides an apparatus and method for pointing to the appropriate corresponding location in the data cache independently of whether the microprocessor accesses data in big endian or little endian ordered regions of the external memory.

Another object of the present invention is to allow a bus controller that promotes sub-word loads to word loads to point the processor core to the correct sub-word data returned as part of the word data on a local bus. For this end the present invention provides an apparatus and method for pointing to the appropriate sub-word data within the promoted word data returned to the local bus independently of whether the bus controller loads data from big endian or little endian ordered regions of the external memory.

3

Another object of the present invention is to improve the flexibility of a computer system by providing a computer system that uses a single endian microprocessor capable of accessing bi-endian external memory.

Yet another object of the present invention is to provide 5 bi-endian compatibility in a microprocessor without undue or overly complex modifications to existing bus controller or data cache circuits.

BRIEF DESCRIPTION OF THE DRAWINGS 10

FIG. 1 is a block diagram of the preferred embodiment of the microprocessor of the present invention.

FIG. 2 is a block diagram of the DCU pointer unit 26 and the BCL pointer unit 32 of the present invention. 15

FIG. 3 is a flowchart of the method of the preferred embodiment of the present invention as embodied in the data cache unit 16.

FIG. 4 is a flowchart of the method of the preferred

20

embodiment of the present invention as embodied in the bus controller logic 18.

FIG. 5 is a block diagram of one overall system embodiment of the present invention.

DETAILED DESCRIPTION OF THE 25
INVENTION

A method and apparatus for performing bi-endian byte and short accesses in a single-endian microprocessor is described. In the following description, numerous specific 30 details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods and circuits are shown in 35 block diagram form in order not to obscure the present invention unnecessarily. It is understood that the present invention is comprised of transistor circuits that are readily manufacturable using well known CMOS (complimentary metal-oxide semiconductor) technology, or other equivalent 40 semiconductor manufacturing process.

OVERALL DESIGN OF THE PREFERRED EMBODIMENT OF THE PRESENT INVENTION

45

FIG. 1 illustrates a block diagram of the preferred embodiment of the present invention. The microprocessor 10 of the present invention includes a processor core 12, a data cache unit (DCU) 16, and a bus controller logic (BCL) 18, all coupled to a memory-side machine bus (MMB) 14. 50 The memory-side machine bus 14 allows a common data transfer and control path between the units that are connected to it. The processor core 12 issues instructions on the memory-side machine bus 14 and processes instructions and data. The instructions issued by the processor core 12 55 include instructions for accessing data in an external memory 22. For example, a LOAD instruction is for loading or retrieving data from external memory and a STORE instruction is for storing data in external memory. The memory access instructions load and store byte, short, and 60 word data. (A word consists of four bytes; a short consists of two bytes.) The external memory 22 stores little endian ordered data in a first memory region 21 and big endian ordered data in a second memory region 23. The bus controller logic 18 and the external memory 22 are coupled 65 to a system bus 20. The system bus 20 is used to transfer data between the microprocessor 10 and external devices such as

4

external memory 22. The bus controller logic 18 controls data transfers on the system bus 20. A memory region table in the bus controller logic 18 divides the external memory 22 into regions. Each entry in the table identifies characteristics of a region of external memory. For example, some characteristics determined by the memory region table are whether a region is little endian or big endian, and whether a region is cacheable. The data cache unit 16 stores copies of data that the bus controller logic 18 retrieves from the external memory 22 for subsequent use by the processor core 12.

The data cache unit 16 receives a CACHEABLE12 30 signal from the bus controller logic 18. The CACHEABLE^ 30 signal indicates whether an access by the processor core 12 is to a cacheable region of the external memory 22. The CACHEABLE12 30 signal causes the data cache unit 16 to handle only cacheable data accesses.

The bus controller logic 18 receives a DCLILDMISSQ21 28 signal from the data cache unit 16. The DCULDMISSQ21 28 signal indicates whether the data cache unit 16 stores a copy of the cacheable data requested by the processor core 12. The DCULDMISSQ21 28 signal causes the bus controller logic 18 to retrieve cacheable data from the external memory 22 only when a copy of the requested data is not stored in the data cache unit 16.

The microprocessor 10 of the present invention is entirely little endian. The processor core 12 performs operations in little endian format, the memory-side machine bus 14 is little endian, and the data cache unit 16 stores and returns copy data in little endian format. A byte converter 24 performs byte ordering translation when the bus controller logic 18 transfers word and short data between the little endian microprocessor 10 and a big endian region of the external memory 22. For example, when the processor core 12 stores a little endian word ABCD to a big endian ordered region of the external memory 22, the byte converter 24 converts the word to DCB A before the bus controller logic 18 places it on the system bus 20 to be stored in the external memory 22. Likewise, when the processor core 12 loads a big endian word DCB A from a big endian ordered region of the external memory 22, the byte converter 24 converts the word to ABCD before the bus controller logic 18 returns it to the processor core 12 on the memory-side machine bus 14. The translation is similar for short STOREs and for LOADs that are not promoted to word accesses. For example, when the processor core 12 stores a little endian short data EF to a big endian ordered region of the external memory 22, the byte converter 24 converts the short data to FE before the bus controller logic 18 places it on the system bus 20. Likewise, when the processor core 12 loads a big endian short FE from a big endian ordered region of the external memory 22, the byte converter 24 converts the short data to EF before the bus controller logic 18 returns it on the memory-side machine bus 14.

As will be further explained, the microprocessor 10 of the present invention selectively promotes short or byte LOADs to word LOADs. The BYTENUM1 34 and BYTENUM0 36 signals are control signals carried by the memory-side machine bus 14 that indicate to the processor core 12 the position of sub-word data within a word returning to the core 12 on the memory-side machine bus 14. Each unit coupled to the memory-side machine bus 14 can drive the BYTENUM1 34 and BYTENUM0 36 signals when returning data to the processor core 12.

The data cache unit 16 is a 1 kilobyte, direct-mapped, write through, little endian cache. The cache memory array 5

17 is organized in 64 lines consisting of 16 bytes each. Each line comprises four words of four bytes each, each word having a valid bit to indicate whether the corresponding word is valid. A tag array of 64 entries corresponds to the 64 lines in the data cache unit 16. The data cache unit 16 always 5 returns word data when a LOAD "hits" the data cache. Therefore, when a sub-word LOAD "hits" the data cache unit 16, the BYTENUM1 34 and BYTENUMO 36 signals must point to the correct sub-word data within the word returned to the memory-side machine bus 14. The BYTENUM1 34 and BYTENUMO 36 signals are also used locally within the data cache unit 16 during a cacheable sub-word STORE as the two least significant bits (LSBs) of the data address to address the cache memory array 17 and, thereby, store the sub-word data in the correct location in the cache memory array 17. 15

A DCU pointer unit 26 drives the BYTENUM1 34 and BYTENUMO 36 signals during a data cache access to allow the data cache unit 16 to correctly handle sub-word (byte or short) LOADs and STOREs by the processor core 12 to big 20 endian regions of the external memory 22. Normally, BYTENUM1 34 and BYTENUMO 36 signals are the same as the two least significant bits (LSBs) of the address, i.e. Al and AO. However, for big endian short or byte accesses the values are different. For example, big endian ordered data 25 DCBA is stored in the external memory 22 at 1000H. A copy of the big endian data is stored at 1000H in the data cache unit 16 in little endian format as ABCD. Suppose that the processor core 12 issues a LOAD instruction requesting the byte D that is stored at 1003H in the external memory 22. 3Q The data cache unit 16 receives the address 1003H from the processor core 12 via the memory-side machine bus 14. The data cache unit 16 returns the word ABCD containing byte D to the memory-side machine bus 14. Because the access is a byte access from big endian ordered memory, the DCU „ pointer unit 26 drives the BYTENUM1 34 and BYTENUMO 36 signals to Al# and A0#, respectively, (where Al and AO are the two least significant bits (LSBs) of the address and Al# means the complement of Al and A0# means the complement of AO) such that the processor core 12 receives 4Q the data D corresponding to location 1000H instead of the data A corresponding to location 1003H.

The DCU pointer unit 26 also adjusts for short accesses to big endian memory. For example, big endian ordered data HGFE is stored in the external memory 22 at 2000H. A copy 45 of the big endian data is stored at 2000H in the data cache unit 16 in little endian format as EFGH. Suppose that the processor core 12 issues a STORE instruction to store short AB to 2000H, a cacheable location in the external memory 22. The byte converter 24 reorders the short to B A before the 50 bus controller logic 18 stores the short at location 2000H in the external memory 22. The result is that word data HGB A now resides at 2000H in the external memory 22. In addition, the data cache unit 16 will store a copy of the short AB in order to maintain an accurate copy of the external dam. 55 Because the access is a short access to big endian ordered memory, the DCUM pointer unit 26 drives the BYTENUM1 34 and BYTENUMO 36 signals to Al# and AO, respectively. The data cache unit 16 uses the BYTENUM1 34 and BYTENUMO 36 signals locally as the two LSBs of the 6Q address that addresses the cache memory array 17 such that the data cache unit 16 stores short data AB at location 2002H instead of the 2000H. Therefore, the data cache unit 16 stores a correct copy data ABGH at 2000H of the big endian data HGBA at 2000H in the external memory 22. 65

The bus controller logic 18 promotes all cacheable byte and short LOADs that "miss" the data cache unit 16 to word

6

accesses. (This is done because the valid bits in the cache memory array 17 of the data cache unit 16 are word granularity.) This is done by promoting the sub-word LOAD to word and treating the two least significant bits of the address, Aland AO, as zeros. For example, a cacheable byte LOAD from address 1003H, or binary 0001 0000 0000 0011, that "misses" data cache unit 16 is promoted to a word access and the two least significant bits are treated as zeros, making the effective address 1000H, or binary 0001 0000 0000 0000. Therefore, the bus controller logic 18 will return the word data at 1000H in external memory.

A BCL pointer unit 32 allows the bus controller logic 18 to correctly handle sub-word LOADs that are promoted to word accesses. As already discussed, the bus controller logic 18 promotes all sub-word cacheable LOADs that "miss" the data cache unit 16 to word accesses. The BCL pointer unit 32 drives the BYTENUM1 34 and BYTENUMO 36 signals to point the processor core 12 to point to the correct sub-word data within the word data returned to the memoryside machine bus 14 by the bus controller logic 18. The result of using word promotion and the BYTENUM1 pointer is that the processor core 12 receives the byte or short data that it originally requested and the data cache unit 16 receives a full word to update the data cache. (Using a full word to update the data cache unit 16 is desirable because of the word granularity of the valid bits in the cache memory array 17.) Normally, BYTENUM1 34 and BYTENUMO 36 signals are the same as the two least significant bits (LSBs) of the address, Aland AO, respectively. However, for big endian short or byte accesses that are promoted to word accesses the values are different. For example, big endian ordered data DCBA is stored at memory location 1000H. Suppose that the processor core 12 issues a LOAD instruction requesting the byte B stored at 1001H in the external memory 22 and no copy of the data is stored in the data cache unit 16. The bus controller logic 18 promotes the byte LOAD to a word LOAD and retrieves the word DCBA from 1000H in the external memory 22. The byte converter 24 reorders the word to little endian format ABCD before the bus controller logic 18 returns the word to the memory-side machine bus 14. The BCL pointer unit 32 receives the two LSBs of the address 1001H and drives the BYTENUM1 34 and BYTENUMO 36 signals to Al# and A0#, respectively, such that the processor core 12 is pointed to data B corresponding to location 1002H instead of data C corresponding to location 1001H. Therefore, the processor core 12 correctly receives data B instead of data C. The BCL pointer unit 32 also adjusts the BYTENUM pointer values for word-promoted short LOADs from big endian external memory by driving the BYTENUM1 and BYTENUMO signals to Al# and AO, respectively.

Because the microprocessor 10 of the present invention is little endian and the external memory 22 is both little endian and big endian, the DCU pointer unit 26 and the BCL pointer unit 32 must account for both types of ordering, i.e., they must allow the data cache unit 16 and the bus controller logic 18 to correctly handle byte, short, word, and multiple word accesses by the processor core 12 to little endian or big endian ordered memory. To accomplish this, the DCU pointer unit 26 and the BCL pointer unit 32 adjust the BYTENUM 1 34 and BYTENUMO 36 pointer values in logical relation to the two LSBs of the address.

FIG. 2 is a block diagram of the DCU pointer unit 26 of the data cache unit 16 and the BCL pointer unit 32 of the bus controller logic 18. The DCU pointer unit 26 and the BCL pointer unit 32 are functionally equivalent. The DCU pointer unit 26 receives address bits Al 44 and AO 46, and control

« 上一頁繼續 »