This flowchart hides the most awful parts (IMO) of x86 prefixes: some combinations of prefixes are invalid but still parsed and executed, like combining two segment overrides, or placing a legacy prefix after a REX prefix.
The CPU also doesn't care if you use prefixes that aren't valid for a specific instruction, for example a REP on a non-repeatable instruction. The LOCK prefix is the only prefix that makes the sane choice to reject invalid combinations, rather than silently accept them.
Also, the (E)VEX prefix doesn't behave like the other prefixes: it must be placed last, and can therefore only appear once. All other prefixes can be repeated.
bonzinijust now
Yes, I wish this was this simple. :) There are many other complications:
* Some instructions require VEX.L or VEX.W to be 0 or 1, and some encodings result in completely different instructions if you change VEX.L.
* Different bits of the EVEX prefix are valid depending on the opcode byte.
* Some encodings (called groups) produce different instructions depending on bits 3-5 of the modrm byte (the second byte after all prefixes). Some encodings further produce different groups depending on whether bits 6-7 (mod) of the modrm byte identifies a register or not.
* Some instructions read a whole vector register but only a scalar if the same instruction has a memory operand. Sometimes this is clear in the manual, sometimes it is not, sometimes the manual is downright wrong.
* Some instructions do not allow using the legacy high-8-bits registers even though they don't do anything with bits 8 and above of the operand: they only want a 32- or 64-bit register as their operand.
* APX (EVEX map 4) looks a lot like legacy map 0, but actually a few instructions were moved there from other maps for good reasons, a few more were moved there for no apparent reason (SHLD/SHRD iirc), and a few more are new.
* REX2 does not extend SSE and AVX instructions to 32 registers even though REX does extend them to 16.
* Intel defines a thing called VEX instruction classes, which makes sense except for a dozen or two instructions where it doesn't. For these, sometimes AMD uses a different class, sometimes doesn't; sometimes AMD's choice makes sense, sometimes it doesn't.
And many more that I found out while writing QEMU's current x86 decoder (which tries to be table based but sometimes that's just impossible).
peterfirefly4 hours ago
> The CPU also doesn't care if you use prefixes that aren't valid for a specific instruction, for example a REP on a non-repeatable instruction.
This is one of the reasons why the x86 could be extended so much. PAUSE is just REP NOP, for example. Segment prefixes in front of conditional branches were used as static branch prediction hints (which I believe have returned in some newer Intel CPUs). Useful if you want to make a hint on newer CPUs that is harmless on older CPUs.
Some prefixes have become part of the encoding for certain SIMD instructions, but that is a different case because those prefixes aren't hints.
adrian_b1 hour ago
Not at all.
The correct behavior for allowing future extensions has already been introduced by Intel with 80186, in 1982, which has introduced an invalid instruction exception, to be used for all undefined instruction opcodes.
This behavior was unlike 8086/8088, which happily executed any undefined instructions, most of them being aliases to defined instructions.
For any opcode where current CPUs generate invalid instruction exceptions, it is very easy to define them in future CPUs to encode useful instructions. Had REP NOP generated exceptions in old CPUs, it would have been still fine for it to become PAUSE in current CPUs. Unfortunately, the designers of Intel CPUs have not always followed their own documentation, so not all invalid opcodes generate the exception, as they should. The non-enforcing of this condition has led to the existence of even commercial programs that are invalid or of compilers that generate officially invalid instructions.
It is true that there are a few cases when Intel has exploited the fact that some encodings were equivalent with a NOP on old CPUs, by reusing them for some instruction on new CPUs, where this allowed the execution of a program compiled for new CPUs on old CPUs. However this has been possible only for very few instructions, e.g. for branch direction hints, when not executing them on old CPUs does not change the result of a program.
In general the reuse of an opcode for a new instruction, when that opcode does not generate exceptions on old CPUs, is very dangerous, because the execution on old CPUs of a program compiled for new CPUs will have unpredictable consequences, like destroying some property of the user.
Your example with PAUSE is also one of the very few examples, besides branch hints, where the execution of a new program on old computers is not dangerous, despite the reassignment of the opcode.
Some time ago there was a discussion about a bug in some CPU, but I do not remember in which one, where the bug was triggered when the order of the REP prefix and of the 64-bit REX prefix was invalid, but the invalid order was ignored by the older CPUs instead of generating the appropriate exception, which allowed the execution of invalid programs, which did not have any bad effects on old CPUs, but they triggered the bug on that specific new CPU.
The new CPU should have been bug-free, but also the programs that triggered the bug should not have existed, as they should have crashed immediately on any older CPU.
vardump4 hours ago
I wonder whether there are some prefixes that cause (some) CPUs to execute the instruction a lot slower.
debugnik10 hours ago
This site redirects to HN when it notices HN in the referrer.
koito172 hours ago
The fact sites do evil things with these headers is why I configure Firefox with
network.http.referer.XOriginTrimmingPolicy
set to 2. It "breaks" sites, but often in good ways (such as the site in TFA).
mghackerlady1 hour ago
Referrer links are a dumb idea. Why the hell do you want to know where I'm coming from other than to track me
st_goliath7 hours ago
If you have JavaScript enabled, that is. JWZ at least does the redirect on the server side.
Copy the URL and manually paste it into a new tab, no referrer then.
trashb6 hours ago
This is an interesting way to prevent the hug of death. I wonder what the author's reasoning is, also would it really be effective?
debugnik6 hours ago
I doubt it, the redirect is client-side, I got a flash of the page before the redirect.
philjackson5 hours ago
If anything, it's going to at least double its traffic this way when people click again assuming they hit back somehow.
therein8 hours ago
Wow, I didn't even notice because I have extensions that strip the referrer header. Excellent.
chimpontherun8 hours ago
open in new tab
yellowapple6 hours ago
That doesn't seem to clear the referrer, at least on Firefox. Gotta go a step further and outright copy/paste the URL into an already-created tab.
high_na_euv4 hours ago
Open in private works
st_goliath6 hours ago
Fun little tidbit: The 0x40-0x4f range used for the REX prefix actually clashes with the single-byte encodings for increment/decrement.
When AMD designed the 64 bit extension, they had run out of available single-byte opcodes to use as a prefix and decided to re-use those. The INC/DEC instructions are still available in 64 bit mode, but not in their single-byte encodings.
TheAdamist2 hours ago
Which clever code can utilize to determine which mode its running in and branch appropriately depending if the inc/dec were executed or not.
where there are links to a couple of patents filed by Intel in 2000, about a 64-bit extension of the x86 ISA, which had been implemented in Pentium 4, but which had been nonetheless disabled and hidden from the users, in order to not compete with Itanium.
The page explains the content of the patents.
As already mentioned by another poster, at least on Firefox you have to open a tab and then copy this link there, to avoid being identified as an "undesirable" :-)
dataflow2 hours ago
There's EVEX2 now. It's hard to keep up...
snvzz8 hours ago
This is in no small part why x86 code density is awful despite variable size encoding.
themafia7 hours ago
Awful compared to what?
I've seen benchmarks that go both ways in terms of a "winner" but in terms of overall variance there seems to be very little. There are some cases where ARM64 or RISCV do better and there are some cases where x86_64 does better. I can't see code density being a relevant factor when picking one ISA over another.
We've got good compilers now anyways.. outside of power consumption.. the ISA wars are dead.
adgjlsfhk1just now
> outside of power consumption.
This is a pretty huge caveat. >90% of cpus are <1W (usb cables, wifi cards, storage controllers etc), and 99% are <10W (phones, lots of laptops)
whobre2 hours ago
Compared to VAX, of course…
bell-cot5 hours ago
Technically, code density still matters - because both L1 cache memory and L1 instruction fetch misses are very expensive.
But as you point out, code density gets far less attention in tech circles these days. And higher-level decision makers rightfully focus on higher-level system performance metrics.
The CPU also doesn't care if you use prefixes that aren't valid for a specific instruction, for example a REP on a non-repeatable instruction. The LOCK prefix is the only prefix that makes the sane choice to reject invalid combinations, rather than silently accept them.
Also, the (E)VEX prefix doesn't behave like the other prefixes: it must be placed last, and can therefore only appear once. All other prefixes can be repeated.
* Some instructions require VEX.L or VEX.W to be 0 or 1, and some encodings result in completely different instructions if you change VEX.L.
* Different bits of the EVEX prefix are valid depending on the opcode byte.
* Some encodings (called groups) produce different instructions depending on bits 3-5 of the modrm byte (the second byte after all prefixes). Some encodings further produce different groups depending on whether bits 6-7 (mod) of the modrm byte identifies a register or not.
* Some instructions read a whole vector register but only a scalar if the same instruction has a memory operand. Sometimes this is clear in the manual, sometimes it is not, sometimes the manual is downright wrong.
* Some instructions do not allow using the legacy high-8-bits registers even though they don't do anything with bits 8 and above of the operand: they only want a 32- or 64-bit register as their operand.
* APX (EVEX map 4) looks a lot like legacy map 0, but actually a few instructions were moved there from other maps for good reasons, a few more were moved there for no apparent reason (SHLD/SHRD iirc), and a few more are new.
* REX2 does not extend SSE and AVX instructions to 32 registers even though REX does extend them to 16.
* Intel defines a thing called VEX instruction classes, which makes sense except for a dozen or two instructions where it doesn't. For these, sometimes AMD uses a different class, sometimes doesn't; sometimes AMD's choice makes sense, sometimes it doesn't.
And many more that I found out while writing QEMU's current x86 decoder (which tries to be table based but sometimes that's just impossible).
This is one of the reasons why the x86 could be extended so much. PAUSE is just REP NOP, for example. Segment prefixes in front of conditional branches were used as static branch prediction hints (which I believe have returned in some newer Intel CPUs). Useful if you want to make a hint on newer CPUs that is harmless on older CPUs.
Some prefixes have become part of the encoding for certain SIMD instructions, but that is a different case because those prefixes aren't hints.
The correct behavior for allowing future extensions has already been introduced by Intel with 80186, in 1982, which has introduced an invalid instruction exception, to be used for all undefined instruction opcodes.
This behavior was unlike 8086/8088, which happily executed any undefined instructions, most of them being aliases to defined instructions.
For any opcode where current CPUs generate invalid instruction exceptions, it is very easy to define them in future CPUs to encode useful instructions. Had REP NOP generated exceptions in old CPUs, it would have been still fine for it to become PAUSE in current CPUs. Unfortunately, the designers of Intel CPUs have not always followed their own documentation, so not all invalid opcodes generate the exception, as they should. The non-enforcing of this condition has led to the existence of even commercial programs that are invalid or of compilers that generate officially invalid instructions.
It is true that there are a few cases when Intel has exploited the fact that some encodings were equivalent with a NOP on old CPUs, by reusing them for some instruction on new CPUs, where this allowed the execution of a program compiled for new CPUs on old CPUs. However this has been possible only for very few instructions, e.g. for branch direction hints, when not executing them on old CPUs does not change the result of a program.
In general the reuse of an opcode for a new instruction, when that opcode does not generate exceptions on old CPUs, is very dangerous, because the execution on old CPUs of a program compiled for new CPUs will have unpredictable consequences, like destroying some property of the user.
Your example with PAUSE is also one of the very few examples, besides branch hints, where the execution of a new program on old computers is not dangerous, despite the reassignment of the opcode.
Some time ago there was a discussion about a bug in some CPU, but I do not remember in which one, where the bug was triggered when the order of the REP prefix and of the 64-bit REX prefix was invalid, but the invalid order was ignored by the older CPUs instead of generating the appropriate exception, which allowed the execution of invalid programs, which did not have any bad effects on old CPUs, but they triggered the bug on that specific new CPU.
The new CPU should have been bug-free, but also the programs that triggered the bug should not have existed, as they should have crashed immediately on any older CPU.
The following is pulled in from `https://soc.me/assets/js/turnBack.js`:
I wonder why Reddit is "temporarily not undesirable".https://github.com/soc/soc.me/blame/main/assets/js/turnBack....
Although, when we inspect author's profile on lobste.rs, we'll see that he's banned:
https://lobste.rs/~soc [Banned 4 years ago by pushcx: Troll.]
Maybe he's banned from HN as well. And this 'undesirables' is a method of taking some kind of revenge.
https://news.ycombinator.com/user?id=soc
When AMD designed the 64 bit extension, they had run out of available single-byte opcodes to use as a prefix and decided to re-use those. The INC/DEC instructions are still available in 64 bit mode, but not in their single-byte encodings.
https://soc.me/interfaces/intels-original-64bit-extensions-f...
where there are links to a couple of patents filed by Intel in 2000, about a 64-bit extension of the x86 ISA, which had been implemented in Pentium 4, but which had been nonetheless disabled and hidden from the users, in order to not compete with Itanium.
The page explains the content of the patents.
As already mentioned by another poster, at least on Firefox you have to open a tab and then copy this link there, to avoid being identified as an "undesirable" :-)
I've seen benchmarks that go both ways in terms of a "winner" but in terms of overall variance there seems to be very little. There are some cases where ARM64 or RISCV do better and there are some cases where x86_64 does better. I can't see code density being a relevant factor when picking one ISA over another.
We've got good compilers now anyways.. outside of power consumption.. the ISA wars are dead.
This is a pretty huge caveat. >90% of cpus are <1W (usb cables, wifi cards, storage controllers etc), and 99% are <10W (phones, lots of laptops)
But as you point out, code density gets far less attention in tech circles these days. And higher-level decision makers rightfully focus on higher-level system performance metrics.