LDDQU (Intel x86/64 assembly instruction)
Load Unaligned Integer 128 Bits
아래 표를 해석하는 방법은 x86-64 명령어 레퍼런스 읽는 법 글을 참조하시기 바랍니다.
Load unaligned data from mem and return double quadword in xmm1.
Load unaligned packed integer values from mem to xmm1.
Load unaligned packed integer values from mem to ymm1.
Instruction Operand Encoding
The instruction is functionally similar to (V)MOVDQU ymm/xmm, m256/m128 for loading from memory. That is: 32/16 bytes of data starting at an address specified by the source memory operand (second operand) are fetched from memory and placed in a destination register (first operand). The source operand need not be aligned on a 32/16-byte boundary. Up to 64/32 bytes may be loaded from memory; this is implementation dependent.
This instruction may improve performance relative to (V)MOVDQU if the source operand crosses a cache line boundary. In situations that require the data loaded by (V)LDDQU be modified and stored to the same location, use (V)MOVDQU or (V)MOVDQA instead of (V)LDDQU. To move a double quadword to or from memory locations that are known to be aligned on 16-byte boundaries, use the (V)MOVDQA instruction.
If the source is aligned to a 32/16-byte boundary, based on the implementation, the 32/16 bytes may be loaded more than once. For that reason, the usage of (V)LDDQU should be avoided when using uncached or write-combining (WC) memory regions. For uncached or WC memory regions, keep using (V)MOVDQU.
This instruction is a replacement for (V)MOVDQU (load) in situations where cache line splits significantly affect performance. It should not be used in situations where store-load forwarding is performance critical. If performance of store-load forwarding is critical to the application, use (V)MOVDQA store-load pairs when data is 256/128-bit aligned or (V)MOVDQU store-load pairs when data is 256/128-bit unaligned.
If the memory address is not aligned on 32/16-byte boundary, some implementations may load up to 64/32 bytes and return 32/16 bytes in the destination. Some processor implementations may issue multiple loads to access the appropriate 32/16 bytes. Developers of multi-threaded or multi-processor software should be aware that on these processors the loads will be performed in a non-atomic way.
If alignment checking is enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check exception (#AC) may or may not be generated (depending on processor implementation) when the memory address is not aligned on an 8-byte boundary.
In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.
LDDQU (128-bit Legacy SSE version)
DEST[127:0] <- SRC[127:0] DEST[VLMAX-1:128] (Unmodified)
VLDDQU (VEX.128 encoded version)
DEST[127:0] <- SRC[127:0] DEST[VLMAX-1:128] <- 0
VLDDQU (VEX.256 encoded version)
DEST[255:0] <- SRC[255:0]
Intel C/C++ Compiler Intrinsic Equivalent
LDDQU : __m128i _mm_lddqu_si128(__m128i* p); VLDDQU : __m256i _mm256_lddqu_si256(__m256i* p);
See Exceptions Type 4;
Note treatment of #AC varies.