모두의 코드
VPBROADCAST (Intel x86/64 assembly instruction)
VPBROADCAST
Load Integer and Broadcast
참고 사항
아래 표를 해석하는 방법은 x86-64 명령어 레퍼런스 읽는 법 글을 참조하시기 바랍니다.
Opcode/ | Op / | 64/32 | CPUID | Description |
---|---|---|---|---|
| RM | V/V | AVX2 | Broadcast a byte integer in the source operand to sixteen locations in xmm1. |
| RM | V/V | AVX2 | Broadcast a byte integer in the source operand to thirty-two locations in ymm1. |
| T1S | V/V | AVX512VL | Broadcast a byte integer in the source operand to locations in xmm1 subject to writemask k1. |
| T1S | V/V | AVX512VL | Broadcast a byte integer in the source operand to locations in ymm1 subject to writemask k1. |
| T1S | V/V | AVX512BW | Broadcast a byte integer in the source operand to 64 locations in zmm1 subject to writemask k1. |
| RM | V/V | AVX2 | Broadcast a word integer in the source operand to eight locations in xmm1. |
| RM | V/V | AVX2 | Broadcast a word integer in the source operand to sixteen locations in ymm1. |
| T1S | V/V | AVX512VL | Broadcast a word integer in the source operand to locations in xmm1 subject to writemask k1. |
| T1S | V/V | AVX512VL | Broadcast a word integer in the source operand to locations in ymm1 subject to writemask k1. |
| T1S | V/V | AVX512BW | Broadcast a word integer in the source operand to 32 locations in zmm1 subject to writemask k1. |
| RM | V/V | AVX2 | Broadcast a dword integer in the source operand to four locations in xmm1. |
| RM | V/V | AVX2 | Broadcast a dword integer in the source operand to eight locations in ymm1. |
| T1S | V/V | AVX512VL | Broadcast a dword integer in the source operand to locations in xmm1 subject to writemask k1. |
| T1S | V/V | AVX512VL | Broadcast a dword integer in the source operand to locations in ymm1 subject to writemask k1. |
| T1S | V/V | AVX512F | Broadcast a dword integer in the source operand to locations in zmm1 subject to writemask k1. |
| RM | V/V | AVX2 | Broadcast a qword element in source operand to two locations in xmm1. |
| RM | V/V | AVX2 | Broadcast a qword element in source operand to four locations in ymm1. |
| T1S | V/V | AVX512VL | Broadcast a qword element in source operand to locations in xmm1 subject to writemask k1. |
| T1S | V/V | AVX512VL | Broadcast a qword element in source operand to locations in ymm1 subject to writemask k1. |
| T1S | V/V | AVX512F | Broadcast a qword element in source operand to locations in zmm1 subject to writemask k1. |
| T2 | V/V | AVX512VL | Broadcast two dword elements in source operand to locations in xmm1 subject to writemask k1. |
Opcode/ | Op / | 64/32 | CPUID | Description |
---|---|---|---|---|
| T2 | V/V | AVX512VL | Broadcast two dword elements in source operand to locations in ymm1 subject to writemask k1. |
| T2 | V/V | AVX512DQ | Broadcast two dword elements in source operand to locations in zmm1 subject to writemask k1. |
| RM | V/V | AVX2 | Broadcast 128 bits of integer data in mem to low and high 128-bits in ymm1. |
| T4 | V/V | AVX512VL | Broadcast 128 bits of 4 doubleword integer data in mem to locations in ymm1 using writemask k1. |
| T4 | V/V | AVX512F | Broadcast 128 bits of 4 doubleword integer data in mem to locations in zmm1 using writemask k1. |
| T2 | V/V | AVX512VL | Broadcast 128 bits of 2 quadword integer data in mem to locations in ymm1 using writemask k1. |
| T2 | V/V | AVX512DQ | Broadcast 128 bits of 2 quadword integer data in mem to locations in zmm1 using writemask k1. |
| T8 | V/V | AVX512DQ | Broadcast 256 bits of 8 doubleword integer data in mem to locations in zmm1 using writemask k1. |
| T4 | V/V | AVX512F | Broadcast 256 bits of 4 quadword integer data in mem to locations in zmm1 using writemask k1. |
Instruction Operand Encoding
Op/En | Operand 1 | Operand 2 | Operand 3 | Operand 4 |
---|---|---|---|---|
RM | ModRM:reg (w) | ModRM:r/m (r) | NA | NA |
T1S, T2, T4, T8 | ModRM:reg (w) | ModRM:r/m (r) | NA | NA |
Description
Load integer data from the source operand (the second operand) and broadcast to all elements of the destination operand (the first operand).
VEX256-encoded VPBROADCASTB/W/D/Q: The source operand is 8-bit, 16-bit, 32-bit, 64-bit memory location or the low 8-bit, 16-bit 32-bit, 64-bit data in an XMM register. The destination operand is a YMM register. VPBROADCASTI128 support the source operand of 128-bit memory location. Register source encodings for VPBROADCASTI128 is reserved and will #UD. Bits (MAXVL-1:256) of the destination register are zeroed.
EVEX-encoded VPBROADCASTD/Q: The source operand is a 32-bit, 64-bit memory location or the low 32-bit, 64-bit data in an XMM register. The destination operand is a ZMM/YMM/XMM register and updated according to the writemask k1.
VPBROADCASTI32X4 and VPBROADCASTI64X4: The destination operand is a ZMM register and updated according to the writemask k1. The source operand is 128-bit or 256-bit memory location. Register source encodings for VBROADCASTI32X4 and VBROADCASTI64X4 are reserved and will #UD.
Note: VEX.vvvv and EVEX.vvvv are reserved and must be 1111b otherwise instructions will #UD.
If VPBROADCASTI128 is encoded with VEX.L= 0, an attempt to execute the instruction encoded with VEX.L= 0 will cause an #UD exception.
Operation
VPBROADCASTB (EVEX encoded versions)
(KL, VL) = (16, 128), (32, 256), (64, 512) FOR j <- 0 TO KL-1 i <- j * 8 IF k1[j] OR *no writemask* THEN DEST[i+7:i] <- SRC[7:0] ELSE IF *merging-masking* ; merging-masking THEN *DEST[i+7:i] remains unchanged* ELSE ; zeroing-masking DEST[i+7:i] <- 0 FI FI; ENDFOR DEST[MAX_VL-1:VL] <- 0
VPBROADCASTW (EVEX encoded versions)
(KL, VL) = (8, 128), (16, 256), (32, 512) FOR j <- 0 TO KL-1 i <- j * 16 IF k1[j] OR *no writemask* THEN DEST[i+15:i] <- SRC[15:0] ELSE IF *merging-masking* ; merging-masking THEN *DEST[i+15:i] remains unchanged* ELSE ; zeroing-masking DEST[i+15:i] <- 0 FI FI; ENDFOR DEST[MAX_VL-1:VL] <- 0
VPBROADCASTD (128 bit version)
temp <- SRC[31:0] DEST[31:0] <- temp DEST[63:32] <- temp DEST[95:64] <- temp DEST[127:96] <- temp DEST[MAX_VL-1:128] <- 0
VPBROADCASTD (VEX.256 encoded version)
temp <- SRC[31:0] DEST[31:0] <- temp DEST[63:32] <- temp DEST[95:64] <- temp DEST[127:96] <- temp DEST[159:128] <- temp DEST[191:160] <- temp DEST[223:192] <- temp DEST[255:224] <- temp DEST[MAX_VL-1:256] <- 0 VPBROADCASTD (EVEX encoded versions) (KL, VL) = (4, 128), (8, 256), (16, 512) FOR j <- 0 TO KL-1 i <- j * 32 IF k1[j] OR *no writemask* THEN DEST[i+31:i] <- SRC[31:0] ELSE IF *merging-masking* ; merging-masking THEN *DEST[i+31:i] remains unchanged* ELSE ; zeroing-masking DEST[i+31:i] <- 0 FI FI; ENDFOR DEST[MAX_VL-1:VL] <- 0
VPBROADCASTQ (VEX.256 encoded version)
temp <- SRC[63:0] DEST[63:0] <- temp DEST[127:64] <- temp DEST[191:128] <- temp DEST[255:192] <- temp DEST[MAX_VL-1:256] <- 0
VPBROADCASTQ (EVEX encoded versions)
(KL, VL) = (2, 128), (4, 256), (8, 512) FOR j <- 0 TO KL-1 i <- j * 64 IF k1[j] OR *no writemask* THEN DEST[i+63:i] <- SRC[63:0] ELSE IF *merging-masking* ; merging-masking THEN *DEST[i+63:i] remains unchanged* ELSE ; zeroing-masking DEST[i+63:i] <- 0 FI FI; ENDFOR DEST[MAX_VL-1:VL] <- 0 VBROADCASTI32x2 (EVEX encoded versions) (KL, VL) = (4, 128), (8, 256), (16, 512) FOR j <- 0 TO KL-1 i <-!= j * 32 n <-!= (j mod 2) * 32 IF k1[j] OR *no writemask* THEN DEST[i+31:i] <- SRC[n+31:n] ELSE IF *merging-masking* ; merging-masking THEN *DEST[i+31:i] remains unchanged* ELSE ; zeroing-masking DEST[i+31:i] <- 0 FI FI; ENDFOR DEST[MAX_VL-1:VL] <- 0
VBROADCASTI128 (VEX.256 encoded version)
temp <- SRC[127:0] DEST[127:0] <- temp DEST[255:128] <- temp DEST[MAX_VL-1:256] <- 0
VBROADCASTI32X4 (EVEX encoded versions)
(KL, VL) = (8, 256), (16, 512) FOR j <- 0 TO KL-1 i <-!= j* 32 n <-!= (j modulo 4) * 32 IF k1[j] OR *no writemask* THEN DEST[i+31:i] <- SRC[n+31:n] ELSE IF *merging-masking* ; merging-masking THEN *DEST[i+31:i] remains unchanged* ELSE ; zeroing-masking DEST[i+31:i] <- 0 FI FI; ENDFOR DEST[MAX_VL-1:VL] <- 0
VBROADCASTI64X2 (EVEX encoded versions)
(KL, VL) = (8, 256), (16, 512) FOR j <- 0 TO KL-1 i <- j * 64 n <- (j modulo 2) * 64 IF k1[j] OR *no writemask* THEN DEST[i+63:i] <- SRC[n+63:n] ELSE IF *merging-masking* ; merging-masking THEN *DEST[i+63:i] remains unchanged* ELSE ; zeroing-masking DEST[i+63:i] = 0 FI FI; ENDFOR;
VBROADCASTI32X8 (EVEX.U1.512 encoded version)
FOR j <- 0 TO 15 i <- j * 32 n <- (j modulo 8) * 32 IF k1[j] OR *no writemask* THEN DEST[i+31:i] <- SRC[n+31:n] ELSE IF *merging-masking* ; merging-masking THEN *DEST[i+31:i] remains unchanged* ELSE ; zeroing-masking DEST[i+31:i] <- 0 FI FI; ENDFOR DEST[MAX_VL-1:VL] <- 0
VBROADCASTI64X4 (EVEX.512 encoded version)
FOR j <- 0 TO 7 i <- j * 64 n <-!= (j modulo 4) * 64 IF k1[j] OR *no writemask* THEN DEST[i+63:i] <- SRC[n+63:n] ELSE IF *merging-masking* ; merging-masking THEN *DEST[i+63:i] remains unchanged* ELSE ; zeroing-masking DEST[i+63:i] <- 0 FI FI; ENDFOR DEST[MAX_VL-1:VL] <- 0
Intel C/C++ Compiler Intrinsic Equivalent
VPBROADCASTB __m512i _mm512_broadcastb_epi8(__m128i a); VPBROADCASTB __m512i _mm512_mask_broadcastb_epi8(__m512i s, __mmask64 k, __m128i a); VPBROADCASTB __m512i _mm512_maskz_broadcastb_epi8(__mmask64 k, __m128i a); VPBROADCASTB __m256i _mm256_broadcastb_epi8(__m128i a); VPBROADCASTB __m256i _mm256_mask_broadcastb_epi8(__m256i s, __mmask32 k, __m128i a); VPBROADCASTB __m256i _mm256_maskz_broadcastb_epi8(__mmask32 k, __m128i a); VPBROADCASTB __m128i _mm_mask_broadcastb_epi8(__m128i s, __mmask16 k, __m128i a); VPBROADCASTB __m128i _mm_maskz_broadcastb_epi8(__mmask16 k, __m128i a); VPBROADCASTB __m128i _mm_broadcastb_epi8(__m128i a); VPBROADCASTD __m512i _mm512_broadcastd_epi32(__m128i a); VPBROADCASTD __m512i _mm512_mask_broadcastd_epi32(__m512i s, __mmask16 k, __m128i a); VPBROADCASTD __m512i _mm512_maskz_broadcastd_epi32(__mmask16 k, __m128i a); VPBROADCASTD __m256i _mm256_broadcastd_epi32(__m128i a); VPBROADCASTD __m256i _mm256_mask_broadcastd_epi32(__m256i s, __mmask8 k, __m128i a); VPBROADCASTD __m256i _mm256_maskz_broadcastd_epi32(__mmask8 k, __m128i a); VPBROADCASTD __m128i _mm_broadcastd_epi32(__m128i a); VPBROADCASTD __m128i _mm_mask_broadcastd_epi32(__m128i s, __mmask8 k, __m128i a); VPBROADCASTD __m128i _mm_maskz_broadcastd_epi32(__mmask8 k, __m128i a); VPBROADCASTQ __m512i _mm512_broadcastq_epi64(__m128i a); VPBROADCASTQ __m512i _mm512_mask_broadcastq_epi64(__m512i s, __mmask8 k, __m128i a); VPBROADCASTQ __m512i _mm512_maskz_broadcastq_epi64(__mmask8 k, __m128i a); VPBROADCASTQ __m256i _mm256_broadcastq_epi64(__m128i a); VPBROADCASTQ __m256i _mm256_mask_broadcastq_epi64(__m256i s, __mmask8 k, __m128i a); VPBROADCASTQ __m256i _mm256_maskz_broadcastq_epi64(__mmask8 k, __m128i a); VPBROADCASTQ __m128i _mm_broadcastq_epi64(__m128i a); VPBROADCASTQ __m128i _mm_mask_broadcastq_epi64(__m128i s, __mmask8 k, __m128i a); VPBROADCASTQ __m128i _mm_maskz_broadcastq_epi64(__mmask8 k, __m128i a); VPBROADCASTW __m512i _mm512_broadcastw_epi16(__m128i a); VPBROADCASTW __m512i _mm512_mask_broadcastw_epi16(__m512i s, __mmask32 k, __m128i a); VPBROADCASTW __m512i _mm512_maskz_broadcastw_epi16(__mmask32 k, __m128i a); VPBROADCASTW __m256i _mm256_broadcastw_epi16(__m128i a); VPBROADCASTW __m256i _mm256_mask_broadcastw_epi16(__m256i s, __mmask16 k, __m128i a); VPBROADCASTW __m256i _mm256_maskz_broadcastw_epi16(__mmask16 k, __m128i a); VPBROADCASTW __m128i _mm_broadcastw_epi16(__m128i a); VPBROADCASTW __m128i _mm_mask_broadcastw_epi16(__m128i s, __mmask8 k, __m128i a); VPBROADCASTW __m128i _mm_maskz_broadcastw_epi16(__mmask8 k, __m128i a); VBROADCASTI32x2 __m512i _mm512_broadcast_i32x2(__m128i a); VBROADCASTI32x2 __m512i _mm512_mask_broadcast_i32x2(__m512i s, __mmask16 k, __m128i a); VBROADCASTI32x2 __m512i _mm512_maskz_broadcast_i32x2(__mmask16 k, __m128i a); VBROADCASTI32x2 __m256i _mm256_broadcast_i32x2(__m128i a); VBROADCASTI32x2 __m256i _mm256_mask_broadcast_i32x2(__m256i s, __mmask8 k, __m128i a); VBROADCASTI32x2 __m256i _mm256_maskz_broadcast_i32x2(__mmask8 k, __m128i a); VBROADCASTI32x2 __m128i _mm_broadcastq_i32x2(__m128i a); VBROADCASTI32x2 __m128i _mm_mask_broadcastq_i32x2(__m128i s, __mmask8 k, __m128i a); VBROADCASTI32x2 __m128i _mm_maskz_broadcastq_i32x2(__mmask8 k, __m128i a); VBROADCASTI32x4 __m512i _mm512_broadcast_i32x4(__m128i a); VBROADCASTI32x4 __m512i _mm512_mask_broadcast_i32x4(__m512i s, __mmask16 k, __m128i a); VBROADCASTI32x4 __m512i _mm512_maskz_broadcast_i32x4(__mmask16 k, __m128i a); VBROADCASTI32x4 __m256i _mm256_broadcast_i32x4(__m128i a); VBROADCASTI32x4 __m256i _mm256_mask_broadcast_i32x4(__m256i s, __mmask8 k, __m128i a); VBROADCASTI32x4 __m256i _mm256_maskz_broadcast_i32x4(__mmask8 k, __m128i a); VBROADCASTI32x8 __m512i _mm512_broadcast_i32x8(__m256i a); VBROADCASTI32x8 __m512i _mm512_mask_broadcast_i32x8(__m512i s, __mmask16 k, __m256i a); VBROADCASTI32x8 __m512i _mm512_maskz_broadcast_i32x8(__mmask16 k, __m256i a); VBROADCASTI64x2 __m512i _mm512_broadcast_i64x2(__m128i a); VBROADCASTI64x2 __m512i _mm512_mask_broadcast_i64x2(__m512i s, __mmask8 k, __m128i a); VBROADCASTI64x2 __m512i _mm512_maskz_broadcast_i64x2(__mmask8 k, __m128i a); VBROADCASTI64x2 __m256i _mm256_broadcast_i64x2(__m128i a); VBROADCASTI64x2 __m256i _mm256_mask_broadcast_i64x2(__m256i s, __mmask8 k, __m128i a); VBROADCASTI64x2 __m256i _mm256_maskz_broadcast_i64x2(__mmask8 k, __m128i a); VBROADCASTI64x4 __m512i _mm512_broadcast_i64x4(__m256i a); VBROADCASTI64x4 __m512i _mm512_mask_broadcast_i64x4(__m512i s, __mmask8 k, __m256i a); VBROADCASTI64x4 __m512i _mm512_maskz_broadcast_i64x4(__mmask8 k, __m256i a);
SIMD Floating-Point Exceptions
None
Other Exceptions
EVEX-encoded instructions, see Exceptions Type 6;
EVEX-encoded instructions, syntax with reg/mem operand, see Exceptions Type E6.
#UD If VEX.L = 0 for VPBROADCASTQ, VPBROADCASTI128.
If EVEX.L'L = 0 for VBROADCASTI32X4/VBROADCASTI64X2.
If EVEX.L'L < 10b for VBROADCASTI32X8/VBROADCASTI64X4.
댓글을 불러오는 중입니다..