모두의 코드
VPERMILPS (Intel x86/64 assembly instruction)
VPERMILPS
Permute In-Lane of Quadruples of Single-Precision Floating-Point Values
참고 사항
아래 표를 해석하는 방법은 x86-64 명령어 레퍼런스 읽는 법 글을 참조하시기 바랍니다.
Opcode/ | Op / En | 64/32 | CPUID | Description |
---|---|---|---|---|
| RVM | V/V | AVX | Permute single-precision floating-point values in xmm2 using controls from xmm3/m128 and store result in xmm1. |
| RM | V/V | AVX | Permute single-precision floating-point values in xmm2/m128 using controls from imm8 and store result in xmm1. |
| RVM | V/V | AVX | Permute single-precision floating-point values in ymm2 using controls from ymm3/m256 and store result in ymm1. |
| RM | V/V | AVX | Permute single-precision floating-point values in ymm2/m256 using controls from imm8 and store result in ymm1. |
| FV-RVM | V/V | AVX512VL | Permute single-precision floating-point values xmm2 using control from xmm3/m128/m32bcst and store the result in xmm1 using writemask k1. |
| FV-RVM | V/V | AVX512VL | Permute single-precision floating-point values ymm2 using control from ymm3/m256/m32bcst and store the result in ymm1 using writemask k1. |
| FV-RVM | V/V | AVX512F | Permute single-precision floating-point values zmm2 using control from zmm3/m512/m32bcst and store the result in zmm1 using writemask k1. |
| FV-RM | V/V | AVX512VL | Permute single-precision floating-point values xmm2/m128/m32bcst using controls from imm8 and store the result in xmm1 using writemask k1. |
| FV-RM | V/V | AVX512VL | Permute single-precision floating-point values ymm2/m256/m32bcst using controls from imm8 and store the result in ymm1 using writemask k1. |
| FV-RM | V/V | AVX512F | Permute single-precision floating-point values zmm2/m512/m32bcst using controls from imm8 and store the result in zmm1 using writemask k1. |
Instruction Operand Encoding
Op/En | Operand 1 | Operand 2 | Operand 3 | Operand 4 |
---|---|---|---|---|
RVM | ModRM:reg (w) | VEX.vvvv (r) | ModRM:r/m (r) | NA |
RM | ModRM:reg (w) | ModRM:r/m (r) | NA | NA |
FV-RVM | ModRM:reg (w) | EVEX.vvvv (r) | ModRM:r/m (r) | NA |
FV-RM | ModRM:reg (w) | ModRM:r/m (r) | NA | NA |
Description
(variable control version)
Permute quadruples of single-precision floating-point values in the first source operand (second operand), each quadruplet using a 2-bit control field in the corresponding dword element of the second source operand. Permuted results are stored in the destination operand (first operand).
The 2-bit control fields are located at the low two bits of each dword element (see Figure 5-26). Each control deter-mines which of the source element in an input quadruple is selected for the destination element. Each quadruple of source elements must lie in the same 128-bit region as the destination.
EVEX version: The second source operand (third operand) is a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted from a 32-bit memory location. Permuted results are written to the destination under the writemask.
(immediate control version)
Permute quadruples of single-precision floating-point values in the first source operand (second operand), each quadruplet using a 2-bit control field in the imm8 byte. Each 128-bit lane in the destination operand (first operand) use the four control fields of the same imm8 byte.
VEX version: The source operand is a YMM/XMM register or a 256/128-bit memory location and the destination operand is a YMM/XMM register.
EVEX version: The source operand (second operand) is a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted from a 32-bit memory location. Permuted results are written to the destination under the writemask.
Note: For the imm8 version, VEX.vvvv and EVEX.vvvv are reserved and must be 1111b otherwise instruction will #UD.
Operation
VPERMILPS (EVEX immediate versions)
(KL, VL) = (4, 128), (8, 256), (16, 512) FOR j <- 0 TO KL-1 i <- j * 32 IF (EVEX.b = 1) AND (SRC1 *is memory*) THEN TMP_SRC1[i+31:i] <- SRC1[31:0]; ELSE TMP_SRC1[i+31:i] <- SRC1[i+31:i]; FI; ENDFOR; TMP_DEST[31:0] <- Select4(TMP_SRC1[127:0], imm8[1:0]); TMP_DEST[63:32] <- Select4(TMP_SRC1[127:0], imm8[3:2]); TMP_DEST[95:64] <- Select4(TMP_SRC1[127:0], imm8[5:4]); TMP_DEST[127:96] <- Select4(TMP_SRC1[127:0], imm8[7:6]); FI; IF VL >= 256 TMP_DEST[159:128] <- Select4(TMP_SRC1[255:128], imm8[1:0]); FI; TMP_DEST[191:160] <- Select4(TMP_SRC1[255:128], imm8[3:2]); FI; TMP_DEST[223:192] <- Select4(TMP_SRC1[255:128], imm8[5:4]); FI; TMP_DEST[255:224] <- Select4(TMP_SRC1[255:128], imm8[7:6]); FI; FI; IF VL >= 512 TMP_DEST[287:256] <- Select4(TMP_SRC1[383:256], imm8[1:0]); FI; TMP_DEST[319:288] <- Select4(TMP_SRC1[383:256], imm8[3:2]); FI; TMP_DEST[351:320] <- Select4(TMP_SRC1[383:256], imm8[5:4]); FI; TMP_DEST[383:352] <- Select4(TMP_SRC1[383:256], imm8[7:6]); FI; TMP_DEST[415:384] <- Select4(TMP_SRC1[511:384], imm8[1:0]); FI; TMP_DEST[447:416] <- Select4(TMP_SRC1[511:384], imm8[3:2]); FI; TMP_DEST[479:448] <- Select4(TMP_SRC1[511:384], imm8[5:4]); FI; TMP_DEST[511:480] <- Select4(TMP_SRC1[511:384], imm8[7:6]); FI; FI; FOR j <- 0 TO KL-1 i <- j * 32 IF k1[j] OR *no writemask* THEN DEST[i+31:i] <- TMP_DEST[i+31:i] ELSE IF *merging-masking* THEN *DEST[i+31:i] remains unchanged* ELSE DEST[i+31:i] <- 0 ;zeroing-masking FI; FI; ENDFOR DEST[MAX_VL-1:VL] <-!= 0
VPERMILPS (256-bit immediate version)
DEST[31:0] <- Select4(SRC1[127:0], imm8[1:0]); DEST[63:32] <- Select4(SRC1[127:0], imm8[3:2]); DEST[95:64] <- Select4(SRC1[127:0], imm8[5:4]); DEST[127:96] <- Select4(SRC1[127:0], imm8[7:6]); DEST[159:128] <- Select4(SRC1[255:128], imm8[1:0]); DEST[191:160] <- Select4(SRC1[255:128], imm8[3:2]); DEST[223:192] <- Select4(SRC1[255:128], imm8[5:4]); DEST[255:224] <- Select4(SRC1[255:128], imm8[7:6]);
VPERMILPS (128-bit immediate version)
DEST[31:0] <- Select4(SRC1[127:0], imm8[1:0]); DEST[63:32] <- Select4(SRC1[127:0], imm8[3:2]); DEST[95:64] <- Select4(SRC1[127:0], imm8[5:4]); DEST[127:96] <- Select4(SRC1[127:0], imm8[7:6]); DEST[MAX_VL-1:128]<- 0
VPERMILPS (EVEX variable versions)
(KL, VL) = (16, 512) FOR j <- 0 TO KL-1 i <- j * 32 IF (EVEX.b = 1) AND (SRC2 *is memory*) THEN TMP_SRC2[i+31:i] <- SRC2[31:0]; ELSE TMP_SRC2[i+31:i] <- SRC2[i+31:i]; FI; ENDFOR; TMP_DEST[31:0] <- Select4(SRC1[127:0], TMP_SRC2[1:0]); TMP_DEST[63:32] <- Select4(SRC1[127:0], TMP_SRC2[33:32]); TMP_DEST[95:64] <- Select4(SRC1[127:0], TMP_SRC2[65:64]); TMP_DEST[127:96] <- Select4(SRC1[127:0], TMP_SRC2[97:96]); IF VL >= 256 TMP_DEST[159:128] <- Select4(SRC1[255:128], TMP_SRC2[129:128]); TMP_DEST[191:160] <- Select4(SRC1[255:128], TMP_SRC2[161:160]); TMP_DEST[223:192] <- Select4(SRC1[255:128], TMP_SRC2[193:192]); TMP_DEST[255:224] <- Select4(SRC1[255:128], TMP_SRC2[225:224]); FI; IF VL >= 512 TMP_DEST[287:256] <- Select4(SRC1[383:256], TMP_SRC2[257:256]); TMP_DEST[319:288] <- Select4(SRC1[383:256], TMP_SRC2[289:288]); TMP_DEST[351:320] <- Select4(SRC1[383:256], TMP_SRC2[321:320]); TMP_DEST[383:352] <- Select4(SRC1[383:256], TMP_SRC2[353:352]); TMP_DEST[415:384] <- Select4(SRC1[511:384], TMP_SRC2[385:384]); TMP_DEST[447:416] <- Select4(SRC1[511:384], TMP_SRC2[417:416]); TMP_DEST[479:448] <- Select4(SRC1[511:384], TMP_SRC2[449:448]); TMP_DEST[511:480] <- Select4(SRC1[511:384], TMP_SRC2[481:480]); FI; FOR j <- 0 TO KL-1 i <- j * 32 IF k1[j] OR *no writemask* THEN DEST[i+31:i] <- TMP_DEST[i+31:i] ELSE IF *merging-masking* THEN *DEST[i+31:i] remains unchanged* ELSE DEST[i+31:i] <- 0 ;zeroing-masking FI; FI; ENDFOR DEST[MAX_VL-1:VL] <-!= 0
VPERMILPS (256-bit variable version)
DEST[31:0] <- Select4(SRC1[127:0], SRC2[1:0]); DEST[63:32] <- Select4(SRC1[127:0], SRC2[33:32]); DEST[95:64] <- Select4(SRC1[127:0], SRC2[65:64]); DEST[127:96] <- Select4(SRC1[127:0], SRC2[97:96]); DEST[159:128] <- Select4(SRC1[255:128], SRC2[129:128]); DEST[191:160] <- Select4(SRC1[255:128], SRC2[161:160]); DEST[223:192] <- Select4(SRC1[255:128], SRC2[193:192]); DEST[255:224] <- Select4(SRC1[255:128], SRC2[225:224]); DEST[MAX_VL-1:256]<- 0
VPERMILPS (128-bit variable version)
DEST[31:0] <- Select4(SRC1[127:0], SRC2[1:0]); DEST[63:32] <- Select4(SRC1[127:0], SRC2[33:32]); DEST[95:64] <- Select4(SRC1[127:0], SRC2[65:64]); DEST[127:96] <- Select4(SRC1[127:0], SRC2[97:96]); DEST[MAX_VL-1:128]<- 0
Intel C/C++ Compiler Intrinsic Equivalent
VPERMILPS __m512 _mm512_permute_ps(__m512 a, int imm); VPERMILPS __m512 _mm512_mask_permute_ps(__m512 s, __mmask16 k, __m512 a, int imm); VPERMILPS __m512 _mm512_maskz_permute_ps(__mmask16 k, __m512 a, int imm); VPERMILPS __m256 _mm256_mask_permute_ps(__m256 s, __mmask8 k, __m256 a, int imm); VPERMILPS __m256 _mm256_maskz_permute_ps(__mmask8 k, __m256 a, int imm); VPERMILPS __m128 _mm_mask_permute_ps(__m128 s, __mmask8 k, __m128 a, int imm); VPERMILPS __m128 _mm_maskz_permute_ps(__mmask8 k, __m128 a, int imm); VPERMILPS __m512 _mm512_permutevar_ps(__m512i i, __m512 a); VPERMILPS __m512 _mm512_mask_permutevar_ps(__m512 s, __mmask16 k, __m512i i, __m512 a); VPERMILPS __m512 _mm512_maskz_permutevar_ps(__mmask16 k, __m512i i, __m512 a); VPERMILPS __m256 _mm256_mask_permutevar_ps(__m256 s, __mmask8 k, __m256 i, __m256 a); VPERMILPS __m256 _mm256_maskz_permutevar_ps(__mmask8 k, __m256 i, __m256 a); VPERMILPS __m128 _mm_mask_permutevar_ps(__m128 s, __mmask8 k, __m128 i, __m128 a); VPERMILPS __m128 _mm_maskz_permutevar_ps(__mmask8 k, __m128 i, __m128 a); VPERMILPS __m128 _mm_permute_ps(__m128 a, int control); VPERMILPS __m256 _mm256_permute_ps(__m256 a, int control); VPERMILPS __m128 _mm_permutevar_ps(__m128 a, __m128i control); VPERMILPS __m256 _mm256_permutevar_ps(__m256 a, __m256i control);
SIMD Floating-Point Exceptions
None
Other Exceptions
Non-EVEX-encoded instruction, see Exceptions Type 4;
#UD If VEX.W = 1.
EVEX-encoded instruction, see Exceptions Type E4NF.
#UD If either (E)VEX.vvvv != 1111B and with imm8.
댓글을 불러오는 중입니다..