모두의 코드
VCVTPS2PH (Intel x86/64 assembly instruction)

작성일 : 2020-09-01 이 글은 604 번 읽혔습니다.

VCVTPS2PH

Convert Single-Precision FP value to 16-bit FP value

참고 사항

아래 표를 해석하는 방법은 x86-64 명령어 레퍼런스 읽는 법 글을 참조하시기 바랍니다.

Opcode/
Instruction

Op /
En

64/32
bit Mode
Support

CPUID
Feature
Flag

Description

VEX.128.66.0F3A.W0 1D /r ib
VCVTPS2PH xmm1/m64 xmm2 imm8

MRI

V/V

F16C

Convert four packed single-precision floating-point values in xmm2 to packed half-precision (16-bit) floating-point values in xmm1/m64. Imm8 provides rounding controls.

VEX.256.66.0F3A.W0 1D /r ib
VCVTPS2PH xmm1/m128 ymm2 imm8

MRI

V/V

F16C

Convert eight packed single-precision floating-point values in ymm2 to packed half-precision (16-bit) floating-point values in xmm1/m128. Imm8 provides rounding controls.

EVEX.128.66.0F3A.W0 1D /r ib
VCVTPS2PH xmm1/m64 {k1}{z} xmm2 imm8

HVM

V/V

AVX512VL
AVX512F

Convert four packed single-precision floating-point values in xmm2 to packed half-precision (16-bit) floating-point values in xmm1/m64. Imm8 provides rounding controls.

EVEX.256.66.0F3A.W0 1D /r ib
VCVTPS2PH xmm1/m128 {k1}{z} ymm2 imm8

HVM

V/V

AVX512VL
AVX512F

Convert eight packed single-precision floating-point values in ymm2 to packed half-precision (16-bit) floating-point values in xmm1/m128. Imm8 provides rounding controls.

EVEX.512.66.0F3A.W0 1D /r ib
VCVTPS2PH ymm1/m256 {k1}{z} zmm2{sae} imm8

HVM

V/V

AVX512F

Convert sixteen packed single-precision floating-point values in zmm2 to packed half-precision (16-bit) floating-point values in ymm1/m256. Imm8 provides rounding controls.

Instruction Operand Encoding

Op/En Operand 1 Operand 2 Operand 3 Operand 4

MRI ModRM:r/m (w) ModRM:reg (r) Imm8 NA

HVM ModRM:r/m (w) ModRM:reg (r) Imm8 NA

Description

Convert packed single-precision floating values in the source operand to half-precision (16-bit) floating-point values and store to the destination operand. The rounding mode is specified using the immediate field (imm8).

Underflow results (i.e., tiny results) are converted to denormals. MXCSR.FTZ is ignored. If a source element is denormal relative to the input format with DM masked and at least one of PM or UM unmasked; a SIMD exception will be raised with DE, UE and PE set.

The immediate byte defines several bit fields that control rounding operation. The effect and encoding of the RC field are listed in Table 5-3.

n o r e t r n o r e v o m m 2 m m x 4 6 m m / 1 x H P 2 S P T V C V 2 m x 4 e / m x 6 9 t o 7 1 6 m 5 3 3 6 0 4 1 3 3 V S V V 0 V i 7 2 1 4 5 9 0 c 2 0 3 1 1 6 V 7 9 8 S S e m r 8 H m n 6 m 1 3 9 1 H 5 H v c 2 v V m 6 c 4 m c , 4 2 e 6 t , 3 t v 2 S 6 e n V H 1 V 1 3 2
Figure 5-7. VCVTPS2PH (128-bit Version)

Table 5-3. Immediate Byte Encoding for 16-bit Floating-Point Conversion Instructions

VEX.128 version: The source operand is a XMM register. The destination operand is a XMM register or 64-bit memory location. If the destination operand is a register then the upper bits (MAX_VL-1:64) of corresponding register are zeroed.

VEX.256 version: The source operand is a YMM register. The destination operand is a XMM register or 128-bit memory location. If the destination operand is a register, the upper bits (MAX_VL-1:128) of the corresponding destination register are zeroed.

Note: VEX.vvvv and EVEX.vvvv are reserved (must be 1111b).

EVEX encoded versions: The source operand is a ZMM/YMM/XMM register. The destination operand is a YMM/XMM/XMM (low 64-bits) register or a 256/128/64-bit memory location, conditionally updated with writemask k1. Bits (MAX_VL-1:256/128/64) of the corresponding destination register are zeroed.

Bits

Field Name/value

Description

Comment

Imm[1:0]

RC=00B

Round to nearest even

If Imm[2] = 0

RC=01B

Round down

RC=10B

Round up

RC=11B

Truncate

Imm[2]

MS1=0

Use imm[1:0] for rounding

Ignore MXCSR.RC

MS1=1

Use MXCSR.RC for rounding

Imm[7:3]

Ignored

Ignored by processor

Operation

VCVTPS2PH (EVEX encoded versions) when dest is a register

(KL, VL) = (4, 128), (8, 256), (16, 512)
FOR j <-  0 TO KL-1
    i <-  j * 16
    k <-  j * 32
    IF k1[j] OR *no writemask*
          THEN DEST[i+15:i] <-
                vCvt_s2h(SRC[k+31:k])
          ELSE 
                IF *merging-masking* ; merging-masking
                      THEN *DEST[i+15:i] remains unchanged*
                      ELSE  ; zeroing-masking
                            DEST[i+15:i] <-  0
                FI
    FI;
ENDFOR
DEST[MAX_VL-1:VL/2] <-  0

VCVTPS2PH (EVEX encoded versions) when dest is memory

(KL, VL) = (4, 128), (8, 256), (16, 512)
FOR j <-  0 TO KL-1
    i <-  j * 16
    k <-  j * 32
    IF k1[j] OR *no writemask*
          THEN DEST[i+15:i] <-
                vCvt_s2h(SRC[k+31:k])
          ELSE 
                *DEST[i+15:i] remains unchanged* ; merging-masking
    FI;
ENDFOR

VCVTPS2PH (VEX.256 encoded version)

DEST[15:0] <- vCvt_s2h(SRC1[31:0]);
DEST[31:16] <- vCvt_s2h(SRC1[63:32]);
DEST[47:32] <- vCvt_s2h(SRC1[95:64]);
DEST[63:48] <- vCvt_s2h(SRC1[127:96]);
DEST[79:64] <- vCvt_s2h(SRC1[159:128]);
DEST[95:80] <- vCvt_s2h(SRC1[191:160]);
DEST[111:96] <- vCvt_s2h(SRC1[223:192]);
DEST[127:112] <- vCvt_s2h(SRC1[255:224]);
DEST[MAX_VL-1:128] <-  0

VCVTPS2PH (VEX.128 encoded version)

DEST[15:0] <- vCvt_s2h(SRC1[31:0]);
DEST[31:16] <- vCvt_s2h(SRC1[63:32]);
DEST[47:32] <- vCvt_s2h(SRC1[95:64]);
DEST[63:48] <- vCvt_s2h(SRC1[127:96]);
DEST[MAX_VL-1:64] <-  0

Flags Affected

None

Intel C/C++ Compiler Intrinsic Equivalent

VCVTPS2PH __m256i _mm512_cvtps_ph(__m512 a);
VCVTPS2PH __m256i _mm512_mask_cvtps_ph(__m256i s, __mmask16 k, __m512 a);
VCVTPS2PH __m256i _mm512_maskz_cvtps_ph(__mmask16 k, __m512 a);
VCVTPS2PH __m256i _mm512_cvt_roundps_ph(__m512 a, const int imm);
VCVTPS2PH __m256i _mm512_mask_cvt_roundps_ph(__m256i s, __mmask16 k, __m512 a,
                                             const int imm);
VCVTPS2PH __m256i _mm512_maskz_cvt_roundps_ph(__mmask16 k, __m512 a,
                                              const int imm);
VCVTPS2PH __m128i _mm256_mask_cvtps_ph(__m128i s, __mmask8 k, __m256 a);
VCVTPS2PH __m128i _mm256_maskz_cvtps_ph(__mmask8 k, __m256 a);
VCVTPS2PH __m128i _mm_mask_cvtps_ph(__m128i s, __mmask8 k, __m128 a);
VCVTPS2PH __m128i _mm_maskz_cvtps_ph(__mmask8 k, __m128 a);
VCVTPS2PH __m128i _mm_cvtps_ph(__m128 m1, const int imm);
VCVTPS2PH __m128i _mm256_cvtps_ph(__m256 m1, const int imm);

SIMD Floating-Point Exceptions

Invalid, Underflow, Overflow, Precision, Denormal (if MXCSR.DAZ=0);

Other Exceptions

VEX-encoded instructions, see Exceptions Type 11 (do not report #AC);

EVEX-encoded instructions, see Exceptions Type E11.

#UD If VEX.W=1.

#UD If VEX.vvvv != 1111B or EVEX.vvvv != 1111B.

첫 댓글을 달아주세요!
프로필 사진 없음
강좌에 관련 없이 궁금한 내용은 여기를 사용해주세요

    댓글을 불러오는 중입니다..