This section describes the load, set, and store operations, which let you load and store data into memory. The load and set operations are similar in that both initialize __m128 data. However, the set operations take a float argument and are intended for initialization with constants, whereas the load operations take a floating point argument and are intended to mimic the instructions for loading data from memory. The store operation assigns the initialized data to the address.
The intrinsics are listed in the following table. Syntax and a brief description are contained the following topics.
The prototypes for Streaming SIMD Extensions (SSE) intrinsics are in the xmmintrin.h header file.
Intrinsic Name |
Alternate Name |
Operation | Corresponding Instruction |
---|---|---|---|
_mm_load_ss | Load the low value and clear the three high values | MOVSS | |
_mm_load_ps1 | _mm_load1_ps | Load one value into all four words | MOVSS + Shuffling |
_mm_load_ps | Load four values, address aligned | MOVAPS | |
_mm_loadu_ps | Load four values, address unaligned | MOVUPS | |
_mm_loadr_ps | Load four values, in reverse order | MOVAPS + Shuffling | |
_mm_set_ss | Set the low value and clear the three high values | Composite | |
_mm_set_ps1 | _mm_set1_ps | Set all four words with the same value | Composite |
_mm_set_ps | Set four values, address aligned | Composite | |
_mm_setr_ps | Set four values, in reverse order | Composite | |
_mm_setzero_ps | Clear all four values | Composite | |
_mm_store_ss | Store the low value | MOVSS | |
_mm_store_ps1 | _mm_store1_ps | Store the low value across all four words. The address must be 16-byte aligned. | Shuffling + MOVSS |
_mm_store_ps | Store four values, address aligned | MOVAPS | |
_mm_storeu_ps | Store four values, address unaligned | MOVUPS | |
_mm_storer_ps | Store four values, in reverse order | MOVAPS + Shuffling | |
_mm_move_ss | Set the low word, and pass in three high values | MOVSS | |
_mm_getcsr | Return register contents | STMXCSR | |
_mm_setcsr | Control Register | LDMXCSR | |
_mm_prefetch | |||
_mm_stream_pi | |||
_mm_stream_ps | |||
_mm_sfence | |||
_mm_cvtss_f32 |
__m128 _mm_load_ss(float const*a)
Loads an SP FP value into the low word and clears the
upper three words.
r0 := *a
r1 := 0.0 ; r2 := 0.0 ; r3 := 0.0
__m128 _mm_load_ps1(float const*a)
Loads a single SP FP value, copying it into all four
words.
r0 := *a
r1 := *a
r2 := *a
r3 := *a
__m128 _mm_load_ps(float const*a)
Loads four SP FP values. The address must be 16-byte-aligned.
r0 := a[0]
r1 := a[1]
r2 := a[2]
r3 := a[3]
__m128 _mm_loadu_ps(float const*a)
Loads four SP FP values. The address need not be 16-byte-aligned.
r0 := a[0]
r1 := a[1]
r2 := a[2]
r3 := a[3]
__m128 _mm_loadr_ps(float const*a)
Loads four SP FP values in reverse order. The address
must be 16-byte-aligned.
r0 := a[3]
r1 := a[2]
r2 := a[1]
r3 := a[0]
__m128 _mm_set_ss(float a)
Sets the low word of an SP FP value to a
and clears the upper three words.
r0 := c
r1 := r2 := r3 := 0.0
__m128 _mm_set_ps1(float a)
Sets the four SP FP values to a.
r0 := r1 := r2 := r3 := a
__m128 _mm_set_ps(float a, float b, float c, float d)
Sets the four SP FP values to the four inputs.
r0 := a
r1 := b
r2 := c
r3 := d
__m128 _mm_setr_ps(float a, float b, float c, float d)
Sets the four SP FP values to the four inputs in reverse
order.
r0 := d
r1 := c
r2 := b
r3 := a
__m128 _mm_setzero_ps(void)
Clears the four SP FP values.
r0 := r1 := r2 := r3 := 0.0
void _mm_store_ss(float *v, __m128 a)
Stores the lower SP FP value.
*v := a0
void _mm_store_ps1(float *v, __m128 a)
Stores the lower SP FP value across four words.
v[0] := a0
v[1] := a0
v[2] := a0
v[3] := a0
void _mm_store_ps(float *v, __m128 a)
Stores four SP FP values. The address must be 16-byte-aligned.
v[0] := a0
v[1] := a1
v[2] := a2
v[3] := a3
void _mm_storeu_ps(float *v, __m128 a)
Stores four SP FP values. The address need not be 16-byte-aligned.
v[0] := a0
v[1] := a1
v[2] := a2
v[3] := a3
void _mm_storer_ps(float *v, __m128 a)
Stores four SP FP values in reverse order. The address
must be 16-byte-aligned.
v[0] := a3
v[1] := a2
v[2] := a1
v[3] := a0
__m128 _mm_move_ss(__m128 a, __m128 b)
Sets the low word to the SP FP value of b.
The upper 3 SP FP values are passed through from a.
r0 := b0
r1 := a1
r2 := a2
r3 := a3
unsigned int _mm_getcsr(void)
Returns the contents of the control register.
void _mm_setcsr(unsigned int i)
Sets the control register to the value specified.
void _mm_prefetch(char const*a, int sel)
(uses PREFETCH) Loads one cache line of data from address a to a location "closer" to the processor. The value sel specifies the type of prefetch operation: the constants _MM_HINT_T0, _MM_HINT_T1, _MM_HINT_T2, and _MM_HINT_NTA should be used for IA-32, corresponding to the type of prefetch instruction. The constants _MM_HINT_T1, _MM_HINT_NT1, _MM_HINT_NT2, and _MM_HINT_NTA should be used for Itanium®-based systems.
void _mm_stream_pi(__m64 *p, __m64 a)
(uses MOVNTQ) Stores the data in a to the address p without polluting the caches. This intrinsic requires you to empty the multimedia state for the mmx register. See The EMMS Instruction: Why You Need It and When to Use It topic.
void _mm_stream_ps(float *p, __m128 a)
(see MOVNTPS) Stores the data in a to the address p without polluting the caches. The address must be 16-byte-aligned.
void _mm_sfence(void)
(uses SFENCE) Guarantees that every preceding store is globally visible before any subsequent store.
float _mm_cvtss_f32(__m128 a)
This intrinsic extracts a single precision floating point value from the first vector element of an __m128. It does so in the most effecient manner possible in the context used. This intrinsic doesn't map to any specific SSE instruction.