SSE指令中的数据操作,该怎么处理_汇编语言

SSE指令中的数据操作
SSE中的xmm系列寄存器都是128位的,暂时学习了movss,movps等指令,现在不知道怎样并行计算,例如我现在有四乘数a1,a2,a3,a4和四个被乘数b1,b2,b3,b4,我想把a1,a2,a3,a4分别放到xmm0的[0-31],[34-63],[64-95],[96-128]位上,b1,b2,b3,b4也是同样的放到xmm1的位置上,然后执行并行计算,现在如何把a1,a2,a3,a4放到xmm寄存器的相应的位置上,movss+位移肯定不是最好的,我就是想知道有没有专门的指令做类似的操作,shufps也不知道是不是没看明白他的说明

------解决方案--------------------------------------------------------
SSE2 指令movaps 和movups可以做到这一点，前者要求内存地址16字节对齐，后者不需要。
这两个指令能把内存中连续的4个单精度浮点数载入128bit MMX寄存器。

关于SSE指令最好的参考手册是Intel自己的文档，请看《Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: Instruction Set Reference, A-M》
我手边的文档是2010，10月的版本，文件名253666.pdf 大小2988KB

这两个指令的描述是：
MOVAPS—Move Aligned Packed Single-Precision Floating-Point Values
MOVUPS—Move Unaligned Packed Single-Precision Floating-Point Values

指令MOVUPS 在这片文档中的描述。
MOVUPS—Move Unaligned Packed Single-Precision Floating-Point Values

MOVUPS xmm1, xmm2/m128
Move packed singleprecision floating-point values from xmm2/m128 to xmm1.

xmm2/m128, xmm1
Move packed singleprecision floating-point values from xmm1 to xmm2/m128.

Description
Moves a double quadword containing four packed single-precision floating-point
values from the source operand (second operand) to the destination operand (first
operand). This instruction can be used to load an XMM register from a 128-bit
memory location, store the contents of an XMM register into a 128-bit memory location,
or move data between two XMM registers. When the source or destination
operand is a memory operand, the operand may be unaligned on a 16-byte boundary
without causing a general-protection exception (#GP) to be generated.
To move packed single-precision floating-point values to and from memory locations
that are known to be aligned on 16-byte boundaries, use the MOVAPS instruction.
While executing in 16-bit addressing mode, a linear address for a 128-bit data access
that overlaps the end of a 16-bit segment is not allowed and is defined as reserved
behavior. A specific processor implementation may or may not generate a generalprotection
exception (#GP) in this situation, and the address that spans the end of
the segment may or may not wrap around to the beginning of the segment.
In 64-bit mode, use of the REX.R prefix permits this instruction to access additional
registers (XMM8-XMM15).

最后，更正一下你的一个错误，[34-63] 应该是[32-63].