* initial talkie support, coherent
* reorder to follow convention
* absorb inverse rope
* stop folding scalars to improve quantization
* use broadcasting instead of duplication
* style cleanup
* add scaling support to LoraTorchTensor; use that path in conversion
* use layer_out_scale instead of embd_skip_scale