Results 301 to 310 of about 1,429,804 (373)
Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization. [PDF]
Burger M +4 more
europepmc +1 more source
Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization. [PDF]