This post addresses some similarities in the ways the Deep Learning field addressed two seemingly different problems, namely
A) recent work on neural networks with self-attentation (“transformers”) which addresses one of their biggest shortcomings, the quadratic computational complexity with respect to the set of input items, and
B) work decoupling capacity and…