Not used head_dim = dim_out // num_heads self.scale = head_dim**-0.5 F.scaled_dot_product_attention takes care of this automatically.