SN550 - Why it uses 512B sector instead of 4096?

I reinitialized to 4096B sector size (LBA Format 1) and experienced a performance drop. I was under the impression that switching to advanced format (the 4096 Byte sector size interface) would be beneficial to performance. I run all the latest (Linux) software on an Intel Q65 chipset (PCIe v2 with 4 PCIe lanes dedicated to the NVMe drive).

Before:
Performance @ 512B sector interface (LBA Format 0):

root@xubuntu:~# hdparm -t /dev/nvme0n1p1
/dev/nvme0n1p1:
 Timing buffered disk reads: 4658 MB in  3.00 seconds = 1552.07 MB/sec

After:
Performance @ 4096B sector interface (LBA Format 1):

# hdparm -t /dev/nvme0n1p1
/dev/nvme0n1p1:
 Timing buffered disk reads: 2844 MB in  3.00 seconds = 947.51 MB/sec

Seq read performance dropped from 1552.07 MB/sec to 947.51 MB/sec, whereas one would expect a performance gain.

What is going on here?

I verified alignment of the logical file system clusters to the physical sector size of 4096B, it should be okay:

$ sudo parted /dev/nvme0n1 
GNU Parted 3.3
Using /dev/nvme0n1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Model: WDC WDS500G2B0C-00PXH0 (nvme)
Disk /dev/nvme0n1: 500GB
Sector size (logical/physical): 4096B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start  End    Size   File system  Name         Flags
 1      300GB  500GB  200GB  ext4         ubuntu-root

(parted) align-check opt 1                                                
1 aligned

What is the internal physical sector size, that the SN550 NVMe drive uses?

WD please help out here? Why do I get a drop in performance when re-initializing to the 4096B sector size?

How can I get the expected performance gain from going to 4096B sector size (advanced format).