SanDisk X300 SD7SB7S010T1122 gives 1/3rd the advertised performance numbers

The SanDisk SD7SB7S010T1122 , 1 TB drive is giving 1/3rd the advertised performance numbers from this data sheet:

https://us-new.ingrammicro.com/Documents/vendors/s/sandisk/x300_datasheet.pdf

The disk cache are all enabled:

hdparm -W /dev/sda

/dev/sda:
 write-caching =  1 (on)

hdparm -A /dev/sda

/dev/sda:
 look-ahead    =  1 (on)

]# hdparm -a /dev/sda

/dev/sda:
 readahead     = 256 (on)

The benchmark tool used was fio with below config file and random read:

The test setup was:

ioengine=libaio
invalidate=1
direct=1
time_based
runtime=60s
filename=/dev/sda
group_reporting=1
iodepth=64
rw=read
bs=4k
numjobs=4

The results were:

fio t1.fio

myjob: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64

fio-2.2.8
Starting 4 processes
Jobs: 4 (f=4): [r(4)] [100.0% done] [151.4MB/0KB/0KB /s] [38.8K/0/0 iops] [eta 00m:00s]
myjob: (groupid=0, jobs=4): err= 0: pid=2691: Wed Feb  8 10:20:24 2017
  read : io=9082.9MB, bw=155003KB/s, iops=38750, runt= 60004msec
    slat (usec): min=0, max=3968, avg=101.91, stdev=555.85
    clat (usec): min=510, max=10015, avg=6502.91, stdev=574.65
     lat (usec): min=519, max=10576, avg=6604.93, stdev=187.08
    clat percentiles (usec):
     |  1.00th=[3376],  5.00th=[6496], 10.00th=[6560], 20.00th=[6560],
     | 30.00th=[6560], 40.00th=[6560], 50.00th=[6560], 60.00th=[6560],
     | 70.00th=[6624], 80.00th=[6624], 90.00th=[6752], 95.00th=[6752],
     | 99.00th=[7328], 99.50th=[7392], 99.90th=[7520], 99.95th=[7584],
     | 99.99th=[9152]
    bw (KB  /s): min=37061, max=39152, per=25.02%, avg=38787.30, stdev=339.29
    lat (usec) : 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=3.09%, 10=96.91%, 20=0.01%
  cpu          : usr=0.86%, sys=2.44%, ctx=73696, majf=0, minf=13174
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=2325202/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=9082.9MB, aggrb=155003KB/s, minb=155003KB/s, maxb=155003KB/s, mint=60004msec, maxt=60004msec

Disk stats (read/write):
  sda: ios=2321776/0, merge=2/0, ticks=8485391/0, in_queue=8487477, util=99.87%

We are getting ~38K IOPS whereas the advertised performance number for random read as per data sheet is 98K !

Any clues/suggestions as what might be hampering performance and any way to tune it to improve the performance numbers.

Thanks in advance !

-naveen