How is ExpressCache 1.3.2 working?

Just a thought:

As I look over the delay problems reported in this thread I’m not so sure they are specifically related to firmware 1.3.2 as much as the general issue of overprovisioning of an SSD.   For those needing an intro to the subject here is a good one written by an LSI/SandForce guy: http://www.edn.com/design/systems-design/4404566/1/Understanding-SSD-over-provisioning .

My thought is this:  Suppose Sandisk offered this same caching SSD product in a 64 GB size with a 32 GB active partition.

The retail cost should only be about $20 more than the current product.  Considering how much caches get written to and how much housekeeping is involved over time, i’d pay an extra $20 for a “double provisioned” 32 GB cache with both increased performance and longevity.

Yea your right I did typo that.  I did do an 18Gb partition.  The partition command I ran was - ECCmd -partition (drive number) 18432 .

So far my cache has filled up to 14Gb.  But, I haven’t loaded any PC games or anything yet.  Do we know what the max cache partition size we can use before we will start seeing problems?

NWGuy the thought about the drive being too small and it should be larger and overprovisioned makes tons of sense.  The OCZ Synapse drive that I had was setup that way.  It was a 64Gb drive that was overprovisioned to 32Gb.  I was wondering why ReadyCache wasn’t setup that way.

I thought I was getting a better drive that was better supported via software by replacing my OCZ synapes (which uses the now defunct dataplex software) … overtime I am discovering that may not be the case :(.

I think think the Sandisk / Condusive ReadyCache product is good.    Especially since the release of the 1.3.2 software update, there have been relatively few bug reports.

17 GB fill of the default 29.xx GB partition is the lowest reported fill where delays occured, that was AlleyViper.   All the other reports I found started at just over 23 GB, there were several in the 23.xx range, and then up from there.    It bears noting that the cache is still working at 100% fill for almost everyone, the task is mostly just trying to improve performance at this point.

Just to clarify, when I posted that delay with only 17GB cached it was due to a Win logo freeze at the usual point, but it was way shorter (only about 2-3s) than annoying ones with >22GB filled. They seem related, anyway.

Also, after reducing the caching partion I’ve had no more casual freezes later in the boot with stuck hdd led lit. Those happened when cache was near full.

Most probably, a 2/3 ssd size cache partition should be near the limit before trouble happens for most, given that it won’t fill completely. I still hope most issues can be solved either by software, or a ssd firmware update.

1 Like

Resetted cache twice, still hang-in on windows boot.

1 Like

Hello,

When will 1.3.3 be coming out to address these boot delay issues?

Thanks.

Hooked it up to my raid card to see if that works better again:P I just disappears after a while and sets the raid alarm off, so I guess its rma time:P

mattschnaidt, rlewandowski23, NWGuy, and anyone that tried a cache size reduction: did you also notice any improvements after this time of testing?

With my cache drive capped at 18Gb I haven’t had any problems yet.  I wish sandisk would fix this so I didn’t have to cap the drive. Honestly, it’s beginning to seem like a design flaw and they should have made it a 64Gb drive that is overprovisioned to 32Gb.

This is exactly what my OCZ synapse drive I used to use did.  Honestly, Sandisk should just release a new drive and give us all discounts on a new one if we send the old one in.

I tried “Preview” before submitting a long post and could not find a way to edit the post further, in so doing lost the post – out of time to type it all again now.

So without explanations, my recomendation for anyone experiencing what they feel are excessive boot delays are to try a partition size of 25Gb (25 x 1024 = 25600) using one of the methods previously discussed in this thread.  That may be the largest cache size you can use to reduced this problem if you are experiencing it.

Regardless of partition size one uses, occasionally seeing the Windows logo for up to two revolutions of the dots is part of normal cache operation.  Testing continues, I have two of these in different machines now now, one will remain at the original 29.xx Gb partition  and one will vary.

I’ve given more thought overprovisioning since I mentioned it as a possible way to reduce cache housekeeping delays.  If I personally bought a 64 Gb SSD cache drive, I’d probably run it in the 50-60 Gb range rather than half of that to overprovision.    In other words I’d accept reduced longevity and occasional housekeeping delays to gain additional files cached.

Another issue is marketing.   Although one could justify overprovisioning a small SSD used as a cache due to the relatively low total cost of going from 32 to 64 Gb of NAND, what about a 1Tb SSD?    How are you going to sell increased reliability for one product and not another?   I can see it for ‘mission critical’ enterprise level cost relatively no object installations but for the home market, “I can give you TWICE the storage for the same money” would be impossible (IMO) to beat.

Yea that would be an unfortunate design flaw if its failing because the relatively cheap controller it uses can’t keep up with all the writes when full, causing resets or whatever.

That being said I never got close to full on mine after a while, so I guess it failed.

I don’t think it’s the controller, it should easily be able to keep up with LBA requests.   I agree with AlleyViper, this is most likely a Condicive software glitch.    My hypothesis is that something is that some permanent or temporary data array is, under some circumstances,  running out of pre-allocated or available space, or possibly some numerical value is exceeding type allowd value (not likely these days).

Two things that can negatively affect caching performance:

Security software scans.  We all (hopefully) do a quick scan each boot, and probably idle time scans – and every so often I scan every single file, archive, etc.   This makes the cache record lots of LBAs many of which are otherwise rarely read.

Defragging moves data from one LBA to another, cache has to adjust to new read patterns.   Active defraggers that try to move files around in real time to keep them contiguous generate  ‘spurious’ (from a caching perspective) LBA reads and change which LBAs hold which data, again forcing the cache to reorder/reprioritize.   

I re-formatted my PC and installed Windows 8.1 with Update x64.  I thought I would give ExpressCache 1.3.2 another shot at working without modification.  But, once the cached filled up it start causing problems like my PC taking forever to boot up.

I took the suggestion above and capped my cache at 25Gb.  That has helped.

Sandisk when is this going to get fixed!!!  This is taking way too long to release updates for this product.

Well, to finally confirm it: after >month and a half of use with only 16GB this computer exibited no readycache related problems. No winlogo temporary freezes or later catastrophic system freezes while running windows with near full cache. I’ll try a 3/4 full drive now (22900KB) excluding again dedicated storage and download drives (torrents mess up cache badly by creating too much cached activity, wich leads to a quicker purge of more important system data). As I’ve had repeated trouble with only about ~22-24GB filled, I’ll undershoot 25GB a bit more than suggested.

I’ve used for a short time a laptop with a 24GB mSATA cache drive using readycache, and there seem to be no issues. This ssd almost full all of time, because on one partition it held hybrid boot data (maintained by other program), and the remaining ~16GB were kept close to full.

Then again, I guess overprovisioning should yeld better IOps (specially on writes) and drive longevity for wear leveling, but such freezes on startup that take a lot of seconds seem more related to a software fault (as NWGuy pointed) than the added latency of a non-optimized SSD/weak controller. Hence why less cached data up to some 20-25GB might strain less the software.

I also believe that many cache reset problems can be due to windows updates, file scans, scheduled defrags, hours of torrenting + moving files, that cause havoc on the LBA list, leading to an expected total resynch.

AlleyViper,  here’s why I’m thinking you can go up higher than 23Gb without delays, even though you and others have seen issues starting near that point:   As far as I know, all these reports were based on running the full available OEM 29.82 Gb size for the active partition!

Now if that is not the case, and if you have experienced cache delays with smaller partition sizes, by all means don’t go above what  works for you – and please post here.

We also need to be sure everyone reporting has upgraded to firmware 1.3.2, as bug reports of all kinds have decreased significantly with that revision, and I know some of the posted delay issues involving partition fill percentages were posted using prior firmware.

OK, here goes:  How a cache works.

You start with a certain low number of disk reads per time period or reboot cycle to initially fill the cache – lets say you start by caching LBAs that have 5 reads to start with.

At some point your cache gets close to full (let’s say 80%) – clearly caching all LBAs with 5 reads is going to take more space than available.   So you want to increment the required LBA read count to be cached by one, making it 6 in this case, and then delete any cached LBAs with fewer than 6 reads from the cache.

In my case with a 16 Gb partition, I observed the cache filling, then decreasing, then refilling again, just as you would expect – with slight boot delays where you would expect housekeeping to be clearing out deleted low activity LBAs.   I’ve observed several cycles, the highest cache fill I’ve observed was 86.25%, the lowest (after initial fill) was 76.25%.

What people are observing using the OEM 29.82 Gb partition is that the initial cache housekeeping pass, triggered somewhere around 80% fill, is taking a lot longer than 5 seconds or so to complete  – yet initial housekeeping passes on smaller partitions finish quickly – disproportionately quickly.     It’s possible that some absolute number is being exceeded – say I did a “scan every file” virus scan over and over again to try to max out the array of LBA read counts – but my guess is that that scenario has been anticipated.   Most likely is that we’re running out of what I’ll call ‘scratch space’ when the partition is set to 29.82 Gb.

As an example, when deleting slower moving LBAs, one may wish to populate a separate array for housekeeping to work off of – perhaps there is not enough room in the 2.18 Gb (code?) space between 29.82 Gb and 32 Gb for this array.   Maybe at one time everything fit but some code change ended up taking more space.

If that’s the case, it MAY be a very near miss – maybe a 29.81 Gb partition would work.   Most likely, making another full 2.18 Gb available by reducing the active partition size to 29.82 - 2.18 = 27.64 Gb would work.  That would cover the “Oh, I thought I had the entire 2.18 Gb space for data structures!” scenario.  Not that that sort of thing ever happens :)   Or is likely happening, as everyone’s cache would be affected.

HOWEVER, there’s always the chance that something does happen as cache fill nears 23 Gb in absolute terms.  Having observed a maximum fill of 86% prior to housekeeping, let’s say 90% to be safe,   a 25 Gb partition x 0.9 = 22.5 Gb, so AlleyViper, I really think you will be ‘safe’ with a 25Gb partition.

There is another possibility too.   We’ve been addressing this problem by reformatting and repartitioning the drive using firmware 1.3.2 – what if that is all that’s necessary?    Hs anyone tried a ‘clear out and start over’ from the command line using firmware 1.3.2 and the OEM partition size? 

@NWGuy,

Perhaps SanDisk should simply fix the software?

ReadyCache is supposed to be transparent to the user.

I will never buy a SanDisk SSD type product again.

Could you blame me?

Flavio :angry:

@NWGuy

Btw, the partioned 29.82GB are already the full declared ~32GB without any spare/scratch or provision space (that would be the case of a drive sold as 30GB with the remaining 2GB unacessible for provision). Manufacturers use Gigabytes (1000^3) instead of formated Gibibytes (1024^3) for storage space.

This 32GB ssd has 62533296 available physical sectors with 512 bytes per sector, which gives 32017047552 bytes (~32000000000) or 32.02GB. If you divide those bytes for 1024 three times, you’ll end up with 29.82GB as commonly reported by an OS that prefers a 1024 base instead of 1000 (or, more precisely, 29.82GiB).

For now, I’ll keep this 3/4 partition to check for abnormal delays or freezes. Given that this drive receives no TRIM commands from Windows because it hasn’t an atributted drive letter, a guaranteed 25% free space should be good enough for its own housekeeping and wear leveling (independently of current readycache software issues).

If everything goes well, I’ll reclaim I bit more space as you suggest.

Btw, you’re right in your assumption, every freeze I’ve had with 22-24GB filled, and once even with 17GB was under a 29.82GB full sized partition. Unfortunately this testing will take some time, as I’m not always near this PC.

 @Flavio

No one here is happy with this situation, the only “good” sign, is Sandisk acknowledging the issues and the hope for a fix. As a matter of fact, I’ve retired my readyache to a PC I’ve built for a familiar because of all of the frustrating problems and went with a decent sized SSD on my desktop from another brand. This was more than half a year ago.

If it was working right from the start, or even the way its finally working now with these workarounds, that investment wouldn’t be necessary. Just try to reduce your cache partition size, and your experience with this drive might be less frustrating. You have nothing to lose until there’s a proper fix.

AlleyViper, can’t believe that got by me, especially since I’ve had the SSD partition pulled up on Partition Wizard more than once – thanks for the good explanation.

I just had a cache ‘reset’ happen to me on my OEM partitioned unit at somewhere between 26.3 and 27 Gb  – 26.3 was the last reading I saw working and the cache was filling very slowly at that point.   The following shutdown took a long time (minutes) of disk activity, and the next bootup went very slowly, with Windows logo delays.  When I looked at the cache it was starting over at 0.07 Gb, but it started filling again, up to 3+ Gb in 10 minutes and 12+ Gb by the end of the day.

29.82 x 0.90 = 26.84

29.82 x 0.86 = 25.65

The delay issues I’ve seen occured in this range

29.82 x 0.78 = 23.26

29.82 x 0.74 = 22.07

With the exception of your 17Gb observation, around 75% is the lowest cache fill reported with observed delays, with more reports around 80%.

  

Both my caches are working fine overall and are definitely speeding up disk IO and reducing physical disk wear and tear.

I don’t mind occasional small housekeeping delays at bootup – maybe because that’s how I expect a cache to work.    I don’t mind leaving some working space for cache operation – that’s the easy fix for Sandisk, just reduce the OEM partition size.

AlleyViper will be trying around a 23 Gb partition.  I’m going to try 25 Gb on both of mine.   It would be great if someone could try 27 Gb, watching cache size daily once fill exceeds 23 Gb.  It may be awhile before I report back.

Well, things didn’t go as I planned – and I learned some things.

 

On the machine I had tried the 16 Gb partition on I decided to try a 27 Gb partition, and start the GUI splash screen with Windows to keep a close eye on the numbers.

 

Oddly enough, even after the command line steps including clearing the old partition and formatting the new partition, the cache seemed to start with the 12+ Gb fill it had on the 16 Gb partition – so I guess those eccmd commands don’t clear the “LBAs to be cached” table.     Wanting a fresh start, I used Clear Cache from the Options on the GUI and that started the partition over at 0.07 Gb .

 

Now on the other computer, the one with the never modified original partition, remember it had recently reset after getting quite full and was refilling much more rapidly that it had the first time.   Well, it has been operating fine ever since, roughly in the 75% to 85 % fill range – no resets.  Also interestingly  I’ve observed it making decreases in cache size during cache operation – I’ve watched on Task manger, during these times both Express Cache and the Express Cache Service grab 25 to 30 Mb of memory and consume around 1.5% CPU.   Very good performance in my opinion.    Now I can’t say for sure whether the actual NAND housekeeping  occurred ‘live’, but I can say there were no major boot delays either before or after cache decreases.

 

Summarizing, what I observed was a very slow cache fill (many days) for the first time through, then a cache reset to 0.07, and then a very fast cache fill and normal cache operation thereafter.

Now back to the 27 Gb machine – I observed the exact same pattern – slow fill, reset, fast fill, normal operation.

 

When the reset occurred, however, on this machine I was loading the GUI at startup, and an additional text box appeared  (In my words I didn’t write it down):  “The cache has been reset so operations will be at normal speed.   Cache resets can be caused by . . .” and it several things including a Windows update.

My conclusion is, this program is currently designed so that the first time you use it, it will fill the cache slowly while generating an LBA use table, then clear the cache and start over with that table.    I don’t remember reading that anywhere so maybe Condusive and Sandisk need to explain a little more what normal operation is so folks aren’t surprised when the cache resets.

 

I also think Sandisk needs to make two changes to the GUI.   First, the informative text box that a cache reset has occurred should display whether or not you are starting the GUI with Windows.   Second, the small ExpressCache icon should load in the system tray by default, again whether or not you are starting the GUI with Windows.

 

For folks who, for whatever reason, want to avoid any cache reset EVER, it may not be possible, although I don’t remember ever seeing a cache reset when I had the partition size down to 16 Gb.