I just did a test of voice recording on my 1GB Clip running v.20 firmware. I recorded an A-440 tuning fork. So far as I can tell, the playback from the Clip is the same frequency as the live tuning fork. I also tried copying the .wav file over to my laptop and playing it there, and again it seemed the same as the live fork.
Both the Clip and your computer depend upon a crystal oscillator for their sense of time. To the extent the actual frequency of a specific crystal varies from it’s nominal value, then you will see variations in pitch and recording time. But I would expect any such variation would not normally produce a difference that would be audible in the short term. So I don’t really have an explanation for what you describe. Is it as far off as, say, a half-tone - A playing back as B-flat?