Wednesday, July 27, 2016

Improving Game Loading Times

Faster computing is a game of eliminating bottlenecks. Every component in a system is waiting for something, whether it be the results from CPU calculations, a piece of information from memory, storage, or the network, or even just input from the user.

Ideally, the computer would always be waiting on the user rather than the other way around. For the most part, today's computing experience approaches that ideal for most. This is why it's all the more jarring when you do have to wait which is often the case with large games.

For game loading, a common piece of advice to improve load times is to get an SSD if you are using a regular hard drive as storage. And it definitely helps.

Regular hard drives are so much more sluggish that replacing them with SSDs improves the general responsiveness of computers more than just about any other upgrade. And for game loading times, it makes sense that faster storage devices lead to faster loading times. But at some point, storage devices will become so fast that they will no longer be the bottleneck.

It turns if you have an SSD, you are already there because even if you increase the speed of your storage device by an order of magnitude, as is the case with RAM drives versus SSDs, game loading times are basically unchanged.**

Why?

For many programs, the bottleneck moves back to the CPU and the rest of the system. Rather than the CPU waiting on storage, it's the user waiting on the CPU to process the instructions that setup the game. To demonstrate this, I clocked my CPU at 1.2, 2.4, 3.6, and 4.8GHz, and then measured initial and subsequent loading times for Killing Floor 2.*


Although it is clear that a faster CPU helps loading times, the benefits become smaller as CPU frequency increases - even looking at things with a percentage change perspective, i.e., the load time of the 2.4GHz run was 62% of the 1.2GHz run despite its 100% clock speed advantage and the load time of the 4.8GHz run was 71% of the 2.4GHz run despite its 100% clock speed advantage. In addition, it is also exponentially more difficult to increase CPU frequencies. Thankfully overclocks in the low to mid 4GHz range happen to be the sweet spot for the processors Intel has released over the past few years so most of the load time benefits can be realized for those with an overclockable system or the latest processors e.g. 4790k turbos to 4.4GHz and 6700k boosts to 4.2.

The second load runs were performed to see the effect of the Windows cache. These runs were about 15 seconds faster across all CPU frequencies. If having parts of the program preloaded into memory can save that much time, it makes me wonder why RAM drives don't perform better. I have a feeling that overhead from the file system might be to blame. Maybe game resources are unpacked in RAM whereas they still need to be decompressed even on a RAM drive. I'd have to unpack the files, if they even are compressed, to test the theory. This has the benefit of shifting some of the burden off the CPU and onto storage. Right now my Killing Floor 2 directory takes up 30GB so even mild compression can easily have it balloon to a size where it doesn't really make sense to reduce loading time but use up precious SSD space. It's worth trying someday.

In any case, the Windows cache after running Killing Floor was 3GB. If all of that represents game assets loaded straight from disk, then that represents at least six seconds of the 15 - probably more given the the maximum read rates for the SSD of around 200MB/s during loading and the important 4K QD1 performance of typical SSD drives, like mine, which is about 29MB/s..

Benching my SSD (Seagate 600 Pro 240GB SSD)

Then again, almost all online RAM drive vs SSD game loading time comparisons suggest this is not the case.

4K QD1 read performance is important because games, and most typical programs, mostly do low queue depth accesses. Resource monitor showed a queue less than 3 during loading. This type of workload is tough to optimize and even the fastest multi-thousand dollar NVMe PCIe datacenter SSDs are no better than a decent consumer drive using old fashioned SATA. The 4K QD1 in this case may be a red herring given the lack of a RAM drive advantage even on a 4.5GHz 2500k.

As an aside which I'm not really going to separate, the best SSDs, like the Samsung SM961 can do 60MB/s for 4K QD1. It's a very good performance and is basically what the ACARD ANS-9010, a SATA based drive that uses much faster DDR2 DRAM, could do (63MB/s or 70MB/s or 55MB/s depending on who you ask). On the other hand, it shows just how much overhead can hamper performance. This user was able to get double or triple the performance (130 to 210 MB/s) on DDR2 and a Core 2 Duo with a software RAM drive. I don't know if its SATA overhead or what, but that's a very significant hit.

Now if only someone would make a PCIe drive using DDR3...

RAM drive bench on my system (3930k 4.6GHz DDR3-2133)

Imagine what a newer computer with DDR4 4000+ could do! (It'd probably fill up the bars) But I'd still rather have a hardware solution over software. That's how I feel about all tasks where there's a choice between hardware and software. REAL TIME. DEDICATED. GUARANTEED PERFORMANCE. - not - "your task will be completed when the Windows scheduler feels like it and as long as the hundred other programs running play nice with each other"

Anyway, performance monitoring software revealed some other interesting facts during the test. Even on the 1.2GHz run, the maximum CPU thread use was 80% even though it is clear that the CPU speed was constraining the load time. (It was 60% for first run at 4.8GHz, 45% for subsequent). The unused CPU capacity might be the result of a race condition but I think it's safe to say there is room for optimization software or hardware side.**

But it's completely understandable if game loading time is very low on the list of developer priorities. 

* From starting the program to firing first shot with loading screens disabled using the
"-nostartupmovies" launch option. This actually saves a good amount of time.

** I tried changing core affinities and counts, priority, hyperthreading, RAM speed (1066-2133), and there were no changes. I'm using a stopwatch so there might have been advantages but nothing like the effect CPU frequency had.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.