Monday, March 1, 2010

Memory management (part 1)

Ok, finally the second part of my ramblings about memory management.

After writing a bit about the different types of memory I want to write about the act of managing the memory in a game.

Note, that there is a big difference between systems with fixed (and known) memory blocks (consoles) and platforms that support different memory configurations (PCs/Macs). Another big factor is the availability of virtual memory (a MMU and external swap space. I will mostly write about platforms that have a known, fixed and non-virtual memory configuration and probably make some notes about virtual memory in a later post.

Ok, we have some memory blocks and lots of data that we have to load in a game. The data normally consists of:
  • The executable. Normally this one is just one big block, but in some situations it can make sense to split the executable into multiple sections ("overlays").
  • On most platforms you need some memory for system utilities (things like virtual keyboards, network configuration tools, save/load utilities and some more). 
  • Another part is the static data section (both zero and value initialized). These are allocated during the executable loading phase and are static in most cases.
  • Stack memory is required for each Thread. This normally is a relatively small section.
  • The remaining memory is used for the actual game data and that's the most interesting part.
The game data can be further divided into two different sections:
  • Pregenerated/static "Assets" that are prebuild and have to be loaded from you distribution medium. These are normally read-only and don't change. Examples for static data are 3D Models, Animations, quest text in RPGs and configuration data.
  • Runtime/dynamic "Instance data". This is the data that changes depending on actual game play actions. Examples for this is the position/velocity/orientation for all 3D objects in the scene, the current score for all players, the placement of objects in Games like "The Sims" and basically everything else that the user can change during the game. In most games this data is MUCH smaller than the static/definition part. There are games that generate some of the data that are normally pregenerated at run time (for example most of the 96k games generate 3D models/animations at run time)

Managing the game data:

Game data is the biggest consumer of the available memory resources. Effectively managing the memory for game data is really dependent on the actual game and platform restrictions. In general there are only a few different game types regarding memory management:
  1. Games that completely fit into the given memory. This is VERY rare in high-end games but is not unheard of in download games.
  2. Games that are structured in a way that leads itself to a natural way of partitioning. A lot of games are structured in levels that can be used for memory management as well: Normally the game data can be divided into global data that is needed in (nearly) every level and level specific data. This is the common case for racing games, map based shooters and beat-em-ups.
  3. Games that are one-big-world (tm) where you can transition between different parts of the world without any visible loading times. This is the common schema for mmorpgs, rpgs, action adventures and open-world shooters.
If your game is in the first group you are basically done: Just allocate one block of memory for everything and load the data / create the dynamic data.
Unfortunately this case is really rare in practice and we have to handle dynamic memory management most of the time.

For Scenario 2 you can use a very simple memory allocation schema: First allocate the global data from the available memory block in a linear fashion (This is simply a pointer to the beginning of the block that you can increase by the size of each allocation to get the start of the next one). After you created all the global data you just store the current pointer and allocate the level specific data in the same way. After the level has been finished you just reset the current allocation pointer to the stored pointer to free all the level data at once. With this schema you don't have any problems with fragmentation and you have only 8 Bytes Overhead for the whole allocator (You can use nearly the complete memory block for useful data).

Memory fragmentation: A short note on memory fragmentation: Fragmentation means that you can't allocate memory of a given size even if the size is smaller (or equal) to the amount of available/unused memory managed by the allocator. This means that the amount of memory that you can actually use is reduced over time. Fragmentation happens for example in block based "general-purpose" memory allocators that allow allocations to happen "out-of-order" (allocate Blocks 1-3 and free block 2. Now we most likely have a "hole" between blocks 1 and 3 (of at least the size of Block 2). Possible solutions for memory fragmentation are either an explicit defragmentation of the memory (which requires moving memory blocks inside the allocator and therefore changing all references/pointers to moving blocks) or not using a general purpose allocator in the first place (because most games don't need one anyway)

Ok, now back to Scenario 3: Games with no visible load times and a open world create the illusion that everything fits into memory and you don't need to load at all. The reality is that those games load data ALL the time. To make this work, we have to make some assumptions about the game data:
  • The "working set" of game data that is required in one specific frame / moment in time fits completely into a fixed section (<= 50%) of memory (It gets a LOT more complicated if this assumption doesn't hold). This assumption is nearly always true because you would have severe problems using that much data in a frame and still stay at an acceptable frame rate.
  • The required game data in an area of the game is "stable". What I mean by this is that you always need the same data in the same spatial region of the game. This can be assumed for most games as well because you rarely have wildly changing geometry/textures depending on time/other parameters.  
  • The target platforms is able to load the required data for a working set/region in worst case time lower than the time that is required to traverse the previous region. 
 If  these assumptions hold for you application/game you can use a relatively easy way to load the data of the next section while you play the current section. The idea is that you partition your memory area into N sections where each section contains the complete data required for one zone. While using the current region to display/play the current section of the game the system loads the next section into memory. This normally means that you have to make each of the N sections the same fixed size which can be a bit too restrictive. If you use exactly 2 Sections you can use a two sided version of the allocator algorithm I talked about earlier: Just allocate memory for the first section from the start of the memory block upwards (increasing addresses) and downwards from the other end for the next section (decreasing addresses). This makes it possible to have bigger regions than 50% of your game data memory block size as long as the adjacent sections need less memory.

A way of starting the loading process is the placement of triggers in a world that are bound to a specific section. These triggers can either be placed explicitly by level designers and/or  using

Because the loading time has no real upper bound in most systems (you can for example remove the disc or have a nearly broken hdd) you always have to have a sync point before switching to the next section (make sure that the loading of the next section has finished). Common ways of doing that is doors that don't open as long as the loading is in progress or in the worst case displaying a loading screen/overlay. This shouldn't be necessary most of the time though.

Ok. thats it for the second part... I hope that the third part won't take another month..


Tuesday, January 26, 2010

Memory management (part 0)

As promised I will get into the details of some of the base module components. I'll start with one of the most important ones: memory management.
Memory is (at least on current consoles) a very limited resource and has to be managed efficiently. The main properties of a block of Memory are:

  • the size of the block
  • the alignment of the first memory element
  • the access granularity that can be used to access the memory
  • the access speed (both throughput and latency) in both directions
  • the presence of a address translation unit that maps virtual addresses to physical addresses
  • the connection to other processors in the computer (speed, restrictions)

In nearly all consoles (and pcs) you have multiple distinct blocks of memory that have different properties. One common scheme for memory architectures is to have local graphics memory "near" the graphics processor.
Another option is to have a (usually small) "scratchpad" memory block that has lower latency and/or higher bandwidth to one of the processors and that can be used for certain algorithms.

Due to the growing discrepancy between the processing speed of modern processors and the speed of memory nearly every architecture has some form of cache that is a copy of some data in a smaller and faster block of memory that will be kept in sync with the original location in the 'main' memory. Important attributes of caches are:

  • all of the above (because it's just another block of memory)
  • the associativity of the cache. (wikipedia)
  • the presence of special cache-control instructions in the processor

The last point is pretty important for explicit control of the cache memory. Most of the time the cache works transparently to the programmer, but sometimes you want to control the cache behavior explicitly to gain a bit more performance. Examples for that are:

  • If you know exactly that you don't need the data that you are about to write anytime soon you can prevent 'cache-pollution' (wikipedia) by writing around the cache directly into main memory. This is normally done through some king of write-combiner (wikipedia) hardware and/or writing in cache-line granularity.
  • If your memory access pattern is known it advance you can give the cache a hint what you will access next. This 'prefetching' will on most platforms lead to one (or more) cache lines to be read into the cache while the next instructions are executed. It depends on the platform what operations are allowed while a cache line is read in the background.
  • on some platforms caches can be used to synchronize multiple processors. This commonly needs special operations to work.

On current hardware you should generally be memory bandwidth limited in your processing (in games at least) and therefore any kind of compression can be very beneficial.

to be continued...

posted while being intoxicated with Long Island Ice Tea.

Saturday, December 26, 2009

Dragons - tomorrow

The 26C3 stars tomorrow (ok, technically it starts TODAY, but I'm going to sleep in between, so it's tomorrow for me ;). If you're interested in hacking, (internet) politics and/or technical computer stuff you might want to check it out here:

motor3d - on google code

I just released the first piece of the motor3d code. It's currently heavily under development and probably not interesting to anyone ;)

I just released parts of the (new) graphics module that I work on right now. Most of the other (required) modules are not released yet.

I mainly wanted to secure the project name in google code and try if I could use the google code subversion server directly as my development server. I'm not entirely sure about the latter point though. On the one hand their server is much faster/more reliable than my shitty linux box and I would have access to the latest code from everywhere on the other hand would it be problematic with my approach of committing the complete compiler binaries (i guess Microsoft would like it if I checked in the complete msvc 8.0 copmiler ;) Not to speak of the 1GB size limit on the code repository (which should be enough for the source code).

I'm still thinking about the options.. But either way I will publish the source code on google code.

Tuesday, December 22, 2009

Building motor3d

Ok, this finally is the first technical post about the motor3d engine:

The first thing I have to talk about is the build system that I use. After years of pain and frustration we finally came up with a solution that I can live with. If I say "we", I mean mostly my colleagues at keengames who did most of the actual implementation. You can find the build system we use exclusively at keengames here. It's called 'lace' and it is basically a replacement for make with a real programming language (ruby) in the background.

The first and most important requirement for the build system was that we needed to be able to reproduce a build after months / years on a different computer without too much trouble. That's not really a big problem for me (it's a nice feature to be able to just compile your project very easily on another machine though). But additionally the way lace supports modules is very flexible and useful even in single-person projects like motor3d.

I won't get into lace much more because you can just read the documentation or the source above if you're interested.

Sunday, December 20, 2009

Blogs that I read

Someone asked me to post a list of programming related blogs that I read. So here it is:


Friday, December 18, 2009

Awesome repository visualization software

A colleague of mine just found this wonderful application that can be used to visualize your source code repository over time.

Very beautiful and nerdy..