NME Rendering Methods (Benchmarked)

This morning, I decided to fork BunnyMark to find how each of the NME rendering methods compare to one another.

I was surprised by the results. I suspect that some of these tests can be improved, so feel free to send pull requests.

The benchmark was run on aĀ MacBook Pro with OS X 10.8.

One of the things you need to bear in mind when developing for mobile devices is that the GPU is almost entirely responsible for adequate performance.

On a laptop or a desktop, CPU-based rendering can perform as well or better than GPU rendering, depending on the system. All the tests used GPU rendering on native, but software when testing Flash Player.

Based on my experience with Flash Player, I would expect it to perform 66% worse on mobile. This is not meant to be a comparison with Stage 3D, but you can see some results here.

Native

  • 35000 bunnies using drawTiles
  • 10750 bunnies using drawTriangles (no alpha)
  • 9250 bunnies using drawRect (no alpha, scale or rotation)
  • 1750 bunnies using Bitmap
  • 1750 bunnies using drawTiles (without batching)
  • 700 bunnies using copyPixels (no alpha, scale or rotation)

Flash

  • 11750 using copyPixels (no alpha, scale or rotation)
  • 900 using Bitmap
  • 900 using drawTriangles (no alpha)
  • 100 using drawRect (no alpha, scale or rotation)

These are the amount of bunnies that were rendered at a consistent 60 FPS. As you can see by the notes above, some of the tests did not render the alpha, scale or rotation values of the benchmark.

Based on these tests, the drawTiles API is the clear performance winner for native. However, drawRect and drawTriangles both performed very well, though they lacked some rendering features.

Using Bitmaps did not perform as well as the Graphics methods, but it still performed faster than in Flash.

If you are targeting Flash Player, the fast software rendering method is copyPixels, unless you have a project that does not need a lot of instances on-screen. This also sacrifices features for performance. Bitmap still may be a faster method if you need alpha, scale or rotation.

  • Martamius

    Awesome. Thanks!

  • HalfDayDeemo

    Forgive these questions but I am trying to get my head around the rendering methods in NME.

    When you say ‘Bitmap’ above, do you mean using the Display List?
    Does the above list imply that drawTiles is not available for Flash?

    Cheers.

  • No problem!

    Yes, I should have been clearer. When I say that I’m testing “Bitmap”, I mean multiple Bitmap instances on the display list.

    “drawTiles” is an API that does not exist in Flash. NME provides a compatibility layer for Flash, which is currently based in either “drawRect” or “drawTriangles”, depending on whether certain features are used in the draw array. This is meant more for testing purposes than for best performance.

    Philippe Elsass has a “TileLayer” library which provides a mirroring of drawTiles using copyPixels instead, which should perform much faster.

  • HalfDayDeemo

    Thanks for the reply Joshua.

    I am currently trying to build a game that I really want to get ‘out there’ and obviously cross platform if very handy for this. Using Haxe NME I really like being able to quickly build/test using the flash target but ultimately I want the game to be aimed at iOS/Android; so understanding these things is important.

    Would it be silly to use the Display List as a baseline for early stages of development? Moving to drawTiles should the need for optimisation occur?

  • That’s not silly at all. That is exactly how I would build my own games šŸ™‚

  • HalfDayDeemo

    Brilliant. Thanks for your quick and informative replies. I would hope you might see me around a lot on the forums over the next month or so….yeah sorry in advance! šŸ˜€

  • Reppy Gibbs

    Thank you for this comparison.
    However, it would be nice to compare the drawTiles results to something more “low-level” like SFML, I know it’s too much work for little result, but I have no idea on how it would be, I’d like to know at least in order of magnitude (e.g.: SFML is [10/100/1000] times faster). Do you have any idea on that? I’m not asking you to program the thing in C+++SFML, I’m just asking for a somewhat reliable guess. šŸ™‚

    Thanks.

  • My guess is that this performs better than a casually written C++ projects, but not quite as fast as one that is perfectly optimized, but it may be close. The renderer has been written with C++ and OpenGL, and has been optimized quite a lot. All the application code becomes C++ as well and is compiled together. There is a little bit of performance loss when converting data types, but almost all of this performance is going to be on the renderer, which is optimized C++, just as you might do for an SFML project.

    Just curious, do you have experience with SFML? Do you know how it compares to SDL? I wouldn’t expect a performance difference between the two, but I’ve heard recommendations for SFML, and wonder how it may handle our windowing differently or better than SDL does, currently.

  • Reppy Gibbs

    Thank you for your reply.

    Yeah, fortunately I’ve compared them once, and surprisingly SFML was way faster, consistently 200-300% faster (3-4x) depending a lot on the hardware. SFML is way better imho, it’s more modern, fully OpenGL based, more “high-level”, and the performance is also much better, of course you could use the OpenGL API directly on SDL to get much better performance I think, but that won’t make much sense imho. That test i made was a lot of time ago and I wouldn’t be surprised if the latest version of SFML is even faster. There are benchmarks showing that SFML is 50x faster than SDL, that’s bullshit imho, because it’s hardly real-world stuff (unless you have a game that really needs thousands of alpha-blended sprites :)). Overall SFML is faster (not 50x), simpler and easier. I wouldn’t recommend SDL nowadays at all.

  • OpenGL over SDL is what we’re using now, but if SFML has better windowing or more features, it might be switching šŸ™‚