Author Topic: INFO: Draw calls not geometry as a bottleneck in NWN.  (Read 2876 times)

Legacy_virusman

  • Sr. Member
  • ****
  • Posts: 448
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #105 on: May 06, 2014, 06:50:10 pm »


               

If only we had the headers... Switching from immediate mode to VBOs and replacing shadows with shaders would probably boost the performance a lot.



               
               

               
            

Legacy_Zwerkules

  • Hero Member
  • *****
  • Posts: 1997
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #106 on: May 06, 2014, 08:20:51 pm »


               


Turn off shadows '<img'>




I wish I could have liked this post more than once! '<img'>


               
               

               
            

Legacy_Zarathustra217

  • Sr. Member
  • ****
  • Posts: 322
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #107 on: May 07, 2014, 09:52:59 am »


               

The end users can themselves switch off the shadows if they have performance issues, so I don't see any reason to do that on the model. On the other hand, it is a viable solution to disable shadows on complex meshes and then make simpler invisible meshes cast the shadows instead.



               
               

               
            

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #108 on: May 07, 2014, 01:59:13 pm »


               


The end users can themselves switch off the shadows if they have performance issues, so I don't see any reason to do that on the model. On the other hand, it is a viable solution to disable shadows on complex meshes and then make simpler invisible meshes cast the shadows instead.




Okay, if you are making complex meshes, shadows are only half the problem. Fair point. Rendering them is the other half.


               
               

               
            

Legacy_OldTimeRadio

  • Hero Member
  • *****
  • Posts: 2307
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #109 on: May 07, 2014, 07:31:18 pm »


               

I had some time open up so I tried to reproduce this.  I installed FRAPS, got my on screen display of the FPS without having to do anything.


 


I tried the first test module (here, file is "File bundle: module + hak + uncompiled tile models") in the game, using the instructions here, which includes starting out staring at one's feet then zooming out as much as possible.  My feet shots look like this, my zoomed ones look like this


Q: OldMansBeard, are those shots representative of what you see when testing?


  


 My results:                                                                     OldMansBeard's results:

 Area1x1 - 148/172 (zoomed out, zoomed in)          1x1 = 112/180 fps

 Area2x2 - 151/172                                                        2x2 = 112/180 fps

 Area4x4 - 132/168                                                        4x4 = 101/180 fps

 Area8x8  - 83/160                                                         8x8 =  69/160 fps


 


These kind of numbers seem close enough to say I've reproduced what you saw.  But there's a problem: Actually walking around in any of the test areas, including the 1x1, brings on horrible lag instantly.  Something's not right, but I can't tell what it is.  I gave a cursory inspection to your models and walkmesh and nothing stood out as being improper.


 


Q: Can you or anyone else reproduce this lag and help figure out where it's coming from?  I'm getting it just trying to walk anywhere.  I'm not sure whether that would affect the FPS tests or not.  It only seems to happen when I'm moving so my guess is not but it's still unexpected behavior.  For anyone who's interested: FRAPS can be downloaded here.  Just install it and you'll get the FPS overlay without having to change anything else.  Test module can be downloaded here, and I'm using the file called "File bundle: module + hak + uncompiled tile models".


 


Interestingly, using that same test module in the toolset, zooming all the way out there was a marked increase in FPS for consolidated tiles:


Area1x1  - 28 FPS (non-optimized, all the rest are optimized)


Area2x2  - 50 FPS


Area4x4  - 83  FPS


Area8x8  - 76 FPS


Area16x8 - 70 FPS


 


So at least up to a point (4x4), it seems to increase the FPS quite a bit. 


 


Q: Can anyone reproduce that in the toolset?


 


Q: Anyone have opinions on what the difference is?


 


------------


 


At least on the geometry (and probably texture) front, check out these scenes and the FPS I'm getting in each with the same settings on my client- namely shadows, shiny water and VSYNC off:


 


xEA43lc.jpgBG9N7Zj.jpg


0Ij2mRJ.jpgIRSE1wu.jpg


 


If anyone wants to actually compare their FPS in these scenes, FRAPS can be downloaded here, here is the module I just whipped together that the above screenshots are from.  You'll need the hak in this archive to play it.  I can absolutely understand how I can get 55 FPS in Megaton compared to the 122 FPS in Rural.  Harder to explain is why I only get 5 more FPS in Tropical and 10 less in TNO from those vantages. 


 


Q (to anyone): So what is going on to either allow Megaton to run so smoothly or TNO to run so slowly?  I'll take either answer as long as it's something I can reproduce.  If you use NWN Explorer Reborn 1.63, with the option of "Outline Polygons in Model Meshes" turned on, you can see the raw geometry that's at play in the Megaton hak.  And the ~85 insanely hi-res (many alpha) textures involved.



               
               

               
            

Legacy_OldTimeRadio

  • Hero Member
  • *****
  • Posts: 2307
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #110 on: May 07, 2014, 07:41:55 pm »


               


The end users can themselves switch off the shadows if they have performance issues, so I don't see any reason to do that on the model. On the other hand, it is a viable solution to disable shadows on complex meshes and then make simpler invisible meshes cast the shadows instead.




 


Also, if an area maker wanted to be sneaky, I believe that a custom environment in environment.2da could turn shadows off, even if the mesh had shadows on them and the client viewing them had shadows turned on.  Re: LIGHT_SHADOWS and DARK_SHADOWS.  I believe the description for DARK_SHADOWS to be a typo and that it's for night, not day, which is what LIGHT_SHADOWS is.  YMMV, but I think that's right.


               
               

               
            

Legacy_OldTimeRadio

  • Hero Member
  • *****
  • Posts: 2307
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #111 on: May 07, 2014, 07:52:29 pm »


               


If only we had the headers... Switching from immediate mode to VBOs and replacing shadows with shaders would probably boost the performance a lot.




 


It's one of the first things I'm going to be asking Overhaul for if they ever work their magic on this game.  Trent & Cam are destined/cursed to revisit NWN.  After MDK2 HD and Baldur's Enhanced, it's the natural "next step".


 


Relevant from the Omnibus, from RTrifts, thread "Emitters & CPU usage":

 



It's a little more complicated than that though.

Emitters and creatures are inherently not optomized for the way that OpenGL works with NWN. (Tilesets on the other hand, are optomized much more tightly, so one polygon in a tile is less than one polygon in a creature or on an emitter in game engine load terms, even if both are animated).  The current vertex pool code in NWN is designed around the old Vertex Array Range (VAR) style of rendering, where the application would be forced to do all memory management for geometry data. A little over a year ago, the Vertex Buffer Object (VBO) extension was introduced to OpenGL, which moved the burden of memory management back into the driver. This is what Bio uses in KotOR in the Odyssey engine, but Aura is still VAR style.  If VBO was implemented in NWN, it would permit creatures and emitters or even entire tilesets to be packaged up as objects. That would much less copying of data, being able to download static geometry to video memory for much faster rendering, etc... That's what KotOR has - but we don't have it in NWN.  Anyways - long story short: a polygon is not a polygon when you are comparing tiles vs creatures & emitters.

So sayeth roboius, and on stuff like this, I salute him smartly.



 


Examination with gDEBugger shows that at least some types of emitters are optimized (multiple particles being drawn in a single pass), but his overall point holds.



               
               

               
            

Legacy_virusman

  • Sr. Member
  • ****
  • Posts: 448
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #112 on: May 07, 2014, 09:07:43 pm »


               


It's one of the first things I'm going to be asking Overhaul for if they ever work their magic on this game.  Trent & Cam are destined/cursed to revisit NWN.  After MDK2 HD and Baldur's Enhanced, it's the natural "next step".


 


Relevant from the Omnibus, from RTrifts, thread "Emitters & CPU usage":

 


 


Examination with gDEBugger shows that at least some types of emitters are optimized (multiple particles being drawn in a single pass), but his overall point holds.




Unfortunately, NWN:EE is highly unlikely: Trent Oster said that multiple times in his Twitter: https://twitter.com/...385563095814144


Yes, some things are batched, but the majority of the calls are single vertices - you can see that in gDEBugger detailed view. Without headers, I can't easily tell what types of meshes are batched.



               
               

               
            

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #113 on: May 07, 2014, 09:31:46 pm »


               

I had some time open up so I tried to reproduce this.  I installed FRAPS, got my on screen display of the FPS without having to do anything.

 

I tried the first test module (here, file is "File bundle: module + hak + uncompiled tile models") in the game, using the instructions here, which includes starting out staring at one's feet then zooming out as much as possible.  My feet shots look like this, my zoomed ones look like this
Q: OldMansBeard, are those shots representative of what you see when testing?



 


Yes, those screenshots are right. You are doing the same tests. Only slight difference is that I set creature shadows to high, which, with an invisible PC, gives no shadow at all at your feet rather than the blurred circle of simple shadow. But it doesn't affect the figures much.


 



My results:                                                                     OldMansBeard's results:

 Area1x1 - 148/172 (zoomed out, zoomed in)          1x1 = 112/180 fps

 Area2x2 - 151/172                                                        2x2 = 112/180 fps

 Area4x4 - 132/168                                                        4x4 = 101/180 fps

 Area8x8  - 83/160                                                         8x8 =  69/160 fps

 

These kind of numbers seem close enough to say I've reproduced what you saw.  But there's a problem: Actually walking around in any of the test areas, including the 1x1, brings on horrible lag instantly.  Something's not right, but I can't tell what it is.  I gave a cursory inspection to your models and walkmesh and nothing stood out as being improper.



 


That's interesting. I didn't get that. I found fps drops a bit when moving but not hugely. Just a thought - try the same thing with my V2 upload. Does it happen in the same way? If it does or doesn't, that might give us a clue.


 



Q: Can you or anyone else reproduce this lag and help figure out where it's coming from?  I'm getting it just trying to walk anywhere.  I'm not sure whether that would affect the FPS tests or not.  It only seems to happen when I'm moving so my guess is not but it's still unexpected behavior.  For anyone who's interested: FRAPS can be downloaded here.  Just install it and you'll get the FPS overlay without having to change anything else.  Test module can be downloaded here, and I'm using the file called "File bundle: module + hak + uncompiled tile models".

 

Interestingly, using that same test module in the toolset, zooming all the way out there was a marked increase in FPS for consolidated tiles:

Area1x1  - 28 FPS (non-optimized, all the rest are optimized)

Area2x2  - 50 FPS

Area4x4  - 83  FPS

Area8x8  - 76 FPS

Area16x8 - 70 FPS

 

So at least up to a point (4x4), it seems to increase the FPS quite a bit. 

 
Q: Can anyone reproduce that in the toolset?

 
Q: Anyone have opinions on what the difference is?

 

------------

 

At least on the geometry (and probably texture) front, check out these scenes and the FPS I'm getting in each with the same settings on my client- namely shadows, shiny water and VSYNC off:

 
xEA43lc.jpgBG9N7Zj.jpg
0Ij2mRJ.jpgIRSE1wu.jpg

 

If anyone wants to actually compare their FPS in these scenes, FRAPS can be downloaded here, here is the module I just whipped together that the above screenshots are from.  You'll need the hak in this archive to play it.  I can absolutely understand how I can get 55 FPS in Megaton compared to the 122 FPS in Rural.  Harder to explain is why I only get 5 more FPS in Tropical and 10 less in TNO from those vantages. 

 
Q (to anyone): So what is going on to either allow Megaton to run so smoothly or TNO to run so slowly?  I'll take either answer as long as it's something I can reproduce.  If you use NWN Explorer Reborn 1.63, with the option of "Outline Polygons in Model Meshes" turned on, you can see the raw geometry that's at play in the Megaton hak.  And the ~85 insanely hi-res (many alpha) textures involved.



 


Just looking at a small sample, TNO01 tiles seem to have about 5x the poly count of the TCN01 ones. That might account for some of the lag.


               
               

               
            

Legacy_OldTimeRadio

  • Hero Member
  • *****
  • Posts: 2307
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #114 on: May 07, 2014, 10:50:21 pm »


               

@Virusman - In the immediate future, like this year or 2015?  I agree with you.  But I think it's something they want to do if they can.  If KotOR can be ported to the iPad 2, it seems the technical hurdles are not as insurmountable as they might seem to bring NWN to mobile.  I do hate that the MMO might in any way slow down the possibility of Overhaul touching it, though.  '<img'>


 




Yes, some things are batched, but the majority of the calls are single vertices - you can see that in gDEBugger detailed view. Without headers, I can't easily tell what types of meshes are batched.




 


Not single vertices but single mesh nodes.  I know you probably know this but I don't want others to get confused.  Here's one of the ~200 calls on the Spelljammer.  I'm assuming the number 33699 after GL_TRIANGLES really means 33k verts and my guess is it's only around 11,000 actual triangles, though.  If you have an ATI card, AMD acquired the code for gDEBugger and continued the project as CodeXL.  I'm still getting familiar with it, but it's very similar to gDEBugger.


 


Out of curiosity, Virusman, I am assuming that the headers are something, like the symbols file, that either falls in your lap or is never accessible at all?  Could something like that be reverse-engineered from what I think is a sort of an older "debug" build of the engine?  Because the Bioware model viewer (can also download here if that link doesn't work) has a lot more that's...um...I guess "viewable", strings-wise at least, than anything else I know of.  For a time, I had access to IDA Pro 5.5 with HexRays and hoped to use the model viewer's weakness to my advantage but still couldn't learn all that much, sadly.


 


@OldMansBeard -


Screenshots: Good.  Thank you again for all  the observations.


 


Walklag: Yep, even on the v2.  Just did some testing because this was just too odd.  Turns out it's a function (at least on my end) of your choice of appearances.  809 is an invisible human male at 10% size.  For some reason this is what was triggering the ungodly lag.  Oddly, 298 (which I usually use for invisible) didn't have this problem nor did other creature appearances (I tried Formian, Basilisk, et al.)  Seems like it has less of an effect as I get closer to 100% size and no lag when it's close to, at or over 100% size.  This game is so much "What the hell?", sometimes.  I was thinking this might be some bizarre graphical thing but maybe the bad behavior is from my PC's movement or anims trying to play through that 10%-sized guy.


 


Diff between TNO and TCN: Maybe.  It makes sense between the two of them but I don't get the difference between TNO and Megaton, if it's based on the heaviness of geometry, anyway.



               
               

               
            

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #115 on: May 08, 2014, 07:12:50 am »


               

@OldMansBeard -
Screenshots: Good.  Thank you again for all  the observations.

 
Walklag: Yep, even on the v2.  Just did some testing because this was just too odd.  Turns out it's a function (at least on my end) of your choice of appearances.  809 is an invisible human male at 10% size.  For some reason this is what was triggering the ungodly lag.  Oddly, 298 (which I usually use for invisible) didn't have this problem nor did other creature appearances (I tried Formian, Basilisk, et al.)  Seems like it has less of an effect as I get closer to 100% size and no lag when it's close to, at or over 100% size.  This game is so much "What the hell?", sometimes.  I was thinking this might be some bizarre graphical thing but maybe the bad behavior is from my PC's movement or anims trying to play through that 10%-sized guy.




That's bizarre. Glad you bottomed it - I would never have thought of that.


It looks like your results generally are the same as mine - different numbers but same trend (or lack of it) - and that's reassuring. The machine I was doing the tests on is unavailable for the next few weeks and I'm reduced to using a puny notebook, so I can't do any other confirmations for a while but it looks like you are set to go with all this. '<img'>


 


(edit - added)


 


I now have access to a laptop with integrated graphics that runs NWN about as fast as my desktop. I've re-run the V2 tests on it and something interesting has come up:



Chunk    Camera    Camera    Camera
 Size      A         B         C
-----    ------    ------    ------
 1x1      184       153        36
 2x2      185       155        50
 4x4      181       139        61
 8x8      176        88        62
16x8      176        89        62

The first two columns (zoom-in/zoom-out) are no surprise but look at Column C (camera height 200.0, looking down on the whole area). Performance improves with consolidation !


 


So different hardware may give different results qualitatively.


 


My original tests were on a Q6600 (4 x 2.4GHz) desktop running Win 8.1 in 6GB of ram, with a GT610 graphics card (2GB DDR3).


 


These tests are on a Core2 Duo (2 x 2.4GHz) laptop running Win 7 in 4GB, with a Mobility Radeon HD4530 (512MB) card.



               
               

               
            

Legacy_Zarathustra217

  • Sr. Member
  • ****
  • Posts: 322
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #116 on: May 12, 2014, 09:57:55 am »


               

I figure one useful thing we can derive from these numbers is that it's perfectly fine to make 2x2 tilegroups as consolidated. At times this could reduce the work you have to do - as well as resulting in less polys when you don't have to split up faces that cross the inner tile edges.



               
               

               
            

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #117 on: May 13, 2014, 09:56:57 pm »


               

I've been doing some more tests, using the TNO01 house 1 2x2 group, This time, with the camera looking horizontally along a street of identical houses. Starting 5m from the southern edge looking north and stepping in 10m intervals up to 155m, at which point the camera is looking at the northern sky with no houses visible, Measuring the frame rate at each step. The difference in view between each step and the next, is the number of houses visible in the far distance.


 


What the numbers are telling me, is that consolidation up to 2x2 does improve frame rate for distant tiles (more than about 40m away) but not for tiles that are close to.the camera - on those it has no effect. If this is right, then performance in large areas with long straight streets can be improved by consolidation but in areas with short streets with buildings across the ends, it doesn't help.


 


This is quite surprising. I will do some more experimentation.


 


(edit -added) That's with fog turned off (fog distance set to 4500). With fog set to the default 45m, frame rate doesn't change until you get right to the end of the street because the number of houses in view is limited by fog anyway, and consolidation has no effect.



               
               

               
            

Legacy_virusman

  • Sr. Member
  • ****
  • Posts: 448
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #118 on: May 14, 2014, 08:23:26 pm »


               


Not single vertices but single mesh nodes.  I know you probably know this but I don't want others to get confused.  Here's one of the ~200 calls on the Spelljammer.  I'm assuming the number 33699 after GL_TRIANGLES really means 33k verts and my guess is it's only around 11,000 actual triangles, though.  If you have an ATI card, AMD acquired the code for gDEBugger and continued the project as CodeXL.  I'm still getting familiar with it, but it's very similar to gDEBugger.


 


Out of curiosity, Virusman, I am assuming that the headers are something, like the symbols file, that either falls in your lap or is never accessible at all?  Could something like that be reverse-engineered from what I think is a sort of an older "debug" build of the engine?  Because the Bioware model viewer (can also download here if that link doesn't work) has a lot more that's...um...I guess "viewable", strings-wise at least, than anything else I know of.  For a time, I had access to IDA Pro 5.5 with HexRays and hoped to use the model viewer's weakness to my advantage but still couldn't learn all that much, sadly.




I already have the symbols file (under non-distribution terms though), but that doesn't contain typeinfo (objects' memory layout) and other stuff, so it'd still take a lot of time to figure out how the engine works without headers. Mapping memory structures isn't easy.