Author Topic: INFO: Draw calls not geometry as a bottleneck in NWN.  (Read 2873 times)

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #45 on: April 26, 2014, 06:24:33 pm »


               

Okay, but that will have to wait while I install 3dsMax. I've been just doing stuff in notepad today.


 


Something I might try is using more complex tiles - perhaps streets of identical houses from tcn - and going up in chunk size 1x1, 2x2, 4x4, 8x8, 16x16 in a 16x16 area.


               
               

               
            

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #46 on: April 26, 2014, 08:01:32 pm »


               

I've taken the Barracks 2x2 group out of tcn and filled a 16x16 area with it.


 


Comparing:


(1) With the four individual tiles in the group, just with shadows & tilefade turned off, meshes compacted by bitmap and compiled.


(2) After moving all the trimeshes onto one of the tiles & compacting it by bitmap again, leaving the other three tiles with just walkmesh & lights.


 


The number of trimesh nodes on the four tiles was 72 in case (1), reducing to 22 in case (2). In other words, the group as a whole uses 22 distinct bitmaps.


 


Results: no difference. 112/180 fps in both cases.


 


So 2x2 chunking a moderately complex tile group is neutral with regards to performance.



               
               

               
            

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #47 on: April 26, 2014, 08:38:53 pm »


               

A thought - suppose that what matters is the combined polycount of all the distinct meshes that intersect the field of view of the camera.


 


Up to a point, chunking would make no difference; the same polys would be rendered regardless of which meshes they are part of. But if you make the chunks too big, the engine will be pipelining polys that are well out of the field of view, just because they are part of the same mesh that is partly in view. If there is an poly in view that is rendered with a particular bitmap, all the polys with that bitmap in the whole area would get rendered, even the ones that are completely out of view. So it would get worse.


 


With the Barracks group, the field of view looking straight down at widest zoom is about one group. If the theory is right, I would expect to see progressive degradation as I successively double the chunk sizes of the barracks tiles.



               
               

               
            

Legacy_MerricksDad

  • Hero Member
  • *****
  • Posts: 2105
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #48 on: April 26, 2014, 08:44:57 pm »


               

I'm still going with the fact that there is a huge fundamental difference between how the aurora engine renders stuff and how other engines, including the toolset, render stuff. But I totally agree with all the testing you guys are doing.



               
               

               
            

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #49 on: April 27, 2014, 04:20:22 pm »


               


I've taken the Barracks 2x2 group out of tcn and filled a 16x16 area with it.


 


Comparing:


(1) With the four individual tiles in the group, just with shadows & tilefade turned off, meshes compacted by bitmap and compiled.


(2) After moving all the trimeshes onto one of the tiles & compacting it by bitmap again, leaving the other three tiles with just walkmesh & lights.


 


The number of trimesh nodes on the four tiles was 72 in case (1), reducing to 22 in case (2). In other words, the group as a whole uses 22 distinct bitmaps.


 


Results: no difference. 112/180 fps in both cases.


 


So 2x2 chunking a moderately complex tile group is neutral with regards to performance.




 


Next measurement: 4x4 chunks 101/180 fps. Performance is starting to drop off.


               
               

               
            

Legacy_OldTimeRadio

  • Hero Member
  • *****
  • Posts: 2307
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #50 on: April 27, 2014, 05:41:08 pm »


               

From looking over your methodology, I can't find anything which jumps out as flawed, but...


 


Since I don't have much time and I want to make sure I'm understanding your observations, consider the following:


 


A. A normal 10x10 area where each tile is composed of one flat plane (1,000cmx1,000cm) and one simple cube.  And, of course, a walkmesh.  "Cube" in this sense would be 2 meter squared and above the flat plane (the ground) some distance.


 


B. A 10x10 group where 9 tiles contain only walkmesh and one tile has a single uncut plane of 10,000cm x 10,000cm parented to it and also 10 cubes, which have been Attached in Max to be 1 mesh and which are also parented to it.


 


Based on your observations so far, which of those is preferable for optimal performance?


 


Are you using gDEBugger to look at this?  Any consolidated meshes should "pop" into view completely for each draw call as confirmation of consolidation.



               
               

               
            

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #51 on: April 27, 2014, 05:53:23 pm »


               


From looking over your methodology, I can't find anything which jumps out as flawed, but...


 


Since I don't have much time and I want to make sure I'm understanding your observations, consider the following:


 


A. A normal 10x10 area where each tile is composed of one flat plane (1,000cmx1,000cm) and one simple cube.  And, of course, a walkmesh.  "Cube" in this sense would be 2 meter squared and above the flat plane (the ground) some distance.


 


B. A 10x10 group where 9 tiles contain only walkmesh and one tile has a single uncut plane of 10,000cm x 10,000cm parented to it and also 10 cubes, which have been Attached in Max to be 1 mesh and which are also parented to it.


 


Based on your observations so far, which of those is preferable for optimal performance?


 


Are you using gDEBugger to look at this?  Any consolidated meshes should "pop" into view completely for each draw call as confirmation of consolidation.




I guess you  mean 100 cubes in B, a 10x10 array of cubes, one for each original tile.


Based on the limited cases I've tested so far, I would expect your case B to be marginally worse than A.


 


I'm going to do the 8x8 barracks next, then finally the whole 16x16. It's a bit slow because I'm checking the models in notepad as I go, as well as visual inspection, to make sure he meshes are what they are supposed to be I'm not overlooking anything.


               
               

               
            

Legacy_rjshae

  • Hero Member
  • *****
  • Posts: 553
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #52 on: April 27, 2014, 06:23:20 pm »


               

I agree with the OP. My understanding is that it is better to use a single merged texture file, or as few as possible, in order to reduce draw calls. Just by looking at the texture files used in NWN2 placeables, you can tell the NWN2 developers took a lot of effort to do this. It's inconvenient in terms of wrapping the UV map, but I'm trying to make more of an effort to follow their lead.


 


My $.02 worth.



               
               

               
            

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #53 on: April 27, 2014, 07:04:11 pm »


               

Next datum point: 8x8 chunked Barracks: 69/160 fps. Definitely degrading. To summarise:



1x1 = 112/180 fps
2x2 = 112/180 fps
4x4 = 101/180 fps
8x8 =  69/160 fps

The 8x8 chunk is a 2.5 Mbyte model when compiled, with about 25k polys total. The largest single trimesh in it, is 5,664 polys bitmapped with tcn_stone20. In NWN1 terms, this is a heavy model and even though there are only four models in the whole area, since typically only one of them is in camera shot at a time, it's the weight of that model that dominates performance.


 


On the basis of these tests, I'm concluding that in NWN1, chunking tile groups to consolidate meshes across bitmaps (to save draw calls) doesn't improve performance because you end up with heavier models with too many polys, which pulls the performance down by more than you gain.


 


This is based on observing frame-rate in-game with FRAPS.



               
               

               
            

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #54 on: April 28, 2014, 10:15:06 am »


               

I can't do the 16x16 test. A bit disappointing, but the model exceeds the theoretical capacity of the game engine. It goes like this:


  1. After consolidating the four tiles in the Barracks2x2 group, the largest trimesh in the model is the one bitmapped with tcn01_stone20 and that has 354 faces (81+33 from tcn01_s19_01, 72+22 from tcn01_s20_01, 32+14 from tcn01_t19_01 and 82+18 from tcn01_t20_01).

  2.    
  3. It takes 64 such groups to fill a 16x16 area, so after making a single model for the whole area and consolidating trimeshes by bitmap, we get a single trimesh bitmapped with tcn01_stone20, with 64x354 = 22656 faces.

  4.    
  5. When a model is compiled or loaded, the tverts in each trimesh are exploded so that each face gets its own 3 tverts. So we would have a trimesh with 3x22656 =  67968 tverts.

  6.    
  7. In the game engine, tverts are indexed by 16-bit unsigned integers, so you can't have more that 65536 of them in a single trimesh.

  8.    
  9. 67968 > 65536.

  10.    
  11. Bang! '<img'>


               
               

               
            

Legacy_Carcerian

  • Hero Member
  • *****
  • Posts: 1655
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #55 on: April 28, 2014, 10:32:05 am »


               


 


I can't do the 16x16 test. A bit disappointing, but the model exceeds the theoretical capacity of the game engine. It goes like this:


  1. After consolidating the four tiles in the Barracks2x2 group, the largest trimesh in the model is the one bitmapped with tcn01_stone20 and that has 354 faces (81+33 from tcn01_s19_01, 72+22 from tcn01_s20_01, 32+14 from tcn01_t19_01 and 82+18 from tcn01_t20_01).

  2.    
  3. It takes 64 such groups to fill a 16x16 area, so after making a single model for the whole area and consolidating trimeshes by bitmap, we get a single trimesh bitmapped with tcn01_stone20, with 64x354 = 22656 faces.

  4.    
  5. When a model is compiled or loaded, the tverts in each trimesh are exploded so that each face gets its own 3 tverts. So we would have a trimesh with 3x22656 =  67968 tverts.

  6.    
  7. In the game engine, tverts are indexed by 16-bit unsigned integers, so you can't have more that 65536 of them in a single trimesh.

  8.    
  9. 67968 > 65536.

  10.    
  11. Bang! '<img'>

 




 


world-exploding-o.gif


 


Doh!



               
               

               
            

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #56 on: April 28, 2014, 05:53:58 pm »


               

Just a rider on that 16x16 attempt:


 


The ascii model is about 4.6MB


  • NWNExplorer displays it correctly

  • The BW compiler can't compile it and drops out reporting a "Major Error" in the model.

  • The toolset crashes if it tries to load it

  • The game itself crashes if it tries to load it

nwnmdlcomp can compile it, without exploding the tverts


The resulting (nwnmdlcomp) compiled model is about 9.6MB


  • It displays correctly in NWNExplorer

  • The toolset crashes it it tries to load it

  • The game itself crashes if it tries to load it

Moral: don't consolidate meshes beyond 21845 faces.


I think I might build that into CM3.


               
               

               
            

Legacy_Zarathustra217

  • Sr. Member
  • ****
  • Posts: 322
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #57 on: April 28, 2014, 08:16:57 pm »


               

I believe the number of drawcalls tax the CPU rather than the GPU, meaning you would only notice the difference in situations where the CPU is the bottleneck.


 


I'm honestly a bit sceptical that it's the actual drawcalls that cause performance issues - you'd need a quite large amount for that - but it could perhaps be related to some other function/feature that's called in relation to rendering each sub-mesh (and thus in conjunction with each drawcall),


 


An interesting test could be to measure when the amount of meshes started to become an issue. Ideally, such a test would also be run in a way that it frequently change texture between the meshes to give something more comparable to actual everyday use - but setting that up may be a bit time consuming.



               
               

               
            

Legacy_Bannor Bloodfist

  • Hero Member
  • *****
  • Posts: 1578
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #58 on: April 28, 2014, 09:49:22 pm »


               


 


I can't do the 16x16 test. A bit disappointing, but the model exceeds the theoretical capacity of the game engine. It goes like this:


  1. After consolidating the four tiles in the Barracks2x2 group, the largest trimesh in the model is the one bitmapped with tcn01_stone20 and that has 354 faces (81+33 from tcn01_s19_01, 72+22 from tcn01_s20_01, 32+14 from tcn01_t19_01 and 82+18 from tcn01_t20_01).

  2.    
  3. It takes 64 such groups to fill a 16x16 area, so after making a single model for the whole area and consolidating trimeshes by bitmap, we get a single trimesh bitmapped with tcn01_stone20, with 64x354 = 22656 faces.

  4.    
  5. When a model is compiled or loaded, the tverts in each trimesh are exploded so that each face gets its own 3 tverts. So we would have a trimesh with 3x22656 =  67968 tverts.

  6.    
  7. In the game engine, tverts are indexed by 16-bit unsigned integers, so you can't have more that 65536 of them in a single trimesh.

  8.    
  9. 67968 > 65536.

  10.    
  11. Bang! '<img'>

 




I heard that bang all the way over here, across that little thing they call a pond...


 


NWN is great, for it's day, but there were all sorts of limitations, and even though we have pushed those limits far beyond what BIoware originally thought possible, we are still working with an ancient (in computer parlance) engine.  One that truly does not utilize the power of our computer systems. 


 


This whole experiment was an interesting thing to investigate, but we still end up with the results we started with.  An engine that can not handle high poly objects of the sort that newer games can handle, an Engine that does not use the possible texture abilities provided by all of that wonderful new hardware that we all currently use etc.  However, having said all of that, I still believe that NWN still provides more to a potential world builder, much more power than any other game out there even today.  No other engine allows a single person to build with such speed and power that NWN allows. 


 


Yeah, we are a bit limited in some of the graphics capabilities that some of the new games offer, as in things like Skyrim, however, I think more people have built useful and interesting worlds with NWN than anything I have actually played or seen with Skyrim.  Given that the graphics are a bit dated, and the landscape building options are limited due to the tile based system, we still have MUCH more ability with NWN than the other games I have investigated.  Granting of course, that I have NOT tried everything, nor will I, unless someone points me to something that will actually blow my socks off.


 


From what playing around I have done in Skyrim and it's more powerful graphical engine, I have found it to be MUCH more difficult to actually get working models into the game, and have found no real way to build terrain systems for it similar to the way I have built tilesets for NWN.  I am not saying it is not possible, just that it is extremely time and brain power consuming process, and the tools required to do even simple things make it nearly worthless to me personally.  Why would anyone wish to have to use 5 or 6 different tools just to get ONE completely NEW item into the game is not something that I can fathom.


 


Thank you for your efforts OMB, and I am sure you enjoyed trying to achieve it as well.  Since I know how much you love attacking mathematical problems looking for better solutions '<img'>


 


P.S.  If anyone has a game or even an engine that they think might surprise me or be something that I would enjoy working with, PLEASE let me know!


               
               

               
            

Legacy_OldMansBeard

  • Full Member
  • ***
  • Posts: 245
  • Karma: +0/-0
INFO: Draw calls not geometry as a bottleneck in NWN.
« Reply #59 on: April 29, 2014, 01:24:27 pm »


               

I've been able to do 16x8 without breaking the engine but that's the limit. No real surprise, the downward trend continued.



 1x1 = 112/180 fps
 2x2 = 112/180 fps
 4x4 = 101/180 fps
 8x8 =  69/160 fps
16x8 =  64/120 fps


It's all rather disappointing, really. I was quite looking forward to writing some wizard software for automagically baking areas into single models. But, there we are. As an aged physicist, I'm used to finding that real measurements don't support the theories that one would like to be true.


 


*OMB goes back to sleep for 100 years*