Author Topic: Hashsets vs Lists  (Read 631 times)

Legacy_Kato -

  • Hero Member
  • *****
  • Posts: 747
  • Karma: +0/-0
Hashsets vs Lists
« on: September 01, 2012, 06:16:39 pm »


               Hi,

I'm wondering which is the most efficient between hashsets and lists, since they both seem pretty similar in terms of functionality. Up to now, I've been using Paul Speed's list system(pg_lists_i), yet I've seen the nwnx hashset plugin, not mentioning I could have missed an even better system out there. Almost all my custom systems use lists, except when a struct can do the job(when elements do not need to be retrieved by index or iterated over), so needless to say an efficient system would be more than welcome.

Thanks for any advice!


Kato 
               
               

               


                     Modifié par Kato_Yang, 01 septembre 2012 - 05:22 .
                     
                  


            

Legacy_Shadooow

  • Hero Member
  • *****
  • Posts: 7698
  • Karma: +0/-0
Hashsets vs Lists
« Reply #1 on: September 01, 2012, 07:09:29 pm »


               Im also usings lists, my own script as i didnt find this functionality on vault in the time when I was starting with nwn. Never considered switching to nwnx_hashset as thats different structure and if I understand it has different usage than list (or in other words i wont pretend I understood it at all ':lol:')

Also I considered the need for nwnx to this system work quite unsatisfactory as I prefer to be able to test as many things as possible just with F9.

Yet it is true that the "list" in nwn terms is really not very effective. All we can do in NWN is pseudo-list using local variables on object. Adding, retriveing and removing values is fast and easy but searching in the list for specific value is very unefficient because one has to loop over all values in list and compare them (this is where any nwnx based data structures has big advantage) - though rarely need.

I think I would be able to create a nwnx plugin for list data structure, yet it never seemed to me important enought to spend time for this - the pseudo-list Ive done in NWScript never disappointed me so far.

Anyway - could you post pspeed's pseudo list code? I would like to see how he managed to do it - compare with my approach and possible suggest improvement into pspeed's code or my own code if pspeed's would be better.
               
               

               


                     Modifié par ShaDoOoW, 01 septembre 2012 - 06:11 .
                     
                  


            

Legacy_Kato -

  • Hero Member
  • *****
  • Posts: 747
  • Karma: +0/-0
Hashsets vs Lists
« Reply #2 on: September 01, 2012, 07:26:10 pm »


               Of course, here it is. (First time I use pastebin, hopefully the link is ok)

pastebin.com/KzCSb3CF


Kato
               
               

               


                     Modifié par Kato_Yang, 01 septembre 2012 - 06:26 .
                     
                  


            

Legacy_Shadooow

  • Hero Member
  • *****
  • Posts: 7698
  • Karma: +0/-0
Hashsets vs Lists
« Reply #3 on: September 01, 2012, 07:35:34 pm »


               ok I see - my system should be more efficient as im using string local variable to store elements

it looks like this:

LIST_TEST / string / 1234567890abcde
LIST_TEST_1 / any / any
LIST_TEST_2 / any / any
...
LIST_TEST_e / any / any

this makes removing elements easy as they doesnt need to be re-ordered (which is what pspeed doing)

I will try to separate my list function from my base include (they uses few custom functions like GetStringLeftFrom etc. so i need to find it all and rewrite it) and post it later this day
               
               

               
            

Legacy_Kato -

  • Hero Member
  • *****
  • Posts: 747
  • Karma: +0/-0
Hashsets vs Lists
« Reply #4 on: September 01, 2012, 08:34:19 pm »


               TYVM ShaDoOoW. It seems I did not grasp the usage of the hashset data structure, yet at first sight it looks like it(the hashset plugin) stores local vars on objects and allows data retrieval by iteration or key (thanks for correcting me if I'm wrong)

Kato
               
               

               


                     Modifié par Kato_Yang, 01 septembre 2012 - 07:43 .
                     
                  


            

Legacy_Squatting Monk

  • Hero Member
  • *****
  • Posts: 776
  • Karma: +0/-0
Hashsets vs Lists
« Reply #5 on: September 01, 2012, 10:46:01 pm »


               Here's my list functions if either of you want. I use a modified version of MemeticAI's list functions, with some additions inspired by pspeed's. It allows you to remove items without screwing up the list order (like pspeed's) but also allows you to just move the last item to the deleted spot. The nice thing is you can do both with the same function. Also, it maintains separate lists for each data type.

http://pastebin.myrror.net/3239

EDIT: Fixed some documentation errors caused by copy-pasting.
               
               

               


                     Modifié par Squatting Monk, 01 septembre 2012 - 10:57 .
                     
                  


            

Legacy_Kato -

  • Hero Member
  • *****
  • Posts: 747
  • Karma: +0/-0
Hashsets vs Lists
« Reply #6 on: September 01, 2012, 11:23:01 pm »


               Nice system, Squatting Monk, thank you for sharing it '<img'>


Kato
               
               

               
            

Legacy_Shadooow

  • Hero Member
  • *****
  • Posts: 7698
  • Karma: +0/-0
Hashsets vs Lists
« Reply #7 on: September 02, 2012, 12:54:36 am »


               Ok managed to separate it from my base include, here is source.

this is usage:

#include "_sh_inc_list"

void main()
{
object oList = OBJECT_SELF;
//create list - this is actually not needed, the Add functions will maintain list if not exists
ListCreate(oList,"TEST");
//fill some values inside
ListAddInt(oList,"TEST",1);
ListAddString(oList,"TEST","something");
int n;
 for(;n < 5;n++)
 {
 ListAddInt(oList,"TEST",d100());
 }
//this will set value of 6 to the third item in list
ListSetInt(oList,"TEST",3,6);
//get total number of items/elements in list
int nNum = ListGetElementCount(oList,"TEST");
PrintInteger(nNum);
//this will get integer value of fourth item in list
int nTest = ListGetInt(oList,"TEST",4);
PrintInteger(nTest);
//this is how to find value in list
int nFind = ListFindInt(oList,"TEST",100);
PrintInteger(nFind);
//print all values in list
PrintString("------");
struct element item = ListGetFirstElement(oList,"TEST");
 while(ListGetIsElementValid(item))
 {
 PrintInteger(item.Int);
 PrintFloat(item.Float);
 PrintString(item.String);
 PrintObject(item.Object);
 PrintString("------");
 item = ListGetNextElement(oList,"TEST");
 }
//remove all items from list
 while(ListGetElementCount(oList,"TEST") > 0)
 {
 ListRemoveElement(oList,"TEST",1);
 nNum = ListGetElementCount(oList,"TEST");
 PrintInteger(nNum);//this will print 6, 5, 4, 3, 2, 1, 0
 }
}


The main advantage of my system and the really only difference is that i use extra local variable that stores each item as a character. This allows to easily add (with slight modification items/elements can be add also at the beginning of the list), remove and sort items in list without looping all previous items or re-numbering them.

Disadvantage is that the ammount of items per list is limited. Dont know whats max but its certainly not 255 because some chars cannot be written in nwscript - using name of some object might allow more though. As I said, so far didnt need more with the list functionality. For simple storage, (pseudo)array is better suited.

Possible improvement: I was considering to rewrite the system to create object like waypoint for each list, something like this:

#include "_sh_inc_list"

void main()
{
//create list - this is actually not needed, the Add functions will maintain list if not exists
object oList = ListCreate("TEST"); //CreateObject(OBJECT_TYPE_WAYPOINT,"list_template",GetStartingLocation(),FALSE,"LIST_TEST");

//remove all items from list
ListDestroy(oList); //DestroyObject(oList);
}

yet dont know if its worth it, seems to me as only typographical adjustion. May even slow down loading of starting location.
               
               

               
            

Legacy_Kato -

  • Hero Member
  • *****
  • Posts: 747
  • Karma: +0/-0
Hashsets vs Lists
« Reply #8 on: September 02, 2012, 01:23:06 am »


               TYVM ShaDoOoW. I'll make sure to also take a look at pseudo-arrays, who knows...


Kato
               
               

               
            

Legacy_henesua

  • Hero Member
  • *****
  • Posts: 6519
  • Karma: +0/-0
Hashsets vs Lists
« Reply #9 on: September 03, 2012, 05:38:23 am »


               Excuse my ignorance... what is a hashset?
               
               

               
            

Legacy_Lightfoot8

  • Hero Member
  • *****
  • Posts: 4797
  • Karma: +0/-0
Hashsets vs Lists
« Reply #10 on: September 03, 2012, 07:08:41 am »


               

henesua wrote...

Excuse my ignorance... what is a hashset?


I had to look it up myself.   It was introduced in C#,  Kind of explains whay I never heard of one.  My days of keeping up with computers ended before C# came out.

Here is the answer I came up with.


COPY AND PASTED.



1-a) A HashSet holds a set of objects, but in a way that it allows you to easily and quickly determine whether an object is already in the set or not. It does so by internally managing an array and storing the object using an index which is calculated from the hashcode of the object.

1-'B)' HashSet is an unordered collection containing unique elements. It has the standard collection operations Add, Remove, Contains, but since it uses a hash-based implementation, these operation are O(1). (As opposed to List for example, which is O(n) for Contains and Remove.) HashSet also provides standard set operations such as union, intersection, and symmetric difference.

2) There are different implementations of Sets. Some make insertion and lookup operations super fast by hashing elements. However that means that the order in which the elements were added is lost. Other implementations preserve the added order at the cost of slower running times.

The HashSet class in C# goes for the second approach, thus preserving the order of elements. It is still much faster than a regular List. Some basic benchmarks showed that HashSet is decently faster when dealing with primary types (int, double, bool, etc.). It is a lot faster when working with class objects. So that point is that HashSet is fast.

The only catch of HashSet is that there is no access by indices. To access elements you can either use an enumerator or use the built-in function to convert the HashSet into a List and iterate through that.Take a look here
               
               

               
            

Legacy_Kato -

  • Hero Member
  • *****
  • Posts: 747
  • Karma: +0/-0
Hashsets vs Lists
« Reply #11 on: September 03, 2012, 07:33:14 am »


               Nicely explained, L8. Considering that similar values associated to different keys create "collisions" involving some workarounds, I had to reject the data structure as a possible candidate for what I'm doing, sadly... And since list systems do not allow to retrieve data by key, I had to resort to storing local vars, using the name of the var as the "key". It does the job but then...


Kato
               
               

               
            

Legacy_meaglyn

  • Hero Member
  • *****
  • Posts: 1451
  • Karma: +0/-0
Hashsets vs Lists
« Reply #12 on: September 04, 2012, 02:05:05 pm »


               Fwiw, hashsets are not a c# construct. They pre-date c# by years. Basically, it's a set with its implementation specified. You could create a set and implement the actual data storage many
different ways. Hashsets just specifiy that it must be a single (unique) key hash table essentially.

Kato_Yang, what you are doing with key named local variables, for all intents and purposes
_is_ a hash set. And you must be doing something yourself to deal with collisions, no?

Also, similar values associated to different keys should never be a collision. That would be a bug.
Different keys are, well, different. Collisions only occur if two different values produce the _same_ key.
Collisions are always possible when converting more data into less (making a fixed length key
from a given value). Good choice of hash function can reduce but not remove that possibility.
(unless you just use the whole value as the key...).

Cheers,
Meaglyn
               
               

               
            

Legacy_Shadooow

  • Hero Member
  • *****
  • Posts: 7698
  • Karma: +0/-0
Hashsets vs Lists
« Reply #13 on: September 04, 2012, 04:43:19 pm »


               

Kato_Yang wrote...

Nicely explained, L8. Considering that similar values associated to different keys create "collisions" involving some workarounds, I had to reject the data structure as a possible candidate for what I'm doing, sadly... And since list systems do not allow to retrieve data by key, I had to resort to storing local vars, using the name of the var as the "key". It does the job but then...


Kato

Cant really say I understood this programming crap really, I read about this at wiki already and this has no clue what is this usefull for and where it is better than any other data structure. Could someone provide an example, preferably in NWScript with the use of the nwnx_hashset functions?':wizard:'
               
               

               
            

Legacy_Kato -

  • Hero Member
  • *****
  • Posts: 747
  • Karma: +0/-0
Hashsets vs Lists
« Reply #14 on: September 04, 2012, 07:52:44 pm »


               @Meaglyn: So you're saying that 2(or more) keys could have the same associated value? I guess I misunderstood the explanations on the wiki then.(Hash Table topic, workarounds proposed in the example associating person names(key) to their phone number(value))

@ShaDoOoW: I'd like to build a cache holding some damage-related data. The cache will be queried very often, most of the time in loops, so, for instance, querying 2das or a DB is dubious considering the required data access speed(on-hit events etc...). The cached data comes from iprp_damagecost.2da(CEP version), associating damage rank to its average. So I had thought of doing this, dunno if it makes sense:

// Meant to be called once, in module load event
void CacheDmg()
{
   int nLast = 158; // last index in the 2DA file
   int nNth, nNum, nDie, nToDivide;
   object oMod = GetModule();
   string sKey;
   HashSetCreate(oMod, "DMG", ++nLast);
   for(nNth; nNth < nLast; nNth++)
   {
      sKey = Get2DAString("iprp_damagecost", "Rank", nNth); // using the rank as the key
      nNum = StringToInt(Get2DAString("iprp_damagecost", "NumDice", nNth));
      nDie = StringToInt(Get2DAString("iprp_damagecost", "Die", nNth));
      if(!nNum) nToDivide = nDie*2;
      else nToDivide = nNum+nNum*nDie;
      HashSetSetLocalInt(oMod, "DMG", sKey, nToDivide); // floats cannot be stored and integers are fast
   }
}

Then, to retrieve, say, avg damage from 2da rank 8:
float fAvgDmg = HashSetGetLocalInt(GetModule(), "DMG", "8")/2.0; // fAvgDmg = 5.0

Or retrieve avg damage from 2da index 8: (assuming that the key order does not change)
float fAvgDmg = HashSetGetLocalInt(GetModule(), "DMG", HashSetGetNthKey(9))/2.0; // fAvgDmg = 4.5


Kato
               
               

               


                     Modifié par Kato_Yang, 04 septembre 2012 - 08:33 .