hashtable -> btree

noobie · Aug 19, 2005

hashtable -> btree

this is probably a bad idea but i am gonna throw it out anyway

one of the problems with runuo is it uses much ram. i thought some what could be done about it and came out with an idea.

it stores mobiles/items in hashtables which waste about 4/10 of the memory it uses. (allocates memory for 10, but only stores 6 : 7 is the maximum it can store however thats not something you can interfere. lets assume 6 is average)

so, if a server has 1millon items, it allocates memory for about 1.6million items where 600K is wasted.

if we use some kind of btree implemented with Array/Lists (arrays might work better), we could lower the "waste ratio". It should has 2^x number of branches where x might be 8 or 9 (8 could be applied for almost any server)
for 8 -> the number of branch=256
wasted memory is about 65K
max number of items is 16,5 millions

for 9->the number of branch=512
wasted memory is about 250K
max number of items is 130 millions

you can use different branch numbers for mobiles and items. (could be 7 for mobiles)

here are some advantages and disadvantages of the system:

advantages:
a) accessing a item/mobile from its serial will cause no efficiency problem. it could be even more faster in this system, if hashing an integer is more expensive than a few shifting. (this is why branch number is power of 2)

b)need less memory.

disadvantages:
a)this system will require a re-serialization system : first item/mobile in the server has to have the Serial value 0x...01

b)re-serialization system should be auto invoked by the system if any item/mobile is created with the max serial. (or you can schedule something for it)

c)converting existing system needs some work. implementing IEnumarable interface and index property probably will make easier to port the system.

any comment on the system? I am not saying it is perfect, obviously it has some issues about back compatibility.

a) it sucks

b) interesting :confused:

c) worth to try

d) excellent

Mark · Aug 19, 2005

Your idea is interesting, but you must remember that in UO items are being created and destroyed constantly. Using a b-tree would require many more smaller allocations and fragment the memory map, essentially doing more harm than good. We could implement some fancy caching of course, but I think the Hashtable handles allocation and fragmentation nicely as it is.

Remember, managed memory is a blessing and a curse all in one.

xir · Aug 20, 2005

noobie said:
it stores mobiles/items in hashtables which waste about 4/10 of the memory it uses. (allocates memory for 10, but only stores 6 : 7 is the maximum it can store however thats not something you can interfere. lets assume 6 is average)

so, if a server has 1millon items, it allocates memory for about 1.6million items where 600K is wasted.

Forgive me, I do not know the implementation details of .NET data structures. I was wondering where you got these numbers from? Normally a dynamic data structure like a hash table doesn't actually "allocate" room for the item until it has to be inserted. In some implementations the item isn't even allocated - it is just stores the reference to the item. To my knowledge the capacity of a hash table is dependant on the current load factor of the hash table. So say you are storing 6 items - the capacity is going to be something like 17 (that means 17 buckets are allocated but that doesn't mean there are 6 items and 11 of those items allocated does it?) The buckets in themselves take up space, but I'm not sure if the "container" actually allocates space up to the capacity of the item type that it contains - that would be silly. To my knowledge the size of the buckets take up space - hence you are trading space for speed.
Again I may be incorrect, so by all means correct me if my assumptions are wrong.

noobie said:
if we use some kind of btree implemented with Array/Lists (arrays might work better), we could lower the "waste ratio". It should has 2^x number of branches where x might be 8 or 9 (8 could be applied for almost any server)
for 8 -> the number of branch=256
wasted memory is about 65K
max number of items is 16,5 millions

for 9->the number of branch=512
wasted memory is about 250K
max number of items is 130 millions

you can use different branch numbers for mobiles and items. (could be 7 for mobiles)

Again, I'm not entirely sure where you are getting these numbers from. A binary tree to my knowledge isn't implemented with an array/list (maybe you are thinking of a heap?). It is something like a linked-list structure in which each node can contain a pointer or reference to it's 2 children. There is no waste factor involved (except maybe for the two references which take up 8 bytes in total per node.) The structure grows as you insert items, so unlike a hash-table you don't actually waste space by allocating empty buckets.

Mark said:
advantages:
a) accessing a item/mobile from its serial will cause no efficiency problem. it could be even more faster in this system, if hashing an integer is more expensive than a few shifting. (this is why branch number is power of 2)
[\QUOTE]

Remember, the data structure that is to be used is using the serial id to sort. Since serials are assigned on an incremental basis (I assume in RunUO) they are also saved and loaded from lower to higher (to the data structure.) Take your standard B-Tree implementation and add numbers 1-10 in that order. What do you get? You get nothing more than a lobsided tree which in efficiency is comparable to a linked list (plus you are using extra space with the extra reference assuming a unidirectional linked list structure.)

To insert, retrieve and delete from a hash-table is constant time O(1)
With a perfectly balanced binary tree (which you aren't going to get due to the nature of the data) the best you are going to get is O(logn) but it is probably going to be more close to O(n) (lob-sided binary tree is just a linked-list.)

You could probably use other balanced tree structures but with a performance in speed. I would say a hashtable would be decent for serials.

Also, if you have access to the implementation of the hash-table you can improve it's performance by not performing the hash at all. Since serials are unique you don't need to hash (collisions are impossible also). Since you have two hash tables (items/mobiles) I don't think this would work well unless you allocate a max load on each hash table and perform a modulus on the second hash table (I think.)

Mark said:

Your idea is interesting, but you must remember that in UO items are being created and destroyed constantly. Using a b-tree would require many more smaller allocations and fragment the memory map, essentially doing more harm than good. We could implement some fancy caching of course, but I think the Hashtable handles allocation and fragmentation nicely as it is.

Remember, managed memory is a blessing and a curse all in one.

Click to expand...

Back to my previous question, are you sure the hash-table actually allocates memory in reserve (up to it's capacity) for the item it contains (and not just the buckets)? Also fragmentation shouldn't really be an issue as the .NET memory managment should take care of the de/allocation of memory.

Ravatar · Aug 20, 2005

This is no longer an issue if they use the Dictionary class in .NET 2.0

xir · Aug 21, 2005

Ravatar said:
This is no longer an issue if they use the Dictionary class in .NET 2.0

I would like you to clarify what "this" and "issue" refer to? Are you refering to the implementation of the Hash table in .NET 1.0? If so what issue was it you were refering to that was fixed for .NET 2.0?
- Thanks

noobie · Aug 21, 2005

well, actually this was just some idea and i am not really insistent about it.

from the design part, it might not be practical and cause some problems.

but still, i am gonna answer a few question.

@xir:

-here is an article about hashtables

-it is not binary tree, it said btree but actually what was in mind neither was a btree. but the actual concept was the same with btree (it needs different implementation). a binary tree has only two children node, a btree could have n chidren where n is up to you. but for the tree which i had my mind, only leaf arrays would have objects.
ex) branch number = 3
if the height is 4
root->node->node->node->node
wasted nodes: 1 (root) + 3 (height=1) + 9 (h=2) + 27 (h=3) = 40
the leaf nodes will have the objects and will have the lenght of 81.

-in such an algorithm, it seems operation are log(n) based but it is not actually. n is a fixed number (max number of items) and the operations needed will be a fixed number (lets say 10 basic instruction)

-in a hashtable, it uses a hash method to hash the given key. after finding the hash value, it finds the index in the hashtable. it is considered as O(n) since it needs a fixed number of operation. i dont know really .net default hashing algorithms, so i cant say really how much expensive it is.

-for the fragmentation part, actually it does matter altough .net handles it itself. but here, unless you free the chunks you allocated, GC wont be invoked. (and you wont free the chunks)

@ravatar:
Dictionary class is the strongly typed version of hashtable. So, they will waste the same space too. But accessing objects will need nomore explicit casting.

Overall, i prefer the existing system since it could be more flexible time to time.

xir · Aug 21, 2005

noobie said:
-here is an article about hashtables

Thanks for the article. It seems to use a double hashing hash table that is implemented.

noobie said:
-it is not binary tree, it said btree but actually what was in mind neither was a btree. but the actual concept was the same with btree (it needs different implementation). a binary tree has only two children node, a btree could have n chidren where n is up to you. but for the tree which i had my mind, only leaf arrays would have objects.

I think you are talking about a k-ary tree where k is the max number of children per node. In such a structure you still have to search the elements in each node to choose which path to take, in which case search per node is O(k) (if searching linearly) or O(logk) if you are using binary search or each node is organised with a BST. Best performance you are going to get with it is O( n/k logk ) (I think) which still isn't better than your hash table access time of O(1).

noobie said:
-in a hashtable, it uses a hash method to hash the given key. after finding the hash value, it finds the index in the hashtable. it is considered as O(n) since it needs a fixed number of operation. i dont know really .net default hashing algorithms, so i cant say really how much expensive it is.

A hash table's access time is dependant on the hashing algorithm and the collision scheme but in generally the hash will give a good distribution over the table so it will give a close to constant time access O(1) (more if there is a collision but that is a rare case). O(n) assumes a linear access time.

noobie said:
-for the fragmentation part, actually it does matter altough .net handles it itself. but here, unless you free the chunks you allocated, GC wont be invoked. (and you wont free the chunks)

From my knowledge .NET does internal referencing counting. When the garbage collection is done, anything not referenced (unless you can tag it) is deleted. It also manages fragmentation itself. Unless you actually "allocate" from unsafe code (using new) I don't see why you would need to manage memory at all. You could probably use memory pools but .NET probably already has that implemented quite well. To be honest I don't see the point of using a high level runtime enviroment if memory is such an issue.

noobie · Aug 21, 2005

ok, this is just an example:
lets say you have ten items (serials, 00,01,..09), an array lenght of 10.

if you need to accces 08, you dont search for it; myItems[8] would do it. there is no searching in this system. but you need to calculate in which array and which index the item is located. And it needs fixed number of instructions. So its not O(n), its O(k) which means O(1) literally.

About the fragmentation part, yes you dont have to handle but GC uses CPU clearly right? so basicly a bad implementation could cause lots of CPU usage for GC, which is actually your fault.

ASayre · Aug 21, 2005

noobie said:
ok, this is just an example:
lets say you have ten items (serials, 00,01,..09), an array lenght of 10.

if you need to accces 08, you dont search for it; myItems[8] would do it. there is no searching in this system. but you need to calculate in which array and which index the item is located. And it needs fixed number of instructions. So its not O(n), its O(k) which means O(1) literally.

About the fragmentation part, yes you dont have to handle but GC uses CPU clearly right? so basicly a bad implementation could cause lots of CPU usage for GC, which is actually your fault.

That would be similar to what a hashtable does, hence the reason Hashtables are O(1). A btree still needs to do a (albiet small compared to an O(n) algorithim) little searching.

noobie · Aug 21, 2005

root array length : 3
for the root array, for each index there is also an array (length 3)

what would be the place for 08?

root[2][1] is the answer. there is no searching.

but again, I reconsidered the sytem, it sucks

end of discussion

Ravatar · Aug 22, 2005

xir said:
I would like you to clarify what "this" and "issue" refer to? Are you refering to the implementation of the Hash table in .NET 1.0? If so what issue was it you were refering to that was fixed for .NET 2.0?
- Thanks

I'm referring to the lookup/storage method of Hashtables in .NET 2.0, and how the new method compares to the old one, as explained below.

noobie said:
Dictionary class is the strongly typed version of hashtable. So, they will waste the same space too. But accessing objects will need nomore explicit casting.

This is untrue, as you can see here.

Zippy · Aug 22, 2005

Also remember that even though hashtables waste this much space, this is not wasted items... these are just wasted object pointers... which depending on the exact implimentation are probably only about 4 bytes each.... Thus your 600K wasted items is only about 2.4mb of wasted memory... a LOT less than would be used for 600K items.

noobie · Aug 22, 2005

@Zippy: hehe actually those were pointers too..

@Ravatar: hmzz, that article is really interesting. it looks like runuo's performance will be improved a lot by .net 2.0

Phenos · Sep 23, 2005

noobie said:
advantages:
[...]
b)need less memory.

Actually I think the opposite is true.

Like you said if a hashtable has a load factor of 0.72 we have a minimum space overhead of 28% (of the size of the hashtable of course, which just stores references to the items).

But what about the space overhead of a self-balancing binary tree?
I don't know what particular implementation you are referring to, but most implementations that I know of require at least 3 references for each node (the parent node and the two child nodes).
So for each reference to an item there would be at least 3 references needed by the structure, which results in a space overhead of 75%.

In fact I think that binary trees might be preferred over hashtables mainly for their worst-case time complexity of the operations (which is O(log(N)) instead of O(N) for the hashtables).
On the other hand hashtables are usually preferred for their avarage time complexity (which is O(1), i.e. doesn't depend on the size of the table) and for the greatly reduced space overhead.

Magnuss · Mar 29, 2007

hashtable

Sorry but i have to ask something----- If Something works very well why screw with it?? my guess would be leave it the crap alone.......
But that might just be me

mordero · Mar 29, 2007

please look at the date of the last post...

noobie · Mar 29, 2007

this is the dumbest idea I have ever seen

whoever said it, I think he might be drunk or something lol

Courageous · Apr 1, 2007

While really Mark and Zippy's opinion require no further elaboration, they are quite right. Any attempt to "save" memory by using a BTree or other log*N tree structure is not the right approach for this class of optimization problems. The hash table is in fact the "best" data structure to use here. No question.

Zippy's observation that the hashtable "waste" really falls down to a few megabytes of object references at best is also correct. Hash tables are really the best solution in their class for insertion, deletion, and lookup. FYI.

C//

hashtable -> btree

noobie

Wanderer

Mark

Knight

xir

Wanderer

Ravatar

Knight

xir

Wanderer

noobie

Wanderer

xir

Wanderer

noobie

Wanderer

ASayre

RunUO Developer

noobie

Wanderer

Ravatar

Knight

Zippy

Razor Creator

noobie

Wanderer

Phenos

Wanderer

Magnuss

Sorceror

mordero

Knight

noobie

Wanderer

Courageous

Wanderer