Go Back   RunUO - Ultima Online Emulation > Developer's Corner > Programming > C#

C# C# Discussion

Reply
 
Thread Tools Display Modes
Old 10-31-2006, 11:55 AM   #26 (permalink)
Forum Expert
 
Join Date: Aug 2004
Location: Redmond, WA
Age: 21
Posts: 1,288
Send a message via AIM to Sep102 Send a message via MSN to Sep102
Default

Well, it's not necessarily an inadequacy, it could very well be a design decision. Say there's one intern table shared by everything running under the CLR, now you need some way to clean up only the strings that were interned by the application that is currently being unloaded.

The application could have a list of indices into the table that it iterates through and decrements a reference count in the intern table for each of those entries and deallocate any that have no more references.

Of course, thinking about it, we could also extend this to, say threads. Why shouldn't the intern table be cleaned after a thread exits and all of it's interned strings removed?

That's all just conjecture though, there's probably a good reason why it's done this way that I just don't know about, probably performance though. It might be worth a look to see if the .NET Compact Framework handles interned strings differently to see if it was a conscious design decision for the general .NET Framework to have semantics that may waste some memory but will end up being quicker.
Sep102 is offline   Reply With Quote
Old 10-31-2006, 01:26 PM   #27 (permalink)
Forum Expert
 
Courageous's Avatar
 
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
Default

That's one of the things I don't understand. In the large, string internalization compacts memory, because instead of using the number of bytes in the string each time it is instanced, you get a pointer to a lexigraphically identical string.

I guess this is as obviously compact on 64 bit systems as it is on 32 bit ones, but be that as it may...

C//

p.s. have implemented string internalization tables myself, for special projects.
Courageous is offline   Reply With Quote
Old 10-31-2006, 06:50 PM   #28 (permalink)
Forum Expert
 
Join Date: Aug 2004
Location: Redmond, WA
Age: 21
Posts: 1,288
Send a message via AIM to Sep102 Send a message via MSN to Sep102
Default

Quote:
Originally Posted by Courageous
That's one of the things I don't understand. In the large, string internalization compacts memory, because instead of using the number of bytes in the string each time it is instanced, you get a pointer to a lexigraphically identical string.

I guess this is as obviously compact on 64 bit systems as it is on 32 bit ones, but be that as it may...

C//

p.s. have implemented string internalization tables myself, for special projects.
As interning pertains to commonly used strings, I don't disagree with you at all Courageous, I agree completely, interning can produce a great increase in the efficiency of both memory usage and certain computations involving the instances.

What the main problem stems from is either the automatic interning of strings or the reckless interning of strings (essentially one in the same). This is where string interning fails, since there may very well be thousands of strings in the intern table that have only been used once and will never be used again, yet they still are present (unless of course some kind of caching and weak reference system was used, but that's another topic) until at least the end of the application. And those could just be the strings that were blatantly created in the program, also stored in the table would be all strings resulting from operations on a string (one such example I could think of was profiling operations, such as ("Running time: " + msRan + " ms"), now, likely, every time would represent a different entry in the table).

Also, although string internalization can greatly help lower the memory usage of your application, even judicious usage could potentially hamper you. For example, say that my application interns 100 kilobytes worth of strings that it uses commonly throughout the program. I now have an extra 100 kilobytes of memory added on to my working set at all times. Also, consider if at any one time I was only using five percent of the strings in the table. Normally only 5 kilobytes would need to be present in memory at a time, since the garbage collector would have cleaned up the rest when the program was not using them and reallocated them when they were needed again, now all of the data must constantly reside in memory for at least the life of the application. Essentially, my concerns boil down to the fact that doing this seems to tread on the ground that the GC is best at walking and saying that you will be doing a better job than it, at least in this instance (of which I have no problem with, just so long as it's done carefully and is backed up with statistics confirming the reasoning used, and, of course, this is really only true if this is done solely in the name of memory conservation and efficiency and not, for instance, for string comparison efficiency gains or some other advantage interning offers).


Given all this, I've reached the same conclusion I had before, all things in moderation, lest they be misused through unintentional overuse. String interning suffers from the same caveats as any other feature and, just like everything else, needs to be used judiciously and have logic backing up the reasons for its use.

Last edited by Sep102; 10-31-2006 at 08:04 PM.
Sep102 is offline   Reply With Quote
Old 10-31-2006, 06:55 PM   #29 (permalink)
Forum Expert
 
Join Date: Aug 2004
Location: Redmond, WA
Age: 21
Posts: 1,288
Send a message via AIM to Sep102 Send a message via MSN to Sep102
Default

Also, if you, like me, happen to be interested in tidbits about the performance of various parts of the .NET Framework, I highly suggest reading the weblogs of the Microsoft .NET performance team. Rico Mariani's, for example has a lot of interesting things in the articles (it's also the only one I currently have as an RSS feed on my main screen, which is why it gets a link).
Sep102 is offline   Reply With Quote
Old 10-31-2006, 11:05 PM   #30 (permalink)
Forum Expert
 
Courageous's Avatar
 
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
Wink

What the main problem stems from is either the automatic interning of strings or the reckless interning of strings (essentially one in the same).

A fair comment. Certain patterns of use, however, yield good information to the interpreter about the probable use of a string. One might argue, for example, that any string that goes into a dictionary, be it as a key, or as a value regardless, should be interned as a matter of course. And so forth.

C//
Courageous is offline   Reply With Quote
Old 10-31-2006, 11:34 PM   #31 (permalink)
Forum Expert
 
Join Date: Aug 2004
Location: Redmond, WA
Age: 21
Posts: 1,288
Send a message via AIM to Sep102 Send a message via MSN to Sep102
Default

Quote:
Originally Posted by Courageous
What the main problem stems from is either the automatic interning of strings or the reckless interning of strings (essentially one in the same).

A fair comment. Certain patterns of use, however, yield good information to the interpreter about the probable use of a string. One might argue, for example, that any string that goes into a dictionary, be it as a key, or as a value regardless, should be interned as a matter of course. And so forth.

C//
Heh, thought I covered that in my "judicious use" statement or "commonly used strings". That is, I agree with you for the same reason I did before. A good argument could very well be made as to why that is a valid use of interning, as there is a good chance the string will be used and a decent chance that it will be used often, thus a good use of string interning to cut down on both memory usage and time spent doing comparisons.

Like I, thought, I guess, I said was that usage of interning, just like pretty much everything else needs to be determined on a case by case basis and can't really just be applied as a general rule for everything.
Sep102 is offline   Reply With Quote
Old 11-01-2006, 04:11 PM   #32 (permalink)
Forum Expert
 
Courageous's Avatar
 
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
Default

Well I was just saying that for a dictionary, the runtime itself could make the decision to intern. And a variety of other similar cases.

C//
Courageous is offline   Reply With Quote
Old 11-01-2006, 05:45 PM   #33 (permalink)
Forum Expert
 
Join Date: Jul 2005
Location: Istanbul/Turkey
Age: 27
Posts: 425
Default

I dont see a point why it should be interned if it is used in a dictionary. If it is a key, probably there wont be many of it. and for the comparison part, it uses a hash method and doesnt compare the exact string with anything..

It is not good to have a default internalization behaviour. It is best that it would be up to you..
__________________
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
noobie is offline   Reply With Quote
Old 11-01-2006, 11:21 PM   #34 (permalink)
Forum Expert
 
Courageous's Avatar
 
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
Default

People put thing into dictionaries to look thing up. Since this is the essential purpose of a dictionary, the fact of entering something into it is a major clue to its use. Items entered as keys into dictionaries have to be matched on a dictionary lookup (as a final part of the hashing cycle one must test for definite match). Internalization makes matching a string one or two instructions only. I should hope you should see the probabilistic nature of this benefit. I.e., "quite likely to be beneficial... to the point of being certain."

BTW, if you don't like the fuzzy nature of "quite likely to be beneficial," you should step out of the use of C# dictionaries altogether, and proceed to something like a red-black tree, as the fundamental algorithm that the dictionary is implemented with (typically, and specifically in C#) is itself one that is statistical and not at all certain. FYI.

C//

Last edited by Courageous; 11-01-2006 at 11:33 PM.
Courageous is offline   Reply With Quote
Old 11-02-2006, 03:20 AM   #35 (permalink)
Forum Expert
 
Join Date: Jul 2005
Location: Istanbul/Turkey
Age: 27
Posts: 425
Default

You might have millions of items in a hashtable, but that doesnt mean necessarily you will look for every one of them. And actually it would be pointless to use interned strings to speed up the program in such a case because speed increase is neglectable.

btw, why do you think dictionaries/hashtables are implemented as RBT? Because they are not..
__________________
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
noobie is offline   Reply With Quote
Old 11-02-2006, 04:23 AM   #36 (permalink)
Forum Expert
 
Courageous's Avatar
 
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
Default

noobie,

Several of your posts have illustrated that when you reply to my posts, what you need to do is read them more carefully first.

For example, I never said that dictionaries/hashtables are implemented as RBT, that's your invention.

Your prior remark that it does not compare the exact string is mistaken, unless you happen to believe that C# hashtables use heavy hashing, like MD5, without a match test, which I highly doubt.


C//

Last edited by Courageous; 11-02-2006 at 04:54 AM.
Courageous is offline   Reply With Quote
Old 11-02-2006, 05:19 AM   #37 (permalink)
Forum Expert
 
Join Date: Jul 2005
Location: Istanbul/Turkey
Age: 27
Posts: 425
Default

Well, actually it does compare, but it uses Equals method which does a value comparison, it doesnt even check references. even it did check references, it is an already O(1) algorithm and trying to make it faster in "this way" is just pointless.
__________________
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

Last edited by noobie; 11-02-2006 at 05:30 AM.
noobie is offline   Reply With Quote
Old 11-02-2006, 06:31 AM   #38 (permalink)
Forum Expert
 
Courageous's Avatar
 
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
Default

Quote:
Well, actually it does compare, but it uses Equals method which does a value comparison, ...
I knew this. You didn't. But this didn't keep you from lecturing to the contrary. Why?

Quote:
...it doesnt even check references. even it did check references, it is an already O(1) algorithm and trying to make it faster in "this way" is just pointless.
"Actually" hash tables are average case O(1), but are worst case O(N). In strict complexity theory, the O() notation that you are invoking ala abracadabra is to express the upper bounds of the compute algorithm. So to use it without mentioning the important word "average" is strictly incorrect. O(N) is correct.

But more pragmatically, in the linear hashing's average case, it is more technically expressed O(1)+K, where K can be large, and improving the performance of the hash table (that persnickety and annoying K you dismiss as irrelevant) can have a substantial runtime percentage change on many applications.

To dismiss pragmatic application runtime, all hand waivey hand waivey like, is not the least bit wise. I have seen applications that use algorithms with better theoretical performance get destroyed by applications with better pragmatics. Theoreticals aren't everything.

An extensive performance test which I have performed recently suggests to me rather strongly that C# hash tables "actually" do in fact intern their strings. Or so it would appear, without exceptional explanation to the contrary.

C//

Last edited by Courageous; 11-02-2006 at 06:34 AM.
Courageous is offline   Reply With Quote
Old 11-02-2006, 07:05 AM   #39 (permalink)
Forum Expert
 
Join Date: Jul 2005
Location: Istanbul/Turkey
Age: 27
Posts: 425
Default

I know all that hashtable stuff but still I cant see any valid point you have taken. I didnt say it wont do any good for performance, yes it will but it is neglectable.

-Lets say the algorithm (which is actually in worst case..your are right) O(N), it would be much pointless to do internalization.
-Lets say its O(1)+K and K is too large. Do you really think getting rid of just one simple Equals method would make it much faster? I mean is it worth to effort? The true answer is No.

Besides there is a good chance of the hashtable would be a temporary, using default internalization would not be a good idea.

C, I dont know what tests you have done but hashtables doesnt use internalization. actually it doesnt know even if the key is a string. (It is what the reflector says)

And appearantly, this debate is not going somewhere. So lets just drop it..
__________________
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

Last edited by noobie; 11-02-2006 at 07:16 AM.
noobie is offline   Reply With Quote
Old 11-02-2006, 11:27 AM   #40 (permalink)
Forum Expert
 
Courageous's Avatar
 
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
Default

Quote:
Do you really think getting rid of just one simple Equals method would make it much faster?
It prevents as many instructions as the string has length, for the final match test (which is, indeed necessary). As for the test performed, I'll leave it to your imagination.

C//
Courageous is offline   Reply With Quote
Old 11-03-2006, 03:52 PM   #41 (permalink)
Forum Expert
 
arul's Avatar
 
Join Date: Jan 2005
Location: Hic sunt leones ...
Age: 21
Posts: 1,289
Send a message via MSN to arul
Default

Quote:
Originally Posted by Courageous
In strict complexity theory, the O() notation that you are invoking ala abracadabra is to express the upper bounds of the compute algorithm.
Sig-worthy

Anyway, there's a nice article about undocumented features of strings over at CodeProject, especially the part called 'Efficient String Switches' in the middle of the page.

Strings UNDOCUMENTED.
__________________
Angels are falling the very last time, down they're burning in hate and decline, unfaithful and violent we're breaking the spell, we're god, we're scissor, in heaven and hell!
arul is offline   Reply With Quote
Old 02-09-2007, 12:39 PM   #42 (permalink)
Forum Newbie
 
Join Date: Jun 2005
Age: 31
Posts: 45
Default

If you want speed, go for if-es, they are always faster then switch.

If you want even more speed, use unsafe code .
yarex is offline   Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off



Powered by vBulletin® Version 3.7.0
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0 RC5