|
||
|
|
#26 (permalink) |
|
Forum Expert
|
Well, it's not necessarily an inadequacy, it could very well be a design decision. Say there's one intern table shared by everything running under the CLR, now you need some way to clean up only the strings that were interned by the application that is currently being unloaded.
The application could have a list of indices into the table that it iterates through and decrements a reference count in the intern table for each of those entries and deallocate any that have no more references. Of course, thinking about it, we could also extend this to, say threads. Why shouldn't the intern table be cleaned after a thread exits and all of it's interned strings removed? That's all just conjecture though, there's probably a good reason why it's done this way that I just don't know about, probably performance though. It might be worth a look to see if the .NET Compact Framework handles interned strings differently to see if it was a conscious design decision for the general .NET Framework to have semantics that may waste some memory but will end up being quicker. |
|
|
|
|
|
#27 (permalink) |
|
Forum Expert
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
|
That's one of the things I don't understand. In the large, string internalization compacts memory, because instead of using the number of bytes in the string each time it is instanced, you get a pointer to a lexigraphically identical string.
I guess this is as obviously compact on 64 bit systems as it is on 32 bit ones, but be that as it may... C// p.s. have implemented string internalization tables myself, for special projects. |
|
|
|
|
|
#28 (permalink) | |
|
Forum Expert
|
Quote:
What the main problem stems from is either the automatic interning of strings or the reckless interning of strings (essentially one in the same). This is where string interning fails, since there may very well be thousands of strings in the intern table that have only been used once and will never be used again, yet they still are present (unless of course some kind of caching and weak reference system was used, but that's another topic) until at least the end of the application. And those could just be the strings that were blatantly created in the program, also stored in the table would be all strings resulting from operations on a string (one such example I could think of was profiling operations, such as ("Running time: " + msRan + " ms"), now, likely, every time would represent a different entry in the table). Also, although string internalization can greatly help lower the memory usage of your application, even judicious usage could potentially hamper you. For example, say that my application interns 100 kilobytes worth of strings that it uses commonly throughout the program. I now have an extra 100 kilobytes of memory added on to my working set at all times. Also, consider if at any one time I was only using five percent of the strings in the table. Normally only 5 kilobytes would need to be present in memory at a time, since the garbage collector would have cleaned up the rest when the program was not using them and reallocated them when they were needed again, now all of the data must constantly reside in memory for at least the life of the application. Essentially, my concerns boil down to the fact that doing this seems to tread on the ground that the GC is best at walking and saying that you will be doing a better job than it, at least in this instance (of which I have no problem with, just so long as it's done carefully and is backed up with statistics confirming the reasoning used, and, of course, this is really only true if this is done solely in the name of memory conservation and efficiency and not, for instance, for string comparison efficiency gains or some other advantage interning offers). Given all this, I've reached the same conclusion I had before, all things in moderation, lest they be misused through unintentional overuse. String interning suffers from the same caveats as any other feature and, just like everything else, needs to be used judiciously and have logic backing up the reasons for its use. Last edited by Sep102; 10-31-2006 at 08:04 PM. |
|
|
|
|
|
|
#29 (permalink) |
|
Forum Expert
|
Also, if you, like me, happen to be interested in tidbits about the performance of various parts of the .NET Framework, I highly suggest reading the weblogs of the Microsoft .NET performance team. Rico Mariani's, for example has a lot of interesting things in the articles (it's also the only one I currently have as an RSS feed on my main screen, which is why it gets a link).
|
|
|
|
|
|
#30 (permalink) |
|
Forum Expert
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
|
What the main problem stems from is either the automatic interning of strings or the reckless interning of strings (essentially one in the same).
A fair comment. Certain patterns of use, however, yield good information to the interpreter about the probable use of a string. One might argue, for example, that any string that goes into a dictionary, be it as a key, or as a value regardless, should be interned as a matter of course. And so forth. C// |
|
|
|
|
|
#31 (permalink) | |
|
Forum Expert
|
Quote:
Like I, thought, I guess, I said was that usage of interning, just like pretty much everything else needs to be determined on a case by case basis and can't really just be applied as a general rule for everything. |
|
|
|
|
|
|
#33 (permalink) |
|
Forum Expert
Join Date: Jul 2005
Location: Istanbul/Turkey
Age: 27
Posts: 425
|
I dont see a point why it should be interned if it is used in a dictionary. If it is a key, probably there wont be many of it. and for the comparison part, it uses a hash method and doesnt compare the exact string with anything..
It is not good to have a default internalization behaviour. It is best that it would be up to you..
__________________
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." |
|
|
|
|
|
#34 (permalink) |
|
Forum Expert
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
|
People put thing into dictionaries to look thing up. Since this is the essential purpose of a dictionary, the fact of entering something into it is a major clue to its use. Items entered as keys into dictionaries have to be matched on a dictionary lookup (as a final part of the hashing cycle one must test for definite match). Internalization makes matching a string one or two instructions only. I should hope you should see the probabilistic nature of this benefit. I.e., "quite likely to be beneficial... to the point of being certain."
BTW, if you don't like the fuzzy nature of "quite likely to be beneficial," you should step out of the use of C# dictionaries altogether, and proceed to something like a red-black tree, as the fundamental algorithm that the dictionary is implemented with (typically, and specifically in C#) is itself one that is statistical and not at all certain. FYI. C// Last edited by Courageous; 11-01-2006 at 11:33 PM. |
|
|
|
|
|
#35 (permalink) |
|
Forum Expert
Join Date: Jul 2005
Location: Istanbul/Turkey
Age: 27
Posts: 425
|
You might have millions of items in a hashtable, but that doesnt mean necessarily you will look for every one of them. And actually it would be pointless to use interned strings to speed up the program in such a case because speed increase is neglectable.
btw, why do you think dictionaries/hashtables are implemented as RBT? Because they are not..
__________________
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." |
|
|
|
|
|
#36 (permalink) |
|
Forum Expert
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
|
noobie,
Several of your posts have illustrated that when you reply to my posts, what you need to do is read them more carefully first. For example, I never said that dictionaries/hashtables are implemented as RBT, that's your invention. Your prior remark that it does not compare the exact string is mistaken, unless you happen to believe that C# hashtables use heavy hashing, like MD5, without a match test, which I highly doubt. C// Last edited by Courageous; 11-02-2006 at 04:54 AM. |
|
|
|
|
|
#37 (permalink) |
|
Forum Expert
Join Date: Jul 2005
Location: Istanbul/Turkey
Age: 27
Posts: 425
|
Well, actually it does compare, but it uses Equals method which does a value comparison, it doesnt even check references. even it did check references, it is an already O(1) algorithm and trying to make it faster in "this way" is just pointless.
__________________
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." Last edited by noobie; 11-02-2006 at 05:30 AM. |
|
|
|
|
|
#38 (permalink) | ||
|
Forum Expert
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
|
Quote:
Quote:
But more pragmatically, in the linear hashing's average case, it is more technically expressed O(1)+K, where K can be large, and improving the performance of the hash table (that persnickety and annoying K you dismiss as irrelevant) can have a substantial runtime percentage change on many applications. To dismiss pragmatic application runtime, all hand waivey hand waivey like, is not the least bit wise. I have seen applications that use algorithms with better theoretical performance get destroyed by applications with better pragmatics. Theoreticals aren't everything. An extensive performance test which I have performed recently suggests to me rather strongly that C# hash tables "actually" do in fact intern their strings. Or so it would appear, without exceptional explanation to the contrary. C// Last edited by Courageous; 11-02-2006 at 06:34 AM. |
||
|
|
|
|
|
#39 (permalink) |
|
Forum Expert
Join Date: Jul 2005
Location: Istanbul/Turkey
Age: 27
Posts: 425
|
I know all that hashtable stuff but still I cant see any valid point you have taken. I didnt say it wont do any good for performance, yes it will but it is neglectable.
-Lets say the algorithm (which is actually in worst case..your are right) O(N), it would be much pointless to do internalization. -Lets say its O(1)+K and K is too large. Do you really think getting rid of just one simple Equals method would make it much faster? I mean is it worth to effort? The true answer is No. Besides there is a good chance of the hashtable would be a temporary, using default internalization would not be a good idea. C, I dont know what tests you have done but hashtables doesnt use internalization. actually it doesnt know even if the key is a string. (It is what the reflector says) And appearantly, this debate is not going somewhere. So lets just drop it..
__________________
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." Last edited by noobie; 11-02-2006 at 07:16 AM. |
|
|
|
|
|
#40 (permalink) | |
|
Forum Expert
Join Date: Nov 2005
Location: San Diego, CA
Posts: 1,824
|
Quote:
C// |
|
|
|
|
|
|
#41 (permalink) | |
|
Forum Expert
|
Quote:
Anyway, there's a nice article about undocumented features of strings over at CodeProject, especially the part called 'Efficient String Switches' in the middle of the page. Strings UNDOCUMENTED.
__________________
Angels are falling the very last time, down they're burning in hate and decline, unfaithful and violent we're breaking the spell, we're god, we're scissor, in heaven and hell! |
|
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|