switching with 2 variables

Courageous · Oct 29, 2006

What you are saying basicly is "str"=="str" returns false...

Sheeesh. I am not. What I am saying is that two different strings, each spelled "str", are in fact actually the same string. That's what the MSDN docs are saying, too. FYI.

C//

Sep102 · Oct 30, 2006

Possibly not what actually happens in the production .NET runtime, but according to Rotor, String.op_Equality calls String.Equals(String, String) which then first does a reference comparison, then if the reference compare comes back false, it does a value comparison. According to Reflector (which I assume disassembles the installed .NET dlls) this behavior also exists in the String class which is in the installed dlls.

The MSDN documentation for String.op_Equality also agrees with this

Remarks:

This operator is implemented using the Equals method, which means the comparands are tested for a combination of reference and value equality. This operator performs an ordinal comparison.

Also, (as far as I know) strings are not interned automatically (unless you're talking about string constants which are present in the metadata, of course) instead they must be entered into the table via the String.Intern method and retrieved via the same method.

Then, there's also another remark in the documentation for String.Intern

Performance Considerations:

If you are trying to reduce the total amount of memory your application allocates, keep in mind that interning a string has two unwanted side effects. First, the memory allocated for interned String objects is not likely be released until the common languange runtime (CLR) terminates. The reason is that the CLR's reference to the interned String object can persist after your application, or even your application domain, terminates. Second, to intern a string, you must first create the string. The memory used by the String object must still be allocated, even though the memory will eventually be garbage collected.

Which also leads to the conclusion that strings are not interned by default, even internally, but instead are interned on-demand.

Of course, in all of this I would like to point out, partially for myself, that when it comes to anything past the runtime itself (such as the actual code that the JIT produces and any optimizations it does with the actual machine code generated) all bets are off and anything is possible.

Courageous · Oct 30, 2006

I reran the test by filling the table with string.Copy()'s of the strings instead of the constants, and indeed the string rand about twice as long as the integer test (I would have expected it to be even longer, given the length of the strings, so something is funny there).

I then manually interned this non constant string, and the test dropped back to the original performance figures.

So you would appear to be correct. Only constant strings are interned in the .net runtime. When they are, they are reference compared.

C//

noobie · Oct 30, 2006

so, it uses both reference and value comparison. I didnt know .NEt has an option like internalization of strings. it could be useful where lots of string operation needed.

Courageous · Oct 30, 2006

Generally speaking, internalization is quite good. The docs seem to be hinting that the internalization tables aren't garbage collected. A shame.

That's an inadequacy of the CLR, I say. Hopefully they fix it.

C//

Sep102 · Oct 31, 2006

Well, it's not necessarily an inadequacy, it could very well be a design decision. Say there's one intern table shared by everything running under the CLR, now you need some way to clean up only the strings that were interned by the application that is currently being unloaded.

The application could have a list of indices into the table that it iterates through and decrements a reference count in the intern table for each of those entries and deallocate any that have no more references.

Of course, thinking about it, we could also extend this to, say threads. Why shouldn't the intern table be cleaned after a thread exits and all of it's interned strings removed?

That's all just conjecture though, there's probably a good reason why it's done this way that I just don't know about, probably performance though. It might be worth a look to see if the .NET Compact Framework handles interned strings differently to see if it was a conscious design decision for the general .NET Framework to have semantics that may waste some memory but will end up being quicker.

Courageous · Oct 31, 2006

That's one of the things I don't understand. In the large, string internalization compacts memory, because instead of using the number of bytes in the string each time it is instanced, you get a pointer to a lexigraphically identical string.

I guess this is as obviously compact on 64 bit systems as it is on 32 bit ones, but be that as it may...

C//

p.s. have implemented string internalization tables myself, for special projects.

Sep102 · Oct 31, 2006

Courageous said:
That's one of the things I don't understand. In the large, string internalization compacts memory, because instead of using the number of bytes in the string each time it is instanced, you get a pointer to a lexigraphically identical string.

I guess this is as obviously compact on 64 bit systems as it is on 32 bit ones, but be that as it may...

C//

p.s. have implemented string internalization tables myself, for special projects.

As interning pertains to commonly used strings, I don't disagree with you at all Courageous, I agree completely, interning can produce a great increase in the efficiency of both memory usage and certain computations involving the instances.

What the main problem stems from is either the automatic interning of strings or the reckless interning of strings (essentially one in the same). This is where string interning fails, since there may very well be thousands of strings in the intern table that have only been used once and will never be used again, yet they still are present (unless of course some kind of caching and weak reference system was used, but that's another topic) until at least the end of the application. And those could just be the strings that were blatantly created in the program, also stored in the table would be all strings resulting from operations on a string (one such example I could think of was profiling operations, such as ("Running time: " + msRan + " ms"), now, likely, every time would represent a different entry in the table).

Also, although string internalization can greatly help lower the memory usage of your application, even judicious usage could potentially hamper you. For example, say that my application interns 100 kilobytes worth of strings that it uses commonly throughout the program. I now have an extra 100 kilobytes of memory added on to my working set at all times. Also, consider if at any one time I was only using five percent of the strings in the table. Normally only 5 kilobytes would need to be present in memory at a time, since the garbage collector would have cleaned up the rest when the program was not using them and reallocated them when they were needed again, now all of the data must constantly reside in memory for at least the life of the application. Essentially, my concerns boil down to the fact that doing this seems to tread on the ground that the GC is best at walking and saying that you will be doing a better job than it, at least in this instance (of which I have no problem with, just so long as it's done carefully and is backed up with statistics confirming the reasoning used, and, of course, this is really only true if this is done solely in the name of memory conservation and efficiency and not, for instance, for string comparison efficiency gains or some other advantage interning offers).

Given all this, I've reached the same conclusion I had before, all things in moderation, lest they be misused through unintentional overuse. String interning suffers from the same caveats as any other feature and, just like everything else, needs to be used judiciously and have logic backing up the reasons for its use.

Sep102 · Oct 31, 2006

Also, if you, like me, happen to be interested in tidbits about the performance of various parts of the .NET Framework, I highly suggest reading the weblogs of the Microsoft .NET performance team. Rico Mariani's, for example has a lot of interesting things in the articles (it's also the only one I currently have as an RSS feed on my main screen, which is why it gets a link).

Courageous · Oct 31, 2006

What the main problem stems from is either the automatic interning of strings or the reckless interning of strings (essentially one in the same).

A fair comment. Certain patterns of use, however, yield good information to the interpreter about the probable use of a string. One might argue, for example, that any string that goes into a dictionary, be it as a key, or as a value regardless, should be interned as a matter of course. And so forth.

C//

Sep102 · Oct 31, 2006

Courageous said:
What the main problem stems from is either the automatic interning of strings or the reckless interning of strings (essentially one in the same).

A fair comment. Certain patterns of use, however, yield good information to the interpreter about the probable use of a string. One might argue, for example, that any string that goes into a dictionary, be it as a key, or as a value regardless, should be interned as a matter of course. And so forth.

C//

Heh, thought I covered that in my "judicious use" statement or "commonly used strings". That is, I agree with you for the same reason I did before. A good argument could very well be made as to why that is a valid use of interning, as there is a good chance the string will be used and a decent chance that it will be used often, thus a good use of string interning to cut down on both memory usage and time spent doing comparisons.

Like I, thought, I guess, I said was that usage of interning, just like pretty much everything else needs to be determined on a case by case basis and can't really just be applied as a general rule for everything.

Courageous · Nov 1, 2006

Well I was just saying that for a dictionary, the runtime itself could make the decision to intern. And a variety of other similar cases.

C//

noobie · Nov 1, 2006

I dont see a point why it should be interned if it is used in a dictionary. If it is a key, probably there wont be many of it. and for the comparison part, it uses a hash method and doesnt compare the exact string with anything..

It is not good to have a default internalization behaviour. It is best that it would be up to you..

Courageous · Nov 1, 2006

People put thing into dictionaries to look thing up. Since this is the essential purpose of a dictionary, the fact of entering something into it is a major clue to its use. Items entered as keys into dictionaries have to be matched on a dictionary lookup (as a final part of the hashing cycle one must test for definite match). Internalization makes matching a string one or two instructions only. I should hope you should see the probabilistic nature of this benefit. I.e., "quite likely to be beneficial... to the point of being certain."

BTW, if you don't like the fuzzy nature of "quite likely to be beneficial," you should step out of the use of C# dictionaries altogether, and proceed to something like a red-black tree, as the fundamental algorithm that the dictionary is implemented with (typically, and specifically in C#) is itself one that is statistical and not at all certain. FYI.

C//

noobie · Nov 2, 2006

You might have millions of items in a hashtable, but that doesnt mean necessarily you will look for every one of them. And actually it would be pointless to use interned strings to speed up the program in such a case because speed increase is neglectable.

btw, why do you think dictionaries/hashtables are implemented as RBT? Because they are not..

Courageous · Nov 2, 2006

noobie,

Several of your posts have illustrated that when you reply to my posts, what you need to do is read them more carefully first.

For example, I never said that dictionaries/hashtables are implemented as RBT, that's your invention.

Your prior remark that it does not compare the exact string is mistaken, unless you happen to believe that C# hashtables use heavy hashing, like MD5, without a match test, which I highly doubt.

C//

noobie · Nov 2, 2006

Well, actually it does compare, but it uses Equals method which does a value comparison, it doesnt even check references. even it did check references, it is an already O(1) algorithm and trying to make it faster in "this way" is just pointless.

Courageous · Nov 2, 2006

Well, actually it does compare, but it uses Equals method which does a value comparison, ...

I knew this. You didn't. But this didn't keep you from lecturing to the contrary. Why?

...it doesnt even check references. even it did check references, it is an already O(1) algorithm and trying to make it faster in "this way" is just pointless.

"Actually" hash tables are average case O(1), but are worst case O(N). In strict complexity theory, the O() notation that you are invoking ala abracadabra is to express the upper bounds of the compute algorithm. So to use it without mentioning the important word "average" is strictly incorrect. O(N) is correct.

But more pragmatically, in the linear hashing's average case, it is more technically expressed O(1)+K, where K can be large, and improving the performance of the hash table (that persnickety and annoying K you dismiss as irrelevant) can have a substantial runtime percentage change on many applications.

To dismiss pragmatic application runtime, all hand waivey hand waivey like, is not the least bit wise. I have seen applications that use algorithms with better theoretical performance get destroyed by applications with better pragmatics. Theoreticals aren't everything.

An extensive performance test which I have performed recently suggests to me rather strongly that C# hash tables "actually" do in fact intern their strings. Or so it would appear, without exceptional explanation to the contrary.

C//

noobie · Nov 2, 2006

I know all that hashtable stuff but still I cant see any valid point you have taken. I didnt say it wont do any good for performance, yes it will but it is neglectable.

-Lets say the algorithm (which is actually in worst case..your are right) O(N), it would be much pointless to do internalization.
-Lets say its O(1)+K and K is too large. Do you really think getting rid of just one simple Equals method would make it much faster? I mean is it worth to effort? The true answer is No.

Besides there is a good chance of the hashtable would be a temporary, using default internalization would not be a good idea.

C, I dont know what tests you have done but hashtables doesnt use internalization. actually it doesnt know even if the key is a string. (It is what the reflector says)

And appearantly, this debate is not going somewhere. So lets just drop it..

Courageous · Nov 2, 2006

Do you really think getting rid of just one simple Equals method would make it much faster?

It prevents as many instructions as the string has length, for the final match test (which is, indeed necessary). As for the test performed, I'll leave it to your imagination.

C//

switching with 2 variables

Wanderer

Page

Wanderer

Wanderer

Wanderer

Page

Wanderer

Page

Page

Wanderer

Page

Wanderer

Wanderer

Wanderer

Wanderer

Wanderer

Wanderer

Wanderer

Wanderer

Wanderer