When Breaking Changes Actually Break Stuff

It’s funny how breaking changes actually…well, break stuff sometimes.  I’ve had an issue on JK.com that I’ve been trying to track down for a while now:  Search was completely broken and for the life of me I couldn’t figure out why.  I won’t bore anyone with the troubleshooting steps I took, but long story short there was a change to the String.GetHashCode algorithm from .Net 1.1 to 2.0.  It says it right there in the documentation:

The behavior of GetHashCode is dependent on its implementation, which might change from one version of the common language runtime to another. A reason why this might happen is to improve the performance of GetHashCode. If you require the behavior of GetHashCode be constant, override the runtime implementation of GetHashCode with an implementation of your own that you know will never change.

Eric Lippert further emphasizes this here:

Finally, the string hash algorithm is not an industry standard and is not guaranteed to produce the same behaviour between versions. And in fact it does not. The .NET 2.0 CLR uses a different algorithm for string hashing than the .NET 1.1 CLR. If you are saving .NET 1.1 CLR hash values in a database then you will not be able to match them when you upgrade to 2.0.

How does this affect the search functionality in Community Server?  Well, for starters CS uses a string hash to tokenize posts (the TokenizeKeywords method) in the cs_searchBarrel table.  So if you’re running CS2.0 under asp.net 2.0, search will be broken as the keys won’t match.  Period.  The fix is easy enough (well, I say that now)…simply truncate your cs_searchBarrel table and update all of your IsIndexed column values in cs_Posts to 0, and CS will do the rest for you by rebuilding the search tables.  I must have truncated my cs_searchBarrel table about a thousand times trying to fix this, but failed to do the latter step.  CS won’t index posts that already have IsIndexed set to true, which of course makes sense once you think about it…this would be disastrous performance-wise for large sites if the search job had to reindex every post on each pass.

So if you’re running CS2.0 on asp.net 2.0, more than likely your search is broken and you may not have realized it.  The bigger point is what Eric pointed out:  Don’t use GetHashCode to uniquely identify data as it’s not guaranteed to remain consistent across .Net version changes.  Google knew about this long before I did.


