vs, net comments edit

I’ve heard that the CR_Documentor plugin is a little sluggish on some folks’ machines so I decided to run a profiler on it and see what’s slowing me down.

The profilers out there for .NET suck. A lot.

The problem with the majority of them is that they only profile executables. You can’t just profile a satellite assembly that you’ve written that gets used by an executable, and you can’t just attach to a process that’s already running and using your assembly.

Well, as an add-in, my assembly’s technically running under Visual Studio itself - devenv.exe. So there’s your host app…

I tried DevPartner Profiler Community Edition. It blue-screened my box twice before I gave up. I never successfully even got VS started up to be profiled.

I tried nprof, the Red-Gate ANTS profiler, and the CLR Profiler. No luck.

DevExpress tech support recommended AQtime, since that’s what they use. It seemed the best of the bunch, being able to attach to existing running processes and select specific assemblies to profile, but it wouldn’t allow me to attach to an existing running instance of Visual Studio, nor was it able to start one up for me (I got a bunch of Access Violations and VS would puke). So the best of the bunch (that I tried out) never worked.

Long and short of it is, I never got the thing profiled. I’m thinking I may have to add some sort of trace-style instrumentation and/or performance counters to get this done. Hmmm.

In other news, I made a few Amazon purchases today: C++ Primer, 4th Ed. (I need to re-learn C++… it’s been too long); Joel On Software; and Red Dwarf, series 5 and 6 (can’t get enough of the Red Dwarf).

I’ve also made quite a bit of progress in my pMachine-to-dasBlog conversion program and may try to transfer over in the reasonably near future. Of course, after having experienced what I have thus far with dasBlog, I can see that there’s some work I need to do on that in order to accommdate the stuff I’d like to do in a reasonable fashion (for example, the templates being of the “macro substitution” fashion need to be fixed to actually work like master pages so I can design them using real controls and not just string replacement… not to mention there are far too many moving parts behind the scenes there for the amount of stuff it’s actually doing…). We’ll see.

gists, javascript, csharp comments edit

I’m struggling right now with the fact that JavaScript/ECMAScript doesn’t allow for Unicode character classes in regular expressions. For example, if I want to set up a client-side JavaScript validation expression on a numeric field, I’d want to do something like ^\d+$ as my regular expression, right? Match one or more digits?

The problem is that in JavaScript, \d expands out to [0-9], which technically isn’t all of the digits, if you think about all of the other alphabets out there that exist and don’t use 0 through 9 to indicate numbers.

In .NET, they solve this by mapping to Unicode character classes. So \d maps to \p{Nd}, which is the Unicode character class for digits. Much more global, right? So how do you do that on the client side?

Well, I figure you have to expand the character classes on the server side and then feed those to the client. JavaScript supports Unicode character codes with a hexadecimal character code, so you can say like \uFFFF or whatever to specify a particular character. So you need to take \d and expand to the full set of Unicode characters.

Using \d as our example, a C# snippet that expands the digits looks like this:

static void Main(string[] args){
  string Nd = UnicodeExpansion(System.Globalization.UnicodeCategory.DecimalDigitNumber);
  Console.WriteLine(Nd);
  Console.ReadLine();
}

/// <summary>
/// Expands a Unicode character set into an ECMAScript compatible character
/// range string.
/// </summary>
/// <param name="category">
/// The Unicode character category to expand.
/// </param>
/// <returns>
/// A <see cref="System.String" /> that can be used in an ECMAScript regular
/// expression.
/// </returns>
/// <remarks>
/// <para>
/// ECMAScript (JavaScript) does not inherently understand Unicode in regular
/// expressions, which results in incorrect validation when using character
/// classes (\w, \s, \d, etc.).
/// </para>
/// <para>
/// This method expands a <see cref="System.Globalization.UnicodeCategory" />
/// into a string that can be used in an ECMAScript regular expression.  For
/// example, the category <see cref="System.Globalization.UnicodeCategory.LetterNumber" />
/// expands to <c>\u2160-\u2183\u3007\u3021-\u3029\u3038-\u303a</c>.
/// </para>
/// </remarks>
public static string UnicodeExpansion(System.Globalization.UnicodeCategory category){
  // The fully expanded block of characters
  string expansion = "";
  // Low-end of the character block
  int blockLow = -1;
  // High-end of the character block
  int blockHigh = -1;
  // Marks whether the current block has been written
  bool blockWritten = false;

  for(int charVal = 0; charVal <= Char.MaxValue; charVal++){
    // Get the category of the current character
    System.Globalization.UnicodeCategory charCat = Char.GetUnicodeCategory(Convert.ToChar(charVal));

    // We haven't written anything this loop; used to ensure
    // all blocks get written at the end.
    blockWritten = false;

    // Ignore characters that don't match the category.
    if(charCat != category){
      continue;
    }

    if(blockLow == -1){
      // Handle the very first block
      blockLow = charVal;
      blockHigh = charVal;
    }
    else if(
      // charVal skipped some characters OR
      blockHigh + 1 != charVal ||
      // We're at the end of the set of characters
      blockHigh + 1 > Char.MaxValue
      ){

      // Write the block to the expansion string
      if(blockLow == blockHigh){
        // This is a one-character block
        expansion += String.Format(@"\u{0:x4}", blockLow);
      }
      else{
        // This is a multi-char block
        expansion += String.Format(@"\u{0:x4}-\u{1:x4}", blockLow, blockHigh);
      }

      // Start a new block
      blockWritten = true;
      blockLow = charVal;
      blockHigh = charVal;
    }
    else{
      // We're still in the same block; increment the high end of the block.
      blockHigh = charVal;
    }
  }

  // If we didn't write the last block, write it now
  if(!blockWritten){
    if(blockLow == blockHigh){
      // This is a one-character block
      expansion += String.Format(@"\u{0:x4}", blockLow);
    }
    else{
      // This is a multi-char block
      expansion += String.Format(@"\u{0:x4}-\u{1:x4}", blockLow, blockHigh);
    }
    blockWritten = true;
  }

  return expansion;
}

For \d, it expands out to:

\u0030-\u0039\u0660-\u0669\u06f0-\u06f9\u0966-\u096f\u09e6-\u09ef\u0a66-\u0a6f\u0ae6-\u0aef\u0b66-\u0b6f\u0be7-\u0bef\u0c66-\u0c6f\u0ce6-\u0cef\u0d66-\u0d6f\u0e50-\u0e59\u0ed0-\u0ed9\u0f20-\u0f29\u1040-\u1049\u1369-\u1371\u17e0-\u17e9\u1810-\u1819\uff10-\uff19

Which means that rather than ^[\d]+$ to validate, you’d use ^[\u0030-\u0039\u0660-\u0669\u06f0-\u06f9\u0966-\u096f\u09e6-\u09ef\u0a66-\u0a6f\u0ae6-\u0aef\u0b66-\u0b6f\u0be7-\u0bef\u0c66-\u0c6f\u0ce6-\u0cef\u0d66-\u0d6f\u0e50-\u0e59\u0ed0-\u0ed9\u0f20-\u0f29\u1040-\u1049\u1369-\u1371\u17e0-\u17e9\u1810-\u1819\uff10-\uff19]+$.

You can try this out at http://www.regular-expressions.info/javascriptexample.html. Seems to work pretty well.

I’m using numbers as my example here, though the same thoughts could be applied to letters or any other character classes. Like in JavaScript, \w maps to [a-zA-Z_0-9], which is obviously not all the possible letters out there.

You could even take this a further step and pre-calculate all of the Unicode character blocks at application start time and cache the common character class expansions for use in regex translation on the server side.

Updated 9/9/2005 for boundary condition logic error and again on 9/11/2005 to fix accidental omission of the last block (thanks cougio); modified the method to be a standalone static for easier cut and paste into applications; added comments for readability.

downloads, vs, net comments edit

Solvent 1.1.1 is out with a minor bug fix allowing for “Command Prompt Here” to open the command prompt to drives other than the one VS.NET is installed. (Oops!)

I’ve also (finally) put the Solvent source out there for folks curious how it works. You’ll see a lot of stubbed-in stuff where I was/am trying to get the Windows “Send To” menu to show up in there, but it’ll probably just forever be stubbed since there’s no great way to interface with “Send To,” particularly in a managed code world.

Anyway, the update’s out, so go get it.