media, movies comments edit

Hitchhiker's Guide To The
GalaxyI hate book-to-film adaptations.

I really do. The problem is that adapting a book to film implies that certain things will get cut due to time constraints, the general inability to illustrate a concept visually, etc. When they do stick to the book (like the first two Harry Potter movies), I love it but the critics hate it. When they deviate from the book (like the third Harry Potter movie), I hate it but the critics love it.

I always hope they’ll stick to the book because the whole reason I want to see the film adaptation is to see the story I know and love brought to life on the screen. One of my favorite stories ever is The Hitchhiker’s Guide To The Galaxy and I was more than excited to see that it was going to get its time on the big screen (not counting the previous time, which, admittedly, I haven’t seen).

Ugh.

I mean, seriously. Ugh.

There was stuff that happened in the movie that never happened in the book.

There was stuff that happened in the movie that happened in other books in the series.

The worst bit is that I’ve read the books and I got lost. Jenn hasn’t read the books and was more lost than me.

I think they were trying to get somewhere they shouldn’t have tried to go. They tried to get “creative” with it or something. They tried to somehow change or improve upon the story and it didn’t work.

I liked Martin Freeman as Arthur Dent. Mos Def as Ford Prefect was… well, he wasn’t what I pictured when I read the books, but I bought it. Zooey Deschanel was a great Trillian. Bill Nighy as Slartibartfast was perfect (and has my vote for most accurately represented character in the movie).

On the other hand, Sam Rockwell as Zaphod Beeblebrox left something to be desired. He felt too… flashy… and not enough “used car salesman.” The rendition of Marvin the Paranoid Android was not remotely how I had pictured Marvin in the books. And the spaceship Heart Of Gold had pretty much no bearing on anything I had imagined.

The Vogons, who are an interesting set of characters, were not major players in the books. I mean, they were there, but they didn’t show up every 10 seconds. In the movie, the Vogons filled in the “villain” role and were on screen almost more than the humans.

I won’t even get into the fact that the dolphins played like NO role in the first book but somehow made it to the opening credits of the movie.

I really hope they don’t try this again. I won’t even be picking this up on DVD, it was so bad. Sorry, Hollywood, you really lost me here. Now I’m going to have a hell of a time convincing Jenn to read the books.

blog comments edit

I’m still torn on whether I should convert from pMachine to dasBlog as my blog software.

There’s some odd stuff with caching going on behind the scenes in dasBlog because it’s a filesystem-based package. pMachine doesn’t do that. Stuff like creating a second copy of all of the metadata attached to all of the comments on the entire site. If it’s stored with the comment, and it’s only used as a cache, why is it a physical file to begin with? Does it ever get rebuilt? Having a separate cache that is, in effect, a disparate data source from the actual data is asking for integrity problems.

The templates really bug me. The more I think about it, the more they bug me. I appreciate that the idea is to allow for folks that “just know HTML” to code up a template and run with it. As such, they use sort of a string-macro-substitution scheme where a huge set of undocumented “magic words” can be used to insert a bunch of stuff in an undocumented format into your page using undocumented CSS classes that you don’t get to choose. It’s an ASP.NET app - what happens if I want to add, say, a textbox to it? Or a treeview? Or any other ASP.NET server control that I could dream up? Right now, I’ve got to do some fancy tap-dancing-and-jazz-hands to put the server control into a user control and… uh, no. Seriously.

If the templates were made for a user who “just knows HTML” to use, how come the setup and upgrade procedures take an ASP.NET guru to figure out?

I shouldn’t have to modify the administrative interface to get it to do things I want. Nor should I have to manually log in to the server, download a config file, edit it, then re-upload it due to lack of an administrative interface. Ever. If you add the feature and it requires config, you add the admin interface.

If everyone using dasBlog is using some external application (i.e., BlogJet) to post entries, it occurs to me that this means not one, but two things: First, that BlogJet is cool and convenient to use. Second, that there is a dire shortcoming in the built-in interface to create new entries that needs to be addressed. If it’s so inconvenient to use, what is it that makes it inconvenient? Address that.

While we’re on the topic of the built-in new entry interface, the usage of text editing components must be questioned. I know the dasBlog guys weren’t responsible for FreeTextBox or the way it munges up HTML. That’s fine. But here’s the deal: clean HTML isn’t important to some folks, but it really is to others, particularly when trying to apply complex CSS. Seriously. As far as I’m concerned anymore, it’s XHTML compliant or bust. (No, my current blog does not adhere to XHTML standards; that’s part of why I want to update. To get a nice, clean template that does.) Even if that means I enter a new entry in a plain old textbox manually (which is what I currently do anyway).

I might want the ability to upload images into one folder and other content, like downloadable software, into another folder. I, further, might even want a file browser so I can see (without having to FTP) what I’ve got up there and delete or rename files as I see fit. Hook me up, guys. It’s not rocket science.

If it doesn’t just work without tweaking, don’t put it in. If it’s not going to be documented so people can take advantage of it, don’t put it in. If you have to know how to create a nuclear accelerator out of duct tape and toothpicks in order to use it, don’t put it in. (This sort of goes back to the trouble with the administrative interface - complete configuration of the site can’t really be achieved through the admin interface right now, so you have to know where/how certain things work - undocumented - in order to get things configured just so.

Where was the common sense when writing some of this stuff. If you have a class called DayEntry and that class has a static method like OccursBefore(DayEntry entry, DateTime dt), then it occurs to me that, since you have to have a DayEntry instance anyway, the method should be an instance method, not a static one. Am I wrong? (There’s a lot of that kind of stuff in there.)

I know what’s going to happen here. I’m going to convert over to dasBlog and get pissed off that there are weird things in there. I’m going to rewrite the thing and have to run a fully custom implementation just so I can get things done. I guess I should just accept it now. Unless you dasBlog folks want to go on a major cleaning spree? Stop adding new features and make the existing product solid. Not just solid, but solid.

vs, dotnet comments edit

I’ve heard that the CR_Documentor plugin is a little sluggish on some folks’ machines so I decided to run a profiler on it and see what’s slowing me down.

The profilers out there for .NET suck. A lot.

The problem with the majority of them is that they only profile executables. You can’t just profile a satellite assembly that you’ve written that gets used by an executable, and you can’t just attach to a process that’s already running and using your assembly.

Well, as an add-in, my assembly’s technically running under Visual Studio itself - devenv.exe. So there’s your host app…

I tried DevPartner Profiler Community Edition. It blue-screened my box twice before I gave up. I never successfully even got VS started up to be profiled.

I tried nprof, the Red-Gate ANTS profiler, and the CLR Profiler. No luck.

DevExpress tech support recommended AQtime, since that’s what they use. It seemed the best of the bunch, being able to attach to existing running processes and select specific assemblies to profile, but it wouldn’t allow me to attach to an existing running instance of Visual Studio, nor was it able to start one up for me (I got a bunch of Access Violations and VS would puke). So the best of the bunch (that I tried out) never worked.

Long and short of it is, I never got the thing profiled. I’m thinking I may have to add some sort of trace-style instrumentation and/or performance counters to get this done. Hmmm.

In other news, I made a few Amazon purchases today: C++ Primer, 4th Ed. (I need to re-learn C++… it’s been too long); Joel On Software; and Red Dwarf, series 5 and 6 (can’t get enough of the Red Dwarf).

I’ve also made quite a bit of progress in my pMachine-to-dasBlog conversion program and may try to transfer over in the reasonably near future. Of course, after having experienced what I have thus far with dasBlog, I can see that there’s some work I need to do on that in order to accommdate the stuff I’d like to do in a reasonable fashion (for example, the templates being of the “macro substitution” fashion need to be fixed to actually work like master pages so I can design them using real controls and not just string replacement… not to mention there are far too many moving parts behind the scenes there for the amount of stuff it’s actually doing…). We’ll see.

gists, javascript, csharp comments edit

I’m struggling right now with the fact that JavaScript/ECMAScript doesn’t allow for Unicode character classes in regular expressions. For example, if I want to set up a client-side JavaScript validation expression on a numeric field, I’d want to do something like ^\d+$ as my regular expression, right? Match one or more digits?

The problem is that in JavaScript, \d expands out to [0-9], which technically isn’t all of the digits, if you think about all of the other alphabets out there that exist and don’t use 0 through 9 to indicate numbers.

In .NET, they solve this by mapping to Unicode character classes. So \d maps to \p{Nd}, which is the Unicode character class for digits. Much more global, right? So how do you do that on the client side?

Well, I figure you have to expand the character classes on the server side and then feed those to the client. JavaScript supports Unicode character codes with a hexadecimal character code, so you can say like \uFFFF or whatever to specify a particular character. So you need to take \d and expand to the full set of Unicode characters.

Using \d as our example, a C# snippet that expands the digits looks like this:

static void Main(string[] args){
  string Nd = UnicodeExpansion(System.Globalization.UnicodeCategory.DecimalDigitNumber);
  Console.WriteLine(Nd);
  Console.ReadLine();
}

/// <summary>
/// Expands a Unicode character set into an ECMAScript compatible character
/// range string.
/// </summary>
/// <param name="category">
/// The Unicode character category to expand.
/// </param>
/// <returns>
/// A <see cref="System.String" /> that can be used in an ECMAScript regular
/// expression.
/// </returns>
/// <remarks>
/// <para>
/// ECMAScript (JavaScript) does not inherently understand Unicode in regular
/// expressions, which results in incorrect validation when using character
/// classes (\w, \s, \d, etc.).
/// </para>
/// <para>
/// This method expands a <see cref="System.Globalization.UnicodeCategory" />
/// into a string that can be used in an ECMAScript regular expression.  For
/// example, the category <see cref="System.Globalization.UnicodeCategory.LetterNumber" />
/// expands to <c>\u2160-\u2183\u3007\u3021-\u3029\u3038-\u303a</c>.
/// </para>
/// </remarks>
public static string UnicodeExpansion(System.Globalization.UnicodeCategory category){
  // The fully expanded block of characters
  string expansion = "";
  // Low-end of the character block
  int blockLow = -1;
  // High-end of the character block
  int blockHigh = -1;
  // Marks whether the current block has been written
  bool blockWritten = false;

  for(int charVal = 0; charVal <= Char.MaxValue; charVal++){
    // Get the category of the current character
    System.Globalization.UnicodeCategory charCat = Char.GetUnicodeCategory(Convert.ToChar(charVal));

    // We haven't written anything this loop; used to ensure
    // all blocks get written at the end.
    blockWritten = false;

    // Ignore characters that don't match the category.
    if(charCat != category){
      continue;
    }

    if(blockLow == -1){
      // Handle the very first block
      blockLow = charVal;
      blockHigh = charVal;
    }
    else if(
      // charVal skipped some characters OR
      blockHigh + 1 != charVal ||
      // We're at the end of the set of characters
      blockHigh + 1 > Char.MaxValue
      ){

      // Write the block to the expansion string
      if(blockLow == blockHigh){
        // This is a one-character block
        expansion += String.Format(@"\u{0:x4}", blockLow);
      }
      else{
        // This is a multi-char block
        expansion += String.Format(@"\u{0:x4}-\u{1:x4}", blockLow, blockHigh);
      }

      // Start a new block
      blockWritten = true;
      blockLow = charVal;
      blockHigh = charVal;
    }
    else{
      // We're still in the same block; increment the high end of the block.
      blockHigh = charVal;
    }
  }

  // If we didn't write the last block, write it now
  if(!blockWritten){
    if(blockLow == blockHigh){
      // This is a one-character block
      expansion += String.Format(@"\u{0:x4}", blockLow);
    }
    else{
      // This is a multi-char block
      expansion += String.Format(@"\u{0:x4}-\u{1:x4}", blockLow, blockHigh);
    }
    blockWritten = true;
  }

  return expansion;
}

For \d, it expands out to:

\u0030-\u0039\u0660-\u0669\u06f0-\u06f9\u0966-\u096f\u09e6-\u09ef\u0a66-\u0a6f\u0ae6-\u0aef\u0b66-\u0b6f\u0be7-\u0bef\u0c66-\u0c6f\u0ce6-\u0cef\u0d66-\u0d6f\u0e50-\u0e59\u0ed0-\u0ed9\u0f20-\u0f29\u1040-\u1049\u1369-\u1371\u17e0-\u17e9\u1810-\u1819\uff10-\uff19

Which means that rather than ^[\d]+$ to validate, you’d use ^[\u0030-\u0039\u0660-\u0669\u06f0-\u06f9\u0966-\u096f\u09e6-\u09ef\u0a66-\u0a6f\u0ae6-\u0aef\u0b66-\u0b6f\u0be7-\u0bef\u0c66-\u0c6f\u0ce6-\u0cef\u0d66-\u0d6f\u0e50-\u0e59\u0ed0-\u0ed9\u0f20-\u0f29\u1040-\u1049\u1369-\u1371\u17e0-\u17e9\u1810-\u1819\uff10-\uff19]+$.

You can try this out at http://www.regular-expressions.info/javascriptexample.html. Seems to work pretty well.

I’m using numbers as my example here, though the same thoughts could be applied to letters or any other character classes. Like in JavaScript, \w maps to [a-zA-Z_0-9], which is obviously not all the possible letters out there.

You could even take this a further step and pre-calculate all of the Unicode character blocks at application start time and cache the common character class expansions for use in regex translation on the server side.

Updated 9/9/2005 for boundary condition logic error and again on 9/11/2005 to fix accidental omission of the last block (thanks cougio); modified the method to be a standalone static for easier cut and paste into applications; added comments for readability.