mike's web log

 

Blog Search


(Supports AND)

 

Google Ads

 

Feed

Subscribe to the RSS feed for this blog.

See this post for info on full versus truncated feeds.

 

Quote

The scientific handicapper will never beat the horses, but he will learn to be alert for subtleties that escape the less trained eye. To weigh and evaluate a vast grid of information, much of it meaningless, and to arrive at sensible, if erroneous, conclusions, is a skill not to be sneezed at.

— Richard Russo



 

Navigation






<April 2014>
SMTWTFS
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910


 

25 Most-Visited Entries

 

Categories

  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
  RSS
 

Blogs I Read

 

Contact

Email me
 

Blog Statistics

Dates
First entry - 6/27/2003
Most recent entry - 4/3/2014

Totals
Posts - 2298
Comments - 2480
Hits - 1,619,384

Averages
Entries/day - 0.58
Comments/entry - 1.08
Hits/day - 410

Update every 30 minutes. Last: 3:19 AM Pacific

 
   |  Reg Ex and the bus factor

posted at 10:07 PM | | |

It's kind of pointless for me to be quoting Scott Hanselman -- if you like the following, you probably saw it weeks ago -- but I laughed when I read it, so here goes. This is buried in a tech post about stripping empty XML elements:
The early versions of the Rectifier used an uber-regular expression to strip out these tags from the source string. This system returns a full XML Document string, not an XmlReader or IXPathNavigable.

I heard a cool quote yesterday at the Portland NerdDinner while we were planning the CodeCamp.

"So you've got a problem, and you've decided to solve it with Regular Expressions. Now you've got two problems."

Since the size of the documents we passed through this system were between 10k and 100k the performance of the RegEx, especially when it's compiled and cached was fine. Didn't give it a thought for years. It worked and it worked well. It looked like this:

private static Regex regex = new Regex(@"\<[\w-_.: ]*\>\<\!\[CDATA\[\]\]\>\|\<[\w-_.: ]*\>\|<[\w-_.: ]*/\>|\<[\w-_.: ]*[/]+\>|\<[\w-_.: ]*[\s]xmlns[:\w]*=""[\w-/_.: ]*""\>\|<[\w-_.: ]*[\s]xmlns[:\w]*=""[\w-/_.: ]*""[\s]*/\>|\<[\w-_.: ]*[\s]xmlns[:\w]*=""[\w-/_.: ]*""\>\<\!\[CDATA\[\]\]\>\",RegexOptions.Compiled);

Stuff like this has what I call a "High Bus Factor." That means if the developer who wrote it is hit by a bus, you're screwed. It's nice to create a solution that anyone can sit down and start working on and this isn't one of them.

[categories]