I’m working on a fairly major refactoring project, merging a couple of fairly large libraries together that have some overlapping/similar functionality. I figured this would be the perfect time to try out the reasonably new CodeRush Duplicate Code Detection feature.
The solution I’m working on has eight projects - four unit test projects, four shipping class libraries, and a total of 1248 C# code files.
I didn’t have duplicate code detection running in the background during the initial pass at the merge. Instead, I ran the detection after the initial pass to indicate some pain points where I need to start looking at combining code. (Let me apologize in advance - I wanted to show you the analysis results in a real life project, but because it’s a real project, I can’t actually show you the code in which we’re detecting duplicates. I can only show you the duplicate reports. With that…)
First, I started with the default settings (analysis level 4):
Running that pass took about eight seconds. I found a few duplicate blocks in unit tests, which weren’t unexpected. Things having to do with similar tests (specifically validating some operator overload functionality):
I wanted to do a little more, so I turned it up a notch to analysis level 3 and ran another pass. This time the pass took about 12 seconds and I found a lot more duplicates. This time the duplicates got into the actual shipping code and started pointing out some great places to start refactoring and merging code. This pass also grabbed more duplicates across classes (whereas the previous pass only caught duplicates within a single class/file).
Well, since one notch on the analysis level settings was good, two must be better, right? Let’s crank it up to analysis level 2!
Once you get to level 2 and below, you get a warning: Analysis might require significant CPU power and memory. I’m not too concerned with this right now since I’m running the analysis manually, but if you turn on the background analysis mechanism you’d probably want to verify that’s really what you want.
Anyway, the third pass using analysis level 2 still only took about 12 seconds… and re-running it, to verify that time, only took around 5 seconds so I’m guessing there is some sort of caching in place. (But don’t hold me to that.)
Now we’re talking! Tons of duplicates found on the level 2 run. However, while the code is very similar, there aren’t as many “automatic fixes” that the system can suggest.
I don’t fault CodeRush for this - the duplicates will require a bit of non-automated thought to combine, which is why they pay us (programmers). (If they could replace us with scripts, they’d do it, right?) I’m sure the geniuses at DevExpress will shortly replace me with a keystroke in CodeRush so they can hire my cat in my place, but until then, I can use this as a checklist of things to look at.
Out of curiosity, I decided to do another run, this time at level 0 - the tightest analysis level available. This run took ~50 seconds and found so many duplicates it’s insane. Some are actionable, some are what I’d call “false positives” (like argument validation blocks - I want the validation to take place in the method being called so the stack trace is correct if there’s an exception thrown). Still, this is good stuff.
Given the balance between the too-granular detection and the not-granular-enough, I think I’m going to go with the level 2 pass, address those things, and then maybe turn it up from there.
Overall, I was really impressed with the speed of this thing. I mean, 1248 files, thousands of classes… and it takes less than a minute to do super-detailed analysis? That’s akin to magic.
Big props to DevExpress for a super-awesome tool that helps me improve my code. And, hey, some of the automatic fixes that are built in don’t hurt, either. :)
If you want to use this on your project, head over and pick up CodeRush. Honestly, it’s the first thing I install right after Visual Studio, and this sort of feature is exactly why.