Thursday, 20 December 2012

Historical Debugging (Part 3) - IntelliTrace

When I first saw IntelliTrace introduced in Visual Studio 2010 I was initially disappointed and passed it by.  Sure, it was petty cool to make this part of your ALM ecosystem.  For example, whenever a test within a Microsoft Test Manager session or a unit test during a CI build failed, an iTrace file could then be stored on the TFS server against the failure and pulled down by a developer to investigate.  For me though, this wasn't where the functionality had the most potential benefit.  With internal testing, we already know the boundaries and set the parameters to ensure the test will fail.  They are written right there in the test.  The real benefit for me is with identifying the unknown cause of a problem within a production environment.  This is exactly what Microsoft gave us with Visual Studio 2012 and the new IntelliTrace Collector.

So what is IntelliTrace?  Well, it's like memory dump analysis on steroids.  However, unlike dump analysis where you can only see the state for frames within the stack, i.e. method entry and exit points, IntelliTrace can be configured to collect data at a far greater verbosity, down to the individual call level.  This provides a debugging experience much closer to that of live debugging except that the state information has already been collected and you can't purposely, or indeed accidentally modify it.

Another great benefit of IntelliTrace, when you have it configured to do so, is the ability to collect gesture information.  This means you no longer need to ask the tester or client for reproduction steps as all the information is right there within the .iTrace file.

IntelliTrace can also be incredibly useful for debugging applications that have a high cyclomatic complexity due to the client reconfigurability of the product.  No longer do you need to spend an enormous amount of time and expense trying to get an environment set up to match that of the client's configuration; if it's even possible that is.  As we are performing historical debugging against an iTrace file that has already collected the information pertaining to the bug, we can jump straight in and do what we do best.

One of the biggest criticisms people have of IntelliTrace however is the initial cost.  The ability to debug against a collected iTrace file is only available in the Ultimate version of Visual Studio.  True, this is an expensive product but I don't think it takes too long before you see a full return on that investment and indeed become better off, both financially and by gaining a strengthened reputation to respond quickly to difficult production issues.  In the very least, I think it makes perfect economic sense to kit out your customer facing maintenance teams with this feature.

Now that IntelliTrace can be used to step through production issues, it has become a formidable tool for the maintenance engineer.  Along side dump file analysis, we have the potential to make the "No repro" issue a thing of the past; only rearing it's head in non-deterministic or load based issues.

Historical Debugging (Part 2) - Visualised Dump File Analysis

As I mentioned previously, "Live Debugging" i.e. debugging a system as you press the buttons and pull the leavers doesn't always allow you to reproduce the issue.  Much time can be wasted building what you believe to be a representative environment only to find the issue still doesn't occur.  This is where "Historical Debugging" has so many advantages.

One of my favourite historical debugging additions to Visual Studio 2010 was the ability to debug against a user mode memory dump.  These dump files have always been creatable using a tool such as ADPlus, WinDbg or more recently ProcDump but since the launch of Windows Vista/Server 2008, their creation has been even easier.  Simply right click on the process in Windows Task Manager and select the "Create dump file" option on the context menu.  In doing so, the dump file is then placed in your %temp% directory, i.e. {SystemDrive}:\Users\{username}\AppData\Local\Temp.

Note, you can only use the above method if your application is working in the same address space as the underlying os architecture. If you attempt to dump a 32bit application running on 64bit windows, you'll just be dumping the WoW64 process that your application is running within.  In this case, use one of the aforementioned utilities to create the dump.

Once you have a *.dmp file and as long as you have the associated pdb's for the process you dumped, you can simply drag it into your IDE and begin performing a historical analysis of the process as of when the snapshot was taken.  You'll be able to see the call stack of each thread in the dump and switch to any frame within each stack so that you can interrogate and analyse the variables within the context of the source code and debugger visualisation tools you are already familiar with.  This is a vast improvement over the CLI based tools such as WinDbg with the SoS extensions used up until this point.

So lets see this in action.  I have put together a simple WinForms application and put something in there to cause the application to hang.


Step 1: Run the application and trigger the action that will cause it to hang.


Step 2: Open Task Manager and observe the application is reporting as not responding.


Step 3: Right click on the process and select "Create dump file".


Step 4: Drag the created .dmp file into Visual Studio and perform the action "Debug with Mixed".


Step 5: Use the debugger tools to perform a historical analysis of the variables.


As you can see from the last screen shot, the hang was simply caused by an infinite loop and Visual Studio took us straight to it because that's the point in the code the process reached when we created the memory dump but this strategy is just as applicable to thread contention issues or indeed any other issue where analysis of a running production environment is required.

Dump file analysis used to be a hard going process and often avoided.  Now we have the ability to import these files into Visual Studio and interrogate the state using the tools we are already familiar with, this once again becomes a very valuable and powerful tool which can save an enormous amount of time and money diagnosing those "no repro" issues.

Friday, 14 December 2012

Historical Debugging (Part 1) - The end of the "No repro" problem?

If you think about it, with all the advancements that have been made within the software engineering industry over the last 40 years, debugging tools and strategies have not really advanced at the same pace.  Sure, parallelism has received a lot of focus in recent years and as such we have seen new debugging visualisers like the "Parallel Stacks" & "Parallel Tasks" windows in Visual Studio 2010 and the new "Parallel Watch" window in Visual Studio 2012 but by and large, debugging strategies have remained the same. Examine a stack trace; place some breakpoints in the associated source and try to reproduce the issue.

How many times have we or our teams tried to reproduce an issue using the reproduction steps provided by QA or worse still, the client and not been able to do so?  More times than I care to remember.  The problem becomes even harder to diagnose when the issue is data driven or caused by a data inconsistency issue and requires setting up an environment that closely represents that of the environment at fault.  We shouldn't be asking our clients for copies of their production data, especially if it contains sensitive customer or patient information.  So how do we debug issues we can't reproduce in a controlled development environment?  The answer, "Historical Debugging".

Historical debugging is a strategy whereby a faulting application can be debugged after the fact, using data collected at the time the application was at fault.  This strategy negates the need to spend time and money configuring an environment and reproducing the issue in order to perform traditional or "live" debugging.  Instead, the collected information allows us to dive straight into the state of the application and debug the issue as if we had reproduced it ourselves in our own development environment.

There have been two major Visual Studio advancements in historical debugging strategies over the last few years; "Visualised Dump File Analysis" and "IntelliTrace".  Both of these have made debugging production issues for maintenance and development teams simpler than ever.  Do they represent the end of the "No repro" issue?  Well between them, quite possibly and I'll discuss each of them in turn in the following two blogs..