Tuesday, August 16, 2011

The Importance of Crash Reporting

One of the most useful feature that I implemented into Multi Commander was that if it crashes it will automatically upload the crash dump to my server.

Of all the crash report I received, I never received a mail or forum post from anyone that reported that it crashed when they did X. If Multi Commander did not send a crash report, there would be a whole bounce of bugs that I would not have been informed about and that would not have been fixed.

Having automatic crash reporting is probably the most important feature you can have. Without it you would not find all the weird problem that user can be affected by.


Microsoft WER service
Microsoft offer developers to use the built in crash report mechanism in Windows by signing up to Windows Error Reporting (WER). You then gets access to the crash reports via a Microsoft site. This site is very good and provide a lot of information and statistics. The problem is that your product must be digital signed using a verisign certificate. And this cost $500 per year (last time I checked).
So for a shareware or Freeware product as Multi Commander is, This is not an option.


Building Your Own
Since using WER was not an option for me, I decided to build my own crash dump system.

I use a modified version of the XCrashReport by Hans Dietrich. I also changed it so that instead for relaying on email. It would will send the crash report by uploading it to my server using HTTP POST. Because a lot of ISPs and AntiVirus softwares will block smtp traffic. And it is also easier to create a server side service that will automatically handle the incoming crash dump.

Then on the server site I have written some php code that copies the crash report and places it in a special folder. Then it will send a mail to me saying that a crash report was received and for was version of Multi Commander it is for.

I then have to run a script for windbg manually thats opens the crash dump and that does some initial crash analyze. Since all my symbol files are indexed,windbg will checkout the correct version of the source code automatically and in most cases even show where and what line of code that the crash happen on.

Automate It
But that script must be run manually, And since 90% of all crash dumps are on issues I have already been informed about, it can be a bit boring to investigate them. just to find that I already fixed that problem. It would be great if the initial analyze could be automated.

And soon that will be possible. I'm working on a tool that will do automatic initial analyze of crash dumps.
It will wait for new dump files to be received. When a new dump file is received it will automatic run basic crash analyze on it, and the output from this is save into a file that as stored with the dump file.

It will also do some automatic scanning of the analyzed result, and in a database file for all of the crash dumps it will store information about what module that crashed, bucket id and a couple of other things. It will also recognize if this is a crash we had before. And when all this is done, a mail is sent to me.

Now I do not have do any analyze crash dump for problem that are already fixed. And when I get the mail about a new crash dump, basic crash dump analyzing has already been done and In most cases you can find the problem with just that. This has saved me a lot of time.

This tool is very generic and I will make it available soon.

2 comments:

  1. Do you have any problems with spam? I've been looking into a way to do crash reporting on an Open source project, but as someone very conscious of security, I'm not sure how to do it without the possibility of someone just spamming the dump reporting server with crash reports (fake or real). I guess this is an unlikely event, but I'm still wondering if there's an easy way to prevent it.

    ReplyDelete
  2. Yes that is a very likely scenario. I have not had any problems with that.

    You cannot remove the risk of abuse completely but it is possible to minimize it. (Making it harder to abuse.)

    For example the application that send the crash report can set some special header that that you verify exists before starting to receive the data.
    You can make the server not accepting a Crashreports if it already got one from the same IP less than X seconds or minutes ago.

    And when you have received some data that should be a crashreport, you do some sanity checking of it to see if it looks to be valid.

    However, if someone really want to abuse it they can.

    But the gain of having a program that can automatically send a crash reports is much higher than the risk of abuse.
    Having to depend on users to manually mail in the crash reports is not a good idea. Only the really committed users are going to do it. And you will miss 9/10 crashes.

    ReplyDelete