Today I was migrating database files for Biztalk to new storage. I wrote a simple little automated move script which I'll share at a later date. Everything was going peachy. I asked the Biztalk administrator to shutdown the services before I began. I noticed that there was a spid with pending transactions, but since he assured me that he was completely offline I decided to proceed. Every database moved cleanly except for BizTalkMgmtDb. This database was the one with the spid. Since my script rolled back transactions I knew that SQL Server would be trying to rollback the spid's transaction already. Nevertheless, with the Biztalk admin's permission I tried killing the spid. It of course gave me the famous (at least to me):
SPID X: transaction rollback in progress. Estimated rollback completion: 0%.
This was a production environment and I was inside of business hours; so thought that I was going to be stuck. As I studied the problem, I could see that the SPID was clearly waiting on DTC:
SELECT lastwaittype FROM sys.sysprocesses WHERE req_spid = X
lastwaittype
--------------------------------
DTC
Then I remembered that the clustered DTC service on the SQL cluster crashed the previous evening. One of the infrastructure engineers resolved the issue and brought the DTC service back online before I could take a look. I tried restarting the DTC service, in case killing the connection to the SQL Server service would release the spid. Alas, it made no difference for me. I
searched around for a solution but found lots of folks advocating restarting SQL Server. Since I knew better, I kept on researching.
Eventually, I found some
useful material about how the DTC works. I realized that it must be the unit of work below the hood that was keeping the SPID from being rolled back. I used the
DTC Tester Tool to validate the DTC. I could tell that the DTC was no longer accepting connections:
DtcGetTransactionManager failed: 8004d01b DtcGetTransactionManager Call Error: Indicates unable to connect to the transaction manager, or the transaction manager is unavailable.
Finally, I was getting somewhere! This gave me an idea, the spid must be hung because it' DTC transaction is orphaned. If that's the case, then there should be no reason why I couldn't kill it. All I needed was the
unit of work id. It turns out that the solution is pretty simple.
SELECT req_transactionUOW FROM syslockinfo WHERE req_spid = X
req_transactionUOW
--------------------------------
AC6B59D7-A9C8-44DE-A650-CB63B8249D04
The result will be a GUID-like string. Use the result to kill each unit of work like this:
KILL 'AC6B59D7-A9C8-44DE-A650-CB63B8249D04'
That's it, after that the spid that I had previously killed finally died! I hope this tidbit is helpful to someone else in a pinch.