A couple of months ago, I lost my precious Google Chrome browser (running on Windows 7, 64bit): All of the sudden (not so much as I will find out later), Chrome was not able to establish secure connections anymore.
The symptoms where a little bit unclear: When I would start the browser, I was able to reach secure sites (with the
https:// prefix, port 443) for about 30 seconds. Afterwards, it would always get stuck in the "Establishing secure connection" stage for new connections, until
ERR_CONNECTION_TIMED_OUT. The connections that were established in these first moments seemed to work for the rest of the session and usual
http traffic (TCP port 80) was not affected.
Researching solutions on the internet yielded a lot of open questions and problems, including obvious solutions as checking firewall settings, reinstalling Chrome, clearing the ssl cache or disabling TLS 1.2 (which is not possible in anymore in current Chrome versions). All of these solutions did nothing for me.
Finding and Resolving the Issue
So, I monitored my network connections with packet capture software (Wireshark). First everything went as expected. But even when the secure connections started to fail, I could clearly see that the handshake completed (SSL
Client Key Exchange,
New Session Ticket), but there was silence after the handshake until Chrome finished and reset the connection 20 seconds later (TCP
FIN followed by
RST flag). Comparing that behavior to Firefox (which still worked; for comparison, I also enabled TLS 1.2), it was clear that it was Chrome's turn to send data.
The fact that the connections which were established in the first seconds worked for an entire session indicated that some kind of data would be cached, probably certificates. Additionally, because the amount of websites that I could load was not limited by a certain number, but a certain time period, it was quite obvious that my connections were victims of a race condition.
So I fired up Sysinternals Process Monitor, filtering only events from the
After a few tries, the experiment was set up and ready to capture Chrome’s faulty behavior. I started Chrome and navigated to a lot off different web pages, pinpointing the exact time of failure. Using Wireshark and Process Monitor, I found out that at the time of the failure, Chrome just finished reading a lot of certificate files (having to do with different Certificate Authorities and revocation lists). There were especially a lot of accesses to the certificate stores
subjects.sst (located at
Research on the internet showed that these files were remnants of an old COMODO Dragon installation that I got tricked into earlier this year and that had not finished correctly. If one tries to delete them, they are recreated moments after by the DLL
certsentry.dll (located at
regsvr32 /u certsentry.dll and Unlocker, I was able to unregister and delete the files making a deletion of the certificate stores permanent. After a couple of reboots, Chrome worked again as intended.
But I did not stop here. To further investigate the issue, I kept copies of the DLL files and the certificate lists. Using Dependency Walker, I was able to confirm that there were dependencies on
cryptui.dll. The certificate stores could be opened with
certmgr and included certificates which I had used during my tests (e.g. Google, LastPass, AdBlockPlus, akamai [facebook]). They were still valid though.
Explanation and Conclusion
Earlier this year, I had to manually cancel a COMODO setup procedure and remove all of its remnants. During that procedure, I missed the DLL files
certsentry.dll in my system folder which were still being hooked by Microsoft Windows cryptography modules, creating the files
subjects.sst in my user data folder.
When Google Chrome is started, the user can immediately start browsing while the certificate stores are loaded. This leads to a race condition. Once all files have finished loading, new SSL connections also fail after completing the handshake.
There are a couple things that Chrome and COMODO could learn from these situations: First of all, the asynchronous loading of security files could also be very dangerous since the user does not seem to be protected by the COMODO mechanism in the first seconds. Secondly, there should be a reasonable error message informing the user of SSL errors instead of a time out exception.
For me, I am very satisfied with the result, especially since I have a game jam coming up where I wanted to make use of HTML5 technology, which is faster and more stable in Chrome than it is in Firefox.