With my recent move to HTTPS I wasn’t sure if there were any pages left on my site that had Mixed Content or not.
If an HTTPS page includes content retrieved through regular, cleartext HTTP, then the connection is only partially encrypted. […] When a webpage exhibits this behavior, it is called a mixed content page. (src)
As modern browsers block most Mixed Content from being downloaded this may leave your HTTPS-enabled website broken.
To check this I wrote a little PHP CLI app to scan an HTTPS website for Mixed Content. The script starts crawling at a given URL, and processes the page:
- All contained
img[src]
,iframe[src]
,script[src]
, andlink[href][rel="stylesheet"]
elements are checked for being Mixed Content or not. - All contained
a[href]
elements linking to the same or a deeper level are successively crawled and scanned for Mixed Content.
The script itself will start scanning and give feedback whilst running. When Mixed Content is found, the URLs will be shown on screen:
Scanning https://www.bram.us/
[2014-12-10 15:38:31] 00000 - https://www.bram.us/
[2014-12-10 15:38:32] 00001 - https://www.bram.us/projects/
[2014-12-10 15:38:33] 00002 - https://www.bram.us/projects/mint-custom-title/
[2014-12-10 15:38:33] 00003 - https://www.bram.us/projects/bramusicq/
[2014-12-10 15:38:33] 00004 - https://www.bram.us/projects/gm_bramus/
[2014-12-10 15:38:34] 00005 - https://www.bram.us/projects/js_bramus/
[2014-12-10 15:38:34] 00006 - https://www.bram.us/projects/js_bramus/jsprogressbarhandler/
[2014-12-10 15:38:36] 00007 - https://www.bram.us/projects/js_bramus/lazierload/
[2014-12-10 15:38:37] 00008 - https://www.bram.us/projects/the-box-office/
[2014-12-10 15:38:37] 00009 - https://www.bram.us/projects/tinymce-plugins/
[2014-12-10 15:38:38] 00010 - https://www.bram.us/projects/tinymce-plugins/tinymce-classes-and-ids-plugin-bramus_cssextras/
[2014-12-10 15:38:38] 00011 - https://www.bram.us/projects/flashlightboxinjector/
[2014-12-10 15:38:40] 00012 - https://www.bram.us/contact/
[2014-12-10 15:38:40] 00013 - https://www.bram.us/2014/12/09/youtube-rewind-2014/
[2014-12-10 15:38:41] 00014 - https://www.bram.us/2014/12/09/6-billion-tweets/
[2014-12-10 15:38:41] 00015 - https://www.bram.us/2014/12/09/little-dragon-underbart/
[2014-12-10 15:38:41] 00016 - https://www.bram.us/2014/12/09/yik-yak-messaging-app-vulnerability/
[2014-12-10 15:38:42] 00017 - https://www.bram.us/2014/11/13/https-everywhere/
[2014-12-10 15:38:42] 00018 - https://www.bram.us/2014/12/09/the-state-of-javascript-in-2015/
[2014-12-10 15:38:43] 00019 - https://www.bram.us/2013/06/27/the-franticness-of-working-in-the-web-business/
[2014-12-10 15:38:43] 00020 - https://www.bram.us/2014/12/09/crossbeat-uprising/
[2014-12-10 15:38:44] 00021 - https://www.bram.us/2014/12/09/its-all-about-time-timing-attacks-in-php/
...
[2014-12-10 15:38:56] 00050 - https://www.bram.us/2008/11/10/jsprogressbarhandler-033/
[2014-12-10 15:38:56] 00051 - https://www.bram.us/demo/projects/lazierload/
- http://farm2.static.flickr.com/1212/1285026452_0aeb38b6e6.jpg
- http://farm2.static.flickr.com/1074/1273115418_a77357040a.jpg
- http://farm2.static.flickr.com/1096/1273106588_91f7a736c6.jpg
- http://farm2.static.flickr.com/1324/1216309045_31ca82f9d9.jpg
- http://farm2.static.flickr.com/1262/1217169586_e4b2bfa7df.jpg
- http://farm2.static.flickr.com/1149/1216304291_63fd48d9c4.jpg
- http://farm2.static.flickr.com/1366/1216301505_51b3c590ff.jpg
- http://farm2.static.flickr.com/1184/1216299847_c57975bed2.jpg
- http://farm2.static.flickr.com/1085/1217158084_a9b059d25b.jpg
- http://farm2.static.flickr.com/1040/1216293529_3b7c044815.jpg
- http://farm2.static.flickr.com/1029/1084232736_5b8c023f46.jpg
- http://farm2.static.flickr.com/1318/1043062251_17071a8cc7.jpg
- http://farm2.static.flickr.com/1221/1043059543_05713e6156.jpg
- http://www.google-analytics.com/urchin.js
[2014-12-10 15:38:57] 00052 - https://www.bram.us/wordpress/wp-content/uploads/2008/02/lazierload_04.zip
[2014-12-10 15:38:57] 00053 - https://www.bram.us/wordpress/wp-content/uploads/2008/02/lazierload_03.zip
[2014-12-10 15:38:57] 00054 - https://www.bram.us/wordpress/wp-content/uploads/2007/09/lazierload_02.zip
[2014-12-10 15:38:57] 00055 - https://www.bram.us/2011/09/30/css-regions-and-css-exclusions/
[2014-12-10 15:38:57] 00056 - https://www.bram.us/2014/06/04/good-looking-shapes-gallery/
...
Invoke the script as such:
$ php bin/scanner.php https://www.bram.us/
To speed things up it’s also possible to define a set of ignore patterns. The default ignore patterns defined are those for a WordPress installation:
return [
'^{$rootUrl}/page/(\d+)/$', // Paginated Overview Links
// '^{$rootUrl}/(\d+)/(\d+)/', // Single Post Links
'^{$rootUrl}/tag/', // Tag Overview Links
'^{$rootUrl}/author/', // Author Overview Links
'^{$rootUrl}/category/', // Category Overview Links
'^{$rootUrl}/(\d+)/(\d+)/$', // Monthly Overview Links
'^{$rootUrl}/(\d+)/$', // Year Overview Links
'^{$rootUrl}/comment-subscriptions', // Comment Subscription Link
'^{$rootUrl}/(.*)?wp\-(.*)\.php', // WordPress Core File Links
'^{$rootUrl}/archive/', // Archive Links
'\?replytocom\=', // Replyto Links
];
The {$rootUrl}
token in each pattern will be replaced with the (root) URL passed into the script.
Special thanks go out to Mathias Bynens for making a few suggestions and additions to Mixed Content Scan.
Consider donating.
I don’t run ads on my blog nor do I do this for profit. A donation however would always put a smile on my face though. Thanks!
Leave a comment