Take apart webarchive files with Nu
Ever wonder what’s inside those .webarchive files that are saved by Safari? It turns out that they are just property lists, a common representation that Apple uses for externalizing structured data. Here’s a short Nu script that opens a saved webarchive file and prints its contents.
#!/usr/local/bin/nush
(import Cocoa)
(set webarchive (NSData dataWithContentsOfFile:"ProgrammingNu.webarchive"))
(set propertylist (NSPropertyListSerialization
propertyListFromData:webarchive
mutabilityOption:0
format:NSPropertyListBinaryFormat_v1_0
errorDescription:nil))
(set WebMainResource (propertylist "WebMainResource"))
(puts "--- Main Resource ---")
(set line (+ "0. " (WebMainResource "WebResourceURL")
" (" (WebMainResource "WebResourceMIMEType")
")"))
(if (set encoding (WebMainResource "WebResourceTextEncodingName"))
(set line (+ line " (" encoding ")")))
(puts line)
;; Resource keys include the following:
;WebResourceURL,
;WebResourceTextEncodingName,
;WebResourceMIMEType,
;WebResourceData,
;WebResourceFrameName
(set WebSubresources (propertylist "WebSubresources"))
(puts "--- Subresources (#{(WebSubresources count)}) ---")
(WebSubresources eachWithIndex:
(do (resource index)
(set line (+ "#{(+ 1 index)}. " (resource "WebResourceURL")
" (" (resource "WebResourceMIMEType") ")"))
(if (set encoding (resource "WebResourceTextEncodingName"))
(set line (+ line " (" encoding ")")))
(puts line)))
Here’s what I get when I run it on the Programming Nu website:
--- Main Resource --- 0. http://programming.nu/ (text/html) (UTF-8) --- Subresources (5) --- 1. http://programming.nu/stylesheets/nu.css (text/css) 2. http://programming.nu/files/recycle-s.png (image/png) 3. http://programming.nu/files/nupp.png (image/png) 4. http://programming.nu/files/masyu-solved.png (image/png) 5. http://programming.nu/files/ohloh_profile.png (image/png)
Data for individual objects is available using the WebResourceData key.


Comments (0) post a reply