Take apart webarchive files with Nu

Wednesday, 10 Sep 2008

Ever wonder what’s inside those .webarchive files that are saved by Safari? It turns out that they are just property lists, a common representation that Apple uses for externalizing structured data. Here’s a short Nu script that opens a saved webarchive file and prints its contents.

#!/usr/local/bin/nush

(import Cocoa)

(set webarchive (NSData dataWithContentsOfFile:"ProgrammingNu.webarchive"))
(set propertylist (NSPropertyListSerialization
  propertyListFromData:webarchive
  mutabilityOption:0
  format:NSPropertyListBinaryFormat_v1_0
  errorDescription:nil))

(set WebMainResource (propertylist "WebMainResource"))
(puts "--- Main Resource ---")
(set line (+ "0. " (WebMainResource "WebResourceURL")
             " ("  (WebMainResource "WebResourceMIMEType")
             ")"))
(if (set encoding (WebMainResource "WebResourceTextEncodingName"))
    (set line (+ line " (" encoding ")")))
(puts line)

;; Resource keys include the following:
;WebResourceURL,
;WebResourceTextEncodingName,
;WebResourceMIMEType,
;WebResourceData,
;WebResourceFrameName

(set WebSubresources (propertylist "WebSubresources"))
(puts "--- Subresources (#{(WebSubresources count)}) ---")
(WebSubresources eachWithIndex:
     (do (resource index)
         (set line (+ "#{(+ 1 index)}. " (resource "WebResourceURL")
                      " ("  (resource "WebResourceMIMEType") ")"))
         (if (set encoding (resource "WebResourceTextEncodingName"))
             (set line (+ line " (" encoding ")")))
         (puts line)))
Here’s what I get when I run it on the Programming Nu website:
--- Main Resource ---
0. http://programming.nu/ (text/html) (UTF-8)
--- Subresources (5) ---
1. http://programming.nu/stylesheets/nu.css (text/css)
2. http://programming.nu/files/recycle-s.png (image/png)
3. http://programming.nu/files/nupp.png (image/png)
4. http://programming.nu/files/masyu-solved.png (image/png)
5. http://programming.nu/files/ohloh_profile.png (image/png)

Data for individual objects is available using the WebResourceData key.

Comments (0) post a reply