Digital forensics

OSXCollector: Walkthrough

Chiragh Dewan
May 4, 2017 by
Chiragh Dewan

OSXCollector is an open-source forensic evidence and analysis tool for OSX released by Yelp back in 2014. Its GitHub repository can be found at

Built in Python, OSXCollector's script running on the infected machine and it generated the output in a JSON file which contains the description of the target machine. It gathers its information from different sources such as SQLite databases, local file systems, plists, etc.

Learn Digital Forensics

Learn Digital Forensics

Build your skills with hands-on forensics training for computers, mobile devices, networks and more.

Using OSXCollector, a forensic investigator/analyst can shed light on the following questions:

  • Is the machine infected?
  • How did the malware get there?
  • How can this be presented and detect further infection?

To begin, let's start by cloning the project on our local machine:

$ git clone

A file named can be found at osxcollector > It is a single Python file can run without any dependencies on a standard OSX machine. To run this file simply type:

$ sudo osxcollector/

It will take a couple of minutes to run the file and once done, this the output you might come across:

This shows us the file has done its work and an output file by the name of osxcollect-2017_05_03-18_06_35.tar.gz is available with our output. Once extracted the contents, this is what it looks like:

As we can see various logs are present in the output folder along with the JSON file:

Note: It is important to note that Python command uses the default Python interpreter on the OSX machine and is not overridden by Python version installed via brew. OSXCollector relies on a few native Python bindings for OSX libraries which might not be available in other Python versions. To specify a specific Python version, you can use the following command:

$ /usr/bin/python2.7 osxcollector/

Additional parameters can be used with

  • -i INCIDENT_PREFIC/--id=INCIDENT_PREFIX: This sets an identifier which is used as the prefix of the output file. The default value is osxcollect:

$ sudo osxcollector/ -I ChiraghDewan

The output folder created would be named: ChiraghDewan-2017_05_03-18_06_35

  • -p ROOTPATH/--path=ROOTPATH: This sets the path to the root of the filesystem to run collection on. The default value is /. This is helpful when running collection on the image of a disk:

    $ sudo osxcollector/ -p '/mnt/pawned'

  • -s SECTION/--section=SECTION: This parameter is used to run only a portion of the full collection:

$ sudo osxcollector/ -s 'safari' -s 'downloads'

Following is a list of all the sections and sub-sections:

  • version

  • system_info

  • kext

  • startup

    • launch_agents

    • scripting_additions

    • startup_items

    • login_items

  • applications

    • applications

    • install_history

  • quarantines

  • downloads

    • downloads

    • email_downloads

    • old_email_downloads

  • chrome

    • history

    • archived_history

    • cookies

    • login_data

    • top_sites

    • web_data

    • databases

    • local_storage

    • preferences

  • firefox

    • cookies

    • downloads

    • formhistory

    • history

    • signons

    • permissions

    • addons

    • extension

    • content_prefs

    • health_report

    • webapps_store

    • json_files

  • safari

    • downloads

    • history

    • extensions

    • databases

    • local storage

    • extension_files

  • accounts

    • system_admins

    • system_users

    • social_accounts

    • recent_items

  • mail

  • full_hash

  • -c/--collect-cookies: This parameter collects cookies value. By default, it does not dump the value of a cookie as it may contain sensitive information.

  • -l/--collect-local-storage: This parameter collects the value stored in the web browser's local storage. By default, they are not collected as the values may contain sensitive information.

  • -d/--debug: The parameter enables verbose output and Python breakpoints.

Understanding the JSON file

Common Keys

Every line of the JSON file records one piece if information. Some common keys are:

  • osxcollector_incident_id: A unique ID shared by every record

  • osxcollector_section: The section or type of data the record holds

  • osxcollector_subsection: The subsection of the type of data the record hold

File Records

  • ctime: The file creation time

  • mtime: The file modified time

  • file_path: The absolute path to the file

  • md5: MD5 hash of the file contents

  • sha1: SHA1 hash of the file contents

  • sha2: SHA2 hash of the file contents

  • signature_chain: The common names of the certs in the files signing chain

Downloaded File

  • xattr-wherefrom: A list containing the source and referrer URLs for the downloaded file

  • xattr-quarantines: A string describing which application downloaded the file

The keys mentioned above are a few that OSXCollector uses. Few more that are used are for:

  • SQLite Records

  • Timestamps

  • Version section

  • System_info section

  • Kext section

  • Startup section

  • Applications section

  • Quarantines section

  • Downloads Section

  • Chrome section

  • Firefox section

  • Safari section

  • Accounts section

  • Mail section

  • Full Hash section

The detailed keys of the section mentioned above can be found in the README.MD file of the project which can be read after cloning it or on its GitHub repository.

Basic Manual Analysis

Forensic analysis is not an exact science. Some may argue that it may fall somewhere between art and science and because of that, every person that reads the story, sees something different.

Going through the entire JSON can be overwhelming to many. However, we can use few commands to narrow down our search:

  • Timestamps

    $ cat osxcollect-2017_05_03-18_06_35.json | grep '2017-05-03'

  • Browser History

    $ cat osxcollect-2017_05_03-18_06_35.json | grep '2017-05-03' | jq 'select(has("url")) | .url'

Note: The above command also requires jq to be installed. Jq is an open-source JSON processor which is available at:

  • Single User

$ cat osxcollect-2017_05_03-18_06_35.json | jq 'select(.osxcollector_username=="Chiragh")|.'

Using a combination of sections and subsections, more useful commands can be created and used to simplify the process.

Automated Analysis

OSXCollector helps us by automating the task of analyzing the output by using various filters. A package called osxcollector.output_filters.

Its repository can be found at

Unlike, the filters have dependencies do not come pre-installed on a MacOS. The best solution proposed by Yelp is to use Virtualenv.


Run the following commands to install virtualenv:

$ sudo pip install tox virtualenv

$ make venv

$ source virtualenv_run/bin/activate

Different type of filters present

  • Find Domains Filter

  • Find Blacklisted Filter

  • Related Files Filter

  • Chrome History Filter

  • Firefox History Filter

  • Chrome Extensions Filter

  • Firefox Extensions Filter

  • OpenDNS Related Domains Filter

  • OpenDNS Lookup Domains Filter

  • Virus Total Lookup Domains Filter

  • Virus Total Lookup Hashes Filter

  • Virus Total Lookup URLs Filter

  • Shadow Server Lookup Hashes Filter

More details, along with how to use them, can be found at the repository link.

Learn Digital Forensics

Learn Digital Forensics

Build your skills with hands-on forensics training for computers, mobile devices, networks and more.


The biggest downside is that it is only for MacOS. During recent times, the development for OSXCollector has been slow. However, Yelp encourages developers to contribute. All-in-all OSXCollector is a powerful tool that can make any forensic analyst's life easier.

Chiragh Dewan
Chiragh Dewan

A creative problem-solving full-stack web developer with expertise in Information Security Audit, Web Application Audit, Vulnerability Assessment, Penetration Testing/ Ethical Hacking as well as previous experience in Artificial Intelligence, Machine Learning, and Natural Language Processing. He has also been recognised by various companies such as Facebook, Google, Microsoft, PayPal, Netflix, Blackberry, etc for reporting various security vulnerabilities. He has also given various talks on Artificial Intelligence and Cyber Security including at an TEDx event.