All Posts

Digital Witness

My colleagues Nigel Campbell, Evan Stuart, Trevor Goodyear, Winston Messer, and I are happy to present our platform for remote evidence collection from volunteers. Digital Witness is an open source platform for evidence submission. Volunteers can upload media files from their phones while reducing the privacy invasion necessary with full disk capture currently used by law enforcement officers.


The Office of the Director of National Intelligence sponsored a challenge competition. I lead a team at GTRI including Natalie Fitch, and Frank Bradfield to win a prize for our entry “Credibility Development with Knowledge Graphs.”

Challenge Website XAMINE

Professional Education and Open Source

Open source software is stymied by a lack of funds for maintinance tasks, but companies aren’t coughing up charity money to pay open source developers. Open Source in the Enterprise How do we generate the funds to fund development on open source code? Support and services contracts like RedHat Enterprise Version licenses like Mongo or Neo4j which Gil Yehuda thinks are problematic. The Backwards Commerial License proposed by hueniverse Another problem in open source is that enterprise software vendors have a large incentive to make their software as sticky as possible and lock-in their clients.

Migrating From a WordPress backup

I have been using hugo to generate this site for quite a while now and I really like it[^1]. But I needed to migrate my old wordpress blog into a static framework. Fortunately, WP practices ethical software development and makes it easy to get your data out of their software. You can get a MySQL database dump, which is useful for migrating from one hosting provider to another, or get a dump in the form of a json array containing all your posts.

Running Julia on Slurm Cluster

Here is a simple example of how to run a julia script on a SLURM cluter. If you want to run a julia script with multiple workers, you need to allocate some nodes and then have the ClusterManager use srun to get those nodes to run julia. See the script for an example. #!/usr/bin/sh # start an allocation with 4 nodes 2 cpus per node and run the sbatch script which will start multiple julia process in a Julia Cluster.

Principles of Computational Science Project Management

I’ve done a lot of computational experiments in my days and have slowly and diligently ascended a slope of experimental mismanagement. Every project starts with an empty directory and attempts to build towards a paper, report, or presentation. Here are my thoughts on the right way to do it. Decide on a process up front and follow it When you start a project it is always just a few files, just a few scripts, and just a few collaborators.

Introducing the Query Garden

This week I presented at the annual Observation Health Data Sciences and Informatics symposium. I presented a poster describing some software that I wrote to enable a health data science application for personalized medicine using individual patient level predicitons. The community was really great at the conference and I really enjoyed talking to everyone about what we are doing with OHDSI. Abstract Many software packages within the OHDSI ecosystem rely on SQL query generation, which is fraught with security risks and compatibility issues.

Golang and Julia: Frenemies?

Two modern languages, two ends of spectrum As Go approaches its second version, and Julia approaches version 1.0 the differences between Julia and Go spring to the front of my mind. I talked to a lot of people at JuliaCon and was surprised to find that almost no one had used the Go programming language for any serious work. Julia was invented in 2012 so it no surprise that everyone had programming experience in another language.

Email Topics with NMF

This is an example of applying Non-negative Matrix Factorization a corpus of emails and to extract topic structure. We use the email currently stored on my desktop and find patterns of email. It turns out that most of my email is sent by machines!

Two language problem

Last week I hit the two language problem hard I am studying convergence criteria for spectral partitioning and this involves using eigensolvers. One benefit of my line of research is that it doesn’t a complete rewrite of the solvers or a new factorization method. I had previously done some experiments with the power method which is both easy to analyze on paper and easy to implement in software. Thus I had some python code for the power method lying around.