Recent Publications

Here are some of my recent publications for a complete listing you can find All Publications or All Conference Publications and All Journal Publications.

News media biases and propaganda are a problem for modern societies, reliance on the internet as a primary news source enables the formation of hyper-partisan echo chambers and an industry where outlets benefit from purveying fake news. While modeling text content of articles is sufficient to identify bias, it is not capable of determining credibility. A structural model based on web links outperforms text models for fake news detection.
WSDM-2018

This paper shows that Julia provides sufficient performance to bridge the performance gap between productivity-oriented languages and low-level languages for complex memory intensive computation tasks such as graph traversal. We provide performance guidelines for using complex low-level data structures in high productivity languages and present the first parallel integration on the productivity-oriented language side for graph analysis. Performance on the Graph500 benchmark demonstrates that the Julia implementation is competitive with the native C/OpenMP implementation.
HPEC-2017

Graphs and networks are prevalent in modeling relational datasets from many fields of research. By using iterative solvers to approximate graph measures (specifically Katz Centrality), we can obtain a ranking vector consisting of a number for each vertex in the graph identifying its relative importance. We use the residual to accurately estimate how much of the ranking from an approximate solution matches the ranking given by the exact solution. Using probabilistic matrix norms and applying numerical analysis to the computation of Katz Centrality, we obtain bounds on the accuracy of the approximation compared to the exact solution with respect to the highly ranked nodes. This relates the numerical accuracy of the linear solver to the data analysis accuracy of finding the correct ranking. In particular, we answer the question of which pairwise rankings are reliable given an approximate solution to the linear system. Experiments on many real-world networks up to several million vertices and several hundred million edges validate our theory and show that we are able to accurately estimate large portions of the approximation. By analyzing convergence error, we develop confidence in the ranking schemes of data mining.
ISC-2017

Increasing volumes of data and the desire for real-time query capability make the development of efficient streaming algorithms for data analytics valuable. Streaming graph algorithms that avoid unnecessary recomputation through clever application of data dependency analysis are often more complex to derive than their static counterparts. This paper discusses a method to derive algorithms for streaming graph analysis from static formulations Combining tuned graph algorithms building blocks with an appropriate functional language, a graph query planner should be able to correctly implement most static and streaming versions of an algorithm from a single mathematical formulation. We provide a detailed analysis for the case of updating triangle counts in a streaming graph using linear algebra and an experimental evaluation in Julia.
IPDPS-2017

Many common methods for data analysis rely on linear algebra. We provide new results connecting data analysis error to numerical accuracy in the context of spectral graph partitioning. We provide pointwise convergence guarantees so that spectral blends (linear combinations of eigenvectors) can be employed to solve data analysis problems with confidence in their accuracy. We apply this theory to an accessible model problem, the ring of cliques, by deriving the relevant eigenpairs and finding necessary and sufficient solver tolerances. Analysis of the ring of cliques provides an upper bound on eigensolver tolerances for graph partitioning problems. These results bridge the gap between linear algebra based data analysis methods and the convergence theory of iterative approximation methods. These results explain how the combinatorial structure of a problem can be recovered much faster than numerically accurate solutions to the associated linear algebra problem.
Compl. Networks

Recent & Upcoming Talks

All Presentations

Recent Posts

More Posts

I have been using hugo to generate this site for quite a while now and I really like it[^1]. But I needed to migrate my old wordpress blog into a static framework. Fortunately, WP practices ethical software development and makes it easy to get your data out of their software. You can get a MySQL database dump, which is useful for migrating from one hosting provider to another, or get a dump in the form of a json array containing all your posts.

CONTINUE READING

Here is a simple example of how to run a julia script on a SLURM cluter. If you want to run a julia script with multiple workers, you need to allocate some nodes and then have the ClusterManager use srun to get those nodes to run julia. See the main.sh script for an example. Main.sh #!/usr/bin/sh # start an allocation with 4 nodes 2 cpus per node and run the sbatch script which will start multiple julia process in a Julia Cluster.

CONTINUE READING

I’ve done a lot of computational experiments in my days and have slowly and diligently ascended a slope of experimental mismanagement. Every project starts with an empty directory and attempts to build towards a paper, report, or presentation. Here are my thoughts on the right way to do it. Decide on a process up front and follow it When you start a project it is always just a few files, just a few scripts, and just a few collaborators.

CONTINUE READING

This week I presented at the annual Observation Health Data Sciences and Informatics symposium. I presented a poster describing some software that I wrote to enable a health data science application for personalized medicine using individual patient level predicitons. The community was really great at the conference and I really enjoyed talking to everyone about what we are doing with OHDSI. Abstract Many software packages within the OHDSI ecosystem rely on SQL query generation, which is fraught with security risks and compatibility issues.

CONTINUE READING

Two modern languages, two ends of spectrum As Go approaches its second version, and Julia approaches version 1.0 the differences between Julia and Go spring to the front of my mind. I talked to a lot of people at JuliaCon and was surprised to find that almost no one had used the Go programming language for any serious work. Julia was invented in 2012 so it no surprise that everyone had programming experience in another language.

CONTINUE READING

More Posts

Research

Software and Data

Julia Graphs

JuliaGraphs is the primary organization dedicated to the advancement of graph theory and algorithms in the Julia Programming language. The flagship project is LightGraphs.jl the premier graph library in the Julia Ecosystem.

Stinger Graph Analysis

Dynamic graphs are all around us. Social networks containing interpersonal relationships and communication patterns. Information on the Internet, Wikipedia, and other datasources. Disease spread networks and bioinformatics problems. Business intelligence and consumer behavior. The right software can help to understand the structure and membership of these networks and many others as they change at speeds of thousands to millions of updates per second.

Contact

  • https://www.google.com/recaptcha/mailhide/d?k=01Pam2gQXRzPV0FfCqnFgrjw==&c=Vpo3xDQlWVcGfG4zf2BluSZ3z10xitKMiLF_NLmw9gs=
  • 75 5th Street NW, Atlanta, GA
  • By appt only.