All Posts

New Chromebook

Hello Readers, I just brought home a shiny new Samsung Chromebook. It is silver colored, so totally infringing on Apple’s design. I am pretty impressed with the weight and size. The first bad thing about it is how weak the terminal is by default. I will have to check out the developer mode terminal but so far Crosh is about as powerful as the default shell on MS-DOS. I found a really cool chrome app called Postman that makes REST API calls and then post processes them to be more pleasant.

NDSEG fellowship

Hello blogosphere, I have good news today. A few months ago I applied for the National Defense Science and Engineering Graduate Fellowship NDSEG. Today I found out that I was one of the people fortunate enough to receive the fellowship. According to their website approximately 10% of applications are accepted. It will provide me with some more funding for my graduate studies. Thanks US government! James Some other good fellowship opportunities are NSF-GRFP and the National Physical Sciences Consortium NPSC fellowship.

GT students state of origin, the difficulty of counting humans

@GATECH_ENGINEERS posted on twitter yesterday that most Georgia Tech grads outside of Georgia are from FLA,TX or CA. As my fellow data people, you know that any measurement of human activity will be biased towards regions with lots of people. Such as the fact that most tweets in America are from FL, TX, CA or NY metro. So I took the liberty of dividing their numbers by the state populations and we see a different perspective on GaTech students.

Python for Data analysis

Part 1 Python for Data Analysis I am reading this book and it is really good. Everyone who wants to do Data analysis should read this book and consider using these tools. It presents NumPy and SciPy for numeric and vectorized operations, matplotlib for fast and programmatic plotting, and Pandas for a robust Data structure framework. It also goes over some data formats and tools for parsing them. Part 2 I have read and thoroughly enjoyed the book by Wes McKinney.

First Semester of Grad School

Friday concluded my first semester at Georgia Tech, and it went well. I took two classes and learned a lot. Machine Learning 1 with Le SongĀ and Massive Graph Analysis with my adviser David Bader. My course projects both taught me a lot. The MGA course project was a lot of nitty gritty C coding with OpenMP. This taught me a lot about writing and debugging parallel code. Because of this Rob McColl and I are starting to think about higher level interfaces to STINGER.

Data Mining Workflows with databases code{white-space: pre-wrap;} span.smallcaps{font-variant: small-caps;} span.underline{text-decoration: underline;} div.column{display: inline-block; vertical-align: top; width: 50%;} a.sourceLine { display: inline-block; line-height: 1.25; } a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; } a.sourceLine:empty { height: 1.2em; position: absolute; } .sourceCode { overflow: visible; } code.sourceCode { white-space: pre; position: relative; } div.sourceCode { margin: 1em 0; } pre.sourceCode { margin: 0; } @media screen { div.sourceCode { overflow: auto; } } @media print { code.

Data Mining Workflows with databases

Machine learning is the study of algorithm that mine large databases for insight. It is therefore surprising that machine learning research is conducted without using databases. All machine learning practitioners, and anyone who has spent an afternoon munging CSV files can appreciate the utility of storing data in a structure database. Why is it that most machine learning research is conducted with scripts that generate CSV files and plots? I recently worked on an entirely self contained ML project where the goal was to take some text documents, build models to classify that text, evaluate those models, and draw some conclusions.