Reflections on Julia

 
julia.png
 

Julia is a new language that could become the goto choice for scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing.  It uses LLVM-based just-in-time (JIT) compilation, has the speed of C and the dynamism of Ruby.  

Contributors of Julia wrote a manifesto to explain their motivation for creating yet another programming language.  Jeff Bezanson, Stefan Karpinski, Viral Shah and Alan Edelman highlight Python's annoying dependencies, JVM's unnecessary overhead, and the debugging pain of distributed systems like Hadoop as just of few of the reasons why Julia exists.

Julia holds a lot of promise because of a few fundamental design choices:

  • Almost everything in Julia is written in Julia.  This will get us out of the C/C++ and Fortran dependency-hell of scikit-learn.
  • Type system makes it possible to rapidly experiment and iterate on data science problems.  The documentation claims that, "Julia’s type system is designed to be powerful and expressive, yet clear, intuitive and unobtrusive."  This is in fact the case.  For example if we build a Hidden Markov Model and our initial attempt was to treat all hidden states as Gaussian distributions, and now we want to try out Exponential, we won't need to refactor the HMM code.  If HMM was designed correctly and references the Distributions type, either Normal or Exponential can be used. 
  • Our limited testing suggests that identically constructed code often will run 2-3 times the speed of Python
  • Github is used for tracking all the Julia source code and for installing packages. Goodbye PyPi and Maven repos!
  • Julia supports metaprogramming.  This makes it possible for a program to transform and generate its own code, resulting in a new level of flexibility and powerful reflection capabilities.  

What's Missing?

  • Pandas is significantly more mature than Julia DataFrames.
  • For NLP problems, Python is still a better choice. TextAnalysis.jl is very basic.
  • John Myles White points out some challenges with the current Julia stats functionality that will be improved in v0.4.
  • Julia community is still small (but hopefully growing).

Getting Started on OS X

Download and install Anaconda (only if you want to run Julia in IPython Notebook)

Download and install Julia

Mac OS X Package (.dmg) contains  Julia.app.  Drag Julia icon to Applications.

sudo ln -s /Applications/Julia-0.3.6.app/Contents/Resources/julia/bin/julia /usr/bin/julia

julia in terminal (you should see the beautiful ascii version of the logo)

Pkg.add("IJulia")

Pkg.add("Gadfly")

Start IPython Notebook with a Julia profile (in terminal)

ipython notebook --profile julia

Useful Packages

Gadfly.jl - plotting and data visualization package that conveniently installs most of the frequently used packages like DataFrames, Iterators, Distributions,  etc.

Cairo.jl - Cairo graphics library used among other things to render PDFs from Gadfly charts

DecisionTree.jlClustering.jlMultivariateStats.jl - stats / machine learning tools

DSP.jl - provides a number of common Digital Signal Processing (DSP) routines

Graph.jl - provides graph types and algorithms like centrality, connected components, cycle detection, etc.

Mocha.jl - deep learning framework inspired by the C++ framework Caffe

Optim.jl - basic optimization algorithms in pure Julia

Morsel.jl - a Sinatra-like micro framework for declaring routes and handling requests. It is built on top of HttpServer.jl and Meddle.jl.

PyCall.jl - if all else fails, call some Python library

JavaCall.jl - reuse the millions of lines of Java code that's out there

~500 more packages

If you find a package that isn't registered you can install it by:

Pkg.clone("git://github.com/path/to/Package.jl.git")

To update packages:

Pkg.update() #for all packages
Pkg.update("DSP")

Examples and Tutorials

Introduction to Julia tutorial at SciPy 2014

YouTube videos

Implementing Digital Filters in Julia

Videos from the Julia tutorial at MIT

Learn Bayes Theorem with Julia

Data Analysis in Julia with Data Frames

Is it Ready for Production?

Yes!  We run Julia against massive volumes of data and process tens of thousands of transactions per second.  We have successfully deployed Julia for graph analytics, non-parametric probability density functions, graphical models, DSP problems, etc.

We also use Julia in our fellowship.  While we encourage fellows to check out Julia, we certainly do not insist on using it for every problem.    

Getting Answers to Questions

The julia-users mailing list is for discussion around the usage of Julia.

JuliaCon 2015 will be held at the MIT Stata Center June 24 - June 28.