Breakthroughs in data capture, genome sequencing, medical imaging, and other fields of biological research, coupled with the ubiquity of cheap digital storage, have paved the way for massive amounts of biological data. Research suggests that by the year 2025, between 2 and 40 exabytes of human genomic data alone will be collected every year. Unfortunately, most mainstream software can’t process data on that scale efficiently, which leaves these troves of data underutilized.
Julia provides a wide variety of facilities for using and processing data effectively. It can efficiently store data structures in memory for quick access, but when datasets are too large to fit into memory, Julia can employ memory mapping of data files stored on disk. This allows for fast and efficient processing, even when memory is limited.
Data are often messy and complex. Julia doesn’t take a one-size-fits-all approach to data structures; instead, it provides a sophisticated yet easy to use system, where users can employ whichever structure most efficiently and sensibly stores their data. Users are not forced to choose from among a strict limited set of data types. When no existing data type fits the bill, users can create their own types and define any set of operations for them. This kind of extensibility and flexibility is at the heart of Julia.
Here is s glimpse of how Julia is solving complex use cases in the field of Life Sciences.
Modeling Cancer Evolution Cancer Genomics, Source: JuliaComputing.com
Researchers predominantly study the growth of tumors to interpret cancer genomes. A team of researches in UK tapped into Julia to run these tumor growth simulations. Julia not only offers them fast and easy ways to run these simulations, but it also has a vibrant community contributing to projects like BioJulia assisting these researches in taking their studies forward.
Augmedics - Medical Imaging Medical Imaging, Source: JuliaComputing.com
Augmedics, a medical tech firm is using Julia to track and render images in real time to build 3 Dimensional images of their patients’ anatomy, an alternative equivalent of X-ray vision.
Diabetic Retinopathy Medical Diagnosis, Source: JuliaComputing.com
Diabetic retinopathy is an eye disease that affects more than 126 million diabetics and accounts for more than 5% of blindness cases worldwide. Timely screening and diagnosis can help prevent vision loss for millions of diabetics worldwide. IBM and Julia Computing analyzed eye fundus images provided by Drishti Eye Hospitals, and a built a deep learning solution that provides eye diagnosis and care to thousands of rural Indians.
Modern systems biology and systems pharmacology, the leading scientific disciplines for biological prediction, make heavy use of ordinary, stochastic, delay, discrete, and partial differential equations. These domains require efficient solvers as simulations can be very computationally expensive. A direct feature comparison to solver suites in other languages shows that Julia’s DifferentialEquations.jl is a leader in the field for differential equation solver software.
Julia’s flexibility also means that researchers in the field directly have congregated in the JuliaDiffEq organization to implement the newest algorithms in Julia, including algorithms which were shown to be 12 to 10^6 times more efficient on stochastic biological models than the standard methods found in other libraries.
Genome sequencing produces massive quantities of data – the human genome consists of over 3 billion nucleotides. However, it can be stored as just a few thousand runs using run-length encoding. This functionality in Julia was developed by pharmaceutical scientists who helped to create a package called RLEVectors.jl. This package facilitates vector storage in a memory-efficient manner using run-length encoding. In benchmarks, RLEVectors.jl is shown to be 1,000 to 65,000 times faster than similar functionality from the R BioConductor package, as can be seen in this comparative graphic:
As the scale of data increases, so must the scale of computation. Many problems in the life sciences lend themselves particularly well to parallel processing, such as the analysis of single nucleotide polymorphisms in genome-wide association studies and simulating disease outbreaks in epidemiological models based on individuals. Julia was built with effortless parallelism in mind, be it on a single multicore machine, a supercomputing cluster, or in the cloud.
Regardless of where you run Julia, well-written Julia code is fast, even in serial. In benchmarks, its performance approaches—and in some cases beats—that of C and Fortran, the current de facto languages for performance-critical applications. And because Julia isn’t a statically compiled language, there’s no waiting around for compilation before you can run your code. This makes it easy to rapidly prototype and iterate on ideas.
Recently a Julia package called Gillespie.jl was published in the Journal of Open Source Software. It implements Gillespie’s direct method for stochastic simulations, which is widely used in fields such as systems biology and epidemiology, in pure Julia with no parallelism. In benchmarks, it’s shown to be over 500 times faster than the equivalent package for R, and over 600 times faster than hand-written R code for the same tasks. Amazingly, no special optimization tricks were used to achieve this huge gain in performance; Gillespie.jl is fast simply by virtue of being built on Julia.
We believe that Julia is not just the language of the future, but also the language of now. It’s a modern solution for modern problems, with the ability to adapt to new challenges with ease. That’s why we feel it’s the right choice for the life sciences industry and research.
Julia is on it’s way to expand its general biostatistics toolkit to include methodologies such as Cox proportional hazards regression and Kaplan-Meier estimated survival. Methods common in epidemiology, such as generalized estimating equations, will also be implemented soon.
Julia’s compliance with 21 CFR Part 11 will be documented to show that it’s ready to take on the rigorous needs of clinical trials. Also crucial for clinical trials is the ability to summarize data into production-quality tables, listings, and figures, and save them in common formats such as RTF and PDF. Anyone who has created an adverse events table for a clinical trial has depended on such functionality from a software package; having this functionality available in Julia will be critical for driving adoption of Julia for clinical trials reporting.
The Julia community is doing amazing things. We hope you join us too.