Here is the latest from Julia Computing

Thoughts on TIOBE's Language Ranking Methodology

16 Sep 2019 | Abhijith Chandraprabhu, Andrew Claster, Stefan Karpinski, Viral Shah

For those who want to track Julia’s growth, some of the most popular measures of programming language popularity include PYPL, TIOBE, GitHub, RedMonk and IEEE Spectrum. TechCrunch published a useful discussion of the differences among some of these measures last year and Zhang Liye published some tracking data on Julia’s discourse forum last year. Here’s a very high-level overview of what the different rankings are based on [1]:

  • PYPL ranking is based on searches for language tutorials using Google.
  • TIOBE measures the number of search engine results for the query term +"X programming" on 25 of the most popular search engines worldwide where X is one of potentially several keywords for each language.
  • GitHub looks at stars and forks of languages developed on GitHub.
  • RedMonk is based on the number of GitHub projects plus the number of questions tagged with a given language on Stack Overflow and is usually presented as a scatterplot of where languages fall on these two dimensions.
  • IEEE Spectrum’s incorporates Google Search, Google Trends, Twitter, GitHub, Stack Overflow, Reddit, Hacker News, CareerBuilder, Dice, and IEEE Xplore Digital Library.

So where does Julia rank? As of September 2019, Julia is:

  • #8 in GitHub stars and forks (among languages developed on GitHub)
  • #22 on PYPL
  • #23 on IEEE Spectrum
  • #33 on RedMonk
  • #36 on TIOBE.

As well as being the lowest, Julia’s ranking on the TIOBE index has been particularly volatile. It jumped 11 places from #50 in July to #39 in August and #36 in September 2019. However, we also saw Julia jump from #50 to #37 from February to March of 2018, only to fall back later. We couldn’t help but wonder “what is going on here?” Since the TIOBE index is the most popular but also the most unpredictable, we decided to do a little digging into their methodology, hoping to better understand the volatility we’ve seen. The specific search query that TIOBE uses for each language is

+"X programming"

In other words, to determine the popularity of Java, it looks for the verbatim phrase “Java programming” across different search engines and counts the number of “hits” each engine reports for that search phrase. According to TIOBE:

It is important to note that the TIOBE index is not about the best programming language or the language in which most lines of code have been written.

TIOBE is transparent about the issues with their current ranking, and actively solicits comments for improvement: “If you have any suggestions how to improve the index don’t hesitate to send an e-mail to [email protected]” According to TIOBE, the top 5 most requested changes to the TIOBE index include:

  1. Apart from "X programming", also other queries such as "programming with X", "X development" and "X coding" should be tried out.
  2. Add queries for other natural languages (apart from English). The idea is to start with the Chinese search engine Baidu. This has been implemented partially and will be completed in the next few months.
  3. Add a list of all search term requests that have been rejected. This is to minimize the number of recurring mails about Rails, JQuery, JSP, etc.
  4. Start a TIOBE index for databases, software configuration management systems and application frameworks.
  5. Some search engines allow to query pages that have been added last year. The TIOBE index should only track those recently added pages.

We at Julia Computing decided to investigate an additional change as part of their most requested potential change. Since Julia and some other languages are often referred to as “the X language” rather than “X programming”, we wanted to learn how rankings would change for Julia and other languages if we included “X language” as well as “X programming” to calculate rankings. We selected the TIOBE top 40 languages and recalculated the rankings using this combined query (+"X language" OR +"X programming").

In the following graph, we have put TIOBE index rank—based only on the search term “X programming”—on the X-axis, and our revised ranking—including both “X programming” and “X language” as search terms—on the y-axis.

Note that a higher ranking corresponds to a lower number (#1 has the most searches), so we inverted the scale on the graph with the highest rankings (lowest numbers) in the top right corner and the lowest rankings (highest numbers) in the bottom left.

  • The biggest loss is for Groovy, which falls from #11 to #38
  • The biggest gain is for ActionScript which climbs from #38 to #14
  • Dart, F# and Delphi all lose rank
  • Julia, Rust, TypeScript, R and D all gain
    • Julia’s new rank is #28 as per this revised ranking

This result led us to give some thought to the following question: Why is is that some languages (e.g. Julia) are more often referred to as “X language” rather than “X programming language”? We can only speculate about the reasons behind this difference. They may be linguistic—the phrase “X programming” is easier to say or sounds more right for some languages, while “X language” is easier or more concordant for others. For example, “Java programming” is a pretty comfortable phrase, whereas “Java language” is kind of awkward and probably only used when trying to make a distinction between the Java language and one of its implementations. This is similar for C, C++ and many languages on the list. This supports the overall use of the search term “Java programming” or “C programming” as a proxy for the popularity of those languages.

On the other hand, since Julia is a person’s name in much of the world, we often find ourselves writing “the Julia language” to clarify what we’re talking about. This may very well affect the number of hits search engines find on the verbatim phrase “Julia programming”. These results, much like the TIOBE ranking itself, are a bit too noisy and hard to interpret to draw firm conclusions, but it does suggest that TIOBE should probably consider broadening their search terms since people write about different languages in different ways.

Another concern with the current TIOBE ranking, alluded to above, is its volatility-language rankings swing wildly from month to month. Indeed, we found that the same search engine frequently shows wildly different counts for the same search depending on the day. We noticed for example, that Baidu’s search counts seem particularly volatile and higher by an order of magnitude compared to Google or Bing. Even over the few weeks as we carried out our exercise, we noticed variations on Google that would move a language a few places in ranking. Naturally, one might consider various statistical ways to address this volatility.

We’re glad that TIOBE is interested in hearing from the community, and we will be sharing these thoughts with them. In the meantime, if you have any further thoughts on this analysis or other suggested changes to ranking methodology, we would love to hear from you.

Recent posts

Grokking Deep Learning with Julia
29 Jul 2020 | Deepak Suresh
Newsletter July 2020
17 Jul 2020 | Julia Computing
CSV Reader Benchmarks: Julia Reads CSVs 10-20x Faster than Python and R
22 Jun 2020 | Deepak Suresh
Get the latest news about Julia delivered to your inbox.