Microsoft Excel is a widely used tool for the analysis of data but there are limitations on the size of data that can be processed and the flexibility of the spreadsheet package. In this blog post I’ll highlight some of the advantages of using the ‘R’ software package against Excel.
The key to R is the code which is a low level functional programming language. The language itself is quick to pick up and understand. There are also a large number of on line resources and tutorials available geared towards learning the language. Having developed a script the code can be run against multiple datasets with little or no interference from users. Work can be easily reproduced for verification purposes or when errors are detected in the data. R jobs can be run automatically after being scheduled through a job management suite or perhaps even directly through CRON. Streaming data can also be processed so analysis can be performed in real time.
The R suite also supports high performance computing and is able to handle datasets that are far larger those currently supported by Excel. There is also support for the emerging big data technologies in the form of Hadoop, Spark and Hive to name a few. Packages exist to allow for the parallelisation of processes allowing for increased performance of large scale data processing.
R also supports multiple data formats. Not only are CSV files supported but so are JSON and XBRL. R also has the capacity to read and write data back to a database. The same piece of R code can be used to extract data from a number of differing sources. This flexibility makes it an ideal tool for blending data from multiple sources.
Having acquired the data, the next step is processing. With large datasets an important task can be the cleansing of data. There are over 5000 packages available to assist with data cleansing and analysis. A large number of data classification and regression analysis tools are available along with a growing selection of machine learning algorithms. This selection and support for advanced analytical tools far exceeds that which Excel offers.
Finally we come down to visualisation, the use of graphics in any data analysis project is key. R offers a wide range of tools that can be used. This ranges from simple graphics such as charts and graphs to sophisticated map overlays and overlaying text onto pictures. Even if users cannot find a suitable toolkit the ability to extend the package to include custom packages should not be overlooked.
This impressive power and flexibility is open source. With budgets being cut and local councils finding their analysts being made redundant the use of R should be seriously considered.
Published 25 October 2016