Originally posted in Spanish at https://blog.mbitschool.com/2015/05/data-science-tools-sas-vs-r.html and https://sasybi.blogspot.com.es/2015/05/data-science-tools-sas-vs-r.html
Some of the disciplines that have experienced the most development in recent years have all related to data science. The techniques and tools used in this discipline are gaining more weight in the business environment. If we take a look at the tools most commonly used, we see that there have been changes in recent years. The next (www.datasciencecentral.com) chart shows the major tools currently used by data scientists.
They are the most sought after today and represent the long trend led by R, and if we focus on existing posts, the leader has been SAS. We could add to those listed in the chart Phython, which although is a general-purpose language, its use in data analysis is increasingly widespread. We can also add to previous tools such as SCALA, RapidMiner, Weka or KNIME.
In this post, we will try to compare two of the most-used tools: SAS and R. Besides being the most-used tools, they also represent different architectures, different orientations and from the point of view of costs: paid vs free. Probably much more interesting to compare a Ferrari with a Lamborghini, to compare R with SAS, but although we also purchased speed, and cost usability, in the business-analytics context we focus on SAS and R.
Do not lose sight that considering the hectic pace leading the IT industry, if we make the same comparison within two years, tools will have evolved and certainly the criteria to assess them, there will be the need to integrate new data types into the analysis.
We begin with a brief introduction of both tools:
SAS: data analysis tool with tradition. It takes many years to lead the market and present in large accounts. It has several tools for data analysis: SAS / BASE, SAS / Enterprise Guide and SAS / Enterprise Miner. Annual term licenses at a cost affordable by only large accounts.
R: data analysis tool, unless you're a SAS veteran, but with a remarkable presence in the market. Widespread in universities and research centers, it is entering with force in the business landscape. Open source license. Extensive community and active users, the amount of available libraries is growing by the day.
The comparison criteria to consider are:
Ease of use / learning curve:
In this aspect SAS may be a simpler language for non-programmers, and there are many business analysts who must use such tools without prior technical background programming. SAS data steps are easy to learn for anyone even slightly acquainted with table structures, as it has a design type DML (Data Manipulation Language). Moreover SAS proc SQL allows the option to write SQL code directly in R may demand a more solid base of knowledge in programming and data structures. If SAS is similar to SQL, R would have its equivalent in C++. In structuring, R is an object-oriented language, while SAS responds to a type of structured, sequential language. R can do the same thing in many different ways, for example, if SAS aggregations, we'll go to a proc SQL aggregation or a PROC MEANS. But in R, there are multiple ways to do this (aggregate, summarize, apply Functions, Doby, etc.). This can be confusing to the novice who is learning R. As for training resources, it's easy to find useful resources on the web. SAS has certifications, but this formal training is also expensive.
Management and Data Management:
The key difference in data management is that R works in memory and SAS disk. Working mostly in RAM has its advantages and disadvantages, facing processes with high-volume datasets R records should be taken into account. There are libraries that allow R disc also work. SAS processes has traditionally been a problem footprint and libraries as the work must be well managed. Both work well paralleling processes.
Graphical and visualization capabilities:
The graphics capabilities of SAS focus on SAS / BASE and SAS / Enterprise Guide and, without considering SAS / Visual Analytics is licensed part, they are pretty fair. SAS in this area covers the essentials, at least in their own modules of data mining. Besides it is not limited in its use intricate. R, however, has very potent display capabilities and numerous packages with advanced functionality.
Due to the nature of open source, R has new algorithms and techniques readily available as individual packages are updated. To date R has about 15000 packets in CRAN (Comprehensive R Archive Network). SAS's policy of regular releases of commercial software, so that R can have more flexibility to incorporate new functionalities, although it may do SAS tested in a controlled environment.
Support services and communities:
R has a widespread and community but has no support, even if you have SAS support. In everyday practice, the broad user community for R (forums, questions, resources), supplies more than the lack of support. That said, some people are more relaxed having support on the other side of the line or you resolve the problem or you can "push" to an alternative solution.
SAS module features the Enterprise Guide an intuitive interface for developing process flows analytic. There are different tools based on R which also allow the development of workflows (an example is Rattle), but the have not been finally imposed, nor are they optimized. Experience shows that many analytic processes are not supported by the components of these tools and, for example, in the case of SAS, most code is purely SAS / BASE and of little use to the predefined components of which Enterprise Guide is made.
SAS provides a range of tools in fields near the Science Data as Business Intelligence, Dashboarding, Data Visualization, Data Warehouse, ETL and Data Quality, which can be integrated with data science processes (end-to-end), while R is a language focused exclusively on data science.
Integration with other languages and tools:
With regard to integration with other tools and languages, it is possible that R will take the lead from SAS. There are many open-source community tools that are integrated with R and rarely does commercial software not offer integration with R. Logically, SAS also has integrations and partnerships, analytical environments, but perhaps stay one step behind.
Licensing and costs:
There is little to say on this: as we know R is open source and SAS is commercial software with high cost. It would be interesting to see what happens in terms of trends of use if SAS lowered prices. So far it has already released one version for free training (SAS OnDemand for Academics). There are approaches in line to use both, something perfectly acceptable since R is free. There are facilities that use SAS for all data management (extraction of sources, merging, cleaning, application of business rules, consolidation, etc.) and allows the final dataset R prepared to apply the statistical model and perform the final presentation. Not a bad approach, especially considering that we can save the license SAS / Enterprise Miner (models) which is the most expensive .Equally useful is to know some equivalences between code level tools: SAS and R Equivalents
Finally an interesting study in which SAS or R preference based on years of experience is analyzed.
In this brief summary we have tried the aspects we consider most critical, this post serves as a home to possible ways to provide comments or considerations not listed in this compendium and that may also have relevance in the selection of the data analysis tool. We Also welcome contributions about other tools (Python, Matlab, SPSS, SCALA, etc.).
Interesting training services about SAS and R, ask at: firstname.lastname@example.org