Access SAS in Python Environment with SASPy

Posted by Qing on April 7, 2018

在2017年初,SAS官方发布了SASPy Github上的SAS。 SASPy是一个Python包,通过这个包,可以在Python环境中直接运行SAS代码。这对于那些对SAS和开源软件集成感兴趣的用户来说,这是一大进步。

根据我的理解,SASPy将python对象和方法转换为SAS代码,将转换后的SAS代码发送到SAS 9.4并执行,然后将结果返回给Python环境。 因此,想要使用这个包,你必须有本地或远程安装SAS的软件,并且需要SAS Base权限,这意味着需要购买SAS软件。

想要在Python中连接SAS,不同的系统下有不同的连接方法,这里我主要描述在Windows客户端上通过使用IOM方法连接SAS的两种方法:

  • 连接Windows环境中的本地SAS
  • 连接Linux环境下(SAS Grid Manager中的远程SAS 更多的方法参见SASPy官方文档

前提配置

  • Python3或更高版本。
  • SAS 9.4或更高版本。
  • 客户端上的Java
  • SAS Integration Technologies客户端的四个JAR文件(可在SAS安装中找到或在此下载)。

安装SASPy软件包

  • 在命令行中安装
    # 使用pip
    pip安装saspy
    # 或使用conda(如果您通过Anaconda安装python)
    conda安装
    
  • 或下载软件包并安装
    1. SAS Github下载Python软件包
    2. 解压压缩包,切换到包文件夹,然后在命令行中执行python setup.py install

设置SASpy

配置

SASPy支持连接到Unix,大型机和Windows上的SAS。也可以连接到本地或远程SAS。 不同的环境有不同的设置,在配置之前,先确定访问类型。本文将解释两种在Windows环境下的访问方法:

  1. 连接本地Windows SAS
  2. 连接远程Linux SAS(SAS metadataserver)

这两种方法都使用IOM连接

找到配置文件

首先需要找到配置文件,名为sascfg.py的配置文件内置在SASPy内。 对于Anaconda安装,配置通常位于Continuum\anaconda3\Lib\site-packages\saspy中。 也可以通过import saspy找到这个包。然后,只需提交saspy.SAScfg。 Python会告诉你它在哪里找到模块。

复制sascfg_personal.py

saspy.cfg文件内置在saspy内,但是它仅仅作为示例配置文件。在更新SASPy之后,文件可能会丢失。为了保险起见,复制sascfg.py并重命名为sascfg_personal.py。 SASPy总是会首先尝试导入sascfg_personal.py,并且只有在失败时才会尝试导入sascfg.py。

设置sascfg_personal.py
  1. 只包含两个SAS_config_names = ['winiomlinux','winlocal'],代表了两种方法。
    • 用于本地Windows连接的winlocal
    • 用于远程Linux连接的winiomlinux
  2. 设置CLASSPATH以访问SAS Java IOM客户机JAR文件。总共五个Java JAR文件, 可以从SAS安装中获得四(4)个JAR文件,还有一个随SASPy包一起提供:saspyiom.jar。必须在CLASSPATH环境变量中提供这五个JAR文件(必须)。在sascfg.py文件中可以以非常简单的方式完成,如下所示:
    # Four SAS installation JAR files
    cpW  =  r"C:\Program Files\SASHome\SASDeploymentManager\9.4\products\deploywiz__94420__prt__xx__sp0__1\deploywiz\sas.svc.connection.jar"
    cpW += r";C:\Program Files\SASHome\SASDeploymentManager\9.4\products\deploywiz__94420__prt__xx__sp0__1\deploywiz\log4j.jar"
    cpW += r";C:\Program Files\SASHome\SASDeploymentManager\9.4\products\deploywiz__94420__prt__xx__sp0__1\deploywiz\sas.security.sspi.jar"
    cpW += r";C:\Program Files\SASHome\SASDeploymentManager\9.4\products\deploywiz__94420__prt__xx__sp0__1\deploywiz\sas.core.jar"
    # One come from SASPy package (located in your python package location)
    cpW += r";C:\Users\qing\AppData\Local\Continuum\anaconda3\Lib\site-packages\saspy\java\saspyiom.jar"
    
  3. 为本地和远程连接设置不同的参数, 两种连接方法,根据自身的需求选择合适的连接方法
    • 访问本地Windows SAS
      winlocal = {'java'      : r'C:\Program Files (x86)\Java\jre7\bin\java',
            'encoding'  : 'windows-1252',
            'classpath' : cpW
            }
      
      • java - (必需)要使用的Java可执行文件的路径。在Windows命令行内输入java,可以找到java的可执行文件的路径。

      • encoding - Python内的编码值,它跟要连接的IOM服务器的SAS编码一致。 * WLATIN1是在Windows上运行SAS的默认编码。映射到Python内的编码值为:windows-1252。

      • classpath - 上一步中指定的五个JAR文件

    • 访问远程Linux SAS
          winiomlinux = {'java'   : r'C:\Program Files (x86)\Java\jre7\bin\java',
                'iomhost'   : 'server.domain.address.com',
                'iomport'   : 8597,
                'encoding'  : 'latin1',
                'classpath' : cpW,
                'authkey'   : 'IOM_Prod_Grid1'
                }
      
      • java - 与本地Windows配置相同
      • iomhost - (必需)可解析的主机名或IOM Linux服务器的IP地址。
      • iomport - (必需)对象spawner侦听工作区服务器连接的端口。    iomhost addressiomport number可以通过提交下面SAS语得到。
         proc iomoperate
        uri='iom://metadataserver.com:8564;Bridge;USER=my_user,PASS=my_pass';
        list DEFINED FILTER='Workspace';
         quit;
         # metadataserver address can be found by:
         click Tools -> click Connections -> Profiles in SAS EG
        
      • encoding - 与本地Windows配置相同
      • classpath - 与本地Windows配置相同
      • authkey - 用户名和密码。

        IOM访问方法支持从用户主目录中的authinfo文件获取所需的用户/密码,而不是提示用户/密码输入。在Windows上,它的名字是_authinfo。 authinfo文件中行的格式如下。 第一个值为authkey指定的authkey值。接下来是’用户’键,后面是值(用户ID),然后是’密码’键,后面跟着它的值(用户的密码)。注意该文件有权限。在Windows上,这个文件应锁定在只有所有者可以读写的位置。 例如,用户Bob的主目录中的authinfo文件的密码为BobsPW1将在其中包含一行,例如: IOM_Prod_Grid1用户Bob密码BobsPW1

开始

费了九牛二虎之力,终于配置好了,可以在Python中直接使用这个包了。

初始化

import saspy
import pandas as pd
from IPython.display import HTML

启动SAS会话

下面使用winlocal配置开始一个名为sas的SAS会话。如果忽略cfgname选项,SAS会弹出一个窗口,让你输入config信息。 建立连接并启动SAS会话后,会跳出与下面类似的注释。 sassession

开始数据分析

有三种方法进行分析:

  1. 通过内置SASPy方法。
  2. 将SAS数据集转换为Pandas Dataframe,通过pandas进行分析
  3. 通过SASPy直接提交SAS代码
使用SASPy内置方法进行分析
  • Data inspection SASPy_1
  • Descriptive Statistics SASPy_2
  • Simple bar chart SASPy_3
使用pandas进行数据分析
  • SAS dataset to Pandas Dataframe & data inspection SASPy_4
  • Correlation matrix SASPY_5
  • Correlation heat map SASPy_8
直接提交SAS代码进行分析
  • A simple scatter chart SASPy_6

结论

 SASPy的配置还算比较简单,关键是选择合适的连接类型。由于是开源软件,中间不可避免地会遇到一些小bug,可以参阅该软件包的文档:saspy或者在GitHub上提交issue。

In early 2017, SAS announce the release of SASPy from SAS on Github. The SASPy is a Python package enables you to connect to and run your analysis code from SAS 9.4.

SASPy brings a “Python-ic” sensibility to this approach for using SAS. That means that all of your access to SAS data and methods are surfaced using objects and syntax that are familiar to Python users. This is a fantastic expansion of functionality and a huge step forward for those interested in Open Source Integration with SAS.

In a word, SASPy translates the python objects and methods into the SAS code, send the translated SAS code to SAS 9.4 and execute, then return the results to Python environment. This means a SAS software should be installed locally or remotely, and SAS Base is needed. You must buy a SAS licence before using SASPy.

In the comming paragraphs, I will outlines how to setup SASPy on Windows client to connect two types of SAS using IOM method:

  • local SAS in Windows environment
  • remote SAS in Linux environment(SAS Grid Manager)

Prerequirement

  • Python3 or higher.
  • SAS 9.4 or higher.
  • Requires Java on the client
  • Requires four JAR files from SAS Integration Technologies Client(can be found in your SAS installation or download here).

Install the Package

  • Install in command line
    # using pip
    pip install saspy
    # or using conda(if you install python by Anaconda)
    conda install
    
  • Or download package and install
    1. download the Python package from SAS Github
    2. extract the package, change to the package folder, and execute python setup.py install in command line

Setting The SASpy

Configuration

SASPy support connect to SAS on Unix, Mainframe, and Windows. It can connect to a local SAS session or remote session. Different environment has different settings. So you should determine your access type before configuration. Two access method will be explained in this article:

  1. Connect local Windows SAS
  2. Connect remote Linux SAS(a SAS metadataserver) both method using IOM connection
Find the configuration file

The configuration file called sascfg.py is located in where you SASPy package is located. For Anaconda installation, the config generally located in Continuum\anaconda3\Lib\site-packages\saspy. Your can also find the package by import saspy. Then, simply submit saspy.SAScfg. Python will show you where it found the module.

Copy a sascfg_personal.py

Since the saspy.cfg file is in the saspy repo, as an example configuration file, it can be updated on occasion or be replaced. To avoid file lose, simply copy the sascfg.py and rename to sascfg_personal.py. saspy will always try to import sascfg_personal.py first, and only if that fails will it try to import sascfg.py.

Setting in sascfg_personal.py
  1. Include only two SAS_config_names SAS_config_names=['winiomlinux', 'winlocal']
    • winlocal for local Windows connection
    • winiomlinux for remote Linux connection
  2. Setting CLASSPATH to access the SAS Java IOM Client JAR files. Total five Java JAR files are requited - four (4) JAR files are available from your existing SAS installation, and one JAR file that is provided with SASPy package: saspyiom.jar. These five JAR files must be provided (fully qualified paths) in a CLASSPATH environment variable. This is done in a very simple way in the sascfg.py file, like so:
     # Four SAS installation JAR files
     cpW  =  r"C:\Program Files\SASHome\SASDeploymentManager\9.4\products\deploywiz__94420__prt__xx__sp0__1\deploywiz\sas.svc.connection.jar"
     cpW += r";C:\Program Files\SASHome\SASDeploymentManager\9.4\products\deploywiz__94420__prt__xx__sp0__1\deploywiz\log4j.jar"
     cpW += r";C:\Program Files\SASHome\SASDeploymentManager\9.4\products\deploywiz__94420__prt__xx__sp0__1\deploywiz\sas.security.sspi.jar"
     cpW += r";C:\Program Files\SASHome\SASDeploymentManager\9.4\products\deploywiz__94420__prt__xx__sp0__1\deploywiz\sas.core.jar"
     # One come from SASPy package (located in your python package location)
     cpW += r";C:\Users\qing\AppData\Local\Continuum\anaconda3\Lib\site-packages\saspy\java\saspyiom.jar"
    
  3. Setting different parameter for local and remote connection
    • access local Windows SAS.
      winlocal = {'java'      : r'C:\Program Files (x86)\Java\jre7\bin\java',
            'encoding'  : 'windows-1252',
            'classpath' : cpW
            }
      

      java - (Required) The path to the Java executable to use. On Windows, you might be able to simply enter java. If that is not successful, enter encoding - the fully qualified path.

      encoding - This is the Python encoding value that matches the SAS session encoding of the IOM server to which you are connecting. WLATIN1 are the default encodings for running SAS on Windows. Those map to Python encoding values: windows-1252.

      classpath - The five JAR files specified in previous step

    • access remote Linux SAS.
      winiomlinux = {'java'   : r'C:\Program Files (x86)\Java\jre7\bin\java',
            'iomhost'   : 'server.domain.address.com',
            'iomport'   : 8597,
            'encoding'  : 'latin1',
            'classpath' : cpW,
            'authkey'   : 'IOM_Prod_Grid1'
            }
      

      java - same as local Windows

      iomhost - (Required) The resolvable host name, or IP address to the IOM Linux Server.

      iomport - (Required) The port that object spawner is listening on for workspace server connections. iomhost address and iomport number can be got with the following SAS statement.

      proc iomoperate
        uri='iom://metadataserver.com:8564;Bridge;USER=my_user,PASS=my_pass';
        list DEFINED FILTER='Workspace';
      quit;
      # metadataserver address can be found by:
          click Tools -> click Connections -> Profiles in SAS EG
      

      encoding - same as local Windows

      classpath - same as local Windows

      authkey - The keyword that starts a line in the authinfo file containing user and or password for this connection.

      The IOM access method has support for getting the required user/password from an authinfo file in the user’s home directory instead of prompting for it. on windows, it’s name is _authinfo. The format of the line in the authinfo file is as follows. The first value is the authkey value you specify for authkey. Next is the ‘user’ key followed by the value (the user id) and then ‘password’ key followed by its value (the user’s password). Note that there are permission rules for this file. On Windows, the file should be equally locked down to where only the owner can read and write it. For example, The authinfo file in the home directory for user Bob, with a password of BobsPW1 would have a line in it as follows: IOM_Prod_Grid1 user Bob password BobsPW1

Getting start

Once you have already done the installation and configuration, you can use the package in Python.

Initial import

import saspy
import pandas as pd
from IPython.display import HTML

Start a SAS session

In the following code we start a SAS session named sas using the winlocal configuration. You can ignore the cfgname option, SAS will pop up a window with connection method After a connection is made and a SAS session is started, a note that is similar to the the one below is displayed. sassession

Begin data analysis

There are 3 ways to make you analysis code:

  1. With built-in SASPy Method.
  2. With Pandas by converting SAS dataset to Pandas Dataframe
  3. Submitting SAS Code directly
Analysis with SASPy built-in Method
  • Data inspection SASPy_1
  • Descriptive Statistics SASPy_2
  • Simple bar chart SASPy_3
Analysis with Pandas
  • SAS dataset to Pandas Dataframe & data inspection SASPy_4
  • Correlation matrix SASPY_5
  • Correlation heat map SASPy_8
Analysis by Submitting SAS Code directly
  • A simple scatter chart SASPy_6

Conclusion

For more background, refer the documentation for the package: saspy