博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
python rpy2_RPy2:结合R + Python的强大功能进行数据科学
阅读量:2519 次
发布时间:2019-05-11

本文共 10460 字,大约阅读时间需要 34 分钟。

python rpy2

About Matthew: Matthew is a Data Scientist at in Kansas City. He previously studied Physics for his BS at the University of Notre Dame followed by the University of Kansas for his MS. When he is not programming, Matthew enjoys playing board games, especially Race for the Galaxy.

关于Matthew :Matthew是堪萨斯城的数据科学家。 之前,他在圣母大学(University of Notre Dame)攻读物理学学位,其后在堪萨斯大学(University of Kansas)攻读硕士学位。 当他不编程时,马修喜欢玩棋盘游戏,尤其是《银河争霸》。

介绍 (Intro)

During my time as a Data Scientist, I have been primarily a Python user. However, I wanted access to some of the power offered by R, specifically the auto.arima function in the R forecast package. This post will go over how to get you started on incorporating R functionality into your python workflow.

在担任数据科学家期间,我主要是Python用户。 但是,我想使用R提供的一些功能,特别是R预测包中的auto.arima函数。 这篇文章将介绍如何开始将R功能整合到python工作流程中。

想要跟进吗? (Want to follow along?)

下载Rodeo并前往教程部分查找代码! (Download Rodeo and head to the tutorials section to find the code!)

Download Rodeo and follow along!

下载Rodeo并跟随!

Python和R:它们各自提供什么? (Python and R: What do they each offer?)

The two most popular options for data analysis and modeling are R and Python. Each has their own unique strengths and weaknesses, many which drive a user or a team to choose one over the other. Here are a few key strengths of both that I find particularly valuable:

用于数据分析和建模的两个最受欢迎的选项是R和Python。 每个都有各自独特的优势和劣势,许多优势或劣势促使用户或团队相互选择。 以下是我认为特别有价值的一些关键优势:

Python:

Python:

  • Python is a ‘real’ programming language, allowing for more flexibility in your ability to solve specific problems
  • It offers many other libraries in addition to those needed for a Data Scientist’s models
  • Python is making strides in the data analysis space with pandas, statsmodels, and scikitlearn
  • Python是一种“真正的”编程语言,可以在解决特定问题的能力上提供更大的灵活性
  • 除了数据科学家模型所需的库外,它还提供了许多其他库
  • Python借助pandas,statsmodels和scikitlearn在数据分析领域取得了长足的进步

R:

R:

  • R libraries have been battle tested far longer than Python, giving a Data Scientist a verified set of tools at their disposal.
  • There are also many implementations of various functions, allowing you to find the library that is right for you.
  • Due to the long history of R packages, there is a strong community around data analysis.
  • R库经过实战测试的时间比Python长得多,为数据科学家提供了一套经过验证的可用工具。
  • 还有各种功能的许多实现,使您可以找到适合您的库。
  • 由于R包的悠久历史,因此围绕数据分析有一个强大的社区。

RPy2如何发挥作用? (How Does RPy2 come into play?)

RPy2 creates a framework that can translate Python objects into R objects, pass them into R functions, and convert R output back into Python objects. There are many ways that a user can integrate this into their workflow. You may decide to call R library functions as you would native Python functions, or you may decide to write a single R script to run your data on. Below I’ll go over the few basics of how to get used to the flow of RPy2, using some sample data from R. We will load in some data, model it in R, and plot the results back in Python.

RPy2创建了一个框架,该框架可以将Python对象转换为R对象,将它们传递给R函数,并将R输出转换回Python对象。 用户可以通过多种方式将其集成到他们的工作流程中。 您可能决定像本地Python函数一样调用R库函数,也可能决定编写一个R脚本来运行数据。 下面,我将使用来自R的一些示例数据,介绍如何适应RPy2流程的一些基础知识。我们将加载一些数据,在R中进行建模,然后将结果绘制回Python中。

Note: While I will be working with time series data, I will not be passing them back and forth between Python and R. This can get really tricky and can cause many headaches, so I find it easier to handle all the time series indexing on the Python side.

注意:虽然我将使用时间序列数据,但我不会在Python和R之间来回传递它们。这可能会非常棘手,并且可能会引起很多麻烦,因此我发现更容易处理所有的时间序列索引Python方面。

将R对象和库导入Python (Importing R objects and libraries to Python)

We can import both R functions and libraries as Python objects. We can load a default R function like ts() through the robjects.r() function, and assign it to a Python variable. Similarly, we can use importr to load an R library into a namespace.

我们可以将R函数和库都导入为Python对象。 我们可以通过robjects.r()函数加载默认的R函数,如ts() ,并将其分配给Python变量。 同样,我们可以使用importr将R库加载到名称空间中。

import rpy2.robjects as robjectsfrom rpy2.robjects.packages import importrts=robjects.r('ts')forecast=importr('forecast')import rpy2.robjects as robjectsfrom rpy2.robjects.packages import importrts=robjects.r('ts')forecast=importr('forecast')

Now that we have these objects in loaded, we can call them similar to standard Python practices. Let’s load up some data to model, and create a forecast off of it. You can .

现在已经加载了这些对象,我们可以像调用标准Python惯例一样调用它们。 让我们加载一些数据进行建模,然后根据其创建预测。 您可以 。

Note: Take care of the import of pandas2ri and the activate() function. These a key to transforming certain datatypes from Python to R.

注意:请注意导入pandas2riactivate()函数。 这些是将某些数据类型从Python转换为R的关键。

We’ve taken our data, transformed it into an robject, and called R functions on our objects. However, we are left with one really messy issue, and that is the output of our function. Our R forecast object isn’t nicely translated into a neat Python object for you to parse. You can find all the information from your forecast that you need as shown below:

我们已经获取了数据,将其转换为robject,并在对象上调用了R函数。 但是,我们留下了一个非常麻烦的问题,那就是函数的输出。 我们的R预测对象无法很好地转换为纯净的Python对象供您解析。 您可以从预测中找到所需的所有信息,如下所示:

index=pd.date_range(start=traindf.index.max(),periods=len(forecast_output[3])+1,freq='QS')[1:]forecast=pd.Series(forecast_output[3],index=index)lowerpi=pd.Series(forecast_output[4],index=index)upperpi=pd.Series(forecast_output[5],index=index)index=pd.date_range(start=traindf.index.max(),periods=len(forecast_output[3])+1,freq='QS')[1:]forecast=pd.Series(forecast_output[3],index=index)lowerpi=pd.Series(forecast_output[4],index=index)upperpi=pd.Series(forecast_output[5],index=index)

We can also draw our newly-created plot and save it as a png file on our machine:

我们还可以绘制新创建的图并将其另存为png文件到我们的计算机中:

将R代码阻塞为一个函数 (Blocking R code into a Function)

Instead of bringing everything into Python, we can instead manipulate our objects purely in R, and return only the desired output back to Python. Similar to how we used the robjects.r() to create a python object mapping to the ts function, we can define our own function and assign it to a Python object. We still have to create an R data object to pass into the function, but the rest is done on the R side.

无需将所有内容都带入Python,我们可以纯粹在R中操作对象,然后仅将所需的输出返回给Python。 与我们使用robjects.r()创建映射到ts函数的python对象相似,我们可以定义自己的函数并将其分配给Python对象。 我们仍然必须创建一个R数据对象以传递给该函数,但是其余的操作在R端完成。

rstring="""    function(testdata){        library(forecast)        fitted_model<-auto.arima(testdata)        forecasted_data<-forecast(fitted_model,h=16,level=c(95))        outdf<-data.frame(forecasted_data$mean,forecasted_data$lower,forecasted_data$upper)        colnames(outdf)<-c('forecast','lower_95_pi','upper_95_pi')        outdf    }"""rfunc=robjects.r(rstring)rdata=ts(traindf.Price.values,frequency=4)r_df=rfunc(rdata)rstring="""    function(testdata){        library(forecast)        fitted_model<-auto.arima(testdata)        forecasted_data<-forecast(fitted_model,h=16,level=c(95))        outdf<-data.frame(forecasted_data$mean,forecasted_data$lower,forecasted_data$upper)        colnames(outdf)<-c('forecast','lower_95_pi','upper_95_pi')        outdf    }"""rfunc=robjects.r(rstring)rdata=ts(traindf.Price.values,frequency=4)r_df=rfunc(rdata)

We now have our resulting forecast in an R Dataframe! You (hopefully) have seen the pandas2ri import above, as this adds a nice easy finish to our data transformation. With its ri2py() function, we can convert our R Dataframe to a Pandas DataFrame object.

现在,我们在R Dataframe中得到了预测结果! 您(希望如此)已经在上面看到了pandas2ri导入,因为这为我们的数据转换添加了一个很好的简便方法。 通过其ri2py()函数,我们可以将R Dataframe转换为Pandas DataFrame对象。

forecast 预测 lower_95_pi lower_95_pi upper_95_pi upper_95_pi
1970-04-01 00:00:00.000001 1970-04-01 00:00:00.000001 1201.895643 1201.895643 1132.969980 1132.969980 1270.821305 1270.821305
1970-07-01 00:00:00.000001 1970-07-01 00:00:00.000001 651.095643 651.095643 581.999757 581.999757 720.191529 720.191529
1970-10-01 00:00:00.000001 1970-10-01 00:00:00.000001 385.395643 385.395643 316.129953 316.129953 454.661333 454.661333
1971-01-01 00:00:00.000001 1971-01-01 00:00:00.000001 820.795643 820.795643 751.360563 751.360563 890.230723 890.230723
1971-04-01 00:00:00.000001 1971-04-01 00:00:00.000001 1239.891286 1239.891286 1138.581601 1138.581601 1341.200971 1341.200971
1971-07-01 00:00:00.000001 1971-07-01 00:00:00.000001 689.091286 689.091286 587.318843 587.318843 790.863728 790.863728
1971-10-01 00:00:00.000001 1971-10-01 00:00:00.000001 423.391286 423.391286 321.158180 321.158180 525.624391 525.624391
1972-01-01 00:00:00.000001 1972-01-01 00:00:00.000001 858.791286 858.791286 756.099584 756.099584 961.482988 961.482988
1972-04-01 00:00:00.000001 1972-04-01 00:00:00.000001 1277.886929 1277.886929 1148.555295 1148.555295 1407.218562 1407.218562
1972-07-01 00:00:00.000001 1972-07-01 00:00:00.000001 727.086929 727.086929 596.940390 596.940390 857.233467 857.233467
1972-10-01 00:00:00.000001 1972-10-01 00:00:00.000001 461.386929 461.386929 330.430556 330.430556 592.343302 592.343302
1973-01-01 00:00:00.000001 1973-01-01 00:00:00.000001 896.786929 896.786929 765.025699 765.025699 1028.548158 1028.548158
1973-04-01 00:00:00.000001 1973-04-01 00:00:00.000001 1315.882572 1315.882572 1159.908983 1159.908983 1471.856160 1471.856160
1973-07-01 00:00:00.000001 1973-07-01 00:00:00.000001 765.082572 765.082572 607.908555 607.908555 922.256588 922.256588
1973-10-01 00:00:00.000001 1973-10-01 00:00:00.000001 499.382572 499.382572 341.017226 341.017226 657.747917 657.747917
1974-01-01 00:00:00.000001 1974-01-01 00:00:00.000001 934.782572 934.782572 775.234792 775.234792 1094.330351 1094.330351

Great! Our output is organized nicely into a nice, neat DataFrame, ready to be consumed by all our other Python tools. Now let’s plot it to ensure we get the same results…

大! 我们的输出被很好地组织成一个漂亮,整洁的DataFrame,可以被我们所有其他Python工具所使用。 现在让我们对其进行绘制,以确保获得相同的结果…

fig=plt.figure(figsize=(16, 7));ax=plt.axes()ax.plot(traindf.Price.index,traindf.Price.values,color='blue',alpha=0.5)ax.plot(forecast_df.index,forecast_df.forecast.values,color='red')ax.fill_between(forecast_df.index,                      forecast_df['lower_95_pi'],                      forecast_df['upper_95_pi'],                      alpha=0.2,color='red')fig=plt.figure(figsize=(16, 7));ax=plt.axes()ax.plot(traindf.Price.index,traindf.Price.values,color='blue',alpha=0.5)ax.plot(forecast_df.index,forecast_df.forecast.values,color='red')ax.fill_between(forecast_df.index,                      forecast_df['lower_95_pi'],                      forecast_df['upper_95_pi'],                      alpha=0.2,color='red')

Looks familiar!

看起来很熟悉!

翻译自:

python rpy2

转载地址:http://orqwd.baihongyu.com/

你可能感兴趣的文章
Struts2表单数据接收方式
查看>>
小技巧之a标签自动解析URL
查看>>
51Nod 1099 任务执行顺序 (贪心)
查看>>
Spring Boot系列学习文章(一) -- Intellij IDEA 搭建Spring Boot项目
查看>>
时间对象 <-> 定时器 <-> 电子时钟 <-> 倒计时效果
查看>>
Xposed模块开发入门-最基本的项目创建
查看>>
PHP多线程类
查看>>
Mysql主从复制原理及配置
查看>>
Golang- import 导入包的语法(转)
查看>>
FMDB的使用
查看>>
jquery追加元素的不同语法
查看>>
微信开发者工具和开发
查看>>
const 指针的三种使用方式
查看>>
Codeforces 1167C - News Distribution
查看>>
四连测Day2
查看>>
Qt模态对话框和非模态对话框
查看>>
emacs 编译 如何把 emacs 的 el 文件编译为 elc 文件
查看>>
腾讯云云机安装dockers
查看>>
项目接口书写心得(1)
查看>>
Java学习(五)
查看>>