python处理出租车轨迹数据_1-出租车数据的基础处理,由gps生成OD(pandas).ipynb...-程序员宅基地

技术标签: python处理出租车轨迹数据  

{

"cells": [

{

"cell_type": "markdown",

"metadata": {},

"source": [

"在这个教程中,你将会学到如何使用python的pandas包对出租车GPS数据进行数据清洗,识别出行OD\n",

"\n",

"

提供的基础数据是:

数据:
\n",

" 1.出租车原始GPS数据(在data-sample文件夹下,原始数据集的抽样500辆车的数据)

"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"[pandas包的简介](https://baike.baidu.com/item/pandas/17209606?fr=aladdin)"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"# 读取数据"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"首先,读取出租车数据。"

]

},

{

"cell_type": "code",

"execution_count": 2,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:51:53.552930Z",

"start_time": "2020-01-18T04:51:52.397018Z"

}

},

"outputs": [],

"source": [

"import pandas as pd\n",

"#读取数据\n",

"data = pd.read_csv(r'data-sample/TaxiData-Sample',header = None)\n",

"#给数据命名列\n",

"data.columns = ['VehicleNum', 'Stime', 'Lng', 'Lat', 'OpenStatus', 'Speed']"

]

},

{

"cell_type": "code",

"execution_count": 3,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:51:58.299239Z",

"start_time": "2020-01-18T04:51:58.271312Z"

}

},

"outputs": [

{

"data": {

"text/html": [

"

\n",

"

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"\n",

"

" \n",

"

\n",

"

\n",

"

VehicleNum\n",

"

Stime\n",

"

Lng\n",

"

Lat\n",

"

OpenStatus\n",

"

Speed\n",

"

\n",

"

\n",

"

\n",

"

\n",

"

0\n",

"

22271\n",

"

22:54:04\n",

"

114.167000\n",

"

22.718399\n",

"

0\n",

"

0\n",

"

\n",

"

\n",

"

1\n",

"

22271\n",

"

18:26:26\n",

"

114.190598\n",

"

22.647800\n",

"

0\n",

"

4\n",

"

\n",

"

\n",

"

2\n",

"

22271\n",

"

18:35:18\n",

"

114.201401\n",

"

22.649700\n",

"

0\n",

"

0\n",

"

\n",

"

\n",

"

3\n",

"

22271\n",

"

16:02:46\n",

"

114.233498\n",

"

22.725901\n",

"

0\n",

"

24\n",

"

\n",

"

\n",

"

4\n",

"

22271\n",

"

21:41:17\n",

"

114.233597\n",

"

22.720900\n",

"

0\n",

"

19\n",

"

\n",

"

\n",

"

\n",

"

"

],

"text/plain": [

" VehicleNum Stime Lng Lat OpenStatus Speed\n",

"0 22271 22:54:04 114.167000 22.718399 0 0\n",

"1 22271 18:26:26 114.190598 22.647800 0 4\n",

"2 22271 18:35:18 114.201401 22.649700 0 0\n",

"3 22271 16:02:46 114.233498 22.725901 0 24\n",

"4 22271 21:41:17 114.233597 22.720900 0 19"

]

},

"execution_count": 3,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"#显示数据的前5行\n",

"data.head(5)"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"数据的格式:\n",

"\n",

">VehicleNum —— 车牌 \n",

"Stime —— 时间 \n",

"Lng —— 经度 \n",

"Lat —— 纬度 \n",

"OpenStatus —— 是否有乘客(0没乘客,1有乘客) \n",

"Speed —— 速度 "

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"# 基础的数据操作"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"## DataFrame和Series"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"DataFrame和Series\n",

"\n",

" > 当我们读一个数据的时候,我们读进来的就是DataFrame格式的数据表,而一个DataFrame中的每一列,则为一个Series \n",

" 也就是说,DataFrame由多个Series组成\n"

]

},

{

"cell_type": "code",

"execution_count": 87,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:25.713432Z",

"start_time": "2020-01-18T04:52:25.708450Z"

}

},

"outputs": [

{

"data": {

"text/plain": [

"pandas.core.frame.DataFrame"

]

},

"execution_count": 87,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"type(data)"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"如果我们想取DataFrame的某一列,想得到的是Series,那么直接用以下代码\n",

"\n",

" > data[列名]"

]

},

{

"cell_type": "code",

"execution_count": 88,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:32.097575Z",

"start_time": "2020-01-18T04:52:32.090592Z"

}

},

"outputs": [

{

"data": {

"text/plain": [

"pandas.core.series.Series"

]

},

"execution_count": 88,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"type(data['Lng'])"

]

},

{

"cell_type": "markdown",

"metadata": {

"ExecuteTime": {

"end_time": "2019-09-06T09:22:43.642625Z",

"start_time": "2019-09-06T09:22:43.638487Z"

}

},

"source": [

"如果我们想取DataFrame的某一列或者某几列,想得到的是DataFrame,那么直接用以下代码\n",

"\n",

"> data2[[列名,列名]]"

]

},

{

"cell_type": "code",

"execution_count": 89,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:33.013124Z",

"start_time": "2020-01-18T04:52:32.990186Z"

}

},

"outputs": [

{

"data": {

"text/plain": [

"pandas.core.frame.DataFrame"

]

},

"execution_count": 89,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"type(data[['Lng']])"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"## 数据的筛选"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"数据的筛选:\n",

"\n",

" 在筛选数据的时候,我们一般用data[条件]的格式\n",

" 其中的条件,是对data每一行数据的true和false布尔变量的Series"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

" 例如,我们想得到车牌照为22271的所有数据\n",

" 首先我们要获得一个布尔变量的Series,这个Series对应的是data的每一行,如果车牌照为\"粤B4H2K8\"则为true,不是则为false\n",

" 这样子的Series很容易获得,只需要\n",

" data['VehicleNum']==22271"

]

},

{

"cell_type": "code",

"execution_count": 90,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:44.078571Z",

"start_time": "2020-01-18T04:52:44.049646Z"

}

},

"outputs": [

{

"data": {

"text/plain": [

"0 True\n",

"1 True\n",

"2 True\n",

"3 True\n",

"4 True\n",

"Name: VehicleNum, dtype: bool"

]

},

"execution_count": 90,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"(data['VehicleNum']==22271).head(5)"

]

},

{

"cell_type": "code",

"execution_count": 92,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:51.723416Z",

"start_time": "2020-01-18T04:52:51.688510Z"

}

},

"outputs": [

{

"data": {

"text/html": [

"

\n",

"

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"\n",

"

" \n",

"

\n",

"

\n",

"

VehicleNum\n",

"

Stime\n",

"

Lng\n",

"

Lat\n",

"

OpenStatus\n",

"

Speed\n",

"

\n",

"

\n",

"

\n",

"

\n",

"

0\n",

"

22271\n",

"

22:54:04\n",

"

114.167000\n",

"

22.718399\n",

"

0\n",

"

0\n",

"

\n",

"

\n",

"

1\n",

"

22271\n",

"

18:26:26\n",

"

114.190598\n",

"

22.647800\n",

"

0\n",

"

4\n",

"

\n",

"

\n",

"

2\n",

"

22271\n",

"

18:35:18\n",

"

114.201401\n",

"

22.649700\n",

"

0\n",

"

0\n",

"

\n",

"

\n",

"

3\n",

"

22271\n",

"

16:02:46\n",

"

114.233498\n",

"

22.725901\n",

"

0\n",

"

24\n",

"

\n",

"

\n",

"

4\n",

"

22271\n",

"

21:41:17\n",

"

114.233597\n",

"

22.720900\n",

"

0\n",

"

19\n",

"

\n",

"

\n",

"

\n",

"

"

],

"text/plain": [

" VehicleNum Stime Lng Lat OpenStatus Speed\n",

"0 22271 22:54:04 114.167000 22.718399 0 0\n",

"1 22271 18:26:26 114.190598 22.647800 0 4\n",

"2 22271 18:35:18 114.201401 22.649700 0 0\n",

"3 22271 16:02:46 114.233498 22.725901 0 24\n",

"4 22271 21:41:17 114.233597 22.720900 0 19"

]

},

"execution_count": 92,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"#得到车牌照为22271的所有数据\n",

"data[data['VehicleNum']==22271].head(5)"

]

},

{

"cell_type": "markdown",

"metadata": {},

"sou

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_39765840/article/details/110111915

智能推荐

基于Kepler.gl 和 Google Earth Engine 卫星数据创建拉伸多边形地图-程序员宅基地

文章浏览阅读965次,点赞18次,收藏21次。现在我们有了 2021 年和 2023 年的 NDVI 数据帧,我们需要从 2021 年的值中减去 2023 年的值以捕获 NDVI 的差异。该数据集包括像素级别的植被值,我们将编写一个自定义函数来根据红色和绿色波段的表面反射率计算 NDVI。在我的上一篇文章中,我演示了如何将单个多边形分割/镶嵌为一组大小均匀的六边形。现在我们有了植被损失数据,让我们使用 Kepler.gl 可视化每个六边形的植被损失。将地图保存为 HTML 文件,在浏览器中打开 HTML 以获得更好的视图。现在我们将调用该函数并使用、

Echarts绘制任意数据的正态分布图_echarts正态分布图-程序员宅基地

文章浏览阅读3.3k次,点赞6次,收藏5次。正态分布,又称高斯分布或钟形曲线,是统计学中最为重要和常用的分布之一。_echarts正态分布图

Android中发送短信等普通方法_android bundle.get("pdus");-程序员宅基地

文章浏览阅读217次。首先要在Mainfest.xml中加入所需要的权限:[html] view plain copyprint?uses-permission android:name="android.permission.SEND_SMS"/> uses-permission android:name="android.permission.READ_SMS"/> _android bundle.get("pdus");

2021-07-26 WSL2 的安装和联网_wsl2 联网-程序员宅基地

文章浏览阅读2.6k次。0、说明最近在学习 Data Assimilation Research Testbed (DART) 相关内容,其软件是在 Unix/Linux 操作系统下编译和运行的 ,由于我的电脑是 Windows 10 的,DART 推荐可以使用 Windows Subsystem For Linux (WSL) 来创建一个 Windows 下的 Linux 子系统。以下的内容主要介绍如何安装 WSL2,以及 WSL2 的联网。1、如何在 Windows 10 下安装WSL具体的安装流程可以在 microso_wsl2 联网

DATABASE_LINK 数据库连接_添加 database link重复的数据库链接命-程序员宅基地

文章浏览阅读1k次。DB_LINK 介绍在本机数据库orcl上创建了一个prod_link的publicdblink(使用远程主机的scott用户连接),则用sqlplus连接到本机数据库,执行select * from scott.emp@prod_link即可以将远程数据库上的scott用户下的emp表中的数据获取到。也可以在本地建一个同义词来指向scott.emp@prod_link,这样取值就方便多了..._添加 database link重复的数据库链接命

云-腾讯云-实时音视频:实时音视频(TRTC)-程序员宅基地

文章浏览阅读3.1k次。ylbtech-云-腾讯云-实时音视频:实时音视频(TRTC)支持跨终端、全平台之间互通,从零开始快速搭建实时音视频通信平台1.返回顶部 1、腾讯实时音视频(Tencent Real-Time Communication,TRTC)拥有QQ十几年来在音视频技术上的积累,致力于帮助企业快速搭建低成本、高品质音视频通讯能力的完整解决方案。..._腾讯实时音视频 分享链接

随便推点

用c语言写个日历表_农历库c语言-程序员宅基地

文章浏览阅读534次,点赞10次,收藏8次。编写一个完整的日历表需要处理许多细节,包括公历和农历之间的转换、节气、闰年等。运行程序后,会输出指定年份的日历表。注意,这个程序只是一个简单的示例,还有很多可以改进和扩展的地方,例如添加节气、节日等。_农历库c语言

FL Studio21.1.1.3750中文破解百度网盘下载地址含Crack补丁_fl studio 21 注册机-程序员宅基地

文章浏览阅读1w次,点赞28次,收藏27次。FL Studio21.1.1.3750中文破解版是最优秀、最繁荣的数字音频工作站 (DAW) 之一,日新月异。它是一款录音机和编辑器,可让您不惜一切代价制作精美的音乐作品并保存精彩的活动画廊。为方便用户,FL Studio 21提供三种不同的版本——Fruity 版、Producer 版和签名版。所有这些版本都是独一无二的,同样具有竞争力。用户可以根据自己的需要选择其中任何一种。FL Studio21.1.1.3750中文版可以说是一站式综合音乐制作单位,可以让您录制、作曲、混音和编辑音乐。_fl studio 21 注册机

冯.诺伊曼体系结构的计算机工作原理是,冯 诺依曼型计算机的工作原理是什么...-程序员宅基地

文章浏览阅读1.3k次。冯诺依曼计算机工作原理冯 诺依曼计算机工作原理的核心是 和 程序控制世界上不同型号的计算机,就其工作原理而言,一般都是认为冯 诺依曼提出了什么原理冯 诺依曼原理中,计算机硬件系统由那五大部分组成的 急急急急急急急急急急急急急急急急急急急急急急冯诺依曼结构计算机工作原理的核心冯诺依曼结构和现代计算机结构模型 转载重学计算机组成原理 一 冯 诺依曼体系结构从冯.诺依曼的存储程序工作原理及计算机的组成来..._简述冯诺依曼计算机结构及工作原理

四国军棋引擎开发(2)简单的事件驱动模型下棋-程序员宅基地

文章浏览阅读559次。这次在随机乱下的基础上加上了一些简单的处理,如进营、炸棋、吃子等功能,在和敌方棋子产生碰撞之后会获取敌方棋子大小的一些信息,目前采用的是事件驱动模型,当下完一步棋界面返回结果后会判断是否触发了相关事件,有事件发生则处理相关事件,没有事件发生则仍然是随机下棋。1.事件驱动模型首先定义一个各种事件的枚举变量,目前的事件有工兵吃子,摸暗棋,进营,明确吃子,炸棋。定义如下:enum MoveE..._军棋引擎

STL与泛型编程-第一周笔记-Geekband-程序员宅基地

文章浏览阅读85次。1, 模板观念与函数模板简单模板: template< typename T > T Function( T a, T b) {… }类模板: template struct Object{……….}; 函数模板 template< class T> inline T Function( T a, T b){……} 不可以使用不同型别的..._geekband 讲义

vb.net正则表达式html,VB.Net常用的正则表达式(实例)-程序员宅基地

文章浏览阅读158次。"^\d+$"  //非负整数(正整数 + 0)"^[0-9]*[1-9][0-9]*$"  //正整数"^((-\d+)|(0+))$"  //非正整数(负整数 + 0)"^-[0-9]*[1-9][0-9]*$"  //负整数"^-?\d+$"    //整数"^\d+(\.\d+)?$"  //非负浮点数(正浮点数 + 0)"^(([0-9]+\.[0-9]*[1-9][0-9]*)|([0..._vb.net 正则表达式 取html中的herf