python 股票数据接口
Note from the editors: Towards Data Science is a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. To learn more about the coronavirus pandemic, you can click here.
编辑注意: 《迈向数据科学》 是一本中等出版物,主要基于对数据科学和机器学习的研究。 我们不是卫生专业人员或流行病学家,因此本文的观点不应被解释为专业建议。 要了解有关冠状病毒大流行的更多信息,请单击 此处 。
The goal of COVID-19 Data Hub is to provide the research community with a unified dataset by collecting worldwide fine-grained case data, merged with exogenous variables helpful for a better understanding of COVID-19. Please agree to the Terms of Use and cite the following reference when using it:
COVID-19数据中心的目标是通过收集全球范围内的细粒度病例数据,并与有助于更好地理解COVID-19的外生变量合并,为研究社区提供统一的数据集 。 请同意使用条款,并在使用时引用以下参考资料:
Guidotti, E., Ardia, D., (2020). COVID-19 Data Hub Journal of Open Source Software, 5(51):2376 https://doi.org/10.21105/joss.02376
Guidotti,E.,Ardia,D.,(2020年)。 COVID-19 Data Hub Journal of Open Source Software , 5 (51):2376 https://doi.org/10.21105/joss.02376
设置和使用 (Setup and usage)
Install from pip with
从pip安装
pip install covid19dh
Importing the main function covid19()
导入主要功能covid19()
from covid19dh import covid19
x, src = covid19()
返回值 (Return values)
The function covid19()
returns 2 pandas dataframes:
函数covid19()
返回2个熊猫数据帧:
- the data and 数据和
- references to the data sources. 对数据源的引用。
参数化 (Parametrization)
国家 (Country)
List of country names (case-insensitive) or ISO codes (alpha-2, alpha-3 or numeric). The list of ISO codes can be found here.
国家名称(不区分大小写)或ISO代码(alpha-2,alpha-3或数字)的列表。 ISO代码列表可在此处找到。
Fetching data from a particular country:
从特定国家/地区获取数据:
x, src = covid19("USA") # Unites States
Specify multiple countries at the same time:
同时指定多个国家/地区:
x, src = covid19(["ESP","PT","andorra",250])
If country
is omitted, the whole dataset is returned:
如果省略country
,则返回整个数据集:
x, src = covid19()
原始数据 (Raw data)
Logical. Skip data cleaning? Default True
. If raw=False
, the raw data are cleaned by filling missing dates with NaN
values. This ensures that all locations share the same grid of dates and no single day is skipped. Then, NaN
values are replaced with the previous non-NaN
value or 0
.
逻辑上。 跳过数据清理? 默认True
。 如果raw=False
,则通过使用NaN
值填充缺失的日期来清理原始数据。 这样可以确保所有地点共享相同的日期网格,并且不会跳过任何一天。 然后,将NaN
值替换为先前的非NaN
值或0
。
x, src = covid19(raw = False)
日期过滤器 (Date filter)
Date can be specified with datetime.datetime
, datetime.date
or as a str
in format YYYY-mm-dd
.
可以使用datetime.datetime
, datetime.date
或以YYYY-mm-dd
格式的str
来指定datetime.datetime
。
from datetime import datetime
x, src = covid19("SWE", start = datetime(2020,4,1), end = "2020-05-01")
水平 (Level)
Integer. Granularity level of the data:
整数。 数据的粒度级别:
- Country level 国家一级
- State, region or canton level 州,地区或州级别
- City or municipality level 城市或市政级别
from datetime import date
x, src = covid19("USA", level = 2, start = date(2020,5,1))
快取 (Cache)
Logical. Memory caching? Significantly improves performance on successive calls. By default, using the cached data is enabled.
逻辑上。 内存缓存? 大大提高了后续呼叫的性能。 默认情况下,启用使用缓存的数据。
Caching can be disabled (e.g. for long running programs) by:
可以通过以下方式禁用缓存(例如,对于长时间运行的程序):
x, src = covid19("FRA", cache = False)
年份酒 (Vintage)
Logical. Retrieve the snapshot of the dataset that was generated at the end
date instead of using the latest version. Default False
.
逻辑上。 检索在end
日期而不是使用最新版本生成的数据集的快照。 默认为False
。
To fetch e.g. US data that were accessible on 22th April 2020 type
例如获取在2020年4月22日可以访问的美国数据,请输入
x, src = covid19("USA", end = "2020-04-22", vintage = True)
The vintage data are collected at the end of the day, but published with approximately 48 hour delay, once the day is completed in all the timezones.
每天结束时收集年份数据,但是一旦在所有时区中完成了一天,发布时间就会延迟约48小时。
Hence if vintage = True
, but end
is not set, warning is raised and None
is returned.
因此,如果vintage = True
,但未设置end
则会发出警告,并且None
返回None
。
x, src = covid19("USA", vintage = True) # too early to get today's vintageUserWarning: vintage data not available yet
数据源 (Data Sources)
The data sources are returned as second value.
数据源作为第二个值返回。
from covid19dh import covid19
x, src = covid19("USA")
print(src)
结论 (Conclusions)
COVID-19 Data Hub harmonizes the amount of heterogeneous data that have become available around the pandemic. It represents a first effort towards open public data standards and sharing in light of COVID-19. Publications using COVID-19 Data Hub are available here.
COVID-19数据中心协调了大流行周围已经可用的异构数据量。 它代表着根据COVID-19迈向开放公共数据标准和共享的第一步。 此处提供了使用COVID-19数据中心的出版物。
致谢 (Acknowledgments)
COVID-19 Data Hub is supported by the Institute for Data Valorization IVADO, Canada. The covid19dh package was developed by Martin Beneš.
[1] Guidotti, E., Ardia, D., (2020). COVID-19 Data Hub, Journal of Open Source Software, 5(51):2376
[1] Guidotti,E.,Ardia,D。,(2020年)。 COVID-19数据中心 ,开源软件杂志,5(51):2376
翻译自: https://towardsdatascience.com/python-interface-to-covid-19-data-hub-c2b3f69497af
python 股票数据接口