python有两个包可以连接vertica,分别为pyodbc和vertica_python,这里给出使用vertica_python的教程;
- 1.安装python,这里选择下载anaconda,使用spyder;
- 2.安装vertica_python,打开anaconda命令行,pip install vertica_python;若在没有网络的情况下使用,可以先在能够联网的电脑上下载vertica_python,然后把文件导出安装即可;
pip download -d /vt/ vertica_python
pip install /vt/six-1.15.-py2.py3-none-any.whl
pip install /vt/python_dateutil-2.8.1-py2.py3-none-any.whl
pip install /vt/vertica_python-1.0.1-py2.py3-none-any.whl
3.连接vertica数据库
conn=vertica_python.connect('database'=保密,'user'=保密,'password'=保密,'host'=保密,'port'=保密)
cur=conn.cursor()
cur.execute(sql语句)
conn.commit()
cur.close()
4.将文件导入到vertica,有两种方法,一是逐条插入;二是copy导入;
这里令表名为csaprp.test;字段为statis_date,num;文件所在目录为C:\Users\测试.csv
import vertica_python
import pandas as pd
df=pd.read_csv(r'C:\Users\测试.csv')
table_nm='csaprp.test'
row_n=len(df) #行数
x=df.columns.values #列名
col_nm=','.join(x) #列名转为字符串
sql=''
conn=vertica_python.connect('database'=保密,'user'=保密,'password'=保密,'host'=保密,'port'=保密)
cur=conn.cursor()
for i in range(row_n):
y=df[i:i+1].values[0].astype(str)
y_value="','".join(y)
y_final="'"+y_value+"'"
y_final=y_final.replace("'",'''"''') #单引号变双引号
sql='''insert into '%(table_nm)s' ('%(col_nm)s') values ('%(y_final)s')'''%vars()
sql=sql.replace("'",'').replace("\n","").replace('''"''',"'").replace("'nan'",'0')#将单引号为空,\n为空,单引号变单引号,空值为0
cur.execute(sql)
conn.commit()
cur.close()
利用copy直接导入数据(效率高)
import vertica_python
import pandas as pd
conn=vertica_python.connect('database'=保密,'user'=保密,'password'=保密,'host'=保密,'port'=保密)
cur=conn.cursor()
with open(r"c:\users\测试.csv" "rb") as fs:
cur.copy("copy csaprp.test(statis_date,num) from stdin delimiter ',' enclosed by '\"'", fs, buffer_size=65536)
conn.commit()
cur.close()
来源 https://zhuanlan.zhihu.com/p/366146998