HBase读写性能和几个参数有密切关系,比如cache和batch会影响读, 而write buffer会影响写,另外除了参数会影响,在程序里怎么处理也极大的影响插入性能,诸如List比一条一条put性能是否要高呢? 网上大部分言论是否正确呢?今天我会通过程序读取HBASE,然后再原封不动的写入另外一张表,对比各个参数的组合对插入的影响。
HTable htable1 = new HTable(hbaseconf, "test2");
Scan scan1 = new Scan();
ResultScanner scaner = htable.getScanner(scan1);
List list = new ArrayList();
scan1.setCaching(300);
htable1.setWriteBufferSize(6*1024*1024);
htable1.setAutoFlush(false);
put.setWriteToWAL(false)
测试一:
测试二:
测试三:
测试四:
通过以上几个测试, setAutoFlush参数对性能影响大,不管是通过List 还是直接put, 另外write buffer影响对List 有较大影响。 本身的List和put 好像差距不大,甚至说没有差距。
所以对于Hbase使用put插入,主要关注2个参数,一个是write buffer,一个就是setAutoFlush.
整个测试程序:
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.util.Bytes;
import com.sun.java_cup.internal.runtime.Scanner;
import com.sun.org.apache.xpath.internal.operations.Mod;
public class filterTest {
public static void main(String[] args) throws IOException {
SimpleDateFormat dateformat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Configuration hbaseconf = HBaseConfiguration.create();
hbaseconf.set("hbase.zookeeper.quorum",
"datanode01.isesol.com,datanode02.isesol.com,datanode03.isesol.com,datanode04.isesol.com,cmserver.isesol.com");
hbaseconf.set("hbase.zookeeper.property.clientPort", "2181");
hbaseconf.set("user", "hdfs");
HTable htable = new HTable(hbaseconf, "t_ui_all");
HTable htable1 = new HTable(hbaseconf, "test2");
Scan scan1 = new Scan();
scan1.setCaching(300);
/*Filter rowfilter = new RowFilter(CompareOp.EQUAL,
new BinaryPrefixComparator(Bytes.toBytes("A131420033-1007-9223370539574828268")));
Filter rowfilter1 = new RowFilter(CompareOp.EQUAL,
new BinaryComparator(Bytes.toBytes("A131420033-1007-9223370539574828268"))); */
// scan1.setRowPrefixFilter(Bytes.toBytes("A131420033-1007-9223370539574828268"));
// Filter filter = new SingleColumnValueFilter(Bytes.toBytes("cf"),
// Bytes.toBytes("fault_level2_name"), CompareOp.EQUAL,
// Bytes.toBytes("电气问题"));
// scan1.setFilter(rowfilter);
// scan1.setRowPrefixFilter(Bytes.toBytes("A131420033-1007-9223370539574828268"));
ResultScanner scaner = htable.getScanner(scan1);
List list = new ArrayList();
Result result = null;
int j = 0;
System.out.println("start to scan original table and put this result into List" + dateformat.format(System.currentTimeMillis()));
htable1.setWriteBufferSize(6*1024*1024);
//htable1.setAutoFlush(false);
while (scaner.iterator().hasNext()) {
result = scaner.next();
Put put = new Put(result.getRow());
//put.setWriteToWAL(false);
for (int i = 0; i <= result.listCells().size() - 1; i++) {
put.add("cf".getBytes(), Bytes.toBytes(new String(result.listCells().get(i).getQualifier())), result
.getValue("cf".getBytes(), new String(result.listCells().get(i).getQualifier()).getBytes()));
}
/* j++;
htable1.put(put);
System.out.println("total number is " + j + " start to put these data into hbase");*/
list.add(put);
j++;
if(j % 500 == 0){
System.out.println("total number is " + j + " start to put these data into hbase" + list.size());
htable1.put(list);
list.clear();
}
}
htable1.put(list);
htable1.close();
htable.close();
System.out.println("Job finish" + dateformat.format(System.currentTimeMillis()));
}
}
HTable htable1 = new HTable(hbaseconf, "test2");
Scan scan1 = new Scan();
ResultScanner scaner = htable.getScanner(scan1);
List list = new ArrayList();
scan1.setCaching(300);
htable1.setWriteBufferSize(6*1024*1024);
htable1.setAutoFlush(false);
put.setWriteToWAL(false)
测试一:
方法 | 参数 | 时间 | 插入条数 | 结果比较 |
put | setWriteToWAL(false) setCaching(300) setWriteBufferSize(6*1024*1024) setAutoFlush(false) | 1分钟 | 105000 | 所有参数给到优的时候,2者性能旗鼓相当 |
List | List<500> setWriteToWAL(false) setCaching(300) setWriteBufferSize(6*1024*1024) setAutoFlush(false) | 1分钟 | 105000 |
测试二:
方法 | 参数 | 时间 | 插入条数 | 结果比较 |
put | setWriteToWAL(true) setCaching(300) setWriteBufferSize(6*1024*1024) setAutoFlush(false) | 1分钟 | 95000 | 开启不写Wal log好像没有影响,哪怕对put也影响不大 |
List | List<500> setWriteToWAL(true) setCaching(300) setWriteBufferSize(6*1024*1024) setAutoFlush(false) | 1分钟 | 105000 |
测试三:
方法 | 参数 | 时间 | 插入条数 | 结果比较 |
put | setWriteToWAL(true) setCaching(300) setWriteBufferSize(1*1024*1024) setAutoFlush(false) | 1分钟 | 95000 | write buffer List 影响较大,但是对Put好像 没什么影响 |
List | List<500> setWriteToWAL(true) setCaching(300) setWriteBufferSize(1*1024*1024) setAutoFlush(false) | 1分钟 | 75000 |
测试四:
方法 | 参数 | 时间 | 插入条数 | 结果比较 |
put | setWriteToWAL(true) setCaching(300) setWriteBufferSize(6*1024*1024) setAutoFlush(true) | 1分钟 | 20000 | auto flush 对Put影响极大,但是对List没影响要少很多 |
List | List<500> setWriteToWAL(true) setCaching(300) setWriteBufferSize(6*1024*1024) setAutoFlush(true) | 1分钟 | 65000 |
通过以上几个测试, setAutoFlush参数对性能影响大,不管是通过List 还是直接put, 另外write buffer影响对List 有较大影响。 本身的List和put 好像差距不大,甚至说没有差距。
所以对于Hbase使用put插入,主要关注2个参数,一个是write buffer,一个就是setAutoFlush.
整个测试程序:
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.util.Bytes;
import com.sun.java_cup.internal.runtime.Scanner;
import com.sun.org.apache.xpath.internal.operations.Mod;
public class filterTest {
public static void main(String[] args) throws IOException {
SimpleDateFormat dateformat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Configuration hbaseconf = HBaseConfiguration.create();
hbaseconf.set("hbase.zookeeper.quorum",
"datanode01.isesol.com,datanode02.isesol.com,datanode03.isesol.com,datanode04.isesol.com,cmserver.isesol.com");
hbaseconf.set("hbase.zookeeper.property.clientPort", "2181");
hbaseconf.set("user", "hdfs");
HTable htable = new HTable(hbaseconf, "t_ui_all");
HTable htable1 = new HTable(hbaseconf, "test2");
Scan scan1 = new Scan();
scan1.setCaching(300);
/*Filter rowfilter = new RowFilter(CompareOp.EQUAL,
new BinaryPrefixComparator(Bytes.toBytes("A131420033-1007-9223370539574828268")));
Filter rowfilter1 = new RowFilter(CompareOp.EQUAL,
new BinaryComparator(Bytes.toBytes("A131420033-1007-9223370539574828268"))); */
// scan1.setRowPrefixFilter(Bytes.toBytes("A131420033-1007-9223370539574828268"));
// Filter filter = new SingleColumnValueFilter(Bytes.toBytes("cf"),
// Bytes.toBytes("fault_level2_name"), CompareOp.EQUAL,
// Bytes.toBytes("电气问题"));
// scan1.setFilter(rowfilter);
// scan1.setRowPrefixFilter(Bytes.toBytes("A131420033-1007-9223370539574828268"));
ResultScanner scaner = htable.getScanner(scan1);
List list = new ArrayList();
Result result = null;
int j = 0;
System.out.println("start to scan original table and put this result into List" + dateformat.format(System.currentTimeMillis()));
htable1.setWriteBufferSize(6*1024*1024);
//htable1.setAutoFlush(false);
while (scaner.iterator().hasNext()) {
result = scaner.next();
Put put = new Put(result.getRow());
//put.setWriteToWAL(false);
for (int i = 0; i <= result.listCells().size() - 1; i++) {
put.add("cf".getBytes(), Bytes.toBytes(new String(result.listCells().get(i).getQualifier())), result
.getValue("cf".getBytes(), new String(result.listCells().get(i).getQualifier()).getBytes()));
}
/* j++;
htable1.put(put);
System.out.println("total number is " + j + " start to put these data into hbase");*/
list.add(put);
j++;
if(j % 500 == 0){
System.out.println("total number is " + j + " start to put these data into hbase" + list.size());
htable1.put(list);
list.clear();
}
}
htable1.put(list);
htable1.close();
htable.close();
System.out.println("Job finish" + dateformat.format(System.currentTimeMillis()));
}
}
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/17036462/viewspace-2141443/,如需转载,请注明出处,否则将追究法律责任。