HBase中文参考指南 3.0

Apache HBase 外部 APIs

贡献者：xixici

本章将介绍通过非 Java 语言和自定义协议访问 Apache HBase。有关使用本机 HBase API 的信息，请参阅User API Reference HBase APIs

97. REST

表述性状态转移 Representational State Transfer (REST)于 2000 年在 HTTP 规范的主要作者之一 Roy Fielding 的博士论文中引入。

REST 本身超出了本文档的范围，但通常，REST 允许通过与 URL 本身绑定的 API 进行客户端 - 服务器交互。本节讨论如何配置和运行 HBase 附带的 REST 服务器，该服务器将 HBase 表，行，单元和元数据公开为 URL 指定的资源。还有一系列关于如何使用 Jesse Anderson 的 Apache HBase REST 接口的博客 How-to: Use the Apache HBase REST Interface

97.1. 启动和停止 REST 服务

包含的 REST 服务器可以作为守护程序运行，该守护程序启动嵌入式 Jetty servlet 容器并将 servlet 部署到其中。使用以下命令之一在前台或后台启动 REST 服务器。端口是可选的，默认为 8080。

# Foreground
$ bin/hbase rest start -p <port>

# Background, logging to a file in $HBASE_LOGS_DIR
$ bin/hbase-daemon.sh start rest -p <port>

要停止 REST 服务器，请在前台运行时使用 Ctrl-C，如果在后台运行则使用以下命令。

$ bin/hbase-daemon.sh stop rest

97.2. 配置 REST 服务器和客户端

有关为 SSL 配置 REST 服务器和客户端以及为 REST 服务器配置doAs 模拟的信息，请参阅配置 Thrift 网关以代表客户端进行身份验证以及 Securing Apache HBase 。

97.3. 使用 REST

以下示例使用占位符服务器 http://example.com:8000，并且可以使用curl或wget命令运行以下命令。您可以通过不为纯文本添加头信息来请求纯文本(默认)，XML 或 JSON 输出，或者为 XML 添加头信息“Accept：text / xml”，为 JSON 添加“Accept：application / json”或为协议缓冲区添加“Accept: application/x-protobuf”。

除非指定，否则使用GET请求进行查询，PUT或POST请求进行创建或修改，DELETE用于删除。

端口	HTTP 名词	描述	示例
`/version/cluster`	`GET`	集群上运行 HBase 版本	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/version/cluster"`
`/status/cluster`	`GET`	集群状态	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/status/cluster"`
`/`	`GET`	列出所有非系统表格	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/"`

端口	HTTP 名词	描述	示例
`/namespaces`	`GET`	列出所有命名空间	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/namespaces/"`
`/namespaces/_namespace_`	`GET`	描述指定命名空间	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/namespaces/special_ns"`
`/namespaces/_namespace_`	`POST`	创建命名空间	`curl -vi -X POST \ -H "Accept: text/xml" \ "example.com:8000/namespaces/special_ns"`
`/namespaces/_namespace_/tables`	`GET`	列出指定命名空间内所有表	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/namespaces/special_ns/tables"`
`/namespaces/_namespace_`	`PUT`	更改现有命名空间,未使用	`curl -vi -X PUT \ -H "Accept: text/xml" \ "http://example.com:8000/namespaces/special_ns`
`/namespaces/_namespace_`	`DELETE`	删除命名空间.必须为空	`curl -vi -X DELETE \ -H "Accept: text/xml" \ "example.com:8000/namespaces/special_ns"`

端口	HTTP 名词	描述	示例
`/_table_/schema`	`GET`	描述指定表结构	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/users/schema"`
`/_table_/schema`	`POST`	更新表结构	`curl -vi -X POST \ -H "Accept: text/xml" \ -H "Content-Type: text/xml" \ -d '<?xml version="1.0" encoding="UTF-8"?><TableSchema name="users"><ColumnSchema name="cf" KEEP_DELETED_CELLS="true" /></TableSchema>' \ "http://example.com:8000/users/schema"`
`/_table_/schema`	`PUT`	新建表, 更新已有表结构	`curl -vi -X PUT \ -H "Accept: text/xml" \ -H "Content-Type: text/xml" \ -d '<?xml version="1.0" encoding="UTF-8"?><TableSchema name="users"><ColumnSchema name="cf" /></TableSchema>' \ "http://example.com:8000/users/schema"`
`/_table_/schema`	`DELETE`	删除表. 必须使用`/_table_/schema` 不仅仅 `/_table_/`.	`curl -vi -X DELETE \ -H "Accept: text/xml" \ "http://example.com:8000/users/schema"`
`/_table_/regions`	`GET`	列出表区域	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/users/regions`

端口	HTTP 名词	描述	示例
`/_table_/_row_`	`GET`	获取单行的所有列。值为 Base-64 编码。这需要“Accept”请求标头，其类型可以包含多个列 (like xml, json or protobuf).	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/users/row1"`
`/_table_/_row_/_column:qualifier_/_timestamp_`	`GET`	获取单个列的值。值为 Base-64 编码。	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/users/row1/cf:a/1458586888395"`
`/_table_/_row_/_column:qualifier_`	`GET`	获取单个列的值。值为 Base-64 编码。	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/users/row1/cf:a" 或者 curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/users/row1/cf:a/"`
`/_table_/_row_/_column:qualifier_/?v=_number_of_versions_`	`GET`	获取指定单元格的指定版本数. 获取单个列的值。值为 Base-64 编码。	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/users/row1/cf:a?v=2"`

端口	HTTP 名词	描述	示例
`/_table_/scanner/`	`PUT`	获取 Scanner 对象。所有其他扫描操作都需要。将批处理参数调整为扫描应在批处理中返回的行数。请参阅下一个向扫描仪添加过滤器的示例。扫描程序端点 URL 作为 HTTP 响应中的 `Location`返回。此表中的其他示例假定扫描仪端点为`http://example.com:8000/users/scanner/145869072824375522207`。	`curl -vi -X PUT \ -H "Accept: text/xml" \ -H "Content-Type: text/xml" \ -d '<Scanner batch="1"/>' \ "http://example.com:8000/users/scanner/"`
`/_table_/scanner/`	`PUT`	要向扫描程序对象提供过滤器或以任何其他方式配置扫描程序，您可以创建文本文件并将过滤器添加到文件中。例如，要仅返回以 u123 开头的行并使用批量大小为 100 的行，过滤器文件将如下所示：[source,xml] ---- { "type": "PrefixFilter", "value": "u123" } ----将文件传递给`curl`的`-d`参数请求。	`curl -vi -X PUT \ -H "Accept: text/xml" \ -H "Content-Type:text/xml" \ -d @filter.txt \ "http://example.com:8000/users/scanner/"`
`/_table_/scanner/_scanner-id_`	`GET`	从扫描仪获取下一批。单元格值是字节编码的。如果扫描仪已耗尽，则返回 HTTP 状态`204`。	`curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/users/scanner/145869072824375522207"`
`_table_/scanner/_scanner-id_`	`DELETE`	删除所有扫描并释放资源	`curl -vi -X DELETE \ -H "Accept: text/xml" \ "http://example.com:8000/users/scanner/145869072824375522207"`

端口	HTTP 名词	描述	示例
`/_table_/_row_key_`	`PUT`	在表中写一行。行，列限定符和值必须均为 Base-64 编码。要编码字符串，请使用`base64`命令行实用程序。要解码字符串，请使用`base64 -d`。有效负载位于`--data`参数中，`/ users / fakerow`值是占位符。通过将多行添加到`<CellSet>`元素来插入多行。您还可以将要插入的数据保存到文件中，并使用`-d @ filename.txt`等语法将其传递给`-d`参数。	`curl -vi -X PUT \ -H "Accept: text/xml" \ -H "Content-Type: text/xml" \ -d '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row key="cm93NQo="><Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell></Row></CellSet>' \ "http://example.com:8000/users/fakerow" 或者 curl -vi -X PUT \ -H "Accept: text/json" \ -H "Content-Type: text/json" \ -d '{"Row":[{"key":"cm93NQo=", "Cell": [{"column":"Y2Y6ZQo=", "$":"dmFsdWU1Cg=="}]}]}'' \ "example.com:8000/users/fakerow"`

97.4. REST XML 结构

<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="RESTSchema">

  <element name="Version" type="tns:Version"></element>

  <complexType name="Version">
    <attribute name="REST" type="string"></attribute>
    <attribute name="JVM" type="string"></attribute>
    <attribute name="OS" type="string"></attribute>
    <attribute name="Server" type="string"></attribute>
    <attribute name="Jersey" type="string"></attribute>
  </complexType>

  <element name="TableList" type="tns:TableList"></element>

  <complexType name="TableList">
    <sequence>
      <element name="table" type="tns:Table" maxOccurs="unbounded" minOccurs="1"></element>
    </sequence>
  </complexType>

  <complexType name="Table">
    <sequence>
      <element name="name" type="string"></element>
    </sequence>
  </complexType>

  <element name="TableInfo" type="tns:TableInfo"></element>

  <complexType name="TableInfo">
    <sequence>
      <element name="region" type="tns:TableRegion" maxOccurs="unbounded" minOccurs="1"></element>
    </sequence>
    <attribute name="name" type="string"></attribute>
  </complexType>

  <complexType name="TableRegion">
    <attribute name="name" type="string"></attribute>
    <attribute name="id" type="int"></attribute>
    <attribute name="startKey" type="base64Binary"></attribute>
    <attribute name="endKey" type="base64Binary"></attribute>
    <attribute name="location" type="string"></attribute>
  </complexType>

  <element name="TableSchema" type="tns:TableSchema"></element>

  <complexType name="TableSchema">
    <sequence>
      <element name="column" type="tns:ColumnSchema" maxOccurs="unbounded" minOccurs="1"></element>
    </sequence>
    <attribute name="name" type="string"></attribute>
    <anyAttribute></anyAttribute>
  </complexType>

  <complexType name="ColumnSchema">
    <attribute name="name" type="string"></attribute>
    <anyAttribute></anyAttribute>
  </complexType>

  <element name="CellSet" type="tns:CellSet"></element>

  <complexType name="CellSet">
    <sequence>
      <element name="row" type="tns:Row" maxOccurs="unbounded" minOccurs="1"></element>
    </sequence>
  </complexType>

  <element name="Row" type="tns:Row"></element>

  <complexType name="Row">
    <sequence>
      <element name="key" type="base64Binary"></element>
      <element name="cell" type="tns:Cell" maxOccurs="unbounded" minOccurs="1"></element>
    </sequence>
  </complexType>

  <element name="Cell" type="tns:Cell"></element>

  <complexType name="Cell">
    <sequence>
      <element name="value" maxOccurs="1" minOccurs="1">
        <simpleType><restriction base="base64Binary">
        </simpleType>
      </element>
    </sequence>
    <attribute name="column" type="base64Binary" />
    <attribute name="timestamp" type="int" />
  </complexType>

  <element name="Scanner" type="tns:Scanner"></element>

  <complexType name="Scanner">
    <sequence>
      <element name="column" type="base64Binary" minOccurs="0" maxOccurs="unbounded"></element>
    </sequence>
    <sequence>
      <element name="filter" type="string" minOccurs="0" maxOccurs="1"></element>
    </sequence>
    <attribute name="startRow" type="base64Binary"></attribute>
    <attribute name="endRow" type="base64Binary"></attribute>
    <attribute name="batch" type="int"></attribute>
    <attribute name="startTime" type="int"></attribute>
    <attribute name="endTime" type="int"></attribute>
  </complexType>

  <element name="StorageClusterVersion" type="tns:StorageClusterVersion" />

  <complexType name="StorageClusterVersion">
    <attribute name="version" type="string"></attribute>
  </complexType>

  <element name="StorageClusterStatus"
    type="tns:StorageClusterStatus">
  </element>

  <complexType name="StorageClusterStatus">
    <sequence>
      <element name="liveNode" type="tns:Node"
        maxOccurs="unbounded" minOccurs="0">
      </element>
      <element name="deadNode" type="string" maxOccurs="unbounded"
        minOccurs="0">
      </element>
    </sequence>
    <attribute name="regions" type="int"></attribute>
    <attribute name="requests" type="int"></attribute>
    <attribute name="averageLoad" type="float"></attribute>
  </complexType>

  <complexType name="Node">
    <sequence>
      <element name="region" type="tns:Region"
   maxOccurs="unbounded" minOccurs="0">
      </element>
    </sequence>
    <attribute name="name" type="string"></attribute>
    <attribute name="startCode" type="int"></attribute>
    <attribute name="requests" type="int"></attribute>
    <attribute name="heapSizeMB" type="int"></attribute>
    <attribute name="maxHeapSizeMB" type="int"></attribute>
  </complexType>

  <complexType name="Region">
    <attribute name="name" type="base64Binary"></attribute>
    <attribute name="stores" type="int"></attribute>
    <attribute name="storefiles" type="int"></attribute>
    <attribute name="storefileSizeMB" type="int"></attribute>
    <attribute name="memstoreSizeMB" type="int"></attribute>
    <attribute name="storefileIndexSizeMB" type="int"></attribute>
  </complexType>

</schema>

97.5. REST Protobufs 结构

message Version {
  optional string restVersion = 1;
  optional string jvmVersion = 2;
  optional string osVersion = 3;
  optional string serverVersion = 4;
  optional string jerseyVersion = 5;
}

message StorageClusterStatus {
  message Region {
    required bytes name = 1;
    optional int32 stores = 2;
    optional int32 storefiles = 3;
    optional int32 storefileSizeMB = 4;
    optional int32 memstoreSizeMB = 5;
    optional int32 storefileIndexSizeMB = 6;
  }
  message Node {
    required string name = 1;    // name:port
    optional int64 startCode = 2;
    optional int32 requests = 3;
    optional int32 heapSizeMB = 4;
    optional int32 maxHeapSizeMB = 5;
    repeated Region regions = 6;
  }
  // node status
  repeated Node liveNodes = 1;
  repeated string deadNodes = 2;
  // summary statistics
  optional int32 regions = 3;
  optional int32 requests = 4;
  optional double averageLoad = 5;
}

message TableList {
  repeated string name = 1;
}

message TableInfo {
  required string name = 1;
  message Region {
    required string name = 1;
    optional bytes startKey = 2;
    optional bytes endKey = 3;
    optional int64 id = 4;
    optional string location = 5;
  }
  repeated Region regions = 2;
}

message TableSchema {
  optional string name = 1;
  message Attribute {
    required string name = 1;
    required string value = 2;
  }
  repeated Attribute attrs = 2;
  repeated ColumnSchema columns = 3;
  // optional helpful encodings of commonly used attributes
  optional bool inMemory = 4;
  optional bool readOnly = 5;
}

message ColumnSchema {
  optional string name = 1;
  message Attribute {
    required string name = 1;
    required string value = 2;
  }
  repeated Attribute attrs = 2;
  // optional helpful encodings of commonly used attributes
  optional int32 ttl = 3;
  optional int32 maxVersions = 4;
  optional string compression = 5;
}

message Cell {
  optional bytes row = 1;       // unused if Cell is in a CellSet
  optional bytes column = 2;
  optional int64 timestamp = 3;
  optional bytes data = 4;
}

message CellSet {
  message Row {
    required bytes key = 1;
    repeated Cell values = 2;
  }
  repeated Row rows = 1;
}

message Scanner {
  optional bytes startRow = 1;
  optional bytes endRow = 2;
  repeated bytes columns = 3;
  optional int32 batch = 4;
  optional int64 startTime = 5;
  optional int64 endTime = 6;
}

98. Thrift

相关文档已转移到 Thrift API and Filter Language.

99. C/C++ Apache HBase 客户端

FB’s Chip Turner 编写了纯正 C/C++ 客户端. Check it out.

C++ client 实现. 详见: HBASE-14850.

100. 将 Java Data Objects (JDO) 和 HBase 一起使用

Java 数据对象 Java Data Objects (JDO) 是一种访问数据库中持久数据的标准方法，使用普通的旧 Java 对象（POJO）来表示持久数据。

依赖

此代码示例具有以下依赖项：

HBase 0.90.x 或更高版本
commons-beanutils.jar (https://commons.apache.org/)
commons-pool-1.5.5.jar (https://commons.apache.org/)
transactional-tableindexed for HBase 0.90 (https://github.com/hbase-trx/hbase-transactional-tableindexed)

下载 hbase-jdo

下载源码 http://code.google.com/p/hbase-jdo/.

Example 26. JDO 示例

此示例使用 JDO 创建一个表和一个索引，在表中插入行，获取行，获取列值，执行查询以及执行一些其他 HBase 操作。

package com.apache.hadoop.hbase.client.jdo.examples;

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.Hashtable;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.client.tableindexed.IndexedTable;

import com.apache.hadoop.hbase.client.jdo.AbstractHBaseDBO;
import com.apache.hadoop.hbase.client.jdo.HBaseBigFile;
import com.apache.hadoop.hbase.client.jdo.HBaseDBOImpl;
import com.apache.hadoop.hbase.client.jdo.query.DeleteQuery;
import com.apache.hadoop.hbase.client.jdo.query.HBaseOrder;
import com.apache.hadoop.hbase.client.jdo.query.HBaseParam;
import com.apache.hadoop.hbase.client.jdo.query.InsertQuery;
import com.apache.hadoop.hbase.client.jdo.query.QSearch;
import com.apache.hadoop.hbase.client.jdo.query.SelectQuery;
import com.apache.hadoop.hbase.client.jdo.query.UpdateQuery;

/**
 * Hbase JDO Example.
 *
 * dependency library.
 * - commons-beanutils.jar
 * - commons-pool-1.5.5.jar
 * - hbase0.90.0-transactionl.jar
 *
 * you can expand Delete,Select,Update,Insert Query classes.
 *
 */
public class HBaseExample {
  public static void main(String[] args) throws Exception {
    AbstractHBaseDBO dbo = new HBaseDBOImpl();

    //*drop if table is already exist.*
    if(dbo.isTableExist("user")){
     dbo.deleteTable("user");
    }

    //*create table*
    dbo.createTableIfNotExist("user",HBaseOrder.DESC,"account");
    //dbo.createTableIfNotExist("user",HBaseOrder.ASC,"account");

    //create index.
    String[] cols={"id","name"};
    dbo.addIndexExistingTable("user","account",cols);

    //insert
    InsertQuery insert = dbo.createInsertQuery("user");
    UserBean bean = new UserBean();
    bean.setFamily("account");
    bean.setAge(20);
    bean.setEmail("ncanis@gmail.com");
    bean.setId("ncanis");
    bean.setName("ncanis");
    bean.setPassword("1111");
    insert.insert(bean);

    //select 1 row
    SelectQuery select = dbo.createSelectQuery("user");
    UserBean resultBean = (UserBean)select.select(bean.getRow(),UserBean.class);

    // select column value.
    String value = (String)select.selectColumn(bean.getRow(),"account","id",String.class);

    // search with option (QSearch has EQUAL, NOT_EQUAL, LIKE)
    // select id,password,name,email from account where id='ncanis' limit startRow,20
    HBaseParam param = new HBaseParam();
    param.setPage(bean.getRow(),20);
    param.addColumn("id","password","name","email");
    param.addSearchOption("id","ncanis",QSearch.EQUAL);
    select.search("account", param, UserBean.class);

    // search column value is existing.
    boolean isExist = select.existColumnValue("account","id","ncanis".getBytes());

    // update password.
    UpdateQuery update = dbo.createUpdateQuery("user");
    Hashtable<String, byte[]> colsTable = new Hashtable<String, byte[]>();
    colsTable.put("password","2222".getBytes());
    update.update(bean.getRow(),"account",colsTable);

    //delete
    DeleteQuery delete = dbo.createDeleteQuery("user");
    delete.deleteRow(resultBean.getRow());

    ////////////////////////////////////
    // etc

    // HTable pool with apache commons pool
    // borrow and release. HBasePoolManager(maxActive, minIdle etc..)
    IndexedTable table = dbo.getPool().borrow("user");
    dbo.getPool().release(table);

    // upload bigFile by hadoop directly.
    HBaseBigFile bigFile = new HBaseBigFile();
    File file = new File("doc/movie.avi");
    FileInputStream fis = new FileInputStream(file);
    Path rootPath = new Path("/files/");
    String filename = "movie.avi";
    bigFile.uploadFile(rootPath,filename,fis,true);

    // receive file stream from hadoop.
    Path p = new Path(rootPath,filename);
    InputStream is = bigFile.path2Stream(p,4096);

  }
}

101. Scala

101.1. 设置类路径

要将 Scala 与 HBase 一起使用，您的 CLASSPATH 必须包含 HBase 的类路径以及代码所需的 Scala JAR。首先，在运行 HBase RegionServer 进程的服务器上使用以下命令，以获取 HBase 的类路径。

$ ps aux |grep regionserver| awk -F 'java.library.path=' {'print $2'} | awk {'print $1'}

/usr/lib/hadoop/lib/native:/usr/lib/hbase/lib/native/Linux-amd64-64

设置$CLASSPATH环境变量以包括您在上一步中找到的路径，以及项目所需的scala-library.jar路径和每个与 Scala 相关的其他 JAR。

$ export CLASSPATH=$CLASSPATH:/usr/lib/hadoop/lib/native:/usr/lib/hbase/lib/native/Linux-amd64-64:/path/to/scala-library.jar

101.2. Scala SBT 文件

build.sbt 需要使用 resolvers 和 libraryDependencies .

resolvers += "Apache HBase" at "https://repository.apache.org/content/repositories/releases"

resolvers += "Thrift" at "https://people.apache.org/~rawson/repo/"

libraryDependencies ++= Seq(
    "org.apache.hadoop" % "hadoop-core" % "0.20.2",
    "org.apache.hbase" % "hbase" % "0.90.4"
)

101.3. Scala 示例

此示例列出 HBase 表，创建新表并向其添加行：

import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{Connection,ConnectionFactory,HBaseAdmin,HTable,Put,Get}
import org.apache.hadoop.hbase.util.Bytes

val conf = new HBaseConfiguration()
val connection = ConnectionFactory.createConnection(conf);
val admin = connection.getAdmin();

// list the tables
val listtables=admin.listTables()
listtables.foreach(println)

// let's insert some data in 'mytable' and get the row

val table = new HTable(conf, "mytable")

val theput= new Put(Bytes.toBytes("rowkey1"))

theput.add(Bytes.toBytes("ids"),Bytes.toBytes("id1"),Bytes.toBytes("one"))
table.put(theput)

val theget= new Get(Bytes.toBytes("rowkey1"))
val result=table.get(theget)
val value=result.value()
println(Bytes.toString(value))

102. Jython

102.1. 设置类路径

要将 Jython 与 HBase 一起使用，您的 CLASSPATH 必须包含 HBase 的类路径以及代码所需的 Jython JAR。

将路径设置为包含Jython.jar的目录，以及每个项目需要的附加的 Jython 相关 JAR。然后将的 HBASE_CLASSPATH 指向$JYTHON_HOME 。

$ export HBASE_CLASSPATH=/directory/jython.jar

在类路径中使用 HBase 和 Hadoop JAR 启动 Jython shell: $ bin/hbase org.python.util.jython

102.2. Jython 示例

Example 27. 使用 Jython 创建表，填充，获取和删除表

以下 Jython 代码示例检查表，如果存在，则删除它然后创建它。然后，它使用数据填充表并获取数据。

import java.lang
from org.apache.hadoop.hbase import HBaseConfiguration, HTableDescriptor, HColumnDescriptor, TableName
from org.apache.hadoop.hbase.client import Admin, Connection, ConnectionFactory, Get, Put, Result, Table
from org.apache.hadoop.conf import Configuration

# First get a conf object.  This will read in the configuration
# that is out in your hbase-*.xml files such as location of the
# hbase master node.
conf = HBaseConfiguration.create()
connection = ConnectionFactory.createConnection(conf)
admin = connection.getAdmin()

# Create a table named 'test' that has a column family
# named 'content'.
tableName = TableName.valueOf("test")
table = connection.getTable(tableName)

desc = HTableDescriptor(tableName)
desc.addFamily(HColumnDescriptor("content"))

# Drop and recreate if it exists
if admin.tableExists(tableName):
    admin.disableTable(tableName)
    admin.deleteTable(tableName)

admin.createTable(desc)

# Add content to 'column:' on a row named 'row_x'
row = 'row_x'
put = Put(row)
put.addColumn("content", "qual", "some content")
table.put(put)

# Now fetch the content just added, returns a byte[]
get = Get(row)

result = table.get(get)
data = java.lang.String(result.getValue("content", "qual"), "UTF8")

print "The fetched row contains the value '%s'" % data

Example 28. 使用 Jython 进行表扫描

此示例扫描表并返回与给定族限定符匹配的结果。

import java.lang
from org.apache.hadoop.hbase import TableName, HBaseConfiguration
from org.apache.hadoop.hbase.client import Connection, ConnectionFactory, Result, ResultScanner, Table, Admin
from org.apache.hadoop.conf import Configuration
conf = HBaseConfiguration.create()
connection = ConnectionFactory.createConnection(conf)
admin = connection.getAdmin()
tableName = TableName.valueOf('wiki')
table = connection.getTable(tableName)

cf = "title"
attr = "attr"
scanner = table.getScanner(cf)
while 1:
    result = scanner.next()
    if not result:
       break
    print java.lang.String(result.row), java.lang.String(result.getValue(cf, attr))