Saved searches

Use saved searches to filter your results more quickly

Cancel Create saved search Sign up Reseting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Notifications You must be signed in to change notification settings

hortonworks/hive-testbench

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Go to file

Folders and files

Last commit message Last commit date

Latest commit

History

View all files

Repository files navigation

hive-testbench

A testbench for experimenting with Apache Hive at any data scale.

Overview

The hive-testbench is a data generator and set of queries that lets you experiment with Apache Hive at scale. The testbench allows you to experience base Hive performance on large datasets, and gives an easy way to see the impact of Hive tuning parameters and advanced settings.

Prerequisites

Install and Setup

All of these steps should be carried out on your Hadoop cluster.

 cd sample-queries-tpcds hive -i testbench.settings hive> use tpcds_bin_partitioned_orc_1000; hive> source query55.sql; 

Note that the database is named based on the Data Scale chosen in step 3. At Data Scale 10000, your database will be named tpcds_bin_partitioned_orc_10000. At Data Scale 1000 it would be named tpch_flat_orc_1000. You can always show databases to get a list of available databases. Similarly, if you generated 1 TB of TPC-H data during Step 5:

 cd sample-queries-tpch hive -i testbench.settings hive> use tpch_flat_orc_1000; hive> source tpch_query1.sql; 

Feedback

If you have questions, comments or problems, visit the Hortonworks Hive forum.

If you have improvements, pull requests are accepted.