Friday, December 9, 2011

Hadoop Cluster size deployed by CBS Interactive

Following are some details of the Hadoop Production Infrastructure deployed at CBS Interactive as of now
- Total Nodes: 80
- Total Disk Capacity: 1 PB
- ETL apps were written using Hadoop Streaming (Phython)

Friday, March 18, 2011

How, in hive, to create a column name that is same as a reserved keyword used by Hive

If you run the follwoing command in hive, it will fail because sort is a reserved keyword in hive

CREATE EXTERNAL TABLE aaaabc ( sort STRING  )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\002'
LINES TERMINATED BY '\n';

FAILED: Parse Error: line 1:31 mismatched input 'sort' expecting Identifier in column specification


To ovecome this, run the command by putting sort in backticks, i.e.
CREATE EXTERNAL TABLE aaaabc ( `sort` STRING  )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\002'
LINES TERMINATED BY '\n';

Wednesday, January 26, 2011

Hive Error: FAILED: Parse Error: line 54:4 mismatched input expecting Identifier in column specification

PROBLEM
I was trying to create an external table in hive using the following command
CREATE EXTERNAL TABLE tetl_fact_r
(
    custid STRING,
    value STRING,
    ph STRING,
    email STRING,
    sort STRING,
    address STRING,
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\002'
LINES TERMINATED BY '\n'

STORED AS TEXTFILE
 LOCATION '/user/hadoop-blog/abc.text';


and got the following error

FAILED: Parse Error: line <linenum>:4 mismatched input <column name> expecting Identifier in column specification

REASON
From https://issues.cloudera.org/browse/SQOOP-37, I found that this is happening because column name sort is a reserved keyword in hive.

SOLUTION
To be honest, I din't spend much time finding the solution, instead I just renamed my field and it worked after that. Hence the new query looked like
CREATE EXTERNAL TABLE tetl_fact_r
(
    custid STRING,
    value STRING,
    ph STRING,
    email STRING,
    sorttype STRING,
    address STRING,
)


If any of you figure out a way to make this query work without renaming the column then please leave a comment. It will be much appreciated.