What is Hadoop?
Hadoop is an open source software platform for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. Hadoop services provide for data storage, data processing, data access, data governance, security, and operations.
HDFS – Hadoop Distributed file system – data storage and processing layer, data will be store on multiple slave nodes with metadata on master node
Hive – Query tool for Hadoop, we can say SQL wrapper on HDFS
All the reporting suite in SAP BOBJ platform consumes Hadoop HDFS data via HIVE through BOBJ Universe. SAP Lumira has self-inbuilt Hadoop data connectors (Both Hive and HDFS)
Steps involved in creating BOBJ report over Hadoop Hive
1) Creating a Connection
2) Create a Data Foundation Layer
3) Publish the Universe
4) Create a Webi Report
Creating a Connection and Universe
In this article , I am using generic database connections. Theoretically, you can also use Apache/Simbha JDBC drivers to connect Hive (I any day prefer JDBC over ODBC)
- Download Hadoop ODBC drivers (32bit) on to your local machine. ( 4.2 Client tools automatically installed the drivers on my machine J )
- Configure the 32 bit Hadoop ODBC on the local machine. if the hadoop environment is kerborized , please make sure you installed MIT Kerberos and ticket is active.
- Create a relation connection in BOBJ IDT
- Create a Data foundation layer
- Create a Business layer and export it to BOBJ repository
Creating a Webi report on top of Hadoop Universe
Once the Universe is exported, the reporting tools consume the universe as any other relational universe (Some exceptions to middleware drivers used)
In my next post, i will detail on consuming and exploring Hadoop data in Lumira