June 2014 ~ Data Molecule

Thursday, 12 June 2014

How To Solr Server

You can download solr server from the following link

http://lucene.apache.org/solr/downloads.html

once downloaded extract and place it into your virtual machine. Move into the Solr folder and you'll see some useful readme files. you should read them first time you get.

To start up a quick up and running server go into the example directory
open terminal and execute the command

java -jar start.jar

now you should be able to see the Solr interface at
http://localhost:8983/solr/

To index a pdf document using the post.jar that is already provided in the Solr distribution move into example docs folder

The post.jar utility is not meant for production use, but as a convenience tool for experimenting with Solr.

Open a new terminal and execute the following command.
this command uses ExtractingRequestHandler aka Solr Cell project

java -Durl=http://localhost:8983/solr/update/extract -Dparams=literal.id=doc5 -Dtype=text/pdf -jar post.jar SQA.pdf

Note you should have the file SQA.pdf in the exampledocs directory.
you should see an output like this:
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update/extract?literal.id=doc5 using content-type text/pdf..
POSTing file SQA.pdf
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/update/extract?literal.id=doc5..
Time spent: 0:00:06.259

you should continue to read here

To query the Solr Server you should go to this url
http://localhost:8983/solr/#/collection1/query

Data Molecule 18:34 No comment

Tagged under: Cloudera VM, coordinator jobs, Oozie

How To Run Oozie Coordinator Jobs

First you have to export the url

export url "http://localhost.localdomain:1100/oozie"

Then submit the job

oozie job -oozie http://localhost.localdomain:11000/oozie -config coord.properties -submit

For running a simple Oozie job you have to run a job.properties file but for time triggered you have run coord.properties file

Directory Structure for using Oozie:

ProjectFolder

Lib

Jars that you want to run. If you have made a mapreduce job than make a jar from it and place It here

DataFolder

Input output files

CoordinatorFolder

Coord.xml

Job.properties file ( should be on the local file system all other should be on hdfs,namenode jobtracker are mentioned here)

Workflow.xml ( actual workflow file that specifies which job to run which class to run and parameters are also specified here)

Coordinator Job

1- The workflow job is started after the predicate is satisfied. A predicate can reference to data, time and/or external events

2- The outputs of last 4 runs of a workflow that runs every 15 minutes become the input of another workflow that runs every 60 minutes. Chaining together these workflows result it is referred as a data application pipeline

Useful links:

http://hadooped.blogspot.com/2013/06/apache-oozie-part-1-workflow-with-hdfs.html

Data Molecule 18:24 1 comment

Wednesday, 11 June 2014

Tagged under: Cloudera VM, OpenTSDB, Tcollector

Installing Tcollector

The following tutorial will guide you to install and use Tcollector.

Tcollector is a child project of OpenTSDB.

Opentsdb will start saving some points, we have to use tcollector (plugin) to store some useful information about system components like ram and disk space

git clone git://github.com/OpenTSDB/tcollector.git

Now go in this tcollector folder and edit 'startstop' and change this variable

TSD_HOST=something something

after changing it should look like

TSD_HOST=127.0.0.1

here 127.0.0.1 is the DNS name of my server

Start the tcollector

sh startstop start

To stop the tcollector

sh startstop stop

Now to query it through restful api these are some of the useful links

http://www.euphoriaaudio.com/opentsdb/querying_examples.html

http://www.euphoriaaudio.com/opentsdb/http-api-q.html

Run this in the browser

http://localhost:4242/api/query?start=1h-ago&m=sum:rate:proc.stat.cpu

you should see a json format returned as a result.

cheers

Data Molecule 14:01 No comment

Tagged under: Cloudera VM, configure, OpenTSDB, setup

Setup your OpenTSDB in the Cloudera VM.

The following tutorial will guide you to setup your OpenTSDB in the Cloudera VM.

First of all we have the set the time of our machine.

this command will set the time of VM, very important for opentsdb
sudo ln -sf /usr/share/zoneinfo/UTC /etc/localtime

If at any point you get the permission error try to login as root using
sudo -s

Install Gnuplot
sudo yum install git automake gnuplot

Make the Git repo
git clone https://github.com/OpenTSDB/opentsdb.git

Then go into the opentsdb folder
cd opentsdb

Execute this command to execute the shell script already in the repo
./build.sh

Set the environment variable
env COMPRESSION=none HBASE_HOME=/usr/lib/hbase ./src/create_table.sh

Then execute this command
mkdir /tmp/tsd

Look in the opentsdb folder and it should contain a folder named "build" just appeared
We're good to go!!

Now run the opentsdb
./build/tsdb tsd --port=4242 --staticroot=build/staticroot/ --cachedir=/tmp/tsd/ --auto-metric

*some important standards are ignored you should study in detail the parameters of the ./build/tsdb tsd command

To check whether OpenTSDB is running or not
http://127.0.0.1:4242

You should see an interface to play with opentsdb and plot some graphs

Opentsdb will start saving some points, we have to use tcollector (plugin) to store some useful information like ram and disk space. We'll cover that in another tutorial.

Cheers

Data Molecule 13:51 No comment

Sunday, 8 June 2014

Tagged under: Cloudera VM, WMware

How to Install & Configure Couldera VM

Prerequisites and Installation:

1- Cloudera VM
you can download this from this link ( Download Cloudera VM )

2- Virtual Machine Player
you will also need a software which runs virtual machine.
Following are the links of famous virtual machine players.
Virtual Box
you can download this from this link (Download Virtual Box)

VMware Player or VMware Workstation
you can download this from (Download VMware Products)

Note: According to me, VMware Workstation is the best.

Configuration: (VMware Workstation)

Open VMware Workstation.
Go to the File menu and Click on "Open" or press "Ctrl + O"
Browse the "Virual Machine File" you downloaded. Hint: see first Prerequisite.
VMware Workstation will start extracting the files.
You have now successfully configure the Cloudera VM, run and enjoy!

Unknown 07:03 1 comment

Data Molecule

300x250 AD TOP

Blog Archive

Thursday, 12 June 2014

How To Solr Server

How To Run Oozie Coordinator Jobs

Wednesday, 11 June 2014

Installing Tcollector

Setup your OpenTSDB in the Cloudera VM.

Sunday, 8 June 2014

How to Install & Configure Couldera VM

Prerequisites and Installation:

Configuration: (VMware Workstation)

Whats Hot This Week

Follow on FaceBook

Trending video

Popular Posts

Pages

Blogroll

Labels

About