Tuesday, August 27, 2013

Powershell File Copy

Microsoft Powershell is a nifty scripting tool which is deeply integrated with .NET framework. Recently I had an opportunity to write an automated script for copying files greater than a certain date from one location to another. This is a rudimentary task, and there are a ton of tested approaches such as robocopy etc., but I decided to write my own script in powershell.

Check out the implementation details at 
http://www.codeproject.com/Tips/642297/File-copy-using-Microsoft-Powershell

Check out the code at
https://github.com/tkmallik/Powershell/blob/master/PSFileCopy.ps1 


Saturday, August 10, 2013

Comparison of Java & Microsoft Technologies

 
A short comparison between software's and tools related to Microsoft technologies & Java platform.


Microsoft
Java Platform
Visual Studio
Eclipse/Netbeans
C#/VB
Java
CLR
JVM
WCF
JAX-WE. JMS
WPF
Swing
Silverlight
JavaFX
ADO.NET/EF/NHibernate
JDBC, Hibernate
ASP.NET
JSP/Servlet/JSF
IIS
Websphere/Glassfish/JBoss/Tomcat
COM+/MTS
Enterprise Java Beans (EJB)
LINQ
Jaque
System.Xml
JAX Pack (JAXM, JAXR, JAXB, JAXP)

Thursday, August 1, 2013

SQL SERVER Collation Conflict

Recently, DBA attached a copy of  our a SQL 2000 mdf file to our new SQL 2012 server.   I had a join query on two different databases within the same server, I ran into this error. 

Cannot resolve the collation conflict between "SQL_Latin1_General_CP437_CI_AS" 
and "SQL_Latin1_General_CP1_CI_AS" in the equal to operation.

Turns out to be SQL 2000 database and SQL 2012 had two different collations and the cross database queries should be written as shown below

select <cols>
from <table1> t1 join <table2> t2 on 
t1.col1 = t2.c1 COLLATE Database_Default
where t1.c1 = 'Blah'

The sys.databases would have the collation name of the database 

The people & business behind Hadoop


Every decade has a key technology and a wave of new companies breed up around this technology. For the 90’s it was write once run anywhere Java, for the 2000’s the whole world was caught up in the web. For the 2010 decade, the software section is buzzing with the term Big Data along with a significant increase in consumer electronics such as tablets and phones. The key software framework which is most sought out for understanding Big Data is Hadoop. There are three main next gen companies which want customers to ask big questions and brave into new frontiers. In this blog I want to write about Hadoop and its allies.

Hadoop was created by Doug Cutting (used to work for Yahoo, currently Chief Architect at Cloudera & Director at Apache Software Foundation) and Mike Cafarella (currently a professor of computer science at University of Michigan) in 2005

Hadoop was quickly recognized as the go to solution for distributed computing using commodity hardware. This framework promised painless deployment, easy maintenance and was at the right time for handling the big data explosion (being a bit dramatic).  Ever since its release under the Apache federation foundation this project amassed lot of committers; soon an ecosystem was built around Hadoop. Variety of companies started providing Hadoop consulting services, amongst them the most popular are Cloudera, Hortonworks & MapR 

Cloudera

Three top engineers from Google, Yahoo and Facebook (Christophe Bisciglia, Amr Awadallah and Jeff Hammerbacher) foresaw a need for analyzing, managing and processing big data, so they teamed up with an Oracle employee and formed Cloudera in the early 2008.

Open Source Contribution
The employees of this company are active participants to many open source projects, some popular projects are:

Apache Avro, Apache Bigtop, Apache Crunch, Apache Flume, Apache Hadoop, Apache Hive, Apache Lucene, Apache MRUnit, Apache Oozie, Apache Sqoop, Apache Whirr, Apache ZooKeeper, Cloudera Development Kit, Crepo, Hue, Impala, Kitten, ML, Seism, CDH - 100% 
Open Source Hadoop Distribution

Other services provided by this company include a Cloudera Manager & Navigator- Centralized System Management which provide a rich set of features which include

Manager
Deployment & Configuration, Service Management, Service & Host Monitoring, Diagnostics API, Rolling Updates/Restarts, SNMP Support, LDAP Integration, Configuration History & Rollbacks, Operational Reports, Automated Disaster Recovery, BDR Add-on

Navigator
Data Audit – HDFS, HBase & Hive, Navigator Add-on, Access Management, Technical Support and Indemnity, Core Projects, Apache HBase, RTD Add-on, Cloudera Impala, RTQ Add-on, Cloudera Manager, Cloudera Navigator, Navigator Add-on

Partners & valuation
Over 160 active partners
Current net valuation of $700 million at least that the word on the street, along with an additional $65 million through Series E round of venture capital funding by Accel patners .

Developers Community
Cloudera offers training for only 5 days at a price of around $2500. This training includes Hadoop development, administration and data science. Also they provide certification as well.  

Hortonworks

This company was incubated by Yahoo. The employees of this company are regular committers to various Apache Hadoop eco systems projects. Primarily this company provides consulting services and offers an end to end solution with the Hadoop framework. They are emerging out with various data products with the newer version of Hadoop. 

Open Source Contribution
Yarn, Tez, Stinger, Ambari, HDP platform

Partners & Valuation
Microsoft, Teradata, Talend, Rackaspace are official partners. This company is currently valued at $200 million.

Developers Community
Hortonworks offers a similar training like Cloudera for only 5 days at a price of around $2500. This training includes Hadoop development, administration and data science. Also they provide certification as well. The advantage with the Hortonworks is they are committed with Microsoft. They provide training for Hadoop on Windows platform.

MapR

My personal opinion their business model is not viable, although backed up with EMC and Cisco, their online presence is dominated by Cloudera and Hortonworks. This company provides a commercial Hadoop solution by offering three products M3, M5 and M7. The employees of this firm actively participate with HBase, Pig, Apache Hive, ZooKeeper.

Partners & valuation
Google Compute Engine, EMC Cisco, valued at $30 million

What’s in it for us?
Well Hadoop is free, anyone can download it and use it, and the trick is to know what to use it for and how to manage it. There is a need for distributed computing platform and Hadoop by itself is in its nascent stages. It’s definitely not a threat to existing database platforms however investing time and money with Hadoop would be a safe bet for future.