Apache Avro is a data serialization format. We can store data as .avro files on disk. Avro files are typically used with Spark but Spark is completely independent of Avro. Avro is a row-based format that is suitable for evolving data schemas. One ... Read more 29 Nov 2019 - 18 minute read -
What is Keras? Keras means many different things. At the time of writing this article, Keras can refer to one of three things: Keras, the API specification keras, the reference implementation, independent of TensorFlow tf.keras, a particul... Read more 24 Jun 2019 - 22 minute read -
Logistic regression is often considered the smallest neural network for binary classification. We can think of Bernoulli distribution as an even smaller neural network – one that doesn’t even depend on the input data. Such a neural network would l... Read more 27 Apr 2019 - 9 minute read -
AWS provides an easy way to run a Spark cluster. Let’s use it to analyze the publicly available IRS 990 data from 2011 to present. This data is already available on S3 which makes it a good candidate to learn Spark. This medium post describes the ... Read more 11 Aug 2018 - 23 minute read -
Afloat is a software that allows some Mac application windows to remain on top of other windows even when they are not in focus. Hence the name Always on Top. This is a standard feature for all windows in Ubuntu but in Mac we need to use a third-p... Read more 23 Oct 2016 - 4 minute read -
Installing xml2 R package often fails due to missing or incompatible library issues. In this post, I describe why this problem occurs and provide two solutions to solve this problem. What goes wrong? xml2 R package depends on libxml2. When you i... Read more 08 Apr 2016 - 5 minute read -