Perfectly Random

machine learning and stuff

Handling Avro files in Python

Apache Avro is a data serialization format. We can store data as .avro files on disk. Avro files are typically used with Spark but Spark is completely independent of Avro. Avro is a row-based format that is suitable for evolving data schemas. One ... Read more

A Guide to Keras Functional API

What is Keras? Keras means many different things. At the time of writing this article, Keras can refer to one of three things: Keras, the API specification keras, the reference implementation, independent of TensorFlow tf.keras, a particul... Read more

Bernoulli Distribution as a tiny Neural Network

Logistic regression is often considered the smallest neural network for binary classification. We can think of Bernoulli distribution as an even smaller neural network – one that doesn’t even depend on the input data. Such a neural network would l... Read more

Setup a Spark cluster on AWS EMR

AWS provides an easy way to run a Spark cluster. Let’s use it to analyze the publicly available IRS 990 data from 2011 to present. This data is already available on S3 which makes it a good candidate to learn Spark. This medium post describes the ... Read more

Always on top in MacOS Sierra

Afloat is a software that allows some Mac application windows to remain on top of other windows even when they are not in focus. Hence the name Always on Top. This is a standard feature for all windows in Ubuntu but in Mac we need to use a third-p... Read more

Install xml2 R package on MacOS

Installing xml2 R package often fails due to missing or incompatible library issues. In this post, I describe why this problem occurs and provide two solutions to solve this problem. What goes wrong? xml2 R package depends on libxml2. When you i... Read more