14/14 Posenet rabbithole

As mentioned previously, my ICM finals was to design a Pose-Karaoke experience. For my motivations and background on the project, please go to the post here.

While I had major issues with getting the ICM code to work on the ml5js platform, much of it has been rectified by the maintainers of the code-base and this current example solves pretty much all the problems.

But while I was doing this, I did not have those luxuries. This resulted me in understanding how Javascript works and figuring out the issues myself. The main issue was that a P5Image does not have a img html tag that ml5js needs to be able to run the algorithm. (Funnily, it works for video. No clue why this is done this way). This was solved using an image tag. But the problem with taking a screenshot from the live video still remained. I soldiered on and found my redemption in the toDataUrl() method.

But this was the easy part.

While starting the project, I did not realise the complexity of comparing 2 poses. A lot of what I had to do relied on being able to compare 2 images and it wasn’t a trivial problem. Trawling through the depths of the internet, I came across this post by Google research where they had worked on a similar problem. This post is a wealth of information on how to compare poses and it was outside my technical ability to be able to incorporate everything in my work. But the chief things that I could incorporate were:

1) Cosine similarity: It is a measure of similarity between two vectors: basically, it measures the angle between them and returns -1 if they’re exactly opposite, 1 if they’re exactly the same. Importantly, it’s a measure of orientation and not magnitude.

2) L2 normalization: which just means we’re scaling the vector to have a unit norm. This helps in ensuring that the scale does not play a factor in comparison and the 2 images can be compared normally.

The cosine similarity helped my code run faster and the L2 normalization ensured that the relative distance from the camera won’t play a role in the comparison.

Getting these 2 things to work proved to be a big challenge and once that was done, the comparison went pretty smoothly as seen in the video below:

I ran out of time to build a complete experience for the users which involve an engaging UI but that gives me something to do for the winter break. While I could not match the scope I had set initially, I am very happy that I could dive into algorithmic complexities and solve those issues to make something working. This gives me a lot of hope for the future and my coding abilities. All in all, time well spent!