{"id":41,"date":"2024-02-17T21:40:32","date_gmt":"2024-02-17T21:40:32","guid":{"rendered":"http:\/\/aulendil.net\/hallucinations\/?p=41"},"modified":"2024-02-17T21:40:32","modified_gmt":"2024-02-17T21:40:32","slug":"first-encounter-with-principal-component-analysis","status":"publish","type":"post","link":"https:\/\/aulendil.net\/hallucinations\/first-encounter-with-principal-component-analysis\/","title":{"rendered":"First encounter with Principal Component Analysis"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"640\" src=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/02\/dle-mistake-edited.webp\" alt=\"\" class=\"wp-image-58\" srcset=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/02\/dle-mistake-edited.webp 1024w, https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/02\/dle-mistake-edited-300x188.webp 300w, https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/02\/dle-mistake-edited-768x480.webp 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>My first feeling when seeing PCA were very mixed: this is some magic, it&#8217;s awesome, it&#8217;s stupid, it can&#8217;t do any good, it&#8217;s not readable, it will do great stuff, it won&#8217;t help anything&#8230; So, I needed to understand it better and run some small experiment.<\/p>\n\n\n\n<p>irst assumption about PCA is that it finds linear relations&#8230; Or rather, linear transformations of data that do not lose much information&#8230; ehmm&#8230; it&#8217;s math, nobody likes math, it&#8217;s ugly, it&#8217;s nerdy, it&#8217;s sexy, but it&#8217;s beyond the means of average humans. It&#8217;s like a trillion dollars bill right in front of you, that you will never get &#8211; it&#8217;s beyond annoying. So, let&#8217;s leave the details to mathematicians and praise them and their matrices, eighenvectors and all their black magic&#8230; and keep them away at the same time. Anyway &#8211; PCA doesn&#8217;t work with non-linear relations between different features of the data. It just doesn&#8217;t.<\/p>\n\n\n\n<p>So, I was wondering &#8211; if <em><strong>x<\/strong><\/em> has an exponential correlation with <em><strong>y<\/strong><\/em> and for linear algebra to work with it one can just replace <em><strong>y<\/strong><\/em> with <em><strong>exp(y)<\/strong><\/em> to linearize correlation&#8230; what if somehow PCA could capture a strong relation between <em><strong>x<\/strong><\/em> and <em><strong>y<\/strong><\/em> and then we could run PCA iteratively on <em><strong>[x, y, PCA1(x,y), PCA2(x,y)]<\/strong><\/em>? Could it magically find out that <em><strong>y<\/strong><\/em> is <em><strong>(x1 + x2 + pca1 + pca2)<\/strong><\/em> &#8211; yeah, now I see how stupid it was. I would need to multiply variables somehow, not add them to data. That is still something that can work, and I could explore &#8211; to automatically try various paths and detect if squaring a variable makes it more helpful or less&#8230; anyway, that is the song of the distant future.<\/p>\n\n\n\n<p>For now, I played a simple experiment. I had three features of data, <em><strong>m<\/strong><\/em>, <em><strong>v<\/strong><\/em>, and <em><strong>e<\/strong><\/em>, where<em><strong> e = m*sqrt(v)<\/strong><\/em>. Just for fun, I ran PCA on this dataset, and it was so absolutely accurate. Like 99.x% of data variance can is captured by <em><strong>pca1<\/strong><\/em>, which cares almost only about <em><strong>e<\/strong><\/em> value. Yeah&#8230; <em><strong>e<\/strong><\/em> is a composition of <em><strong>m<\/strong><\/em> and <em><strong>v<\/strong><\/em>, but there is an infinite amount of solutions for each <em><strong>e<\/strong><\/em>, so it didn&#8217;t feel right.<\/p>\n\n\n\n<p>Therefore, I tried the other thing. I just recreated original data using PCA component vectors from <em><strong>pca1<\/strong><\/em> and <em><strong>pca2<\/strong><\/em>. Over 99% of variance was covered, so reconstructed data should be close to the original. Yeah&#8230; <\/p>\n\n\n\n<p>The conclusion is &#8211; whenever using PCA, try reconstructing the data to test if PCA is applicable to your data. Don&#8217;t trust its accuracy unless proven. It lies to you, with premeditation, it wants you to be wrong, to get fired and homeless&#8230; and you still can&#8217;t live without it! PCA is a woman, an extremely beautiful woman that will cheat on you the very second you give her something she doesn&#8217;t like. Just like my daughter.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>My first feeling when seeing PCA were very mixed: this is some magic, it&#8217;s awesome, it&#8217;s stupid, it can&#8217;t do any good, it&#8217;s not readable, it will do great stuff, it won&#8217;t help anything&#8230; So, I needed to understand it better and run some small experiment. irst assumption about PCA is that it finds linear [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/posts\/41"}],"collection":[{"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/comments?post=41"}],"version-history":[{"count":6,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/posts\/41\/revisions"}],"predecessor-version":[{"id":60,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/posts\/41\/revisions\/60"}],"wp:attachment":[{"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/media?parent=41"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/categories?post=41"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/tags?post=41"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}