{"id":173,"date":"2024-03-11T22:26:58","date_gmt":"2024-03-11T22:26:58","guid":{"rendered":"https:\/\/aulendil.net\/hallucinations\/?p=173"},"modified":"2024-03-11T22:26:58","modified_gmt":"2024-03-11T22:26:58","slug":"civilizng-the-random-forest","status":"publish","type":"post","link":"https:\/\/aulendil.net\/hallucinations\/civilizng-the-random-forest\/","title":{"rendered":"Civilizng the Random Forest"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/03\/dle_subduing_forest-edited.webp\" alt=\"\" class=\"wp-image-175\" srcset=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/03\/dle_subduing_forest-edited.webp 1024w, https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/03\/dle_subduing_forest-edited-256x144.webp 256w, https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/03\/dle_subduing_forest-edited-512x288.webp 512w, https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/03\/dle_subduing_forest-edited-768x432.webp 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>In <a href=\"https:\/\/aulendil.net\/hallucinations\/learning-in-random-forest\/\" data-type=\"link\" data-id=\"https:\/\/aulendil.net\/hallucinations\/learning-in-random-forest\/\">my previous post<\/a>, I explained that I was pondering the rather uneven distribution of features among the trees in a random forest. It didn&#8217;t feel right to just verify whether I could achieve a more uniform distribution. I felt compelled to explore how it functions with this wild notion of democracy. Yes, the random forest is akin to a mathematical proof that democracy works\u2014a bunch of ignorants casting votes for the final result. And time and again, it&#8217;s shown to be superior to relying on a single, even wisest tree.<\/p>\n\n\n\n<p>It all sounded easy. It always does. Just create a bunch of trees, compile the results, summarize those results, and incorporate a more uniform distribution. It always sounds easy until I actually try to do something.<\/p>\n\n\n\n<p>Long ago, in a galaxy far, far away, when the world seemed full of promise, I built this class for ARandomForest with universal distribution. I even created a separate script that could run some tests. I used the simple IRIS dataset\u2014not that I know what it is, probably some flowers, likely remnants of the good old hippie era\u2014and ran it through these tests. After rectifying a few bugs that prevented my code from running, I obtained the first results. The accuracy for the professional scikit-learn library was perfect. The accuracy for my code was also perfect, albeit in both cases without cross-validation, but it was a first attempt just to debug the code and check if it runs.<\/p>\n\n\n\n<p>Okay, not bad; it runs. It&#8217;s just that the data is so straightforward that even a deaf and blind dwarf could figure out how to sort it. So, I tried something more complex &#8211; data about wines and their quality, available somewhere on the University of California, Irvine page. And the results were devastating. My algorithm was clearly inferior to scikit-learn&#8217;s. So my idea was nonsense, I&#8217;m no good, I will never do anything sensible&#8230; But it didn&#8217;t give me peace: how could it be that making a more uniform selection of features resulted in worse outcomes? I speculated that, perhaps by sheer luck, the random forest from those ugly scientists at scikit-learn was fortunate to assign the best features&#8230; So I ran the test over and over with different seeds for the random generator&#8230; and it was always better. It didn&#8217;t feel right. They may be superior, but they must be employing some unfair black magic, I&#8217;ll catch them.<\/p>\n\n\n\n<p>I wanted to compare apples to apples, so I needed to have the same algorithm\u2014one with and another without the trick. Thus, I made a copy of my naive random forest classifier and, in this version, I didn&#8217;t use the more universal distribution but just randomly selected features as described in lectures. And I ran the tests comparing three solutions. Here are the results:<\/p>\n\n\n\n<ul>\n<li>classic is those unfair villain scientists who dare to be better than me<\/li>\n\n\n\n<li>uniform is my naive implementation with a trick of more universal distribution of features<\/li>\n\n\n\n<li>naive is just like uniform but without the trick<\/li>\n<\/ul>\n\n\n\n<p>                         <\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><\/td><td>cross validation<\/td><td>accuracy<\/td><td>precision<\/td><td>recall<\/td><td>f1<\/td><\/tr><tr><td>classic<\/td><td>0.59 +\/- 0.03<\/td><td>0.59<\/td><td>0.29<\/td><td>0.26<\/td><td>0.26<\/td><\/tr><tr><td>uniform<\/td><td>0.54 +\/- 0.05<\/td><td>0.56<\/td><td>0.27<\/td><td>0.23<\/td><td>0.21<\/td><\/tr><tr><td>naive<\/td><td>0.36 +\/- 0.13<\/td><td>0.40<\/td><td>0.12<\/td><td>0.17<\/td><td>0.12<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">1 means perfect, less than 0.5 means utter crap<\/figcaption><\/figure>\n\n\n\n<p>It became very interesting. While those devil charlatans who wrote scikit-learn are always much better than me, the difference isn&#8217;t as vast as day and night. But then, my naive implementation without the trick is as dumb as one can be. I wonder if there&#8217;s perhaps some contest for the worst classification algorithm ever\u2014I might have a chance to win.<\/p>\n\n\n\n<p>Anyway what&#8217;s really, really interesting is that the trick made a huge difference. So my intuition isn&#8217;t totally wrong and I&#8217;m not biggest idiot under the sun.<\/p>\n\n\n\n<p>What&#8217;s more worrisome is that now I need to somehow modify scikit-learn, and I had a brief look at their code; it&#8217;s not a trivial thing to do. They knew I&#8217;d come for them, and they prepared by applying some human-made obfuscation of the code.<\/p>\n\n\n\n<p>If only time were a bit more stretchable. If only a day were as long as on the moon (and I wouldn&#8217;t need to sleep or do other boring stuff).<\/p>\n\n\n\n<p>Ahh&#8230; and <a href=\"https:\/\/github.com\/adaslesniak\/ai-xp06\" data-type=\"link\" data-id=\"https:\/\/github.com\/adaslesniak\/ai-xp06\">here is the code<\/a>.<\/p>\n\n\n\n<p>Democracy may work, but all options must be equally heard among the voters. It&#8217;s now my personal quest to prove it!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In my previous post, I explained that I was pondering the rather uneven distribution of features among the trees in a random forest. It didn&#8217;t feel right to just verify whether I could achieve a more uniform distribution. I felt compelled to explore how it functions with this wild notion of democracy. Yes, the random [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/posts\/173"}],"collection":[{"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/comments?post=173"}],"version-history":[{"count":6,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/posts\/173\/revisions"}],"predecessor-version":[{"id":181,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/posts\/173\/revisions\/181"}],"wp:attachment":[{"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/media?parent=173"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/categories?post=173"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/tags?post=173"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}