{"id":183,"date":"2024-04-21T16:00:19","date_gmt":"2024-04-21T16:00:19","guid":{"rendered":"https:\/\/aulendil.net\/hallucinations\/?p=183"},"modified":"2024-04-21T16:02:33","modified_gmt":"2024-04-21T16:02:33","slug":"the-magic-ingredient","status":"publish","type":"post","link":"https:\/\/aulendil.net\/hallucinations\/the-magic-ingredient\/","title":{"rendered":"The magic ingredient"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/04\/dle-victory-edited.webp\" alt=\"\" class=\"wp-image-193\" srcset=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/04\/dle-victory-edited.webp 1024w, https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/04\/dle-victory-edited-256x192.webp 256w, https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/04\/dle-victory-edited-512x384.webp 512w, https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/04\/dle-victory-edited-768x576.webp 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>I stopped my research quite some time ago. I hope to get back to it soon, but a few words of excuses are appropriate I believe.<\/p>\n\n\n\n<p>I was beyond busy. I was doing my daily job, desperately looking for a new job (finished very successfully) and doing a capstone project to finish the program, meanwhile trying to squeeze out at least some time for the two little ones. I still have a lot of debt to them for not being there these last few months. Anyway &#8211; job is changed, all the training and school projects are finished, so now just to pay off the time debts I took and then I can get back to my research.<\/p>\n\n\n\n<p>But not all this time was lost. It wasn&#8217;t only about getting some more or less stupid certificate, even if sometimes it felt like it and my engagement was lower than at the beginning when I desperately tried to grasp all those new concepts. One lesson is really valuable. It&#8217;s about&#8230; one can call it art, another can name it intuition, a third will use the term magic, a fourth one will say &#8220;domain knowledge.&#8221; All those terms are extremely vague and none is better than the other. I just like the word magic, so I&#8217;ll stick to it.<\/p>\n\n\n\n<p>At the beginning of the program I often heard that data science is not about algorithms, not about engineering, but more of an art. That domain knowledge is more important than understanding the deep tricks of some technical skills. This caused my resentment\u2014engineering should be about precision, numbers, reason, not some gut feeling and arbitrary choices, so my long experience was being challenged.<\/p>\n\n\n\n<p>Through doing a project without any access to someone with wide knowledge about the topic and fighting my way against directions suggested by the notebook sketch I received, I now know what the teachers meant. Funny thing is that I can apply the same to my engineering practice I&#8217;ve been doing over the years, I just wasn&#8217;t aware of it. <\/p>\n\n\n\n<p>So the short point is that computers and software running on them are tools. If they are to make any sense they must fulfill some utility function\u2014even if that utility is just providing fun. There is no reason for beautiful and clever algorithms that serve no purpose to the user. No matter how clever a programmer or engineer you are, you will do a poor job if you don&#8217;t know or care what the users needs.<\/p>\n\n\n\n<p>An example I encountered in my project was that what I was asked for made no sense. My goal was to provide a computer vision model for detecting malaria parasites in white blood cells. But first I did a little digging. The data was clean and nice, images of cells extracted from thin blood smears. Such a smear contains about 80 cells, while malaria often infects less than one cell per hundred. Using the Binomial Probability Formula we have:<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/ql-cache\/quicklatex.com-c7016d7a85876a4081ae602bd92bd9f2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#80;&#40;&#88;&#61;&#107;&#41;&#32;&#61;&#32;&#92;&#98;&#105;&#110;&#111;&#109;&#123;&#110;&#125;&#123;&#107;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#112;&#94;&#107;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#40;&#49;&#45;&#112;&#41;&#94;&#123;&#110;&#45;&#107;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"22\" width=\"248\" style=\"vertical-align: -7px;\"\/><\/p>\n\n\n\n<p>and for not having any malaria cells in a sample, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/ql-cache\/quicklatex.com-7d01060ae7c38eb8f2c8a962075f78a7_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#107;&#32;&#61;&#32;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"42\" style=\"vertical-align: 0px;\"\/>, so it becomes:<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/ql-cache\/quicklatex.com-b7083317cd5bbdea3a27f42ec1817b4a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#80;&#40;&#88;&#61;&#48;&#41;&#32;&#61;&#32;&#92;&#98;&#105;&#110;&#111;&#109;&#123;&#110;&#125;&#123;&#48;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#112;&#94;&#48;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#40;&#49;&#45;&#112;&#41;&#94;&#110;&#32;&#61;&#32;&#40;&#49;&#45;&#112;&#41;&#94;&#110;\" title=\"Rendered by QuickLaTeX.com\" height=\"22\" width=\"314\" style=\"vertical-align: -7px;\"\/><\/p>\n\n\n\n<p>and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/ql-cache\/quicklatex.com-66d4e993a6a0dd405f1cef9d4b29572a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#112;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"10\" style=\"vertical-align: -4px;\"\/> is 1\/100 as only one cell per 100 is infected, and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/ql-cache\/quicklatex.com-b0edcc481795a77cf2f0dd279c423162_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#110;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"11\" style=\"vertical-align: 0px;\"\/> (number of cells in sample) is 80 so it becomes:<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/ql-cache\/quicklatex.com-e17ee5678e77229f5484ec1fbed3cf24_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#80;&#40;&#88;&#61;&#48;&#41;&#32;&#61;&#32;&#40;&#49;&#45;&#48;&#46;&#48;&#49;&#41;&#94;&#123;&#56;&#48;&#125;&#32;&#92;&#97;&#112;&#112;&#114;&#111;&#120;&#32;&#48;&#46;&#52;&#52;&#55;&#51;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"264\" style=\"vertical-align: -5px;\"\/><\/p>\n\n\n\n<p>So we have almost a 50\/50 chance that we won&#8217;t detect malaria, it&#8217;s a no-go. End of discussion, job finished&#8230; not really, one can say so, but I&#8217;m curious so I need to dig deeper and see if there is something we can learn, change, adjust, it&#8217;s just an obstacle, not a deal-breaker.<\/p>\n\n\n\n<p>What&#8217;s interesting is that all this analysis is done without writing a single line of code, no technical skill of a programmer or AI technician was required. Nor deep knowledge. It&#8217;s all the magic of connecting various pieces of information, various areas of knowledge, experience that comes to mind by intuition, gut feeling as one could say.<\/p>\n\n\n\n<p>Next point was also about following intuition. Just to gather some insight I ran a few Google queries and it turned out that malaria is widespread and kills people where access to healthcare is extremely poor\u2014and so is the GDP of those countries. So doing thin blood smears, which require quite a high skill, is also not the best solution&#8230; another no-go.<\/p>\n\n\n\n<p>Thirdly, we need to detect the presence of malaria in a patient, not in a cell, so I ran some math to figure out what accuracy we need on the cell level to detect malaria on a reasonable sample size (350 cells) and to detect 95% of cases (assuming that in 15% of cases we will report false positives) we need an accuracy of 99.55%, which is a really, really high number.<\/p>\n\n\n\n<p>Only then did I run some models to check how computer vision was working with malaria parasites, but now my approach was &#8211; it&#8217;s an experiment that is to gather insight, this is no go for the product, so no need to waste a lot of resources to develop it super carefully. So yeah, I managed 98.15% accuracy after a few tries.<\/p>\n\n\n\n<p>And then next comes something that isn&#8217;t in the book. Just by intuition, a question came: what&#8217;s the problem with those few (48 out of 2600) images that are misclassified&#8230; and it turned out that about half of them had bad labels, maybe more&#8230; I&#8217;m no diagnostician, so that&#8217;d need to be confirmed. After adjusting for this mislabeled test data, it wasn&#8217;t a big deal to get 99% accuracy\u2014and the effort I put into building the model was minor compared to the effort made to understand the context.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"98\" src=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/04\/misclassified-512x98.png\" alt=\"\" class=\"wp-image-194\" srcset=\"https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/04\/misclassified-512x98.png 512w, https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/04\/misclassified-256x49.png 256w, https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/04\/misclassified-768x147.png 768w, https:\/\/aulendil.net\/hallucinations\/wp-content\/uploads\/2024\/04\/misclassified.png 836w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><figcaption class=\"wp-element-caption\">Images misclassified by model and their original labels.<\/figcaption><\/figure>\n\n\n\n<p>The same applies to building the model. Looking carefully at the data and understanding algorithms can save a lot of work. It was obvious that parasites are not distinguished by some shape\u2014it&#8217;s just an area of different color inside a cell. This lets you filter out quite a lot of things that one could try for different kinds of data, like letters or more common objects like birds, shoes, and whatever one might need to detect.<\/p>\n\n\n\n<p>So to summarize. It&#8217;s like in my everyday engineering. I&#8217;m not particularly brilliant, nor am I highly intelligent\u2014when searching for work I fail almost all the test &#8220;write algorithm in 15 minutes&#8221; or go through 100 questions in some cognitive test. But when doing actual work I&#8217;m somehow more efficient than others. That is because I am careful (lazy one could say) and I avoid doing stuff that is not required. I spend a lot of time asking questions instead of figuring out the answers. In most engineering work, brute force to try the correct approach is not efficient, there are just too many ways to solve the problem.<\/p>\n\n\n\n<p>But here is the important part: the word &#8220;domain knowledge&#8221; is misleading. It&#8217;s about connecting dots between many domains\u2014understanding digital images, understanding statistics, understanding computer science, understanding neural networks, understanding blood analysis, understanding the economic situation&#8230; that is no single domain.<\/p>\n\n\n\n<p>And one final thing. I made a presentation. That&#8217;s something I always felt bad about\u2014I perceive my presentation skills at the very low end of the spectrum, but then&#8230; I&#8217;m more than glad I was forced to do that. I believe it ended up really nice. The thing is that I really, really hate doing something slipshod. If something is worth doing, then it&#8217;s worth doing right, with passion. If not, then why bother doing it at all?<\/p>\n\n\n\n<p>So <strong>the magic ingredient is love, passion, curiosity<\/strong>, not brute force, or awesome intelligence or extremely vast or deep knowledge. And here is <a href=\"https:\/\/github.com\/adaslesniak\/ai-school_project3\/blob\/main\/presentation.pdf\" data-type=\"link\" data-id=\"https:\/\/github.com\/adaslesniak\/ai-school_project3\/blob\/main\/presentation.pdf\">the presentation<\/a>  &#8211; nice thing is that I made presentation nice, there are images, not just wall of text. That I&#8217;m proud of that I can do stuff that&#8217;s visually appealing.<\/p>\n\n\n\n<p>Now, I just need to pay back the debt of not spending time with my children and figuring out all the stuff in the new job, and then I can get back to beating the random forest&#8230; and then to finding the best clustering algorithm, and then&#8230; There are so many things to do and so little time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I stopped my research quite some time ago. I hope to get back to it soon, but a few words of excuses are appropriate I believe. I was beyond busy. I was doing my daily job, desperately looking for a new job (finished very successfully) and doing a capstone project to finish the program, meanwhile [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,10,8],"tags":[],"_links":{"self":[{"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/posts\/183"}],"collection":[{"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/comments?post=183"}],"version-history":[{"count":11,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/posts\/183\/revisions"}],"predecessor-version":[{"id":197,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/posts\/183\/revisions\/197"}],"wp:attachment":[{"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/media?parent=183"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/categories?post=183"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aulendil.net\/hallucinations\/wp-json\/wp\/v2\/tags?post=183"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}