I think it's fair to say that most people have been deluged by reporting about AI or Machine Learning (I'll discuss the difference in the following post). Paradoxically, the sheer volume of media on the subject has probably generated more confusion about the subject. It reminds of the time I was working for a DoD contractor making frameworks for courseware. The Courseware Manager called me into his office and asked earnestly "Do we need the Cloud?" It was obvious that he really didn't know what the "Cloud" was but had heard the term from somewhere. I think there are a lot of medium and small business in the same place as my old manager. To his credit, he let me explain the term as it applied or didn't apply to our projects and we moved on quickly. I wonder if the medium/small businesses have a similar approach to that of my manager or how many spend a lot of resources on projects that will have little return. This is where a Software Engineer's approach really comes in handy. Typically, SEs are presented with or identify a problem and then they look for a technical solution. While I have no illusions that Software Engineers don't make mistakes in offering up solutions but I do think the approach is correct and I think most instructive material doesn't emphasize this enough. So I'll spend more than a few postings focusing on why and not so much as the how.
Many of the online resources and some books include code examples that aren't very well structured. I think this is mainly due to the context in which the code examples are supplied. The examples are generated with the aim of teaching some aspect of Machine Learning and not application architecture. It does lend itself to the mindset that all ML is done in PyCharm with scripts that include little or no classes, methods, or unit tests and riddled with literals. I'd like to explore how we can implement Machine Learning in well structured code using industry best practices.
Lastly, I see a shocking lack of subject material in the Machine Learning community that deals with much of what comprises Software Engineers daily tasks. I feel like so much of the material is being generated by people who are not currently working as developers. Machine Learning as functional code is pretty new. Ten years ago, most of the work being done in Machine Learning was being carried out at elite Computer Science departments or companies with very large R&D budgets. It has come a long way from there but outside of large companies, implementation can be fairly haphazard compared to their other software projects. In addition, roles aren't well defined. I've read descriptions of a Data Scientist job that really was nothing more than a Business Analyst. I think there is a lot of confusion about what is Machine Learning and what is statistical inference. Ultimately, most of this confusion arises from how new the implementation of Machine Learning. What does seem clear is that there is very little material about ML as it pertains to Software Engineering best practices. I want to explore how do deploy and maintain Machine Learning tools in Continuous Integration/Continuous Deployment environments. I say tools because because I also include data pipelines in this. How do we handle scalability for our projects. How can we verify very large data sets and not bring platforms with them. I suspect there will be a lot more subjects that I currently discuss here. And I'm sure it's going to provide for lively discussion as well.
So, again welcome to my blog. There are so many great resources for Machine Learning and my intent is for this blog to join that community. I certainly don't wish to demean other blogs, books, videos, etc... I just think there is a definite need to look at Machine Learning through the lens of a Software Engineer. Your comments and feedback will always be welcome,
Thanks and Happy Learning,
Greg