Blog about software by Frits Duus

Make VS2017 Live Unit Testing work

So you installed Visual Studio 2017 and you want to try out Live Unit Testing. It looked so easy on Microsoft’s introduction videos for VS2017, but when you open your old existing solution and try to enable it… it doesn’t work… !?
Sounds familiar ? Here is how I made it work in under 5 minutes for a MSTest based project.

Step 1: Remove reference to Microsoft.VisualStudio.QualityTools.UnitTestFramework

Extend References on your Test project within Visual Studio and find and delete the reference Microsoft.VisualStudio.QualityTools.UnitTestFramework.

Step 2: Add MSTest NuGet packages

Add the MSTest.TestFramework and MSTest.TestAdapter NuGet packages to your test project.

Step 3: Start Live Unit Testing

In VS2017 open the Test menu, expand Live Unit Testing and select Start. Wait a little… and now it works 🙂

Oh and by the way Live Unit Testing, does currently not work on .NET Core projects.

Actors in .NET and why I think the actor model is so cool

Actors and actor frameworks has been on my personal technology radar for the last couple of years.

So why do I think actors are so interesting ?

When you run into concurrency challenges in traditional Object Oriented programming you have to start using things like lock’s, mutex’s and semaphores. These kind of things are especially challenging when you have to make the solution scale as well. Things like lock’s, mutex’s and semaphores do not resonate very well with high throughput and scalability. This is one of the areas where actors really shine and that is why I find actors so interesting. Furthermore you can run many actors simultaneously and thus use them to build a highly scalable solution.

What is an actor ?

An actor is an object with its own virtual thread.
You communicate with an actor by sending messages to it.
An actor has a input queue containing the messages sent to the actor.
The thread processes the messages in the input queue one at a time. Because the actor only processes one message at a time there are no concurrency concerns inside the actor.
You communicate out of an actor by sending messages to other actors (or calling functions on other objects).

The actor model is not a new thing it was first described in 1973 (wikipedia), so it is almost as old as SQL.

Actors in .NET

I am primarily working with C# and .NET and I have found two frameworks build on the Microsoft stack that I think looks promising and they are: Azure Service Fabric and Akka.NET.

Why start with Akka.NET ?

Akka.NET is a NuGet package that you can include in your existing projects and start using in a small corner. Azure Service Fabric is more like a platform that you can run your apps inside. Thus Azure Service Fabric requires more setup and configuration to get started. So to get your feet wet with actors in .NET I would recommend Akka.NET.

Akka.NET is a port of Akka from the JVM and it has evolved a lot over the last two years. The Akka.NET documentation is very good and full of code examples.

What is Machine Learning ?

This post follows my previous post Getting startet with Machine Learning. In this post I will try to answer the question: What is Machine Learning ? from the perspective of a software developer. So, with some help from the book “An Introduction to Statistical Learning” and without too much math, I will try to define what machine learning is.

If your focus is on applying Machine Learning, it pretty much comes down to predicting things.

So in essence it boils down to a function that takes a set of input values and returns a value, the prediction. The mathematicians write it like this:

Y = f(X)

Where X is all the inputs, f is the function and Y is the predicted value. That’s all there is to it…

Let’s take a small example that most people can relate to. Let’s say we want to predict the wage of a number of people. To be able to make a reasonable prediction we need some information about the people like their seniority, education and where they live. As a programmer the challenge is to write a function that takes a set of inputs (ie. a persons seniority and education plus the zip and country code of where they live) and returns the persons predicted yearly income. If you think back to your early days in school you might remember this function:

Y = a·X + b

This is the function for a straight line. Let us take this function as our very first and very naive attempt on a wage prediction. If x is the seniority(ie. number of year they have been working) then the challenge is to figure out the value of a and b. Figuring out a and b is done during the training step of setting up a prediction function with machine learning. I will get back to the training step in a later post. To get the function a little bit closer to our wage example it would look like this:

Y = a·seniority + b·education + c·zip + d·country + e

In fact this type of function is a specific type known as “Linear Regression” in machine learning. As you might imagine there are many different types of functions, but more on that in a later post. First let us understand the function above a little better. If you were given the values of a, b, c, d and e, how would you write the function ? The first part “a times seniority” is easy, but what about the next part ? some value times education ?? ie. 2.23 times M.CS., that doesn’t really make sense. Let us examine the data types of the input values.

What are the possible input values to a Machine Learning function ?

From a computer science point of view there are 3 different types of input variables to a machine learning function: Numbers, Booleans and Enumerations. Numbers are types like int, long, decimal and float. Booleans are True/False or 1/0 as any programmer would expect.

Enumerations

Enumerations are fundamentally values of type enum to a C# or Java programmer. In reality these values are often of type string, but inside the prediction function they are treated as enums. In our wage example education, zip and country code are examples of this type. Exactly how an enum is used internally in a prediction function is not important from a computer science point of view.

The fact that you can only use numbers, booleans and enumerations might seem like a limitation, but when you start working with it you’ll see that is it not that bad 🙂

The point of Machine Learning

The really interesting point about machine learning is that you don’t have to write the prediction function. There are already libraries with a number of different types of functions out there for you to use. The challenge when building a solution that uses Machine Learning is to select the right function and configure it in an optimal way.

In my next post I will describe a set of steps involved in building a Machine Learning solution.

Getting started with machine learning

Machine Learning has been on my list of things that I really want learn more about for quite some time now. My ambition is to be able to build software solutions that leverages these techniques.

The Machine Learning toolset should be part of my “software development tool belt”. I don’t need to understand how the toolset was built. I just want to understand how to use it.

Too much math !?

I have tried to read blog posts and watched tutorials about it a couple of times. Most of them start out with a good intro, but before you know it they also get into some pretty hardcore math. As an engineer it looks like you need a PhD or Master in statistical math to understanding it.

Then I stumbled upon this url www.StatLearning.com and the book “An Introduction to Statistical Learning“. Statistical Learning is mathematicians speak for Machine Learning in computer science. This book is written for people who want to use/apply machine learning without understanding all the inner details. This book sounds exactly like what I’m looking for :-).

But after reading chapter 2 of the book it was clear that it is after all written by mathematicians. This book however is the best read on the topic I have found so far, and it is fairly comprehensible to an engineer like me.

So in the following posts I will try to distil the book from a software development point of view. My goal is to describe how to get started with as little math as possible.

Read my next post: What is Machine Learning ?

DRY Spaghetti vs. Copy/Paste code

I have never meet a software developer that doesn’t know the DRY (Don’t Repeat Yourself) principle. It fits perfectly with the concept of code reuse that most of us learned about when we started learning how to write software. DRY is a simple principle that all the software developers I’ve worked with understands really well. We’ve all learned it to the extend where DRY code is equal to or at least part of what we consider to be good clean code. Yes, there are a lot of good things to say about DRY code.

But… yes, I’m sorry there is a but…

When you gain more experience and start working with larger systems and code bases you realize, that as in almost every other aspect of life the real world is seldom black or white, but comes in many different shades of gray. DRY comes at cost and there are tradeoffs to be made.
But why isn’t DRY code always the right thing? The short answer to that question is “Dependencies”. If you pursue a codebase that is completely DRY, you will take on more and more dependencies. If you go all in on DRY you are at high risk of developing a code base that resembles “Spaghetti code” with dependencies all over the place, where even small changes can cascade through large portions of your code.

DRY comes in shades of gray

If you have participated in architecting, building and/or maintaining a number of software solutions you will most likely have learned that code with many dependencies is not a good thing. I think it was Frank Buschmann that once said something like “Being an architect is mostly about managing dependencies”. Managing a lot of dependencies sounds like a lot of work, so what do we try to do then? Well I always try to reduce the number of dependencies in any way I can. Sometimes I even choose to break the DRY principle and copy some code into my project instead of taking a dependency on some other library. Yes, I’m admitting to it I sometimes deliberately choose to break the DRY principle in favor of keeping the number of dependencies down.

The DRY Spaghetti vs Copy/Paste scale

I sometimes consider DRY and the number of dependencies like each end of a scale. On one end of the scale you have DRY code with a ton of dependencies aka “DRY Spaghetti” and on the other end you have no dependencies and a lot of duplicated code aka “Copy/Paste”. Choosing black, white or some shade of gray on this scale is not an easy choice, because it often feels like choosing between two evils, but being conscious about the fact that you are making this choice can help you build a better solution.

Getting DRY with no spaghetti, is it possible ?

Of cause the best solution would be to break the scale so you can have both. There are a lot of great tools out there like design patterns and the SOLID principles that can help you. These tools can help you not to end up having to choose where you want to be on this scale, but they are not always enough. In fact, whenever you add a reference to an external library you are making a choice on the “DRY Spaghetti vs. Copy/Paste” scale.

What should you do ?

There is always some future cost of adding a dependency to an external library, and with tools like NuGet, NPM and the like it has become almost too easy to add dependencies. So when you are about add a reference to some external library always ask yourself the question: How can I avoid adding a reference to this library? If there is a fairly simple solution to not adding the dependency, then don’t. If it will take you a lot of time and/or money to avoid it then add it, but be aware of the future cost of managing the dependency.