Optimise your Code! Multithreading in C#

Are you multithready ready?

What is it?

Multithreading is using separate threads to complete separate tasks.

If you go too far down the ‘what is a thread’ rabbit hole, it all gets a bit philosophical, so as a basic definition, a thread is part of your program that can be executed independently from the rest of the program.

Your code typically runs on the main thread. The main thread will take your code one line at a time, and execute each statement in turn (disclaimer – not true if you’re coding async styley, but that’s another blog). When one bit finishes, the next bit starts. Multithreading allows your program to execute multiple bits of your code at the same time.

When Should you Use it?

For sloth code!

You should use it when you’ve got a particularly slow piece of code. This could be a hefty calculation, a big database query, or you’re getting data from an API and you’ve got no idea about its latency. If you’ve got code that runs slow at any point (or even the potential to run slow), run that function away from the main thread.

I use it all the time when I’ve got a big data pull (or a badly written stored procedure that runs like treacle). Say you’re running an application that displays all of a company’s 500 employees (yeah, that’s not huge, but depending on your connection, it can be enough to hang for half a second longer than your user would like). In a single-threaded application, you open the employee screen and the whole application freezes for two seconds while the data is pulled in. In a multithreaded world, the UI opens in a fraction of a second and lets you click around, while the data pull is done by a thread in the background. Fair enough, you don’t see any employees in the employee dropdown until the worker thread has finished, but at least you didn’t have to deal with a loading spinner.

Can I have an example plz?

We'll use the example above - we're pulling in employees from a database, but we've written the sql query really badly, and there's no primary keys or indexes on the tables, so it's going to be a dead slow query. We'd like the rest of the program to carry on while the employees are loading.

Start by creating the thread. I'll take this opportunity to tell you not to call it 'myThread' or 'thread1'. Shall we all agree to help the people maintaining our code and give it a name that explains why it's being used? Cool.

You've got a thread, but it needs to know what to do, so we need to pass it a function. Notice that we pass the function in without parentheses. When you include the parentheses on the function, it calls it immediately. We don't want that, we want it to run when we start the thread, so we pass it as a callback, leaving out the parentheses.

Then, we need to start the thread. As soon as you start the thread, the function you passed in starts to run.

BUT! What if we only want employees from a certain department? You pass arguments to a function using parentheses, but we've got no parentheses when we pass the function to the thread...

For this, use a lambda expression. You don't really need to know how these work (but go look up how they work anyway), just use the pattern and switch out your own function name and parameters.

And that's basically it.

That's a lie. There's quite a lot more to it. There's the thread pool and managing threads using interrupt, wait, join, suspend, resume and all sorts of other stuff, but this will get you started experimenting with threads. I'll add this other stuff to my list of blogs to write.

Downsides

It can be really tempting to put everything into separate threads to really optimise your code. This can cause really convoluted code that's difficult to maintain. You could also end up with parts of your code throwing errors because it's reliant on the results of a function that's running in a separate thread and hasn't finished yet. It also makes debugging a little more difficult.

You can also end up with race conditions. If two threads are trying to access/modify data at the same time, you're going to end up with weird results or just full deadlocks.

Others might argue against this, but I'd only use threads when your latency is bad and you really need to. A lot of the time, synchronous is fine.

Is this just async but in a different outfit?

Same purpose, different methods.

Multithreading creates a new thread that works solely on the given task until it's complete. Async performs multiple tasks on one thread. It breaks the tasks into smaller chunks and, when there is downtime on one task, it picks up a chunk from another task.

Here's an example:

You're looking at a website that shows all the books you've read in the last year. You click on a book and expect to see a list of general details along with your review and rating.

If we go down Multithreading Avenue, you've got one thread that connects to the database, runs the query and returns your review and rating. Simultaneously, another thread is out connecting to the Google Books API, retrieving the cover image, blurb and publishing details. When the database thread finishes, the information populates. When the API thread finishes, more information populates.

Across town on Async Boulevard, there is one thread (or none if you dive into the fun async stuff floating around in the internet ether). For a few milliseconds, this thread works on the database task, creating the connection to the database. The database is a bit laggy today, so while it's waiting for the connection to establish, it runs the call to the API. It gets some data back, but the connection to the database is ready, so it switches and runs the database query. While the query is running, it returns to the API function, parses the results of the API call and displays it. Once that’s done, it switches back to the database call, gets the query results, parses them and displays them. Switchy switchy, job done.

Both work.

What are your thoughts? When would you use multithreading? Comment below!