Page 1 of 2 12 LastLast
Results 1 to 10 of 18

Thread: Parallel Programming in C#

  1. #1
    Join Date
    Sep 2004
    Posts
    1,905
    Rep Power
    21

    Default Parallel Programming in C#

    My thoughts:

    Normally when I do a foreach loop, I would not be aware of many cores being available, and so I would write:


    Code:
        foreach (int i in intArray)
        {
            dosomething(i);
        }
    If "dosomething" takes a very long time, and the "intArray" is small, then it would be better to write:

    Code:
        Parallel.ForEach (intArray, i=>dosomething(i));
    If "dosomething" takes a moderate time, and the "intArray" is very large, then it would be better to write:
    Code:
        int c = intArray.Length;
    
        //divide array into 4 parts, so at least a quad core can take advantage.
        //do not want it to divide too much, depends on how large array is.
        //if it is really large, then divide array by 16 or more
    
        int part1 = 0;
        int part2 = (c / 4) * 1;
        int part3 = (c / 4) * 2;
        int part4 = (c / 4) * 3;
        
        //this would be created in a thread, maybe using ParameterizedThreadStart
        //I can update it later
        //thread 1 for simplicity
        for (int i = part1; i < part2; i++)
        {
            dosomething(i);
        }
    
        //for Simplicity :) thread 2 below would use Start method, threading code omitted
        //thread 2
        for (    i = part2; i < part3; i++)
        {
            dosomething(i);
        }
    
        //For simplicity :)
        //thread 3
        for (    i = part3; i < part4; i++)
        {
            dosomething(i);
        }
    
        //thread 4
        for (    i = part4; i < c; i++)
        {
            dosomething(i);
        }
    It would be nice to add "various size - array dividing" snippets into Visual studio.

    This code should run faster when it runs on a processor with many cores.


    There are other parallel classes in .Net 4.5 like ConcurrentDictionary, but I think that would give you a similar performance like Parallel.ForEach where each thread on each element, I think. (It depends, I also think, you may not benefit from many threads on very small data. Well actually yes, but having fewer threads with larger data may be better. We could bench it and find out)

    What are your thoughts?

    What other parallel code could we use, and for what situations?
    Let's act on what we agree on now, and argue later on what we don't.
    Black men leave Barbeque alone if Barbeque don't trouble you

  2. #2
    Join Date
    Feb 2003
    Posts
    3,184
    Rep Power
    0

    Default

    What are your thoughts?
    You are creating a more complex program in order to utitilize the cores. Better to just make the program faster than to make it more complex. there are instances where you want complexity but its usually in the later stages of development. i.e. dont start complex.

    What other parallel code could we use, and for what situations?
    Mostly movies, videos and data processing; databases and search. Going parrallell really works best when you have nothing shared between the processes. Here is what you can try; get 2 or 3 computers on a network. then have a 4th computer send a question to the other 3 computers. since the 3 computers will have separate files, RAM, harddrives and cpus you would technically get a faster answer than if you ran the same question on one computer. The more computers you add to the network the faster you will get answers. The problem you will have is figuring out how to share the data between the computers and arrange all the answers you will be getting back.

  3. #3
    Join Date
    Sep 2004
    Posts
    1,905
    Rep Power
    21

    Default

    Quote Originally Posted by owen View Post
    What are your thoughts?
    You are creating a more complex program in order to utilize the cores. Better to just make the program faster than to make it more complex. there are instances where you want complexity but its usually in the later stages of development. i.e. don't start complex.
    Interesting.

    Get the program done first.

    Then add optimizations.

    Makes sense.
    Let's act on what we agree on now, and argue later on what we don't.
    Black men leave Barbeque alone if Barbeque don't trouble you

  4. #4
    Join Date
    Sep 2004
    Posts
    1,905
    Rep Power
    21

    Default

    I was thinking about the underlying mechanism for the Parallel.ForEach

    If the array, is very large, like 100,000, and DoSomething takes a long time, like 30 secs, then would it be wisest to initialize so many threads at once?!

    Consider how the OS handles threads. It would give each thread a time slice, and CPU execution would "switch" amongst the threads. But if too many threads are running, then the thread data may not hold in the nearest cache, or even the L3 cache, and this could reduce performance. Some throttling of the amount of threads being initialize could be done and limit the total threads initialized, until existing threads complete. Another approach would be to break down the DoSomething task itself into smaller threads that could finish in the time slice that the OS gives it.

    The simplest way that I can think of at this time would be to use:

    Code:
    //break array into say 64 parts
    
    //use a 'master' thread to issue Parallel.ForEach on the first part of the array, starting 1562 'helper' threads (100,000/64)
    
    //simultaneously use a next 'master' thread to issue Parallel.ForEach on the second part of the array, another 1562 threads issued
    
    //As soon as any of the 'master' thread completes and terminates, another 'master' thread would be waiting to start
    // and to issue Parallel.ForEach on subsequent parts of the array.
    
    //How ever, I would have to bench an implementation like this to see if there is any performance benefits.
    // At any time, there should be about 3 thousand threads issued by the app as opposed to 100,000.
    // However, I do not know if some built in mechanism already exists in Windows to limit the maximum amount of threads that can be created.
    Optimizing code does seem to add much more complexity. It even depends a bit on the target processor, for example, the numbers that I chose were just arbitrary, but I had in mind a quad core or 8 core target processor.

    Could one day the compiler, automatically optimize code at the thread level.
    Let's act on what we agree on now, and argue later on what we don't.
    Black men leave Barbeque alone if Barbeque don't trouble you

  5. #5
    Join Date
    Feb 2003
    Posts
    3,184
    Rep Power
    0

    Default

    Just write the program and do the tests. It's the best way to see for yourself what works and what doesn't. Copy the program to different computers and see how fast it does a task. At the end of the day everything is bound by physics.

  6. #6
    Join Date
    Oct 2005
    Posts
    745
    Rep Power
    0

    Default rofl

    Quote Originally Posted by owen View Post
    Just write the program and do the tests. It's the best way to see for yourself what works and what doesn't. Copy the program to different computers and see how fast it does a task. At the end of the day everything is bound by physics.
    Oooooh, so you are THE OWEn!
    3.14159265358979323846264338327950288
    4197169399375105820974944592307816406
    28620899862803482534211706798 pi 101

  7. #7
    Join Date
    Sep 2004
    Posts
    1,905
    Rep Power
    21

    Default

    I am trying to code this.

    I started to make some modifications to the task so that it will return a value instead of task with no return.
    I am trying to change this
    Code:
        Parallel.ForEach (intArray, i=>dosomething(i));
    into something that will return a value, somehow.



    The Parallel.ForEach has a few overload methods.

    • I found out that one parameter, ParallelOptions.MaxDegreeOfParallelism can be use to limit the amount of threads created. How cool!


    • Furthermore, I found that this same Parallel class already partitions the array (using the automatic default implementation in the class) or it can use a custom manual implementation, eg Partitioner.Create i.e See below.

      https://docs.microsoft.com/en-us/dot...ocal-variables

      https://opbuildstorageprod.blob.core...rogramming.pdf


      When a parallel loop runs, the TPL (Task Parallel Library) partitions the data source so that the loop can operate on multiple parts
      concurrently. Behind the scenes, the Task Scheduler partitions the task based on system resources and workload.
      When possible, the scheduler redistributes work among multiple threads and processors if the workload becomes
      unbalanced.
      The TPL partitioners also support a dynamic number of partitions.This means they can create partitions on-the fly,
      for example, when the ForEach loop spawns a new task. This feature enables the partitioner to scale together
      with the loop itself. Dynamic partitioners are also inherently load-balancing.When you create a custom
      partitioner, you must support dynamic partitioning to be consumable from a ForEach loop.
    • Also, there is an overload of Parallel.ForEach that can return a value

      https://docs.microsoft.com/en-us/dot...ocal-variables

      Code:
      using System;
       using System.Linq;
       using System.Threading;
       using System.Threading.Tasks;
      
       class Test
       {
           static void Main()
           {
               int[] nums = Enumerable.Range(0, 1000000).ToArray();
               long total = 0;
      
               // First type parameter is the type of the source elements
               // Second type parameter is the type of the thread-local variable (partition subtotal)
               Parallel.ForEach<int, long>(nums, // source collection
                                           () => 0, // method to initialize the local variable
                                           (j, loop, subtotal) => // method invoked by the loop on each iteration
                                           {
                                               subtotal += j; //modify local variable
                                               return subtotal; // value to be passed to next iteration
                                           },
                   // Method to be executed when each partition has completed.
                   // finalResult is the final value of subtotal for a particular partition.
                                           (finalResult) => Interlocked.Add(ref total, finalResult)
                                           );
      
               Console.WriteLine("The total from Parallel.ForEach is {0:N0}", total);
          }
      }
      // The example displays the following output:
      //        The total from Parallel.ForEach is 499,999,500,000



    However, I have NOT figured out how to use the msdn example to fit my return value as yet.

    Doing some research.
    Last edited by crosswire; Jul 23, 2017 at 04:49 PM.
    Let's act on what we agree on now, and argue later on what we don't.
    Black men leave Barbeque alone if Barbeque don't trouble you

  8. #8
    Join Date
    Sep 2004
    Posts
    1,905
    Rep Power
    21

    Default

    This is what I have:


    Code:
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    
    namespace ThreadFeeder
    {
        class WorkLoad
        {
            struct WorkLoadData
            {
                public ulong i1;
                public ulong i2;
            }
    
            //The objective is to find a count of all prime numbers between two given numbers
            //64 MB Data
            const uint size = 1024 * 1024 * 8;
            // long[] intArray1 = new long[size];
            WorkLoadData[] arrWorkLoadData = new WorkLoadData[size];
            uint[] uintResults = new uint[size];
    
            void Initialize()
            {
                //load intArray;
            }
    
    
            void Method1()
            {
                //Simpliest way using one thread
    
                for (uint i = 0; i < size; i++)
                {
                    uintResults[i] = dosomething(arrWorkLoadData[i]);
                }
            }
    
            void Method2()
            {
                ParallelOptions opt = new ParallelOptions();
                opt.MaxDegreeOfParallelism = 8;
    
                Parallel.ForEach(arrWorkLoadData, opt, i => dosomething(i));
            }
    
    
    
            uint dosomething(WorkLoadData w)
            {
                return ThreadFeeder.PrimeHelper.CountAllPrimesBetweenTwoNums(w.i1, w.i2);
            }
        }
    }
    I want
    Code:
    Parallel.ForEach(arrWorkLoadData, opt, i => dosomething(i));
    to return data into
    Code:
    uintResults
    Let's act on what we agree on now, and argue later on what we don't.
    Black men leave Barbeque alone if Barbeque don't trouble you

  9. #9
    Join Date
    Sep 2004
    Posts
    1,905
    Rep Power
    21

    Default

    Well. I found a solution. Don't know if it's the best. Plus, it still has issues.


    I tested the code as a WinApp using VS Express 2013 for Win Store App. My VS for desktop is on a different machine.






    I ran the code and just basically tested Parallel.ForEach as it seem to be able to cover other conditions by just configuring its parameter(s).

    As, expected, multi-threaded method (Method 2) was faster than single thread (Method 1).

    Method 2 in the pic uses the parallel approach, and the max degree of parallelism was configured at 1, 2, 4, 8, 16, 64, 256. The times are posted beside the button for the different degree of parallelism.

    Method 1 uses a normal for loop and is thus only 1 thread.

    Note that the thread data is small for 256 elements in the array and should not overload the cache size of the processor which was a dual core i5 4200U cpu with hyper-thread (= 4 logical core) with 3MB L3 cache.

    This was tested on my work laptop, but I plan to test it on my personal laptop that has a quad core i7-6700HQ HP Signature Edition. Also my work laptop is running many threads in the background and is not really suited for testing. See pic of task manager when the app is NOT running. When the app is running on single thread there was a ~16% utilization increase on all logical cores. Since so many other threads are running, it probably drowned out the one single thread so that there was a low cpu utilization. However, I will see how it behaves on another machine.

    Afterwards, I would like to change the algorithm from "count all primes between two given numbers" to "get the single maximum product (of all the elements in the array, for each data element) the first prime multiplied by the last prime (if present) between the two given numbers. This result would not be an array but a single scalar max. This would be more interesting to code. Any suggestions?

    The code is below
    Let's act on what we agree on now, and argue later on what we don't.
    Black men leave Barbeque alone if Barbeque don't trouble you

  10. #10
    Join Date
    Sep 2004
    Posts
    1,905
    Rep Power
    21

    Default

    WorkLoad.cs

    Code:
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    
    namespace ThreadFeeder
    {
        class WorkLoad
        {
            struct WorkLoadData
            {
                public ushort i1;
                public ushort i2;
                public uint iResult;
            }
    
            //The objective is to find a count of all prime numbers between two given numbers
    
            //64 M items is way too big for this slow algorithm
            //const uint size = 8;
            public const uint size = 256;
            //const uint size = 1024 * 1024 * 64;
    
            // long[] intArray1 = new long[size];
            WorkLoadData[] arrWorkLoadData = new WorkLoadData[size];
    
            public int LevelOfParallelism = 16; //chaged by UI
    
            public void Initialize()
            {
                //load arrWorkData
    
                Random r = new Random(System.DateTime.Now.Second);
    
                byte[] b = new byte[sizeof(ushort)];
    
                for (uint i = 0; i < size; i++)
                {
    
                    r.NextBytes(b);
    
                    arrWorkLoadData[i].i1 = BitConverter.ToUInt16(b, 0);
                    //arrWorkLoadData[i].i1 = BitConverter.ToUInt64(b, 0);
    
                    r.NextBytes(b);
                    arrWorkLoadData[i].i2 = BitConverter.ToUInt16(b, 0);
                    //arrWorkLoadData[i].i2 = BitConverter.ToUInt64(b, 0);
    
                }
            }
    
    
            public void Method1()
            {
                //Simpliest way using one thread
    
                for (uint i = 0; i < size; i++)
                {
                    dosomething(ref arrWorkLoadData[i]);
                }
            }
    
    
            public void Method2()
            {
                ParallelOptions opt = new ParallelOptions();
                opt.MaxDegreeOfParallelism = LevelOfParallelism;
    
                Parallel.ForEach(arrWorkLoadData, opt, i => dosomething(ref i));
            }
    
    
            void dosomething(ref WorkLoadData w)
            {
                w.iResult = ThreadFeeder.PrimeHelper.CountAllPrimesBetweenTwoNums(w.i1, w.i2);
            }
        }
    }
    Let's act on what we agree on now, and argue later on what we don't.
    Black men leave Barbeque alone if Barbeque don't trouble you

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •