Base solution for your next web application
Open Closed

Ef core gets slow in long Iteration Loops #9227


User avatar
0
adamphones created

Hi,

We would like to import data into system from CSV file. We created a process read cv file each row one by one and create necessary Entities and sub entities by using repository pattern and save changes by calling await CurrentUnitOfWork.SaveChangesAsync(); or await _productRepository.InsertAndGetIdAsync(product);

However, the process run very fast in the first 10-20 iterations then gets slower and slower and It takes over 10 minutes to complete around 150 rows!

I looked up on the internet and found this link where @Rick explained the problem: https://weblog.west-wind.com/posts/2014/dec/21/gotcha-entity-framework-gets-slow-in-long-iteration-loops.

What I tried: 1- call AsNoTracking() in all repository access but I still see that process very slow.

2- Used using (var unitOfWork = _unitOfWorkManager.Begin()) {...} in the iteration thinking that this would refresh the context in each iteration as the link suggests but no luck with this either.

Do you have any idea how to overcome this issue? How do you import large users into the system?

Regards,


5 Answer(s)
  • User Avatar
    0
    maliming created
    Support Team

    hi

    Can you share some code?

  • User Avatar
    0
    adamphones created

    Hi,

    I knew you would ask that.. :)

    It is nothing to do with the code but I will put an example here.

    public Task ImportData()
    {
    
     var productsToImport =  await anotherDataSourcerepository.GetAll().AsNoTracking().ToListAsync();
     
     foreach(var productImport in productsToImport)
        {
             var existingProduct = _productRepository.GetAll().AsNoTracking().FirstOrDefaultAsync(x=>x.Name == productImport.Name);
    
            if(existingProduct==null)
            {
                 var product = new Product(){Name= productImport.Name};
                 await _productRepository.InsertAsync(product)
                 await CurrentUnitOfWork.SaveChangesAsync();
             }
         }
    }
    

    If I have 10 products to import finishes in less than a second(instantly). If I have 100 products then this takes over 3 minutes! It doesn't add up. Clearly db context getting bloated as the link suggests with the records and it gets slower and slower.

    As the link suggests I would need to re-create the db context in the iteration instead of using single context.

    But how to do that?

    The key is to import data as fast as possible.

    is there a way to create new db context in the loop? something like : var productRepostotry = new DbContext()..

    Regards,

  • User Avatar
    0
    musa.demir created

    Hi @adamphones

    Looping for sql queries is not a good approach. You may query all your needs at once and use it then. For example you can change your code as seend below:

    public Task ImportData()
    {
        var productsToImport =  await anotherDataSourcerepository.GetAll().AsNoTracking().ToListAsync();
        var productImportNames = productsToImport.Select(x=>x.Name).ToList();
    
        //get all of them at once
        var productDictByName = _productRepository.GetAll()
            .Where(product=> productImportNames.Contains(product.Name))
            .Select(x=>x.Name)
            .ToHashset();
     
        foreach(var productImport in productsToImport)
        {
            if(!productDictByName.ContainsKey(productImport.Name))
            {
                 var product = new Product(){Name= productImport.Name};
                 await _productRepository.InsertAsync(product)
                 await CurrentUnitOfWork.SaveChangesAsync();
             }
         }
    }
    
  • User Avatar
    0
    adamphones created

    Hi Musa,

    I appriciate your help. I am reading the info from a csv file where there are over 10000 records. I assume you wouldnt want to me to load all data in the memory and deal with that in the memory? The process needs to do the job row by row. Read > check > write and commit. That is the requirement.

    In your example If I have entries repeated in the source file then I would have to distinct those too. But Sometimes the row product name can be the same and I don't need to create a new product but I will need to use other columns in that row to add new child properties (ProductItems) in the product.

    Can you please let me know how I would initiate the context in the method? That is all I need to know.

    Thanks

  • User Avatar
    0
    musa.demir created

    If it is what you need you should create a new unit of work for each transaction and save in at the end of the insert.

    [UnitOfWork(IsDisabled = true)]//to prevent auto unit of work creation in app service
    public Task ImportData()
    {
        //...
        foreach(var productImport in productsToImport)
        {
            using (var uow = _unitOfWorkManager.Begin())
            {
                //do your logic
                 uow.Complete();
            }
        }
    }
    

    It will create a transaction one by one.

    You can also check that: https://github.com/aspnetzero/aspnet-zero-core/pull/3314