What Is the Fastest Way To Do a Bulk Insert? Let’s Find Out

1 месяц назад

10,140 Просмотров

Комментарии:

@Ivang017 - 30.05.2024 07:46

Hey Milan, any discount incoming for the The Ultimate Modular Monolith Blueprint course? I bought your Clean Architecture course and I loved it. Just wondering if there is a sale soon or discount for the Modular Monolith Course. Thanks

Ответить

@ExtremeTeddy - 29.05.2024 23:01

All shown methods are slow compared to „load data from file“. Whenever possible for large data imports use load data from file. It will load gigabyte of data within seconds. One of the best approaches in my experience is to create a temporary table for the datasource and do the load data file command. Then perform the inserts to the entity tables on the database server.
Only issue / drawback can be the network connection when loading large datasets.

Ответить

@pilotboba - 29.05.2024 21:44

It also looks like there is a library called dapper plus that has bulk insert feature as well. Also a commercial paid library.

Ответить

@pilotboba - 29.05.2024 21:35

A few things.

You never adjusted the batch size for EF Core. It is possible to speed up inserts by increasing the batch since. I think by default it is 100.
Also bulk-copy has a way to set the batch size. By default I believe it is set to 0 which means all rows. But, its recommended to use it.

Bulk-copy by default does a non-trasacted insert. So, if there is an issue there is no way to roll it back. There is an option to have it use a transaction, but I assume that will slow it down a bit.

I'm curious if you match the bulkcopy and efcore batch size settings and enable Internal transactions in bulk-copy if the speeds would be closer?

I'm not sure, but did your code create the collection each time? Perhaps to remove the overhead of that you could create the user collection in the constructor?

Ответить

@belediye_baskani - 29.05.2024 21:32

What do you think about Bulk Update? Can you run Benchmark for us?

Ответить

@antonmartyniuk - 29.05.2024 14:23

Imho, it's a shame, but really a marketing issue that we have to do so a lot of work to perform bulk inserts in sql databases, unless you use a library.

NoSQL databases such as MongoDb have bulk operations out of the box and completely free and easy to use

Ответить

@antonmartyniuk - 29.05.2024 14:21

Surprisingly Dapper doesn't perform well. Still I would like to see results when using Dapper with SQL Bulk Insert command.

I personally have used a EFCore.Extensions library, which is a paid one, to do the bulk inserts. My company bought a license for this library and it saved many development days for such things as bulk merge and bulk synchronize operations.
Interesting to compare its performance to sql bulk copy class

Ответить

@EzequielRegaldo - 29.05.2024 12:33

So DataTable is EF core without paid lib?

Ответить

@harshakumar6890 - 29.05.2024 12:16

It's possible to check for dapper with executing SP that accept UDT table as parameter?

Ответить

@vasiliylu8054 - 29.05.2024 11:01

Thank you, Milan! This video must be in top.

Ответить

@timur2887 - 29.05.2024 10:10

BulkExtensions just for lazy programming) BulkCopy rulez!)

Ответить

@-INC0GNIT0- - 29.05.2024 09:44

Thanks for doing the research !
Very insightful investigation

Ответить

@islandparadise - 29.05.2024 06:45

Love this. One quick qn: For the EFCore approaches, would the performance be consistent on Postgres as well as SQL server?

Ответить

@wabriones276 - 29.05.2024 03:29

As my mentor always said, you can never beat microsoft in optimizing bulk inserts.

I also want to put out there that using that if you have the table ready in the database, then using an IDataReader with SqlBulkCopy will yield even more speed. We're talking milliseconds for a million++ rows.

Ответить

@10Totti - 29.05.2024 00:15

Another best tutorial!
Thanks!

Ответить

@musaalp4677 - 28.05.2024 23:48

Have you try OpenJson or other json structure with raw sql query?

Ответить

@Bennevisie - 28.05.2024 21:12

SQLBulkcopy. It will suspend constraints on the table during insert.

Ответить

@MatthewCrawford - 28.05.2024 20:26

My repository creates a datatable, inserts 25K records from a passed collection, then it sends that datatable to a sproc.
I use this same format for all Sel, Del, Upsert.

Dynamic SQL is slower than sprocs.

Ответить

@lolyasuo1235 - 28.05.2024 20:01

How dapper can be 5 times slower at 1m records than efcore addall? This doesn't make sense at all.

Ответить

@plamentodorov5117 - 28.05.2024 19:08

Just a suggerstion. Prepare seriaes for GrapgQL with hotchocolate

Ответить

@rsrodas - 28.05.2024 17:01

Another alternative, if you already have files ready to import, is to use inside of SQL Server the OPENROWSET command:
INSERT INTO Table (
Col1, Col2, ...
)
SELECT
Col1, Col2, ..
FROM OPENROWSET(
BULK 'c:\myfile.txt', FORMATFILE='c:\format.xml'
)
In the XML file, you define the rules in how the file you want to import is formatted (fixed size, comma split, etc...)

Ответить

@alexanderst.7993 - 28.05.2024 17:01

Milan just wanted to say, thanks to you and your C# videos, i managed to land a job. Appreciate ya pal ;)

Ответить

@anonymoos - 28.05.2024 15:44

I like using the BulkCopy function, it is amazingly fast for importing large datasets. One thing to note is that the column names specified for the database side are case sensitive. If there's a mismatch in case on column names, the import will fail. You can also eek out even more performance by tweaking the batch size in BulkCopy using `bulk.BatchSize = 10_000;`. Actual performance will vary based on how many columns you're inserting.

Ответить

@xtazyxxx3487 - 28.05.2024 15:40

Can you try to concatenate the insert query then try sql raw query with ef and see the results

Ответить

@pavfrang - 28.05.2024 15:27

Thanks for the video! I did not know the EF bulk extension - it seems very practical.
I deal with large datasets only - that's why I use SqlBulkCopy for almost all insert operations. ( + some wrappers to allow dynamic DataTable building before passing to SqlBullkCopy).

Ответить