I am importing data from a CSV file into a SQL Server DB, the CSV may contain duplicate entries. My existing code uses SqlBulkCopy() and "IGNORE_DUP_KEY = ON", all is well with duplicates.
I am switching to code first using EF Core with Zzz Projects Entity Framework Extensions.
Primary keys are defined in DbContext:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
if (modelBuilder == null)
throw new ArgumentNullException(nameof(modelBuilder));
modelBuilder.Entity<AmbientWeatherData>()
.HasKey(u => new { u.TimeStamp });
modelBuilder.Entity<ApcUpsData>()
.HasKey(u => new { u.TimeStamp });
modelBuilder.Entity<PurpleAirData>()
.HasKey(u => new { u.TimeStamp });
modelBuilder.Entity<Vr1710Data>()
.HasKey(u => new { u.TimeStamp });
modelBuilder.Entity<Vr1710EventData>()
.HasKey(u => new { u.TimeStamp, u.Type });
modelBuilder.Entity<WeatherGooseData>()
.HasKey(u => new { u.TimeStamp });
modelBuilder.Entity<WundergroundData>()
.HasKey(u => new { u.TimeStamp });
modelBuilder.Entity<WundergroundDailyData>()
.HasKey(u => new { u.TimeStamp });
}
Data is inserted using BulkInsert:
dbContext.BulkInsert(data, options => options.AllowDuplicateKeys = true);
Exception when inserting duplicates:
Microsoft.Data.SqlClient.SqlException (0x80131904): Violation of PRIMARY KEY constraint 'PK_Vr1710Data'. Cannot insert duplicate key in object 'dbo.Vr1710Data'. The duplicate key value is (2014 - 12 - 28 14:26:07.0000000).
The function that inserts the data is generic and type agnostic.
public static bool Import<TDataType, TLoaderType, TConfigType>(string fileName, string connectionString)
where TLoaderType : TelemetryLoader<TDataType, TConfigType>, new()
where TConfigType : TelemetryConfig
where TDataType : TelemetryData
{
// ...
}
The online examples for BulkInsert() and AllowDuplicateKeys require the use of ColumnPrimaryKeyExpression
From: https://entityframework-extensions.net/bulk-insert#insert-only-if-the-entity-not-already-exists
context.BulkInsert(customers, options => {
options.InsertIfNotExists = true;
options.ColumnPrimaryKeyExpression = c => c.Code;
});
How can I ignore duplicates without needing a specialization for ColumnPrimaryKeyExpression for every data type, but instead use my keys as defined in the DbContext?
How can I ignore duplicates without needing a specialization for ColumnPrimaryKeyExpression for every data type, but instead use my keys as defined in the DbContext?
By default, the library already takes the key from the DbContext.
In the example, c.Code was representing a custom key (The CustomerID is the real key).
If I understand correctly the problem, you need to use the following code:
context.BulkInsert(customers, options => {
options.InsertIfNotExists = true;
options.AllowDuplicateKeys = true;
});
Both options are required.
User contributions licensed under CC BY-SA 3.0