Sometimes it happens that we have a really big .csv (~100 MB) file and we need to import it to our RoR application database. We can try to use the built-in class:
require 'csv' def import(file) CSV.foreach(file.path) do |row| product = Product.from_csv_row(row) product.save! end end
However, it will take too much time for .csv files to be imported. What is more, you will have to import the .csv file again in case some rows are broken.
So how to solve this problem?
If your models are not linked you can just use activerecord-import as mass insertion is much faster in this case.
require 'csv'
def import(file)
products = []
CSV.foreach(file.path) do |row|
products << Product.from_csv_row(row)
end
Product.import products
end
If the models are linked let’s try some tricks.
At first, I would advise to split the full import process into chunks. It gives 4 advantages:
1. You don’t have to wait forever untill .csv is opened.
2. You don’t have to start from the begining if something goes wrong.
3. You can monitor process status.
4. It is easier to debug.
Secondly, it’s better to keep the last inserted id from .csv (or just row number) in your database.
class AddCsvRowId < ActiveRecord::Migration
def change
change_table :products do | t |
t.integer :csv_row_id
end
end
end
I have used SmarterCSV gem. It allows you to read .csv files by chunks. Therefore we get
def import(file)
total_chunks = SmarterCSV.process(
file.path,
{chunk_size: 10000}
) do |chunk_products|
if Product.last.present?
chunk_products = chunk_products.select {|product|
product[:id].to_i > TimeBlock.last.csv_row_id
}
end
chunk_products.each do |row|
next if Product.exists?(csv_row_id: row[:id].to_i)
product = Product.from_csv_row(row)
next unless product.valid?
product.images << Image.from_csv_row(row)
product.tags << Tag.from_csv_row(row)
product.save!
end
end
puts total_chunks.to_s
end
And finally, it is better to move import to background process.
task import_products: :environment do
ImportProductsJob.perform_later('/path/to/file.csv')
end
class ImportProductsJob < ActiveJob::Base
rescue_from(StandardError) do
retry_job wait: 5.minutes, queue: :low_priority
end
def perform(*args)
import(args[0]) if File.exist?(args[0]) &&
File.extname(args[0]) == 'csv'
end
def import(file)
# process file
end
end
Conclusion
So .csv file import is not that diffucult if you have some basic tricks up your sleeve and use these links:
2. SmarterCsv
3. ActiveJob