# = csv.rb -- CSV Reading and Writing
# Created by James Edward Gray II on 2005-10-31.
# Copyright 2005 James Edward Gray II. You can redistribute or modify this code
# under the terms of Ruby's license.
# See CSV for documentation.
# Welcome to the new and improved CSV.
# This version of the CSV library began its life as FasterCSV. FasterCSV was
# intended as a replacement to Ruby's then standard CSV library. It was
# designed to address concerns users of that library had and it had three
# 1. Be significantly faster than CSV while remaining a pure Ruby library.
# 2. Use a smaller and easier to maintain code base. (FasterCSV eventually
# grew larger, was also but considerably richer in features. The parsing
# core remains quite small.)
# 3. Improve on the CSV interface.
# Obviously, the last one is subjective. I did try to defer to the original
# interface whenever I didn't have a compelling reason to change it though, so
# hopefully this won't be too radically different.
# We must have met our goals because FasterCSV was renamed to CSV and replaced
# the original library as of Ruby 1.9. If you are migrating code from 1.8 or
# earlier, you may have to change your code to comply with the new interface.
# == What's Different From the Old CSV?
# I'm sure I'll miss something, but I'll try to mention most of the major
# differences I am aware of, to help others quickly get up to speed:
# * This parser is m17n aware. See CSV for full details.
# * This library has a stricter parser and will throw MalformedCSVErrors on
# * This library has a less liberal idea of a line ending than CSV. What you
# set as the <tt>:row_sep</tt> is law. It can auto-detect your line endings
# * The old library returned empty lines as <tt>[nil]</tt>. This library calls
# * This library has a much faster parser.
# * CSV now uses Hash-style parameters to set options.
# * CSV no longer has generate_row() or parse_row().
# * The old CSV's Reader and Writer classes have been dropped.
# * CSV::open() is now more like Ruby's open().
# * CSV objects now support most standard IO methods.
# * CSV now has a new() method used to wrap objects like String and IO for
# * CSV::generate() is different from the old method.
# * CSV no longer supports partial reads. It works line-by-line.
# * CSV no longer allows the instance methods to override the separators for
# performance reasons. They must be set in the constructor.
# If you use this library and find yourself missing any functionality I have
# trimmed, please {let me know}[mailto:james@grayproductions.net].
# See CSV for documentation.
# == What is CSV, really?
# CSV maintains a pretty strict definition of CSV taken directly from
# {the RFC}[http://www.ietf.org/rfc/rfc4180.txt]. I relax the rules in only one
# place and that is to make using this library easier. CSV will parse all valid
# What you don't want to do is feed CSV invalid data. Because of the way the
# CSV format works, it's common for a parser to need to read until the end of
# the file to be sure a field is invalid. This eats a lot of time and memory.
# Luckily, when working with invalid CSV, Ruby's built-in methods will almost
# always be superior in every way. For example, parsing non-quoted fields is as
# == Questions and/or Comments
# Feel free to email {James Edward Gray II}[mailto:james@grayproductions.net]
# This class provides a complete interface to CSV files and data. It offers
# tools to enable you to read and write to and from Strings or IO objects, as
# CSV.foreach("path/to/file.csv") do |row|
# arr_of_arrs = CSV.read("path/to/file.csv")
# CSV.parse("CSV,data,String") do |row|
# arr_of_arrs = CSV.parse("CSV,data,String")
# CSV.open("path/to/file.csv", "wb") do |csv|
# csv << ["row", "of", "CSV", "data"]
# csv << ["another", "row"]
# csv_string = CSV.generate do |csv|
# csv << ["row", "of", "CSV", "data"]
# csv << ["another", "row"]
# == Convert a Single Line
# csv_string = ["CSV", "data"].to_csv # to CSV
# csv_array = "CSV,String".parse_csv # from CSV
# CSV { |csv_out| csv_out << %w{my data here} } # to $stdout
# CSV(csv = "") { |csv_str| csv_str << %w{my data here} } # to a String
# CSV($stderr) { |csv_err| csv_err << %w{my data here} } # to $stderr
# CSV($stdin) { |csv_in| csv_in.each { |row| p row } } # from $stdin
# csv = CSV.new(io, options)
# # ... read (with gets() or each()) from and write (with <<) to csv here ...
# == CSV and Character Encodings (M17n or Multilingualization)
# This new CSV parser is m17n savvy. The parser works in the Encoding of the IO
# or String object being read from or written to. Your data is never transcoded
# (unless you ask Ruby to transcode it for you) and will literally be parsed in
# the Encoding it is in. Thus CSV will return Arrays or Rows of Strings in the
# Encoding of your data. This is accomplished by transcoding the parser itself
# Some transcoding must take place, of course, to accomplish this multiencoding
# support. For example, <tt>:col_sep</tt>, <tt>:row_sep</tt>, and
# <tt>:quote_char</tt> must be transcoded to match your data. Hopefully this
# makes the entire process feel transparent, since CSV's defaults should just
# magically work for you data. However, you can set these values manually in
# the target Encoding to avoid the translation.
# It's also important to note that while all of CSV's core parser is now
# Encoding agnostic, some features are not. For example, the built-in
# converters will try to transcode data to UTF-8 before making conversions.
# Again, you can provide custom converters that are aware of your Encodings to
# avoid this translation. It's just too hard for me to support native
# conversions in all of Ruby's Encodings.
# Anyway, the practical side of this is simple: make sure IO and String objects
# passed into CSV have the proper Encoding set and everything should just work.
# CSV methods that allow you to open IO objects (CSV::foreach(), CSV::open(),
# CSV::read(), and CSV::readlines()) do allow you to specify the Encoding.
# One minor exception comes when generating CSV into a String with an Encoding
# that is not ASCII compatible. There's no existing data for CSV to use to
# prepare itself and thus you will probably need to manually specify the desired
# Encoding for most of those cases. It will try to guess using the fields in a
# row of output though, when using CSV::generate_line() or Array#to_csv().
# I try to point out any other Encoding issues in the documentation of methods
# This has been tested to the best of my ability with all non-"dummy" Encodings
# Ruby ships with. However, it is brave new code and may have some bugs.
# Please feel free to {report}[mailto:james@grayproductions.net] any issues you
# The version of the installed library.
# A CSV::Row is part Array and part Hash. It retains an order for the fields
# and allows duplicates just as an Array would, but also allows you to access
# fields by name just as you could if they were in a Hash.
# All rows returned by CSV will be constructed from this class, if header row
# processing is activated.
# Construct a new CSV::Row from +headers+ and +fields+, which are expected
# to be Arrays. If one Array is shorter than the other, it will be padded
# The optional +header_row+ parameter can be set to +true+ to indicate, via
# CSV::Row.header_row?() and CSV::Row.field_row?(), that this is a header
# row. Otherwise, the row is assumes to be a field row.
# A CSV::Row object supports the following Array methods through delegation:
def initialize(headers, fields, header_row = false)
headers.each { |h| h.freeze if h.is_a? String }
# handle extra headers or fields
@row = if headers.size >= fields.size
fields.zip(headers).map { |pair| pair.reverse! }
# Internal data format used to compare equality.
def_delegators :@row, :empty?, :length, :size
# Returns +true+ if this is a header row.
# Returns +true+ if this is a field row.
# Returns the headers of this row.
@row.map { |pair| pair.first }
# field( header, offset )
# This method will return the field value by +header+ or +index+. If a field
# is not found, +nil+ is returned.
# When provided, +offset+ ensures that a header match occurs on or later
# than the +offset+ index. You can use this to find duplicate headers,
# without resorting to hard-coding exact indices.
def field(header_or_index, minimum_index = 0)
finder = header_or_index.is_a?(Integer) ? :[] : :assoc
pair = @row[minimum_index..-1].send(finder, header_or_index)
# return the field if we have a pair
pair.nil? ? nil : pair.last
# fetch( header ) { |row| ... }
# fetch( header, default )
# This method will fetch the field value by +header+. It has the same
# behavior as Hash#fetch: if there is a field with the given +header+, its
# value is returned. Otherwise, if a block is given, it is yielded the
# +header+ and its result is returned; if a +default+ is given as the
# second argument, it is returned; otherwise a KeyError is raised.
def fetch(header, *varargs)
raise ArgumentError, "Too many arguments" if varargs.length > 1
pair = @row.assoc(header)
raise KeyError, "key not found: #{header}"
# Returns +true+ if there is a field with the given +header+.
alias_method :include?, :has_key?
alias_method :key?, :has_key?
alias_method :member?, :has_key?
# []=( header, offset, value )
# Looks up the field by the semantics described in CSV::Row.field() and
# Assigning past the end of the row with an index will set all pairs between
# to <tt>[nil, nil]</tt>. Assigning to an unused header appends the new
if args.first.is_a? Integer
if @row[args.first].nil? # extending past the end with index
@row[args.first] = [nil, value]
@row.map! { |pair| pair.nil? ? [nil, nil] : pair }
else # normal index assignment
@row[args.first][1] = value
if index.nil? # appending a field
self << [args.first, value]
else # normal header assignment
# <<( header_and_field_array )
# <<( header_and_field_hash )
# If a two-element Array is provided, it is assumed to be a header and field
# and the pair is appended. A Hash works the same way with the key being
# the header and the value being the field. Anything else is assumed to be
# a lone field which is appended with a +nil+ header.
# This method returns the row for chaining.
if arg.is_a?(Array) and arg.size == 2 # appending a header and name
elsif arg.is_a?(Hash) # append header and name pairs
arg.each { |pair| @row << pair }
else # append field value
# A shortcut for appending multiple fields. Equivalent to:
# args.each { |arg| csv_row << arg }
# This method returns the row for chaining.
args.each { |arg| self << arg }
# delete( header, offset )
# Used to remove a pair from the row by +header+ or +index+. The pair is
# located as described in CSV::Row.field(). The deleted pair is returned,
# or +nil+ if a pair could not be found.
def delete(header_or_index, minimum_index = 0)
if header_or_index.is_a? Integer # by index
@row.delete_at(header_or_index)
elsif i = index(header_or_index, minimum_index) # by header
# The provided +block+ is passed a header and field for each pair in the row
# and expected to return +true+ or +false+, depending on whether the pair
# This method returns the row for chaining.
# This method accepts any number of arguments which can be headers, indices,
# Ranges of either, or two-element Arrays containing a header and offset.
# Each argument will be replaced with a field lookup as described in
# If called with no arguments, all fields are returned.
def fields(*headers_and_or_indices)
if headers_and_or_indices.empty? # return all fields--no arguments
@row.map { |pair| pair.last }
else # or work like values_at()
headers_and_or_indices.inject(Array.new) do |all, h_or_i|
all + if h_or_i.is_a? Range
index_begin = h_or_i.begin.is_a?(Integer) ? h_or_i.begin :
index_end = h_or_i.end.is_a?(Integer) ? h_or_i.end :
new_range = h_or_i.exclude_end? ? (index_begin...index_end) :
fields.values_at(new_range)
alias_method :values_at, :fields
# index( header, offset )
# This method will return the index of a field with the provided +header+.
# The +offset+ can be used to locate duplicate header names, as described in
def index(header, minimum_index = 0)
index = headers[minimum_index..-1].index(header)
# return the index at the right offset, if we found one
index.nil? ? nil : index + minimum_index
# Returns +true+ if +name+ is a header for this row, and +false+ otherwise.
alias_method :include?, :header?
# Returns +true+ if +data+ matches a field in this row, and +false+
# Yields each pair of the row as header and field tuples (much like
# iterating over a Hash).
# Support for Enumerable.