# frozen_string_literal: true
# = csv.rb -- CSV Reading and Writing
# Created by James Edward Gray II on 2005-10-31.
# See CSV for documentation.
# Welcome to the new and improved CSV.
# This version of the CSV library began its life as FasterCSV. FasterCSV was
# intended as a replacement to Ruby's then standard CSV library. It was
# designed to address concerns users of that library had and it had three
# 1. Be significantly faster than CSV while remaining a pure Ruby library.
# 2. Use a smaller and easier to maintain code base. (FasterCSV eventually
# grew larger, was also but considerably richer in features. The parsing
# core remains quite small.)
# 3. Improve on the CSV interface.
# Obviously, the last one is subjective. I did try to defer to the original
# interface whenever I didn't have a compelling reason to change it though, so
# hopefully this won't be too radically different.
# We must have met our goals because FasterCSV was renamed to CSV and replaced
# the original library as of Ruby 1.9. If you are migrating code from 1.8 or
# earlier, you may have to change your code to comply with the new interface.
# == What's the Different From the Old CSV?
# I'm sure I'll miss something, but I'll try to mention most of the major
# differences I am aware of, to help others quickly get up to speed:
# * This parser is m17n aware. See CSV for full details.
# * This library has a stricter parser and will throw MalformedCSVErrors on
# * This library has a less liberal idea of a line ending than CSV. What you
# set as the <tt>:row_sep</tt> is law. It can auto-detect your line endings
# * The old library returned empty lines as <tt>[nil]</tt>. This library calls
# * This library has a much faster parser.
# * CSV now uses Hash-style parameters to set options.
# * CSV no longer has generate_row() or parse_row().
# * The old CSV's Reader and Writer classes have been dropped.
# * CSV::open() is now more like Ruby's open().
# * CSV objects now support most standard IO methods.
# * CSV now has a new() method used to wrap objects like String and IO for
# * CSV::generate() is different from the old method.
# * CSV no longer supports partial reads. It works line-by-line.
# * CSV no longer allows the instance methods to override the separators for
# performance reasons. They must be set in the constructor.
# If you use this library and find yourself missing any functionality I have
# trimmed, please {let me know}[mailto:james@grayproductions.net].
# See CSV for documentation.
# == What is CSV, really?
# CSV maintains a pretty strict definition of CSV taken directly from
# {the RFC}[http://www.ietf.org/rfc/rfc4180.txt]. I relax the rules in only one
# place and that is to make using this library easier. CSV will parse all valid
# What you don't want to do is to feed CSV invalid data. Because of the way the
# CSV format works, it's common for a parser to need to read until the end of
# the file to be sure a field is invalid. This consumes a lot of time and memory.
# Luckily, when working with invalid CSV, Ruby's built-in methods will almost
# always be superior in every way. For example, parsing non-quoted fields is as
# == Questions and/or Comments
# Feel free to email {James Edward Gray II}[mailto:james@grayproductions.net]
require_relative "csv/fields_converter"
require_relative "csv/match_p"
require_relative "csv/parser"
require_relative "csv/row"
require_relative "csv/table"
require_relative "csv/writer"
using CSV::MatchP if CSV.const_defined?(:MatchP)
# \CSV (comma-separated variables) data is a text representation of a table:
# - A _row_ _separator_ delimits table rows.
# A common row separator is the newline character <tt>"\n"</tt>.
# - A _column_ _separator_ delimits fields in a row.
# A common column separator is the comma character <tt>","</tt>.
# This \CSV \String, with row separator <tt>"\n"</tt>
# and column separator <tt>","</tt>,
# has three rows and two columns:
# "foo,0\nbar,1\nbaz,2\n"
# Despite the name \CSV, a \CSV representation can use different separators.
# For more about tables, see the Wikipedia article
# "{Table (information)}[https://en.wikipedia.org/wiki/Table_(information)]",
# "{Simple table}[https://en.wikipedia.org/wiki/Table_(information)#Simple_table]"
# Class \CSV provides methods for:
# - Parsing \CSV data from a \String object, a \File (via its file path), or an \IO object.
# - Generating \CSV data to a \String object.
# To make \CSV available:
# All examples here assume that this has been done.
# A \CSV object has dozens of instance methods that offer fine-grained control
# of parsing and generating \CSV data.
# For many needs, though, simpler approaches will do.
# This section summarizes the singleton methods in \CSV
# that allow you to parse and generate without explicitly
# For details, follow the links.
# Parsing methods commonly return either of:
# - An \Array of Arrays of Strings:
# - The outer \Array is the entire "table".
# - Each inner \Array is a row.
# - Each \String is a field.
# - A CSV::Table object. For details, see
# {\CSV with Headers}[#class-CSV-label-CSV+with+Headers].
# The input to be parsed can be a string:
# string = "foo,0\nbar,1\nbaz,2\n"
# \Method CSV.parse returns the entire \CSV data:
# CSV.parse(string) # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
# \Method CSV.parse_line returns only the first row:
# CSV.parse_line(string) # => ["foo", "0"]
# \CSV extends class \String with instance method String#parse_csv,
# which also returns only the first row:
# string.parse_csv # => ["foo", "0"]
# ==== Parsing Via a \File Path
# The input to be parsed can be in a file:
# string = "foo,0\nbar,1\nbaz,2\n"
# File.write(path, string)
# \Method CSV.read returns the entire \CSV data:
# CSV.read(path) # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
# \Method CSV.foreach iterates, passing each row to the given block:
# CSV.foreach(path) do |row|
# \Method CSV.table returns the entire \CSV data as a CSV::Table object:
# CSV.table(path) # => #<CSV::Table mode:col_or_row row_count:3>
# ==== Parsing from an Open \IO Stream
# The input to be parsed can be in an open \IO stream:
# \Method CSV.read returns the entire \CSV data:
# File.open(path) do |file|
# end # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
# As does method CSV.parse:
# File.open(path) do |file|
# end # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
# \Method CSV.parse_line returns only the first row:
# File.open(path) do |file|
# \Method CSV.foreach iterates, passing each row to the given block:
# File.open(path) do |file|
# CSV.foreach(file) do |row|
# \Method CSV.table returns the entire \CSV data as a CSV::Table object:
# File.open(path) do |file|
# end # => #<CSV::Table mode:col_or_row row_count:3>
# \Method CSV.generate returns a \String;
# this example uses method CSV#<< to append the rows
# that are to be generated:
# output_string = CSV.generate do |csv|
# output_string # => "foo,0\nbar,1\nbaz,2\n"
# \Method CSV.generate_line returns a \String containing the single row
# constructed from an \Array:
# CSV.generate_line(['foo', '0']) # => "foo,0\n"
# \CSV extends class \Array with instance method <tt>Array#to_csv</tt>,
# which forms an \Array into a \String:
# ['foo', '0'].to_csv # => "foo,0\n"
# \Method CSV.filter provides a Unix-style filter for \CSV data.
# The input data is processed to form the output data:
# in_string = "foo,0\nbar,1\nbaz,2\n"
# CSV.filter(in_string, out_string) do |row|
# out_string # => "FOO,0000\nBAR,1111\nBAZ,2222\n"
# There are three ways to create a \CSV object:
# - \Method CSV.new returns a new \CSV object.
# - \Method CSV.instance returns a new or cached \CSV object.
# - \Method \CSV() also returns a new or cached \CSV object.
# \CSV has three groups of instance methods:
# - Its own internally defined instance methods.
# - Methods included by module Enumerable.
# - Methods delegated to class IO. See below.
# For convenience, a CSV object will delegate to many methods in class IO.
# (A few have wrapper "guard code" in \CSV.) You may call:
# The default values for options are:
# # For both parsing and generating.
# unconverted_fields: nil,
# header_converters: nil,
# liberal_parsing: false,
# ==== Options for Parsing
# Options for parsing, described in detail below, include:
# - +row_sep+: Specifies the row separator; used to delimit rows.
# - +col_sep+: Specifies the column separator; used to delimit fields.
# - +quote_char+: Specifies the quote character; used to quote fields.
# - +field_size_limit+: Specifies the maximum field size allowed.
# - +converters+: Specifies the field converters to be used.
# - +unconverted_fields+: Specifies whether unconverted fields are to be available.
# - +headers+: Specifies whether data contains headers,
# or specifies the headers themselves.
# - +return_headers+: Specifies whether headers are to be returned.
# - +header_converters+: Specifies the header converters to be used.
# - +skip_blanks+: Specifies whether blanks lines are to be ignored.
# - +skip_lines+: Specifies how comments lines are to be recognized.
# - +strip+: Specifies whether leading and trailing whitespace are
# to be stripped from fields..
# - +liberal_parsing+: Specifies whether \CSV should attempt to parse
# - +nil_value+: Specifies the object that is to be substituted for each null (no-text) field.
# - +empty_value+: Specifies the object that is to be substituted for each empty field.
# :include: ../doc/csv/options/common/row_sep.rdoc
# :include: ../doc/csv/options/common/col_sep.rdoc
# :include: ../doc/csv/options/common/quote_char.rdoc
# :include: ../doc/csv/options/parsing/field_size_limit.rdoc
# :include: ../doc/csv/options/parsing/converters.rdoc
# :include: ../doc/csv/options/parsing/unconverted_fields.rdoc
# :include: ../doc/csv/options/parsing/headers.rdoc
# :include: ../doc/csv/options/parsing/return_headers.rdoc
# :include: ../doc/csv/options/parsing/header_converters.rdoc
# :include: ../doc/csv/options/parsing/skip_blanks.rdoc
# :include: ../doc/csv/options/parsing/skip_lines.rdoc
# :include: ../doc/csv/options/parsing/strip.rdoc
# :include: ../doc/csv/options/parsing/liberal_parsing.rdoc
# :include: ../doc/csv/options/parsing/nil_value.rdoc
# :include: ../doc/csv/options/parsing/empty_value.rdoc
# ==== Options for Generating
# Options for generating, described in detail below, include:
# - +row_sep+: Specifies the row separator; used to delimit rows.
# - +col_sep+: Specifies the column separator; used to delimit fields.
# - +quote_char+: Specifies the quote character; used to quote fields.
# - +write_headers+: Specifies whether headers are to be written.
# - +force_quotes+: Specifies whether each output field is to be quoted.
# - +quote_empty+: Specifies whether each empty output field is to be quoted.
# - +write_converters+: Specifies the field converters to be used in writing.
# - +write_nil_value+: Specifies the object that is to be substituted for each +nil+-valued field.
# - +write_empty_value+: Specifies the object that is to be substituted for each empty field.
# :include: ../doc/csv/options/common/row_sep.rdoc
# :include: ../doc/csv/options/common/col_sep.rdoc
# :include: ../doc/csv/options/common/quote_char.rdoc
# :include: ../doc/csv/options/generating/write_headers.rdoc
# :include: ../doc/csv/options/generating/force_quotes.rdoc
# :include: ../doc/csv/options/generating/quote_empty.rdoc
# :include: ../doc/csv/options/generating/write_converters.rdoc
# :include: ../doc/csv/options/generating/write_nil_value.rdoc
# :include: ../doc/csv/options/generating/write_empty_value.rdoc
# CSV allows to specify column names of CSV file, whether they are in data, or
# provided separately. If headers are specified, reading methods return an instance
# of CSV::Table, consisting of CSV::Row.
# # Headers are part of data
# data = CSV.parse(<<~ROWS, headers: true)
# data.class #=> CSV::Table
# data.first #=> #<CSV::Row "Name":"Bob" "Department":"Engineering" "Salary":"1000">
# data.first.to_h #=> {"Name"=>"Bob", "Department"=>"Engineering", "Salary"=>"1000"}
# # Headers provided by developer
# data = CSV.parse('Bob,Engineering,1000', headers: %i[name department salary])
# data.first #=> #<CSV::Row name:"Bob" department:"Engineering" salary:"1000">
# By default, each value (field or header) parsed by \CSV is formed into a \String.
# You can use a _field_ _converter_ or _header_ _converter_
# to intercept and modify the parsed values:
# - See {Field Converters}[#class-CSV-label-Field+Converters].
# - See {Header Converters}[#class-CSV-label-Header+Converters].
# Also by default, each value to be written during generation is written 'as-is'.
# You can use a _write_ _converter_ to modify values before writing.
# - See {Write Converters}[#class-CSV-label-Write+Converters].
# ==== Specifying \Converters
# You can specify converters for parsing or generating in the +options+
# argument to various \CSV methods:
# - Option +converters+ for converting parsed field values.
# - Option +header_converters+ for converting parsed header values.
# - Option +write_converters+ for converting values to be written (generated).
# There are three forms for specifying converters:
# - A converter proc: executable code to be used for conversion.
# - A converter name: the name of a stored converter.
# - A converter list: an array of converter procs, converter names, and converter lists.
# This converter proc, +strip_converter+, accepts a value +field+
# and returns <tt>field.strip</tt>:
# strip_converter = proc {|field| field.strip }
# In this call to <tt>CSV.parse</tt>,
# the keyword argument <tt>converters: string_converter</tt>
# - \Proc +string_converter+ is to be called for each parsed field.
# - The converter's return value is to replace the +field+ value.
# string = " foo , 0 \n bar , 1 \n baz , 2 \n"
# array = CSV.parse(string, converters: strip_converter)
# array # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
# A converter proc can receive a second argument, +field_info+,
# that contains details about the field.
# This modified +strip_converter+ displays its arguments:
# strip_converter = proc do |field, field_info|
# string = " foo , 0 \n bar , 1 \n baz , 2 \n"
# array = CSV.parse(string, converters: strip_converter)
# array # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
# [" foo ", #<struct CSV::FieldInfo index=0, line=1, header=nil>]
# [" 0 ", #<struct CSV::FieldInfo index=1, line=1, header=nil>]