xsv—a fast CSV cli toolkit written in Rust
In recent years I've become an avid user of csvkit, a suite of command-line tools for converting to and working with CSV—which for me has become the king of tabular file formats.
Then, just recently, I've learned about xsv…
xsv is a command line program for indexing, slicing, analyzing, splitting and joining CSV files. Commands should be simple, fast and composable:
- Simple tasks should be easy.
- Performance trade offs should be exposed in the CLI interface.
- Composition should not come at the expense of performance.
And I must say, I really like the simplicity of it's commands and the balanced feature set they provide:
- cat – Concatenate CSV files by row or by column.
- count – Count the rows in a CSV file. (Instantaneous with an index.)
- fixlengths – Force a CSV file to have same-length records by either padding or truncating them.
- flatten – A flattened view of CSV records. Useful for viewing one record
at a time. e.g.,
xsv slice -i 5 data.csv | xsv flatten
. - fmt – Reformat CSV data with different delimiters, record terminators or quoting rules. (Supports ASCII delimited data.)
- frequency – Build frequency tables of each column in CSV data. (Uses parallelism to go faster if an index is present.)
- headers – Show the headers of CSV data. Or show the intersection of all headers between many CSV files.
- index – Create an index for a CSV file. This is very quick and provides constant time indexing into the CSV file.
- input – Read CSV data with exotic quoting/escaping rules.
- join – Inner, outer and cross joins. Uses a simple hash index to make it fast.
- partition – Partition CSV data based on a column value.
- sample – Randomly draw rows from CSV data using reservoir sampling (i.e., use memory proportional to the size of the sample).
- reverse – Reverse order of rows in CSV data.
- search – Run a regex over CSV data. Applies the regex to each field individually and shows only matching rows.
- select – Select or re-order columns from CSV data.
- slice – Slice rows from any part of a CSV file. When an index is present, this only has to parse the rows in the slice (instead of all rows leading up to the start of the slice).
- sort – Sort CSV data.
- split – Split one CSV file into many CSV files of N chunks.
- stats – Show basic types and statistics of each column in the CSV file. (i.e., mean, standard deviation, median, range, etc.)
- table – Show aligned output of any CSV data using elastic tabstops.