Specify Two Delimeters When Reading in a Csv File Pandas

This blog was published as a office of Data Scientific discipline Blogathon seven

          import pandas equally pd        

Every Data Analysis project requires a dataset. These datasets are bachelor in a diverse file formats such as .xlsx, .json, .csv, .html. Conventionally, datasets are mostly institute in .csv format. CSV (or Comma Separated Values) files, equally the proper name suggests, have data items separated by commas. CSV files are obviously text files that are lighter in file size. Too, CSV files can be viewed and saved in tabular course in popular tools such equally Microsoft Excel and Google Sheets.

The commas used in CSV files are known every bit delimiters. Retrieve of delimiters as a separating boundary which distinguishes betwixt any two subsequent data item.

Reading CSV Files using Pandas

To read these CSV files, we use a part of the Pandas library called read_csv().

          df = pd.read_csv()        

The read_csv() role has tens of parameters out of which one is mandatory and others are optional to utilise on an ad hoc basis. This mandatory parameter specifies the CSV file we desire to read. For example,

df = pd.read_csv("C:\Users\Rahul\Desktop\abc.csv")

Note: Recollect to use double backward slashes while specifying the file path.

read file separators
abc.csv file

(Source – Personal Computer)

The sep Parameter

One of the optional parameters in read_csv() is sep, a shortened name for separator. This operator is the delimiter we talked about earlier. This sep parameter tells the interpreter, which delimiter is used in our dataset or in Layman'southward term, how the data items are separated in our CSV file.

The default value of the sep parameter is the comma (,) which ways if we don't specify the sep parameter in our read_csv() office, information technology is understood that our file is using comma equally the delimiter. Thus, in our previous code snippet, we did not specify the sep parameter, it was understood that our file has comma every bit delimiters.

Using Other Delimiters

Often it may happen, the dataset in .csv file format has data items separated by a delimiter other than a comma. This includes semicolon, colon, tab space, vertical bars, etc. In such cases, we need to utilize the sep parameter within the read.csv() function. For example, a file named Example.csv is a semicolon-separated CSV file.

Using Other Delimiters
Case.csv File

(Source – Personal Computer)

df = pd.read_csv("C:\Users\Rahul\Desktop\Example.csv", sep = ';')

On executing this code, we get a dataframe nameddf:

dataframe Delimiters
Dataframe df

(Source – Personal Computer)

Vertical-bar Separator

Thus, a vertical bar delimited file tin be read by:

df = pd.read_csv("C:\Users\Rahul\Desktop\Case.csv", sep = '|')

Colon Separator

And a colon-delimited file can be read by:

df = pd.read_csv("C:\Users\Rahul\Desktop\Example.csv", sep = ':')

Tab Separator

Often nosotros may come across the datasets having file format .tsv. These .tsv files have tab-separated values in them or we can say it has tab space as delimiter. Such files can be read using the same .read_csv() office of pandas and nosotros need to specify the delimiter. For example:

df = pd.read_csv("C:\Users\Rahul\Desktop\Case.tsv", sep = 't')        

Similarly, other separators can be used based on identified delimiter from our data.

Conclusion

It is ever useful to bank check how our information is being stored in our dataset. Understanding the data is necessary earlier starting working over it. A delimiter tin can exist identified effortlessly past checking the data. Based on our inspection, we tin can employ the relevant delimiter in the sep parameter.

The media shown in this article are non owned past Analytics Vidhya and is used at the Writer'southward discretion.

hugheswelverepose.blogspot.com

Source: https://www.analyticsvidhya.com/blog/2021/04/delimiters-in-pandas-read_csv-function/

0 Response to "Specify Two Delimeters When Reading in a Csv File Pandas"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel