So if your separator is a tab, for instance, this would work:
1mydata <- read.table("filename.txt", sep="\t", header=TRUE)
2
The command above also indicates there's a header row in the file with header=TRUE
.
If, say, your separator is a character such as | you would change the separator part of the command to sep="|".
Categories or values?
Because of R's roots as a statistical tool, when you import non-numerical data, R may assume that character strings are statistical factors -- things like "poor," "average" and "good" -- or "success" and "failure."
But your text columns may not be categories that you want to group and measure, just names of companies or employees. If you don't want your text data to be read in as factors, add stringsAsFactor=FALSE
to read.table, like this:
1mydata <- read.table("filename.txt", sep="\t", header=TRUE, stringsAsFactor=FALSE)
2
Example
For this example, we will take a text file which contains tab delimited historical price data for Goldman Sachs stock.
We can now import this data into r using the data.table()
function.
1> setwd("C:/r-programming/data/importing data")
2> getwd()
3[1] "C:/r-programming/data/importing data"
4> dir()
5[1] "GS-Stock-Prices.txt" "top-100-stocks.csv"
6> gs_data <- read.table("GS-Stock-Prices.txt", sep="\t", header=TRUE, stringsAsFactor=FALSE);
7> gs_data
8 Time Open High Low Last Volume
91 1/24/2017 231.86 236.06 230.84 233.68 4448100
102 1/23/2017 231.86 233.75 230.75 232.67 3136100
113 1/20/2017 231.62 233.23 230.54 232.20 5211800
124 1/19/2017 234.07 234.75 230.62 231.41 4561800
135 1/18/2017 236.00 237.69 231.52 234.29 7590400
146 1/17/2017 242.94 243.06 235.61 235.74 6277100
157 1/13/2017 245.43 247.77 242.91 244.30 4186000
168 1/12/2017 245.06 245.47 241.57 243.84 4022300
179 1/11/2017 242.77 245.84 242.00 245.76 3532500
1810 1/10/2017 240.87 243.44 239.05 242.57 3432900
1911 1/9/2017 243.25 244.69 241.47 242.89 3022700
2012 1/6/2017 242.29 246.20 241.37 244.90 3591000
2113 1/5/2017 242.72 243.23 236.78 241.32 3562600
2214 1/4/2017 241.44 243.32 240.03 243.13 2728700
2315 1/3/2017 242.70 244.97 237.97 241.57 4384200
2416 12/30/2016 238.51 240.50 237.40 239.45 2355500
2517 12/29/2016 240.75 241.07 236.64 238.18 2619000
2618 12/28/2016 243.69 244.50 240.44 240.65 3052900
2719 12/27/2016 241.95 242.59 240.40 241.56 1998100
28> summary(gs_data)
29 Time Open High Low
30 Length:19 Min. :231.6 Min. :233.2 Min. :230.5
31 Class :character 1st Qu.:237.3 1st Qu.:239.1 1st Qu.:233.6
32 Mode :character Median :241.9 Median :243.2 Median :238.0
33 Mean :240.0 Mean :241.7 Mean :237.3
34 3rd Qu.:242.9 3rd Qu.:244.8 3rd Qu.:240.9
35 Max. :245.4 Max. :247.8 Max. :242.9
36 Last Volume
37 Min. :231.4 Min. :1998100
38 1st Qu.:235.0 1st Qu.:3037800
39 Median :241.3 Median :3562600
40 Mean :239.5 Mean :3879668
41 3rd Qu.:243.0 3rd Qu.:4416150
42 Max. :245.8 Max. :7590400
43>
44
Point and Click Data Import
If you'd prefer, R allows you to use a series of menu clicks to load data instead of 'reading' data from the command line as just described. To do this, go to the Workspace tab of RStudio's upper-right window, find the menu option to "Import Dataset," then choose a local text file or URL.
As data are imported via menu clicks, the R command that RStudio generated from your menu clicks will appear in your console. You may want to save that data-reading command into a script file if you're using this for significant analysis work, so that others -- or you -- can reproduce that work.
Copying Data Snippets
If you've got just a small section of data already in a table, say a spreadsheet, or a Web HTML table, you can control-C copy those data to your Windows clipboard and import them into R.
The command below handles clipboard data with a header row that's separated by tabs, and stores the data in a data frame (x):
1x <- read.table(file = "clipboard", sep="\t", header=TRUE)
2
On a Mac, the pipe ("pbpaste") function will access data you've copied with command-c, so this will do the equivalent of the previous Windows command:
x <- read.table(pipe("pbpaste"), sep="\t")