Use SQL To Operate R Data Frames | R-bloggers

In both research and application, we need to manipulate data frames by selecting desired columns, filtering records, transforming and aggregating data. R provides built-in functions for data frame manipulation. Suppose df is the data frame we are dealing with. We use df[1:100,] to select the first 100 rows, df[,c(«price»,»volume»)] to select price and volume columns, df[df$price >= mean(df$price),] to single out records with prices no less than their average, transform(df, totalValue=price*volume) to add a new column totalValue for each record, apply(df,2,mean) to calculate the mean of each column. However, if we want to do something more, together, the R code will be totally a mess. Say we want to sort df by a new column totalValue, which equals price times volume, and then average the price and totalValue columns for the top 20 records. The R code, if written in several lines, can be this: df$totalValue <- df$price * df$volume df.sorted <- df[order(df$totalValue,decreasing=T),] df.subset = 3000 Sorting can also be simple. Here we use ORDER BY to sort the records by totalValue in a descending way. SELECT *, price * volume AS totalValue FROM df ORDER BY totalValue DESC The code for subsetting a table is also intuitive. Here we use LIMIT to select only the top 30 records with the highest totalValue. SELECT *, price * volume AS totalValue FROM df ORDER BY totalValue DESC LIMIT 30 Note that we break the lines to make the statement clear. It works perfectly in the same way as a statement without line breaks. The power of SQL may not be very clear yet, unless we combine them together. For example, if we want to finish all the tasks in the first paragraph in one SQL statement, here it is: SELECT AVG(price), AVG(totalValue) FROM (SELECT *, price * volume AS totalValue FROM df ORDER BY totalValue DESC LIMIT 20) Here we embed a SQL statement inside another. Another example is to select the top 100 records ordered by totalValue in descending way where their prices are no less than the average price. SELECT *, price * volume AS totalValue FROM df WHERE price >= (SELECT AVG(price) FROM df) ORDER BY totalValue DESC LIMIT 100 If you are familiar with SQL, the statement above is almost as friendly as plain English, and it does not matter whether we write it in one line or in several lines. Here we separate the different clauses in the statement for greater readability. You may try to implement it only by built-in R functions and you will certainly find SQL a very powerful tool. Here I should remark that sqldf is based on SQLite memory database and provides its select functionality. Since different database engines support the standard of SQL to a different degree, we are only allowed to use the SQL-SELECT statements within the support of SQLite database engine. You may get more information here. In conclusion, SQL is a powerful tool so that R users should pick it up. And sqldf is the way we use this language with R to operate data frame in a more decent

Πηγή: Use SQL To Operate R Data Frames | R-bloggers

Primary Plotting | R-bloggers

My wife tricked me into a partial-weekend project to try to get all the primary/caucus results to-date on a map (the whole us). This is challenging since not all states use counties as boundaries for aggregate results. I’m still piecing together some shapefiles for the primary/caucus summation boundaries for a couple remaining states but I […]

Πηγή: Primary Plotting | R-bloggers

KPI dashboard in R with animated icons | R-bloggers

So Key Performance Indicators (KPIs) are all the rage in the dashboarding community… well everywhere really. The premise is simple… check a list of measurements against targets and show how they compare using some kind of visualization. I haven’t yet seen, however, a version that can utilize animated icons to display …

Πηγή: KPI dashboard in R with animated icons | R-bloggers

How I build up a ggplot2 figure | R-bloggers

Recently, Jeff Leek at Simply Statistics discussed why he does not use ggplot2. He notes “The bottom line is for production graphics, any system requires work.” and describes a default plot that needs some work: To break down what is going on, here is what R interprets (more or less): Make a container for data […]

Πηγή: How I build up a ggplot2 figure | R-bloggers