Logo von Mathlab.de
Informatik

Programmieren und Statistik mit "R"

 Letzte Änderung: 23.12.2003 
 

The R Project for Statistical Computing


externer Link The R Project

1992 wurde das Projekt ins Leben gerufen und im Februar 2000 als Gnu S veröffentlicht.
Davor war nur die kommerzielle Version externer Link  S-Plus verfügbar. Gnu S beinhaltet neben den ursprünglichen Elementen von S eine Reihe von Erweiterungen, um Entwicklungen in der Statistik angemessen zu berücksichtigen. Parallel dazu gibt es Weiterentwicklungen der S-Sprache. Zwei Aspekte sind in S nur unzureichend berücksichtigt: der interaktive Zugriff und die Einbettung in eine vernetzte Umgebung.
Diese und weitere Aspekte sind Bestandteil von Omegahat – eines Versuchs, ein System der nächsten Generation zu entwickeln, das auf den Erfahrungen mit S aufbaut. Diese experimentellen Arbeiten werden unter externer Link Omegahat bereitgestellt. S bietet einfache Möglichkeiten, Prozeduren aus anderen Sprachen wie C und Fortran aufzurufen. Dort setzt Omegahat an und erweitert diese Möglichkeiten, wie z.B. mit einem direkten Zugang zu Java, Perl ...
Für den April 2004 ist die Version 2.0 geplant.

 Befehle:

Funktion Befehl
 Beenden  q()  
 Eigene Funktionen  name <- function(arg1, arg2,..) Ausdruck    Beispiel: xpy <- function(x,y) x+y
 FOR-Schleife  for (i in Objekt) Ausdruck  
 Hilfefunktion   help(help), help(q),     "help.start()" zeigt Hilfe im Browser an  
 Source-Code laden  source("dateiname.R")  
 WHILE-Schleife  while (Bedingung) Anweisung  
 
 

Frequently Asked Questions (FAQ)


Wie erzeugt man Nullvektoren?

Wie erzeugt man Eineitsmatrizen?
diag(n) erzeugt eine Einheitsmatrix der Dimension (n,n)


Getting Started with R


One way is editing all the R commands into a file first, which can be done with any word processor, and then processing the whole file at once in R. For example, create a file with pico and write R codes in this file. Save the file, and then run it in an R session using the commands as follows.

> source(file="filename.s")


To save a R session, do the following.


> sink("output")

...

> sink()


This will save everything in between the two sink commands.


Seeking Help



To get help on any command, say, survfit, for example, enter


> help(scan)


Or


> ?scan


To get help on a specific topic, start the help facility and then use mouse to explore the help windows.


To get help from other S-PLUS or R experts, S-news is a very active news group specially designated for discussing all kinds of S-PLUS or R related questions. To subscribe, send e-mail to


s-news-request@lists.biostat.wustl.edu.


with the body of the message “subscribe s-news”.


The following website contains excellent help information of R, including several useful manuals, and frequently asked questions about R.


http://stat.ethz.ch/R-alpha/




Preliminaries


There are seven basic types of data objects in S-PLUS: vector, matrix, array, list, factor, time series, and data frame. And they could assume values of different modes including numerical, logical, character, complex etc.


To list all the objects in the current working directory

> ls()

# Or

> objects()


Note that everything behind # in a line will be treated as comments and not be evaluated.



Creating Objests: There are a lot of ways to create objects in SPLUS. For creating the most common object type, namely, vectors, some useful functions are listed below.


Table 1.1: Useful functions for creating vectors.


Function description Examples


Scan read values in any mode scan(), scan(“data”)

c combines values in any mode c(1,3,2,6), c(“yes”,”no”)

rep repeat values in any mode rep(NA,5), rep(c(1,2),3)

: numeric sequences 1:5, 1:-1

seq numeric sequences seq(-pi,pi,.5)

vector initialize vectors vector(‘complex’,5)

logical initialize logical vectors logical(3)

numeric initialize numeric vectors numeric(4)

character initialize character vectors character(6)


# Examples about two different uses of rep

> x <- rep(1:3, 3); x

[1] 1 2 3 1 2 3 1 2 3


> x <- rep(1:3, 1:3); x

[1] 1 2 2 3 3 3


# To see the help file on rep, use

> ?rep



If you have a longer data set, you might want to use the scan function to enter your data. Below is an example of the scan function.


> x1 <- scan()

1: 1 4 2 8 5 7

7: 9 7 4 5 7 4

13:


The variable x1 will have 12 elements. Enter a blank line at the prompt to tell S+ that you are done entering data for the variable.



Operating on Objects: An operation on a subject could be getting its attributes, making corrections, subsetting or subscripting, or algebraic operation, merging variables and deleting.

> mode(x)

> length(x)


> mat <- rep(1:4,rep(3,4)); mat

> dim(mat) <- c(3,4); mat


Make Corrections

> mat[2,3] <- 5.6; mat

If you have many changes to make, you can use the fix command.

> x <- fix(x)

This will allow you to edit all the contents of mat, using Notepad.




Subsetting<


> mat[2:3, 3:4]

> mat[mat[,2] <= 1.5,1:3]


Merging Variables by Rows or Columns

# cbind and rbind

> x1 <- rnorm(100, 1, 2.5)

> x2 <- sample(x=c(1,2,3), 100, replace=T)

> x3 <- cbind(x1, x2); x3

> x4 <- rbind(x1, x2); x4


If you have an object, say x7, that you no longer need, you can delete it using rm commands.


> rm(x7)


Same as in any other programming languages, how to avoid unnecessary loops is always an important concern. SPLUS has a lot of powerful built-in functions that could help things out if well used. The function apply is one of them. Basically, it returns a vector or array by applying a specified function to sections of an array. For example,


> x <- apply(matrix(rnorm(5000, 4, 3), 100, 50), MARGIN=2, mean)


The option MARGIN=2 tells SPLUS to compute means of x by columns instead of by rows. See what happen if MARGIN =1. After all, a matrix is a two-dimensional array.


align="justify">

Distributions: Splus had implemented almost all the common distributions. It is handy to get the density, probability, quantile and random numbers from a specific distribution in Splus.


# Example 1; Normal Distribution


> z <- qnorm(seq(.001, .999, len = 100), mean=2, sd=1 )

# compute a vector of quantiles

> y <- dnorm(z, mean=2, sd=1)

# density (dnorm), probability (pnorm), quantile (qnorm), or random

# sample (rnorm) for the Normal distribution with two parameters, mean and sd.

> plot(z,y, type="l", ylab=”Weibull Density”)


# Example 2: Exponential

> values <- seq(0.0001, 6, length=200);

> bvals <- values[values>qexp(.95)]

> plot(values, dexp(values), type="l")

> polygon(c(qexp(.95), bvals, 6), c(0, dexp(bvals), 0))

> abline(h=0); abline(v=0) align="justify">
align="justify">

Import and Export: The scan function, which can read from either standard input or from a file, is commonly used to read data from keyboard input.


> x <- matrix(scan("filename"), ncol = 10, byrow = T)


If you have a text file with data arranged in the form of a table, you can read it

into S-PLUS as a data frame using the read.table function. align="justify">

> auto <- read.table(’auto.dat’,header=T) align="justify">

When you want to export data to share with another S-PLUS user, use the data.dump or dumpfunction:


> dump("matz", connection="matz.dmp") align="justify">

To bring it back to SPLUS session, run


> source(“matz.dmp”) align="justify">

The inverse operation to the scan function is provided by the cat and write functions. Similarly, the inverse operation to read.table is provided by write.table. align="justify">
align="justify">

Writing Functions in Splus: Programming in S-PLUS consists largely of writing functions. Functions do most of the work in S-PLUS. The simplest functions arise naturally as shorthand for frequently-used combinations of S-PLUS expressions. For example, S-PLUS has no built-in function for calculating the standard deviation of a data set. It does, however, have a function for calculating the variance and another for calculating square roots. The standard deviation is simply the square root of the variance, so a standard deviation function can be created as follows:


> stdev <- function(x){ sqrt(var(x)) }


You can build more complicated functions either by adding new features incrementally onto simpler functions, or by designing whole programs from scratch. As your functions grow more complex, proper use of programming features such as conditionals and error handling becomes more important. align="justify">

Example: align="justify">

Over fifty datasets are supplied with R, and others are available in packages (including

the standard packages supplied with R). These datasets have to be loaded explicitly, using the function data. To see the list of datasets in the base system use data() and to load one of these use, for example, align="justify">

> data(women)

> women

> attach(women,pos=1)

> mean(height)

> sd(height)

> hist(height)

> hist(height, nclass=10)

> stem(weight) align="justify">

The decimal point is 1 digit(s) to the right of the | align="justify">

11 | 57

12 | 0369

13 | 259

14 | 26

15 | 049

16 | 4 align="justify">
align="justify">
align="justify">




ANHANG: Ausdrücke in R



A. Literale

	      number		
1 1.1 1.1e10
string
ring' or "s
g"
name
comm
# string.
function (formals) expr

n(args){defn}



B. Calls

	
expr infix expr

r %anything% expr
unary expr

expr ( arglist )

expr [ arglist ]

xpr [[ arglist ]]
expr $ fname



C. Zuweisungen

	
expr <- exp
> expr_expr

expr -> expr
expr <<- expr
ces write to disk
f

in a function



D. IF-Bedingugen

	 
if ( expr ) expr
if (

xpr else expr



E. Iterationen

repeat expr

le ( expr ) expr
for ( Nam

r ) expr

F. Flow


	     
ak
next

return ( expr
r> ( expr )
{ exprlist }



II. Ari
tische Operatonen







ultiplikation
+ Addition
<

Subtraktion
/ Division



xponentiation

%% Remaind

dulo operator

%*% Matrix mu

tion operator



nteger divide

%c% Kreuzprodukt m1 %

t(m1) %*% m2





odukt





III. Vergleichsoperationen





Not-equal-to

Less-than

<=

n-or-equal-t

> == Equal



Greater-than

>= Gre


-equal-to




IV.
ische Operationen





!
an>, Negation

| OR (Verwendung mit

der Matrizen)

|| Shortcut Or (Don't use wit

or matrices)

& AND (Verwendung mit

der Matrizen)

&& Shortcut And (Don't use wit



ices)





V. Schreibweisen







tor subscript

[[ ]] list subscrip
can only identify
ingle element

$ Named component



list





V. SUBSCRIPT FORMS



logical extracts o

s T component

positive numbers extracts or sele

ified indices

negative numbers dele

ified indices

NA or out of range extends dime




A






VI. Folg
nd Wiederholungen



seq (from, t
y, length, along)
als

as in 1:10

rep(x,

ength)



VII. Arithmetische Opera
en und Funktionen


<

abs
br> acos(
r> acosh
br> asin(
r> asinh
br> atan(x)
atan(x,
r> atanh(x)
ceilin

cos
br> cos

exp(
r> floor(
r> gamma(x
> lgamma(x)

g(x, base=exp(1
r> log10(x)
max(...
elementwise
min(...
elementwise
pmax(.
parallel
pmin(
para

sin
br> sinh
br> sqr

tan
br> tanh(


trunc(x)




VIII. Typen



Can be used in as.<type> and is.<type> and <typ
;(length=n c
.
array
categor
is, as onl
> charac
br> com

dou
br> i
er
l
br> log

matrix
nul
is, as o

numeric



IX.
nbearbeitung in R



A. Data In

scan(file="", wh
umeric(), n, sep,
multi.line = F, f

, append = F)

Example: data <- matrix(scan("data.f

ol=5,byrow=T)



mmand File In

so

e, local = F)

C.

utput to File

sink(file)
sink( ) rest
o
t to screen

D. W

Read Objects
dput(x, file)

dget(file)

write(t(matrix),file,ncol=ncol(

append=FALSE)

dump(list
leout="dumpdata")
restore(file)

E. Make Things (Including Help) Ava

r Unavailable

assign("name",

frame, where)


tach(file, pos=2)


etach(what=2)

library( )


help=section)

library(
ion, first=FALSE)
library

ection, file)

help(na
help", offline=F)



e="help")




X. REDUCTION OPERATORS


r> all(..
r> any(...
> length(
r> max(...)

mean(x, trim=0
> median(
r> min(.
br> mode(x
> prod(...)
quantile(x, pro
(0,.25,.5,.75,1
r> sum(..
r> var(x,y)



y,trim=0)




XI. STATISTICAL DISTRIBUTIONS



d<dist>(x,<parameters

density at x

p<dist>(x,<parameters>) cum

distn fn to x

q<dist>(p,<parameter

inverse cdf

r<dist>(n,<parameters>) generates n random









+--------------------------------------------
----------------+
|+-------------------------------------------
---------------+|
|| <dist> Distribution Parameter
Defaults ||
|+-------------------------------------------
---------------+|
|| beta beta shape1, sha
-, - ||
|| cauchy Cauchy loc, scale
0, 1 ||
|| chisq chi-square df
- ||
|| exp exponential -
- ||
|| f F df1, df2
-, - ||
|| gamma Gamma shape
- ||
|| lnorm log-normal mean, sd (o
g) 0, 1 ||
|| logis logistic loc, scale
0, 1 ||
|| norm normal mean, sd
0, 1 ||
|| stab stable index, skew
-, 0 ||
|| t Student's t df
- ||
|| unif uniform min, max
0, 1 ||
||
||
|+-------------------------------------------
---------------+|
+--------------------------------------------


--------+




XII.
ische Darstellung



A. Starting and Stopping Plotting



<device-speci
tion function>


raphics.off()



B. Device-Specification Functions



h

k=F, file="")

hpgl(width=10, hei
7.25, ask=!auto,
auto=F, color=2, speed=400

d=F, file="")

postscript(file, command,
izontal=F, width,
height, rasters,
tsize=14, font=1,
preamble=ps.prea

nts=ps.fonts)

printer(width=80, height=64,
e="", command

show()



(ask=F, file)

sun(as

color=FALSE)



C. Plot Parameter



log='<x|y|xy>'

arithmic axes

main='title'

new=<logical> T forces add

current plot



bottom title'

type='<l|p|b|n>' Lin
oints, both, none
lty=n
Line type
pch='.'

lot character


ab='x-axis label'


y-axis label'

xlim=c
.value,xhi.value)
ylim=c


hi.value)




D. Eindimensionale Plots



barplot(hei
#simple form
barplot(height, width, names, s
=.2, inside=TRUE,
beside=FALSE, horiz=
E, legend, angle,
densi

blocks=TRUE)

boxplot(..., range, w
, varwidth=FALSE,
notch=FAL

s, plot=TRUE)

hist(x, nclass, brea
plot=TRUE, angle,


col, inside)



E. Zweidimensionale Plots






s(x, y, type="l")
po

y, type="p"))

matplot(x, y, type="p", lt
5, pch=, col=1:4)
matpoints(x, y, type="p", lt
5, pch=, col=1:4)
matlines(x, y, type="l", lt

ch=, col=1:4)

plot(x,

="p", log="")
abline(coef) abline(a, b)<
abline(reg)
abline(h=)

abline(v=)

qq
(x, y, plot=TRUE)
qqnorm(x, da

E, plot=TRUE)



F. Dreimensionale Plots



contour(x, y, z, v, nint

FALSE, labex)

interp(x, y, z, xo, yo,

extrap=FALSE)

persp(z,

,-8,5), ar=1)



G. Multiple Plots Pro Seite (Beispiel)



par(mfrow=(nrow, ncol
ma=c(0, 0, 4, 0))
mtext(side=3, li
, cex=2, outer=T,
"This is an Overall
le For the Page")


Fortgeschrittenes Programmieren:

Eigene Operatoren
Angenommen man möchte dem §-Zeichen eine ganz spezielle Funktion zuweisen, dann geht dies mit  "%§%"<-function(X,y){ }. Die Verwendung des Operators
olgt dann mit %§%

 
 
zurück Übersicht  hoch