NAME: Enron Employee Information TYPE: Observational Study SIZE: 156 rows, 7 variables DESCRIPTIVE ABSTRACT: In the original dataset, th emails were organized into 150 mailboxes labeled by employee name; the emails in a mailbox were not necessarily sent by that person. Additionally, some employees with similar names were binned into the same mailbox, while others had their messages split among two mailboxes. In order to circumvent such potential binning errors, we ignored the folder designation and instead extracted only From, To, and CC fields of each email message. While only one employee may appear in either the From or To fields (which is different from most current/modern email systems), an arbitrary number may appear in the CC field. We considered only senders and recipients with email addresses that have an enron.com domain name. To distinguish between the individuals, we relied on six standard aliases used at Enron (see Zhou et al. (2007) for instance). The result was 156 Enron employees whose email communication we considered. The dataset described here contains the names and employment information associated with the 156 individuals whose emails were analyzed. SOURCE: The dataset is available at https://s3.amazonaws.com/metanautix/enron/enron_ mail_20110402_csv.tgz. DATASET NAME: Enron Employee Information.csv COLUMN & VARIABLE DESCRIPTION: 1, 2: ID number as sorted alphabetically 3: Name 4: Email ID 5: Name of folder in which email was collated 6: Department where individual worked 7: Job title of individual STORY BEHIND THE DATA: One of the most infamous corporate scandals of the past few decades curiously left in its wake one of the most valuable publicly-available datasets. In late 2001, the Enron Corporation's accounting obfuscation and fraud led to the bankruptcy of the large energy company. The Federal Energy Regulatory Commission subpoenaed all of Enron's email records as part of the ensuing investigation. Over the following two years, the commission released, unreleased, and rereleased the email corpus to the public after deleting emails that contained personal information like social security numbers. The Enron corpus contains emails whose subjects range from weekend vacation planning to political strategy talking points, and it remains the only large example of real world email datasets available for research. SUBMITTED BY: J. S. Hardin Pomona College Department of Mathematics 610 North College Ave Claremont, CA, 91711 jo.Hardin@Pomona.edu G. Sarkis Pomona College Department of Mathematics 610 North College Ave Claremont, CA, 91711 ghassan.Sarkis@Pomona.edu