"Many types of textsthough not all of themare so monotonous on the language level that they can be produced automatically without any very complicated cognitive Artificial Intelligence models. A meticulous corpus analysis can reveal how similar elements and rules are used over and over again: one then only needs to reproduce them with a computer. This procedure will be demonstrated and discussed using television news..."Right on!
Ulrich Schmitz, Automatic generation of texts without using cognitive models: television news, in Susan Hockey & Nancy Ide (eds.): Research in Humanities Computing 2. Oxford: Clarendon Press 1994, pg 186-194.
"...[Our computer] program is based on an empirical analysis of news bulletins which have actually already been broadcast; it could, however, keep on generating texts which could be broadcast right into the next century. This is due to the fact that news texts acquire their topicality solely from the random mixture of constant elements belonging to the "Symbolfeld" and the variables belonging to the "Zeigfeld" valid at the time..."I love this guy! That's "Zeigfeld" rhyming with "Seinfeld" (not the follies). zeigen: a german transitive verb meaning "to show."
"...the concept presented here for a computer simulation of German television news can be perceived both as a project for the entire setting up of a news machine and as a criticism of the hollow simplicity of the usual news bulletins. In other (polemic) words: artificial intelligenceof a particularly simple kindcould replace natural stupidity or could show it up. The political conclusions of the "dequalification of knowledge leading to the fact that people are informed rather than being able to discuss issues" (Schmidt 1986) have not yet been drawn...I *really* love this guy!!
Here's a copy of the whole paper: [Schmitz 1994].
February 2002 Notes
So. To get started, I used 1990 US Census data to web-whack together random people random places, and random company names.
Here's a description and local copy the US Census name data I used for the random people [HTML].
Here are some interesting notes from the Census bureau on their methods [HTML].
Earlier Notes
(August 2001) I typed in some notes on fictive news story generation [PDF].
(September 2000) I formed the unlikely conception that people would actually buy a consumer device that spit out totally random stuff, and started thinking about the business side of it, as if there were any [PDF].
The thing is, *I* would buy a device like this, instantly. But other people?
Jan 2002: Random crimes
Crime reporting seems like a natural place to start. How to generate a random crime?
The U.S. Department of Justice and the FBI collect and publish "Uniform Crime Reporting" statistics that follow data formats outlined in a 135 page document "National Incident-Based Reporting System (NIBRS): Volume I: Data Collection Guidelines" [PDF].
This FBI stuff lets you select a crime at random according to appropriate observed distributions, and also fill out some relevant details on victims, weapons, circumstances, etc.
Once the "crime" is known, it remains to fill out the news story by surrounding it (at random) with some salient features, often driven (I'm thinking) by the progress of the case in the judicial system.
Feb 2002: Data
Some word lists [directory]
An online Plain Text English dictionary [turned off the link to kill robots]
Web resources
List started 20 March 2002
SIGGEN Resources in text generation.
CLINT, a Template/Word-based Text Generator. [local HTML]
CLAWS7 Tag Set, part of a part-of-speech tagging system [web]
Grady Ward's Moby, a lexicon project including parts of speech databases.
2 April 2002
Large US Cities & Automatically Constructed Geographic Phrases
64 cities have more than 250,000 people.
200 cities have more than 100,000 people.
555 cities have more than 50,000 people.
I also found a handy program perl program dist_pl that computes great circle distances and direction data from lat/long pairs such are found in the US census data I'm using. [My notes].
I intend to use this data to form geographical fragments such as:
...in Leland, a rural Iowa town...
[because it can be calculated to be near no large US city]
... from Burlingame, a suburb south of San Francisco...
[nearness to large US city; direction from lat/long]
3 April 2002
Anaphoric & Cataphoric
Main Entry: an�a�phor�ic Pronunciation: "a-n&-'for-ik, -'f�r- Function: adjective Date: 1904
Of or relating to anaphora; especially : being a word or phrase that takes its reference from another word or phrase and especially from a preceding word or phrase -- compare CATAPHORIC.
Main Entry: cat�a�phor�ic Pronunciation: "ka-t&-'for-ik Function: adjective Date: 1968
Of or relating to cataphora; especially : being a word or phrase (as a pronoun) that takes its reference from a following word or phrase (as in: before her Jane saw nothing but desert) -- compare ANAPHORIC
3 April 2002
The Prison Escape
I've done some experiments putting together a runtime system for the automatic generation of news stories about prison escapes.
I used a short initial fragment of this actual Associated Press story as a model:
GUTHRIE, Okla. -- Four prisoners broke out of a county jail Wednesday by smashing through a ceiling and an inner wall and escaping through an air conditioning duct.
They then climbed over a 10-foot fence topped with razor wire, Sheriff Randy Richardson said. They may have fled in a minivan discovered stolen Wednesday morning.
A shoe and a piece of torn clothing were found near the fence.
Richardson said bars cover the air conditioning vents at the Logan County Jail, but once the prisoners broke through the ceiling and the inner wall, they were able to get behind those bars.
The jail, located near a historic district of antique shops, is more than 100 years old.
The escaped convicts were identified as Timothy Glenn Garner, 20; Phillip Dean Hancock, 38; Dedrick Max Bloss, 22, and Tanner Michael James, 22.
Garner was being held on charges arising from a domestic dispute and an assault on a police officer. Hancock faced drug charges. Bloss and James had transferred to the jail from other facilities, and the sheriff wasn't immediately sure what charges they were facing.
They were all in the same cell at the time of the early morning escape. They were discovered missing during a 6 a.m. head count.
Richardson said sanding and repainting work is under way at the jail and may have covered up any noise made in the escape.
23 September 2002
Invented Citizens
Civil servants have "made up" personal details for at least 1 million people and added them to the results of the 2001 census...(read more)
8 Nov 2002
Plot units
http://www.geocities.com/bayinnaung/progexampweblog.html:
Plot units include success, failure, motivation, change of mind, perserverance, loss, resolution, trade-off, mixed blessing, hidden blessing, sacrifice, killing two birds with one stone, fleeting success, starting over, giving up, intentional problem resolution, fortuitous problem resolution, success born of adversity, threat, and promise.
12 November 2002
Fatal Car Crash Model
Onondaga County Sheriff Kevin E. Walsh said Friday that deputies are investigating a two-vehilce crash that claimed the life of a Fulton teen-ager.
At approximately 8:33 a.m. sheriff's deputies responded to Henry Clay Boulevard, about one mile south of Route 31, to investigate a head-on collision.
Upon their arrival, deputies and rescue personnel discovered a 1999 Pontiac Grand Am operated by Ashley Spear, 19, of 105 Honey Hill Road, Fulton, had crashed head-on with a tractor-trailer operated by Philip Keller III, 53, of Lysander Road, Baldwinsville.
Investigators report that Spear was traveling south on Henry Clay Boulevard and apparently crossed the double solid line before crashing head-on with Keller's northbound tractor-trailer.
Witnesses told deputies that Keller tried to avoid the crash and that his truck skidded onto the shoulder of the roadway.
The impact of the crash caused immense damage to the car and forced paramedics to pronounce Spear dead at the scene.
Members of the Sheriff's Accident Investigation Team spent most of the morning at the scene of the crash and will continue with their investigation.
Henry Clay Boulevard, between Route 31 and Waterhouse Road, was reopened to traffic by 12:45 p.m.
No other vehicles were involved in the crash, and no one else was injured, deputies said. Keller was visibly shaken, according to Sheriff Walsh.
Keller was hauling an empty trailer at the time of the accident.
No tickets are expected to be issued, according to the sheriff.
23 July 2004
The Context Free Press
I've returned to this idea (after a break of about two years), and have started to put together a fictive news service, the Context Free Press.