挑选名词

作者:Dave Taylor

一位读者给我写了一封信(真是令人高兴!),虽然我仍然不太确定她想做什么,但这仍然是一个有趣的难题,值得尝试解决。以下是她的问题:

我不会编码,但我有一个项目想法,有点像疯狂填词,但用于解梦。我希望人们能够输入一个梦境,然后计算机程序会挑选出名词,并要求参与者自由联想他们作为该对象或人想到的任何事物。然后,计算机将把输入的回复替换回输入的文本中,以进行超现实的解读。您认为这会很难创建吗?

疯狂填词 用于梦境?这当然是一个有趣的想法,特别是考虑到梦境的元素通常看起来是多么的随机和脱节。梦境一直被视为来自神灵的启示以及我们潜意识的游乐场,以及它解决我们日常经历的需求。然后是弗洛伊德,他非常肯定,如果你没有真的梦到雪茄,那是因为你嫉妒拥有雪茄的人,或者因为你迷恋雪茄但压抑了你的兴趣。

哦,好吧。不要雪茄,好吗?也不要莱温斯基的笑话。

我们需要完成这项任务的是一个脚本,它可以解析输入,识别并创建一个名词列表,提示用户为每个名词提供自由联想的同义词,然后再次输出原始文本,将每个原始名词替换为用户建议的替代词。首先,你如何识别名词?

首先,我们要消灭所有名词

我本来打算从普林斯顿大学的 Wordnet 程序中获取综合词典,但仔细检查后发现它有超过 85,000 个单词,并且有各种晦涩的替代用法等等。最终结果是,虽然它很全面,但它产生了太多的误报。因此,Desi Quintans 提供了一个简单的仅单词列表,您可以抓取以用于我们的目的:http://www.desiquintans.com/downloads/nounlist.txt

它的格式也完全符合需要


$ head nounlist.txt
aardvark
abyssinian
accelerator
accordion
account
accountant
acknowledgment
acoustic
acrylic
act

看起来这似乎是最困难的一步,但事实上,考虑到互联网几乎无限的数据存储,这出奇地容易。

在散文中识别名词

下一步相当容易:给定一些散文,将其分解为单独的单词,然后测试每个单词以识别哪些是名词。这实际上是程序的大部分,现在我们有了一个名词列表


for word in $( sed 's/[[:punct:]]//g' $dream | 
 ↪tr '[A-Z]' '[a-z]' | tr ' ' '\n')
do
  # is the word a noun? Let's look!
  if [ ! -z "$(grep -E "^${word}$" $nounlist)" ] ; then
    nouns="$nouns $word"
  fi
done

for 循环有点复杂,但它正在删除输入中的所有标点符号,将大写转换为小写,然后将每个空格转换为回车符。结果可以通过示例最容易地展示。假设我们有以下输入


I've never seen a blue chipmunk!

通过 sed | tr | tr 过滤器运行它会产生以下结果


ive
never
seen
a
blue
chipmunk

这很容易,现在我们可以从输入中分离出每个单词,很容易搜索名词列表以查看是否有任何匹配项。同样,这有点复杂,因为我们需要确保我们没有获得嵌入式匹配项(例如,将名词“acoustic”与俚语“stic”匹配)。

这是通过rooting搜索作为正则表达式来完成的:^ 在行的开头,而 $ 是行的结尾——因此正则表达式为 ^${word}$,其中可选的 {} 表示法只是分隔变量名称对 shell 来说是什么。

包含一些调试代码后,这是我们整个脚本的第一个草稿


#!/bin/sh

# dreamer - script to help interpret dreams. does this 
#    by asking users to describe their most recent 
#    dream, then prompts them to free associate
#    words for each of the nouns in their original description.

nounlist="nounlist.txt"
dream="/tmp/dreamer.$$"

input=""; nouns=""

trap "/bin/rm -f $dream" 0      # no tempfile left behind

echo "Welcome to Dreamer. To start, please describe in a 
 ↪few sentences the dream"
echo "you'd like to explore. End with "DONE" in all caps 
 ↪on its own line."

until [ "$input" = "DONE" -o "$input" = "done" ]
do
  echo "$input" >> $dream
  read input    # let's read another line from the user...
done

echo ""
echo "Okay. To confirm, your dream was about:"

cat $dream

echo "=============="

for word in $( sed 's/[[:punct:]]//g' $dream | tr '[A-Z]' 
 ↪'[a-z]' | tr ' ' '\n')
do
  # is the word a noun? Let's look!
  if [ ! -z "$(grep -E "^${word}$" $nounlist)" ] ; then
    nouns="$nouns $word"
  fi
done

echo "Hmm.... okay. I have identified the following 
 ↪words as nouns:"
echo "$nouns"

echo "Are you ready to do some free association? Let's begin..."

for word in $nouns
do
  echo "What comes to mind when I say $word?"
done

exit 0

它实际上被分解为简单的功能块:首先提示用户分享他们的梦境,然后将散文分解为单独的单词,并将它们与名词列表进行比较,最后(虽然还不是最终形式),提示对每个识别出的名词进行自由联想。

让我们运行它看看我的意思


$ sh dreamer.sh
Welcome to Dreamer. To start, please describe in a few 
sentences the dream you'd like to explore. End with DONE 
in all caps on its own line.
I was sitting in a tree house in the middle of an ancient 
forest and an owl was staring at me. It asked "who?" and 
I woke up in a cold sweat.
DONE

Okay. To confirm, your dream was about:

I was sitting in a tree house in the middle of an ancient 
forest and an owl was staring at me. It asked "who?" and 
I woke up in a cold sweat.
==============
Hmm.... okay. I have identified the following words as nouns:
 tree house middle forest owl cold
Are you ready to do some free association? Let's begin...
What comes to mind when I say tree?
What comes to mind when I say house?
What comes to mind when I say middle?
What comes to mind when I say forest?
What comes to mind when I say owl?
What comes to mind when I say cold?

正如立即显而易见的那样,末尾的自由联想部分以及随后用新的自由联想词或短语重新组装散文仍在进行中。

但这将是下个月的项目。与此同时,保留一份梦境日记,很快你就可以借助 Linux shell 来解读它了——或者类似的东西!

Dave Taylor 长期以来一直在 UNIX 和 Linux 系统上编写 shell 脚本。他是 Learning Unix for Mac OS XWicked Cool Shell Scripts 的作者。您可以在 Twitter 上找到他 @DaveTaylor,您可以通过他的技术问答网站联系他:Ask Dave Taylor

加载 Disqus 评论