使用 Python 和 pyuno 将电子表格转换为 CSV 文件,第 1 部分 v2
几个月前,我开发了一些 pyuno 代码,用于从命令行将电子表格转换为 CSV 文件。Pyuno 是 OpenOffice 运行时的 Python 接口。我收到的增强建议之一是添加提取所有工作表和/或特定工作表的功能,而不是始终提取第一个工作表。以下代码更新正是为了实现这一目标。
为了刷新我们的记忆,转换程序接受成对的输入和输出文件,例如
$ python ssconverter.py file1.xls file1.csv file2.ods file2.csv
每个输入文件都是一个电子表格,它被转换为相应的输出文件作为 CSV 文件。新版本已得到增强,因此您可以在输入文件名末尾附加工作表名称或编号,以指定要提取的特定工作表,例如
$ python ssconverter.py file1.xls:1 file1.csv
$ python ssconverter.py file1.xls:Sheet1 file1.csv
$ python ssconverter.py file2.ods@1 file2.csv
$ python ssconverter.py file2.ods@Sheet2 file2.csv
工作表名称或编号(从 1 开始)附加到输入文件名末尾,并用冒号 (:) 或 at 符号 (@) 分隔。
此外,您可以通过在输出文件名中指定 %d 或 %s 格式说明符来一次性转换所有工作表
$ python ssconverter.py file1.xls file1-%d.csv
$ python ssconverter.py file1.xls file1-%s.csv
如果格式说明符是 %d,则工作表编号被格式化到输出文件名中;如果指定了 %s,则使用工作表名称。%d 说明符可以包含零填充和宽度说明符(例如 %04d)。
新代码相对于原始代码有三个主要变化。第一个变化是检查输入文件工作表规范
# Check for sheet specification in input file name.
match = re.search(r'^(.*)[@:](.*)$', inputFile)
if os.path.exists(inputFile) or not match:
inputUrl = uno.systemPathToFileUrl(os.path.abspath(inputFile))
inputSheet = '1' # Convert fist sheet.
else:
inputUrl = uno.systemPathToFileUrl(os.path.abspath(match.group(1)))
inputSheet = match.group(2)
正则表达式搜索检查输入文件名是否可以分解为文件名部分和工作表名称部分。如果可以,并且仅当 原始未分解的输入文件名未引用现有文件时,才将这些部分用作输入文件和工作表规范。检查原始未分解的输入文件名是否引用现有文件允许您指定名称中包含冒号和 at 符号的输入文件,并且即使在输入文件名中包含冒号或 at 符号并且您想要提取特定工作表的情况下,它也能继续工作,例如
$ python ssconverter.py this:month.xls:sales output.csv
将正确识别工作表名称为“sales”,文件名为“this:month.xls”。由于正则表达式搜索默认是贪婪的,因此中断将发生在最后一个冒号处,并且由于“this:month.xls:sales”未引用现有文件,因此将使用匹配项。
第二个主要变化与文档的加载方式有关。在使代码的其余部分达到我认为应该工作的程度后,我在测试时发现 OpenOffice 始终转换第一个工作表。在尝试修改代码但未能解决问题后,我开始查看一些其他 pyuno 代码示例,我真正注意到的唯一一点是,许多示例在加载文档时未使用 Hidden 属性。因此,我尝试了一下,果然解决了问题。随附的注释中有更多说明
# Sheet activation does not work properly when Hidden is specified.
# Although the sheet does become the active sheet, it's not the sheet that
# gets saved if the spreadsheet is loaded with Hidden=True.
#
# Removing Hidden=True doesn't seem to change anything: nothing appears
# on the screen regardless of the Hidden value.
#
# document = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, ooutils.oo_properties(Hidden=True))
document = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, ooutils.oo_properties())
第三个变化是提取工作表或多个工作表的地方。如果指定了特定工作表,则首先激活该工作表,然后将其保存到输出文件
# Activate the sheet to be converted.
if re.search(r'^\d+$', inputSheet):
sheet = sheets.getByIndex(int(inputSheet)-1)
else:
sheet = sheets.getByName(inputSheet)
controller.setActiveSheet(sheet)
outputUrl = uno.systemPathToFileUrl(os.path.abspath(outputFile))
document.storeToURL(outputUrl, props)
此处的正则表达式检查工作表名称是数字还是字符串(工作表编号或工作表名称)。
如果要保存所有工作表,则逐个激活工作表,格式化输出文件名并保存工作表
# Use the sheet number if the format is %d, otherwise the sheet name.
dfmt = re.search(r'%[0-9]*d', outputFile)
sfmt = re.search(r'%s', outputFile)
if dfmt or sfmt:
i = 0
while i < sheets.getCount():
# Activate the sheet.
sheet = sheets.getByIndex(i)
controller.setActiveSheet(sheet)
# Create output file name.
if dfmt:
ofile = outputFile % (i+1)
else:
ofile = outputFile % sheet.getName().replace(' ', '_')
if verbose: print " %s" % ofile
# Save the sheet to the output file.
outputUrl = uno.systemPathToFileUrl(os.path.abspath(ofile))
document.storeToURL(outputUrl, props)
i += 1
完整代码如下
1 #!/usr/bin/python
2 #
3 # Convert spreadsheet to CSV file.
4 #
5 # Based on:
6 # PyODConverter (Python OpenDocument Converter) v1.0.0 - 2008-05-05
7 # Copyright (C) 2008 Mirko Nasato <mirko@artofsolving.com>
8 # Licensed under the GNU LGPL v2.1 - or any later version.
9 # https://gnu.ac.cn/licenses/lgpl-2.1.html
10 #
11
12 import os
13 import re
14 import ooutils
15
16 import uno
17 from com.sun.star.task import ErrorCodeIOException
18
19
20
21 class SSConverter:
22 """
23 Spreadsheet converter class.
24 Converts spreadsheets to CSV files.
25 """
26
27 def __init__(self, oorunner=None):
28 self.desktop = None
29 self.oorunner = None
30
31
32 def convert(self, inputFile, outputFile, verbose=False):
33 """
34 Convert the input file (a spreadsheet) to a CSV file.
35
36 The input file name can contain a sheet specification to specify a particular sheet.
37 The sheet specification is either a number or a sheet name.
38 The sheet specification is appended to the file name separated by a colon
39 or an at sign: ":" or "@".
40
41 If the output file name contains a %d or %s format specifier, then all the sheets
42 in the input file are converted, otherwise only the first sheet is converted.
43
44 If the output file name contains a %d format specifier then the sheet number
45 is used when formatting the output file name.
46 The format can contain a width specifier (eg %02d).
47
48 If the output file name contains a %s specifier then the sheet name is used
49 when formatting the output file name.
50 """
51
52 # Start openoffice if needed.
53 if not self.desktop:
54 if not self.oorunner:
55 self.oorunner = ooutils.OORunner()
56
57 self.desktop = self.oorunner.connect()
58
59 # Check for sheet specification in input file name.
60 match = re.search(r'^(.*)[@:](.*)$', inputFile)
61 if os.path.exists(inputFile) or not match:
62 inputUrl = uno.systemPathToFileUrl(os.path.abspath(inputFile))
63 inputSheet = '1' # Convert fist sheet.
64 else:
65 inputUrl = uno.systemPathToFileUrl(os.path.abspath(match.group(1)))
66 inputSheet = match.group(2)
67
68
69 # NOTE:
70 # Sheet activation does not work properly when Hidden is specified.
71 # Although the sheet does become the active sheet, it's not the sheet that
72 # gets saved if the spreadsheet is loaded with Hidden=True.
73 #
74 # Removing Hidden=True doesn't seem to change anything: nothing appears
75 # on the screen regardless of the Hidden value.
76 #
77 # document = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, ooutils.oo_properties(Hidden=True))
78 document = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, ooutils.oo_properties())
79
80 try:
81 props = ooutils.oo_properties(FilterName="Text - txt - csv (StarCalc)")
82 #
83 # Another useful property option:
84 # FilterOptions="59,34,0,1"
85 # 59 - Field separator (semicolon), this is the ascii value.
86 # 34 - Text delimiter (double quote), this is the ascii value.
87 # 0 - Character set (system).
88 # 1 - First line number to export.
89 #
90 # For more information see:
91 # http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options
92
93 # To convert a particular sheet, the sheet needs to be active.
94 # To activate a sheet we need the spreadsheet-view, to get the spreadsheet-view
95 # we need the spreadsheet-controller, to get the spreadsheet-controller
96 # we need the spreadsheet-model.
97 #
98 # The spreadsheet-model interface is available from the document object.
99 # The spreadsheet-view interface is available from the controller.
100 #
101 controller = document.getCurrentController()
102 sheets = document.getSheets()
103
104 # If the output file name contains a %d or %s format specifier, convert all sheets.
105 # Use the sheet number if the format is %d, otherwise the sheet name.
106 dfmt = re.search(r'%[0-9]*d', outputFile)
107 sfmt = re.search(r'%s', outputFile)
108
109 if dfmt or sfmt:
110 i = 0
111 while i < sheets.getCount():
112 # Activate the sheet.
113 sheet = sheets.getByIndex(i)
114 controller.setActiveSheet(sheet)
115
116 # Create output file name.
117 if dfmt:
118 ofile = outputFile % (i+1)
119 else:
120 ofile = outputFile % sheet.getName().replace(' ', '_')
121
122 if verbose: print " %s" % ofile
123
124 # Save the sheet to the output file.
125 outputUrl = uno.systemPathToFileUrl(os.path.abspath(ofile))
126 document.storeToURL(outputUrl, props)
127 i += 1
128
129 else:
130 # Activate the sheet to be converted.
131 if re.search(r'^\d+$', inputSheet):
132 sheet = sheets.getByIndex(int(inputSheet)-1)
133 else:
134 sheet = sheets.getByName(inputSheet)
135
136 controller.setActiveSheet(sheet)
137 outputUrl = uno.systemPathToFileUrl(os.path.abspath(outputFile))
138 document.storeToURL(outputUrl, props)
139 finally:
140 if document: document.close(True)
141
142
143 if __name__ == "__main__":
144 from sys import argv
145 from os.path import isfile
146
147 if len(argv) == 2 and argv[1] == '--shutdown':
148 ooutils.oo_shutdown_if_running()
149 else:
150 if len(argv) < 3 or len(argv) % 2 != 1:
151 print "USAGE:"
152 print " python %s INPUT-FILE[:SHEET] OUTPUT-FILE ..." % argv[0]
153 print "OR"
154 print " python %s --shutdown" % argv[0]
155 exit(255)
156
157 try:
158 i = 1
159 converter = SSConverter()
160
161 while i+1 < len(argv):
162 print '%s => %s' % (argv[i], argv[i+1])
163 converter.convert(argv[i], argv[i+1], True)
164 i += 2
165
166 except ErrorCodeIOException, exception:
167 print "ERROR! ErrorCodeIOException %d" % exception.ErrCode
168 exit(1)