使用 Python 和 pyuno 将电子表格转换为 CSV 文件,第 1 部分 v2

作者:Mitch Frazier

几个月前,我开发了一些 pyuno 代码,用于从命令行将电子表格转换为 CSV 文件。PyunoOpenOffice 运行时的 Python 接口。我收到的增强建议之一是添加提取所有工作表和/或特定工作表的功能,而不是始终提取第一个工作表。以下代码更新正是为了实现这一目标。

为了刷新我们的记忆,转换程序接受成对的输入和输出文件,例如

$ python ssconverter.py file1.xls file1.csv file2.ods file2.csv

每个输入文件都是一个电子表格,它被转换为相应的输出文件作为 CSV 文件。新版本已得到增强,因此您可以在输入文件名末尾附加工作表名称或编号,以指定要提取的特定工作表,例如

$ python ssconverter.py file1.xls:1      file1.csv
$ python ssconverter.py file1.xls:Sheet1 file1.csv
$ python ssconverter.py file2.ods@1      file2.csv
$ python ssconverter.py file2.ods@Sheet2 file2.csv

工作表名称或编号(从 1 开始)附加到输入文件名末尾,并用冒号 (:) 或 at 符号 (@) 分隔。

此外,您可以通过在输出文件名中指定 %d 或 %s 格式说明符来一次性转换所有工作表

$ python ssconverter.py file1.xls file1-%d.csv
$ python ssconverter.py file1.xls file1-%s.csv

如果格式说明符是 %d,则工作表编号被格式化到输出文件名中;如果指定了 %s,则使用工作表名称。%d 说明符可以包含零填充和宽度说明符(例如 %04d)。

新代码相对于原始代码有三个主要变化。第一个变化是检查输入文件工作表规范

        # Check for sheet specification in input file name.
        match = re.search(r'^(.*)[@:](.*)$', inputFile)
        if os.path.exists(inputFile) or not match:
            inputUrl   = uno.systemPathToFileUrl(os.path.abspath(inputFile))
            inputSheet = '1'   # Convert fist sheet.
        else:
            inputUrl   = uno.systemPathToFileUrl(os.path.abspath(match.group(1)))
            inputSheet = match.group(2)

正则表达式搜索检查输入文件名是否可以分解为文件名部分和工作表名称部分。如果可以,并且仅当 原始未分解的输入文件名未引用现有文件时,才将这些部分用作输入文件和工作表规范。检查原始未分解的输入文件名是否引用现有文件允许您指定名称中包含冒号和 at 符号的输入文件,并且即使在输入文件名中包含冒号或 at 符号并且您想要提取特定工作表的情况下,它也能继续工作,例如

$ python ssconverter.py this:month.xls:sales output.csv

将正确识别工作表名称为“sales”,文件名为“this:month.xls”。由于正则表达式搜索默认是贪婪的,因此中断将发生在最后一个冒号处,并且由于“this:month.xls:sales”未引用现有文件,因此将使用匹配项。

第二个主要变化与文档的加载方式有关。在使代码的其余部分达到我认为应该工作的程度后,我在测试时发现 OpenOffice 始终转换第一个工作表。在尝试修改代码但未能解决问题后,我开始查看一些其他 pyuno 代码示例,我真正注意到的唯一一点是,许多示例在加载文档时未使用 Hidden 属性。因此,我尝试了一下,果然解决了问题。随附的注释中有更多说明

        #   Sheet activation does not work properly when Hidden is specified.
        #   Although the sheet does become the active sheet, it's not the sheet that
        #   gets saved if the spreadsheet is loaded with Hidden=True.
        #
        #   Removing Hidden=True doesn't seem to change anything: nothing appears
        #   on the screen regardless of the Hidden value.
        #
        # document  = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, ooutils.oo_properties(Hidden=True))
        document  = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, ooutils.oo_properties())

第三个变化是提取工作表或多个工作表的地方。如果指定了特定工作表,则首先激活该工作表,然后将其保存到输出文件

                # Activate the sheet to be converted.
                if re.search(r'^\d+$', inputSheet):
                    sheet = sheets.getByIndex(int(inputSheet)-1)
                else:
                    sheet = sheets.getByName(inputSheet)

                controller.setActiveSheet(sheet)
                outputUrl = uno.systemPathToFileUrl(os.path.abspath(outputFile))
                document.storeToURL(outputUrl, props)

此处的正则表达式检查工作表名称是数字还是字符串(工作表编号或工作表名称)。

如果要保存所有工作表,则逐个激活工作表,格式化输出文件名并保存工作表

            # Use the sheet number if the format is %d, otherwise the sheet name.
            dfmt = re.search(r'%[0-9]*d', outputFile)
            sfmt = re.search(r'%s', outputFile)

            if dfmt  or  sfmt:
                i = 0
                while i < sheets.getCount():
                    # Activate the sheet.
                    sheet = sheets.getByIndex(i)
                    controller.setActiveSheet(sheet)

                    # Create output file name.
                    if dfmt:
                        ofile = outputFile % (i+1)
                    else:
                        ofile = outputFile % sheet.getName().replace(' ', '_')

                    if verbose: print "    %s" % ofile

                    # Save the sheet to the output file.
                    outputUrl = uno.systemPathToFileUrl(os.path.abspath(ofile))
                    document.storeToURL(outputUrl, props)
                    i += 1

完整代码如下

  1 #!/usr/bin/python
  2 #
  3 # Convert spreadsheet to CSV file.
  4 #
  5 # Based on:
  6 #   PyODConverter (Python OpenDocument Converter) v1.0.0 - 2008-05-05
  7 #   Copyright (C) 2008 Mirko Nasato <mirko@artofsolving.com>
  8 #   Licensed under the GNU LGPL v2.1 - or any later version.
  9 #   https://gnu.ac.cn/licenses/lgpl-2.1.html
 10 #
 11 
 12 import os
 13 import re
 14 import ooutils
 15 
 16 import uno
 17 from com.sun.star.task import ErrorCodeIOException
 18 
 19 
 20 
 21 class SSConverter:
 22     """
 23     Spreadsheet converter class.
 24     Converts spreadsheets to CSV files.
 25     """
 26 
 27     def __init__(self, oorunner=None):
 28         self.desktop  = None
 29         self.oorunner = None
 30 
 31 
 32     def convert(self, inputFile, outputFile, verbose=False):
 33         """
 34         Convert the input file (a spreadsheet) to a CSV file.
 35 
 36         The input file name can contain a sheet specification to specify a particular sheet.
 37         The sheet specification is either a number or a sheet name.
 38         The sheet specification is appended to the file name separated by a colon
 39         or an at sign: ":" or "@".
 40 
 41         If the output file name contains a %d or %s format specifier, then all the sheets
 42         in the input file are converted, otherwise only the first sheet is converted.
 43 
 44         If the output file name contains a %d format specifier then the sheet number
 45         is used when formatting the output file name.
 46         The format can contain a width specifier (eg %02d).
 47 
 48         If the output file name contains a %s specifier then the sheet name is used
 49         when formatting the output file name.
 50         """
 51 
 52         # Start openoffice if needed.
 53         if not self.desktop:
 54             if not self.oorunner:
 55                 self.oorunner = ooutils.OORunner()
 56 
 57             self.desktop = self.oorunner.connect()
 58 
 59         # Check for sheet specification in input file name.
 60         match = re.search(r'^(.*)[@:](.*)$', inputFile)
 61         if os.path.exists(inputFile) or not match:
 62             inputUrl   = uno.systemPathToFileUrl(os.path.abspath(inputFile))
 63             inputSheet = '1'   # Convert fist sheet.
 64         else:
 65             inputUrl   = uno.systemPathToFileUrl(os.path.abspath(match.group(1)))
 66             inputSheet = match.group(2)
 67 
 68 
 69         # NOTE:
 70         #   Sheet activation does not work properly when Hidden is specified.
 71         #   Although the sheet does become the active sheet, it's not the sheet that
 72         #   gets saved if the spreadsheet is loaded with Hidden=True.
 73         #
 74         #   Removing Hidden=True doesn't seem to change anything: nothing appears
 75         #   on the screen regardless of the Hidden value.
 76         #
 77         # document  = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, ooutils.oo_properties(Hidden=True))
 78         document  = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, ooutils.oo_properties())
 79 
 80         try:
 81             props = ooutils.oo_properties(FilterName="Text - txt - csv (StarCalc)")
 82             #
 83             # Another useful property option:
 84             #   FilterOptions="59,34,0,1"
 85             #     59 - Field separator (semicolon), this is the ascii value.
 86             #     34 - Text delimiter (double quote), this is the ascii value.
 87             #      0 - Character set (system).
 88             #      1 - First line number to export.
 89             #
 90             # For more information see:
 91             #   http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options
 92 
 93             # To convert a particular sheet, the sheet needs to be active.
 94             # To activate a sheet we need the spreadsheet-view, to get the spreadsheet-view
 95             # we need the spreadsheet-controller, to get the spreadsheet-controller
 96             # we need the spreadsheet-model.
 97             #
 98             # The spreadsheet-model interface is available from the document object.
 99             # The spreadsheet-view interface is available from the controller.
100             #
101             controller = document.getCurrentController()
102             sheets     = document.getSheets()
103 
104             # If the output file name contains a %d or %s format specifier, convert all sheets.
105             # Use the sheet number if the format is %d, otherwise the sheet name.
106             dfmt = re.search(r'%[0-9]*d', outputFile)
107             sfmt = re.search(r'%s', outputFile)
108 
109             if dfmt  or  sfmt:
110                 i = 0
111                 while i < sheets.getCount():
112                     # Activate the sheet.
113                     sheet = sheets.getByIndex(i)
114                     controller.setActiveSheet(sheet)
115 
116                     # Create output file name.
117                     if dfmt:
118                         ofile = outputFile % (i+1)
119                     else:
120                         ofile = outputFile % sheet.getName().replace(' ', '_')
121 
122                     if verbose: print "    %s" % ofile
123 
124                     # Save the sheet to the output file.
125                     outputUrl = uno.systemPathToFileUrl(os.path.abspath(ofile))
126                     document.storeToURL(outputUrl, props)
127                     i += 1
128 
129             else:
130                 # Activate the sheet to be converted.
131                 if re.search(r'^\d+$', inputSheet):
132                     sheet = sheets.getByIndex(int(inputSheet)-1)
133                 else:
134                     sheet = sheets.getByName(inputSheet)
135 
136                 controller.setActiveSheet(sheet)
137                 outputUrl = uno.systemPathToFileUrl(os.path.abspath(outputFile))
138                 document.storeToURL(outputUrl, props)
139         finally:
140             if document: document.close(True)
141 
142 
143 if __name__ == "__main__":
144     from sys import argv
145     from os.path import isfile
146 
147     if len(argv) == 2  and  argv[1] == '--shutdown':
148         ooutils.oo_shutdown_if_running()
149     else:
150         if len(argv) < 3  or  len(argv) % 2 != 1:
151             print "USAGE:"
152             print "  python %s INPUT-FILE[:SHEET] OUTPUT-FILE ..." % argv[0]
153             print "OR"
154             print "  python %s --shutdown" % argv[0]
155             exit(255)
156 
157         try:
158             i = 1
159             converter = SSConverter()
160 
161             while i+1 < len(argv):
162                 print '%s => %s' % (argv[i], argv[i+1])
163                 converter.convert(argv[i], argv[i+1], True)
164                 i += 2
165 
166         except ErrorCodeIOException, exception:
167             print "ERROR! ErrorCodeIOException %d" % exception.ErrCode
168             exit(1)

Mitch Frazier 是 Emerson Electric Co. 的嵌入式系统程序员。自 2000 年代初期以来,Mitch 一直是 Linux Journal 的贡献者和朋友。

加载 Disqus 评论