XML and
Excel Spreadsheet Conversions
Apply APIs from the Jakarta Project to make XML-to-spreadsheet and
spreadsheet-to-XML conversions
by Deepak Vohra
January 4, 2006
XML is the standard medium of data exchange, and often an XML document
is presented in a Microsoft Excel spreadsheet format. The Jakarta POI project provides an API to
create an Excel spreadsheet. The POI HSSF API can also be used to parse an Excel spreadsheet and
convert it to another format such as XML. The HSSF API has the provision to set the layout, border
settings, and fonts of an Excel spreadsheet document. Here, we'll generate an Excel spreadsheet
example by parsing an XML document and adding data from the XML document to a spreadsheet.
Subsequently we'll convert the Excel spreadsheet to an XML document.
The Jakarta POI HSSF API has classes to create an Excel workbook file and add spreadsheets to
the workbook. With the POI API a workbook is represented by the HSSFWorkbook class. Spreadsheet
fonts, sheet order, and cell styles are set in the HSSFWorkbook class.
A spreadsheet is represented with the HSSFSheet class. Sheet layout, including column widths,
margins, header, footer, and print setup, is set with the HSSFSheet class. A spreadsheet row is
represented with the HSSFRow class. The row height is set with the HSSFRow class. The HSSFCell
class represents a cell in a spreadsheet row. The cell style is set with the HSSFCell class. The
indexing of spreadsheets in a workbook, rows in a spreadsheet, and cells in a row is 0 based. In
the procedure presented here, we'll convert an XML document example to an Excel spreadsheet, and
then convert the spreadsheet to an XML document.
XML to Spreadsheet Data
You can create and parse an Excel spreadsheet by using the Apache POI HSSF API. Download the
Apache POI JAR file
, and extract it to an installation directory. Add poi-2.5.1-final-20040804.jar to the
Classpath. The XML document example is parsed with the JDK 5.0 XPath API. You can download and
install
JDK 5.0
from the Java site, and you can use the
Excel Viewer
to open the Excel spreadsheet that is generated from the XML document example. Download and
install the Excel Viewer; the Excel document can also be displayed in
Excel
itself.
Let's convert the XML document to an Excel spreadsheet by using the Apache HSSF API. Download
the document example,
catalog.xml
(see
Listing 1
). The XML document is parsed with the JDK 5.0 XPath API, and the retrieved values are added
to an Excel spreadsheet. Begin by importing the Apache POI HSSF package:
import org.apache.poi.hssf.
usermodel.*;
Use the HSSFWorkbook constructor to create an Excel workbook file:
HSSFWorkbook wb=
new HSSFWorkbook();
Create a spreadsheet from the workbook:
HSSFSheet spreadSheet=
wb.createSheet("spreadSheet");
Create a cell style object for the spreadsheet:
HSSFCellStyle cellStyle=
wb.createCellStyle();
Next, specify the border settings for the HSSFCellStyle object:
cellStyle.setBorderRight(
HSSFCellStyle.BORDER_MEDIUM);
cellStyle.setBorderTop(
HSSFCellStyle.BORDER_MEDIUM);
A row in the spreadsheet has cells corresponding to each of the elements in the journal tag of
the XML document. You can set the column width of each of the columns in the spreadsheet. For
example, specify the column width of the first cell of a row like this:
spreadSheet.setColumnWidth((
short)0, (short)(256*25));
Create an InputSource object for the XML document that you want to convert to an Excel
spreadsheet:
InputSource inputSource =
new InputSource(
new FileInputStream(
new File(
"C:/Excel/catalog.xml")));
Create an XPath object to parse the XML document:
XPathFactory factory=
XPathFactory.newInstance();
XPath xPath=factory.newXPath();
Specify the XPath expression for a node list you want to obtain:
String expression=
"/catalog/journal";
Then obtain a node list for the specified XPath expression. The node list consists of nodes
corresponding to the journal elements in the XML document example:
com.sun.org.apache.xml.internal.
dtm.ref.DTMNodeList nodeList =
(com.sun.org.apache.xml.
internal.dtm.ref.DTMNodeList)
(xPath.evaluate(
expression, inputSource,
XPathConstants.NODESET));
Iterate over the node list, and add a row to the spreadsheet that corresponds to each of the
journal nodes in the node list. Use the HSSFRow object to add a spreadsheet row:
for(int i=0;
i<nodeList.getLength();
i++){
HSSFRow row=
spreadSheet.createRow((
short)i);}
Now you can create a cell in a spreadsheet row for each of the elements in the journal element.
Use the createCell() method of the HSSFRow object to create a cell:
HSSFCell cell=
row.createCell((short)0);
Set the value of a row cell with the setCellValue() method. For example, the value of the cell
for the section element is set like this:
cell.setCellValue(((Element)(
nodeList.item(i))).
getElementsByTagName(
"section").item(0).
getFirstChild().
getNodeValue());
The first cell in a row has index 0; set the cell style of a cell with the setCellStyle() method
of the HSSFCell object:
cell.setCellStyle(cellStyle);
Similarly, you can set the values of the columns for the other cells in a row. Create a
FileOutputStream to output the Excel workbook to an XLS file:
FileOutputStream output=
new FileOutputStream(
new File(
"C:/Excel/ExampleExcel.xls"));
Output the Excel workbook to an XLS file, and close the FileOutputStream:
wb.write(output);
output.flush();
output.close();
An Excel spreadsheet is generated.
Going the Other Way
The
spreadsheet example
that is generated from the XML document example is available for download. The Java
application XMLToExcel.java, which is used to convert an XML document to an Excel spreadsheet, is
shown in
Listing 2
.
Now that we've seen how easy it is to generate an Excel document from an XML document, let's
covert the other way, that is, convert the Excel document to an XML document. You also use the
Apache POI HSSF API to parse an Excel spreadsheet and retrieve the cell values from the
spreadsheet. Begin by importing the Apache POI HSSF API:
import org.apache.poi.hssf.
usermodel.*;
The root element of the XML document generated is catalog, and a journal element is added to
correspond to each of the rows of the Excel spreadsheet. Generate an XML document, and specify the
root element of the document:
DocumentBuilderFactory factory =
DocumentBuilderFactory.
newInstance();
DocumentBuilder builder =
factory.newDocumentBuilder();
Document document =
builder.newDocument();
Element catalogElement=
document.createElement(
"catalog");
document.appendChild(
catalogElement);
Next, create an InputStream object for the Excel spreadsheet that will be converted to an XML
document:
InputStream input=
new FileInputStream(excelFile);
Obtain the workbook for the InputStream object:
HSSFWorkbook workbook=
new HSSFWorkbook(input);
Obtain the spreadsheet for the Excel workbook:
HSSFSheet spreadsheet=
workbook.getSheetAt(0);
Iterate over the rows in the spreadsheet, and add a journal element to the XML document for each
of the rows:
for(int i=0; i<=
spreadsheet.getLastRowNum();
i++){HSSFRow row=
spreadsheet.getRow(i);Element journalElement=
document.createElement(
"journal");
catalogElement.appendChild(
journalElement);}
A cell in a spreadsheet row is retrieved with the getCell() method, and a cell value is
retrieved with the getStringCellValue() method. For example, the section cell value is retrieved
from the spreadsheet and set in the XML document this way:
Element sectionElement=
document.createElement(
"section");
journalElement.appendChild(
sectionElement);
sectionElement.appendChild(
document.createTextNode(
row.getCell((short)0).
getStringCellValue()));
Similarly, the other cell values are retrieved from the spreadsheet and specified in the XML
document. The
Excel spreadsheet
from which an XML document—catalog.xml—is generated is available for download.
ExcelToXML.java, the Java application used to convert an Excel spreadsheet to an XML document, is
listed in Listing 3 below.
Listing 3. The ExcelToXML.java application converts an Excel spreadsheet to an XML
document.
import org.apache.poi.hssf.usermodel.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;public class ExcelToXML {
public void generateXML(File excelFile) {
try {
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder =
factory.newDocumentBuilder();
Document document = builder.newDocument();
Element catalogElement = document.createElement(
"catalog");
document.appendChild(catalogElement); InputStream input = new FileInputStream(
excelFile);
HSSFWorkbook workbook = new HSSFWorkbook(input);
HSSFSheet spreadsheet = workbook.getSheetAt(0); for (int i = 0; i <=
spreadsheet.getLastRowNum(); i++) {
HSSFRow row = spreadsheet.getRow(i); Element journalElement =
document.createElement("journal");
catalogElement.appendChild(journalElement); Element sectionElement =
document.createElement("section");
journalElement.appendChild(sectionElement);
sectionElement.appendChild(
document.createTextNode(row.getCell((short)
0).getStringCellValue())); Element publisherElement =
document.createElement("publisher");
journalElement.appendChild(publisherElement);
publisherElement.appendChild(
document.createTextNode(
row.getCell((short)
1).getStringCellValue())); Element levelElement = document.createElement(
"level");
journalElement.appendChild(levelElement);
levelElement.appendChild(
document.createTextNode(row.getCell((short)
2).getStringCellValue())); Element editionElement =
document.createElement("edition");
journalElement.appendChild(editionElement);
editionElement.appendChild(
document.createTextNode(
row.getCell((short)
3).getStringCellValue())); Element titleElement = document.createElement(
"title");
journalElement.appendChild(titleElement);
titleElement.appendChild(
document.createTextNode(row.getCell((short)
4).getStringCellValue())); Element authorElement =
document.createElement("author");
journalElement.appendChild(authorElement);
authorElement.appendChild(
document.createTextNode(row.getCell((short)
5).getStringCellValue()));
} TransformerFactory tFactory =
TransformerFactory.newInstance(); Transformer transformer =
tFactory.newTransformer(); DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(new File(
"C:/Excel/catalog.xml"));
transformer.transform(source, result);
} catch (IOException e) {
} catch (ParserConfigurationException e) {
} catch (TransformerConfigurationException e) {
} catch (TransformerException e) {
}
} public static void main(String[] argv) {
ExcelToXML excel = new ExcelToXML();
File input = new File(
"C:/Excel/ExampleExcel.xls");
excel.generateXML(input);
}
}
Listing 2. The XMLToExcel.java application converts an XML document to an Excel
spreadsheet.
import org.apache.poi.hssf.usermodel.*;
import org.apache.xpath.NodeSet;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
import java.io.*;
import javax.xml.xpath.*;public class XMLToExcel {
public void generateExcel(File xmlDocument) {
try {
HSSFWorkbook wb = new HSSFWorkbook();
HSSFSheet spreadSheet = wb.createSheet(
"spreadSheet");
HSSFCellStyle cellStyle = wb.createCellStyle();
cellStyle.setBorderRight(
HSSFCellStyle.BORDER_MEDIUM);
cellStyle.setBorderTop(
HSSFCellStyle.BORDER_MEDIUM); /* cellStyle.setFillForegroundColor(
/* org.apache.poi.hssf.util.HSSFColor.BLUE.
/* index);
/* cellStyle.setFillPattern(
/* HSSFCellStyle.SOLID_FOREGROUND); */
spreadSheet.setColumnWidth((short) 0, (short)
(256 * 25));
spreadSheet.setColumnWidth((short) 1, (short)
(256 * 25));
spreadSheet.setColumnWidth((short) 2, (short)
(256 * 25));
spreadSheet.setColumnWidth((short) 3, (short)
(256 * 25));
spreadSheet.setColumnWidth((short) 4, (short)
(256 * 75));
spreadSheet.setColumnWidth((short) 5, (short)
(256 * 25)); InputSource inputSource = new InputSource(
new FileInputStream(xmlDocument)); XPathFactory factory =
XPathFactory.newInstance();
XPath xPath = factory.newXPath();
String expression = "/catalog/journal"; com.sun.org.apache.xml.internal.dtm.ref.
DTMNodeList nodeList = (
com.sun.org.apache.xml.internal.dtm.ref.
DTMNodeList) (xPath.evaluate(expression,
inputSource, XPathConstants.NODESET)); for (int i = 0; i < nodeList.getLength();
i++) {
HSSFRow row = spreadSheet.createRow((
short) i); HSSFCell cell = row.createCell((short) 0);
cell.setCellValue(((Element) (
nodeList.item(i))).getElementsByTagName(
"section").item(0).getFirstChild().
getNodeValue()); cell.setCellStyle(cellStyle); cell =
row.createCell((short) 1); cell.setCellValue(((Element) (
nodeList.item(i))).getElementsByTagName(
"publisher").item(0).getFirstChild().
getNodeValue()); cell.setCellStyle(cellStyle); cell =
row.createCell((short) 2);
cell.setCellValue(((Element) (
nodeList.item(i))).getElementsByTagName(
"level").item(0).getFirstChild().
getNodeValue()); cell.setCellStyle(cellStyle); cell =
row.createCell((short) 3);
cell.setCellValue(((Element) (
nodeList.item(i))).getElementsByTagName(
"edition").item(0).getFirstChild().
getNodeValue()); cell.setCellStyle(cellStyle); cell =
row.createCell((short) 4);
cell.setCellValue(((Element) (
nodeList.item(i))).getElementsByTagName(
"title").item(0).getFirstChild().
getNodeValue()); cell.setCellStyle(cellStyle); cell =
row.createCell((short) 5);
cell.setCellValue(((Element) (
nodeList.item(i))).getElementsByTagName(
"author").item(0).getFirstChild().
getNodeValue()); cell.setCellStyle(cellStyle);
} FileOutputStream output = new FileOutputStream(
new File("C:/Excel/ExampleExcel.xls"));
wb.write(output);
output.flush();
output.close();
} catch (IOException e) {
} catch (XPathExpressionException e) {
}
} public static void main(String[] argv) {
File xmlDocument = new File(
"C:/Excel/catalog.xml"); XMLToExcel excel = new XMLToExcel();
excel.generateExcel(xmlDocument);
}
}
Listing 1. This XML document will be converted to an Excel spreadsheet.
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<journal>
<section>Java Technology</section>
<publisher>IBM developerWorks</publisher>
<level>Introductory</level>
<edition>Nov-2004</edition>
<title>Getting started with enumerated
types</title>
<author>Brett McLaughlin</author>
</journal>
<journal>
<section>Java Technology</section>
<publisher>IBM developerWorks</publisher>
<level>Intermediate</level>
<edition>Sep-2004</edition>
<title>Migrating to Eclipse</title>
<author>David Gallardo</author>
</journal>
<journal>
<section>Java Technology</section>
<publisher>IBM developerWorks</publisher>
<level>Intermediate</level>
<edition>Jan-2004</edition>
<title>Design service-oriented architecture
frameworks with J2EE technology</title>
<author>Naveen Balani</author>
</journal>
</catalog>
|