This article describes how to use the Zendesk REST API to package the content of your Help Center so you can hand it off to a localization vendor for translation for the first time. You'll also learn how to automate the reverse process: publishing the translated content on Help Center after it comes back from the translators.
The article assumes you haven't localized any of your content yet and are starting from scratch. Support for multiple languages is available on the Plus and Enterprise plans of Zendesk Support.
The article is based on custom automation scripts the docs team at Zendesk uses to localize the Zendesk Support documentation.
The article walks you through the following tasks:
- Part 1: Developing a localization workflow
- Part 2: Setting up your scripting environment
- Part 3: Creating the initial handoff package
- Part 4: Publishing the translated content
Part 1: Developing a localization workflow
Before doing anything else, settle on a workflow for localizing your content. The Zendesk docs team uses the following basic workflow for getting content localized:
-
On handoff day, we download the HTML of the English articles and write them to HTML files.
-
We bundle the HTML files with our image files in a zip file, and then hand off the package to the localization vendor.
-
After the vendor returns the translated HTML files, we upload the translated HTML to the knowledge base.
-
Finally, we upload new or updated images to an offsite file server we use to host our images.
Most of our content is generated from DITA source files stored in Box and sync'ed to the writers' laptops. However, we use the Web content as our master for translation because it always has our most up-to-date content.
What's included in the handoff package
The initial handoff package described in this article consists of the following folders:
/handoff
/current
/graphics
/README.pdf
The current folder contains the most up-to-date English HTML files.
The graphics folders contain the English source image files for the articles.
The readme file contains information for the translators. The document covers the following topics:
- Description of the folders containing the text and image files
- Any extra strings to localize not captured in the HTML files, such as category and section titles
- Any other notes, such as how to handle titles or cross-references
Managing images
The images in most articles in the Zendesk documentation are hosted on an Amazon S3 file server. This option provides more flexility for managing images. You can add, update, or delete the images on the file server using a file client such as Cyberduck.
A file server also gives you more flexibility in terms of managing the images in localized articles. You can create and populate folders of localized images on the file server with a simple suffix to differentiate the languages. Example:
- user_guide
- user_guide_de
- user_guide_fr
- user_guide_ja
Then, in the localized articles, you can switch to the localized version of an image by changing the HTML src
attribute. For example, if the src
attribute of an English image is as follows:
src="http://cdn.oscura.com/images/documentation/user_guide/visit_icon.png"
then adding the "de" suffix to "userguide" displays the German version of the image:
src="http://cdn.oscura.com/images/documentation/user_guide_de/visit_icon.png"
You can make this change for all the images in all the German files with a simple global search and replace of "_guide/" with "_guide_de/".
Setting up your scripting environment
This article uses Python to help automate the localization process. Python is a powerful but beginner-friendly scripting and programming language with a clear and readable syntax. Visit the Python website to learn more.
All you need to write and run Python scripts is a text editor and a command-line interface like Terminal on the Mac or the command prompt in Windows. You'll need a few more tools to develop your localizaton scripts. This section describes how to get them.
Topics covered:
If you're interested in taking a deeper dive into Python after finishing this article, see the following free resources:
- Dive into Python 3 by Mark Pilgrim
- Think Python by Allen B. Downey
Install Python
The scripts in this article use version 3 of Python.
To install Python
- Go to http://www.python.org/download/ and install the latest stable production version of Python 3 for your operating system.
You can test the installation by running python3 --version
on the command line. It should give you the Python version number.
Install Requests
The Requests module is a third-party Python library that simplifies making HTTP requests in Python. To learn more, see Requests: HTTP for Humans.
To install the Requests module
-
At the command line, enter:
$ pip3 install requests
That's it. The command downloads and installs the latest version of the module.
pip
instead of pip3
on the command line.Install Beautiful Soup
Beautiful Soup is a Python library that provides tools for parsing, navigating, searching, and modifying HTML trees. The scripts in this article rely on the module to parse and modify the Help Center's HTML articles. To learn more, see the Beautiful Soup website.
To install the Beautiful Soup module
-
At the command line, enter:
$ pip3 install beautifulsoup4
The command downloads and installs the latest version of the module.
-
Install lxml, an HTML parser that works with Beautiful Soup:
$ pip3 install lxml
Beautiful Soup works with a number of parsers; lxml is one of the fastest.
Part 3: Creating the initial handoff package
In this part, you build a script that uses the Zendesk REST API to download the articles in your knowledge base and create an HTML file for each article. After running the script, you'll be able to hand off your content to a localization vendor for a cost estimate and translation.
The script assumes the content you want to hand off is contained in one category on your Help Center.
Script spec
Here are the basic tasks the script has to accomplish to create the handoff package:
- Download the HTML of the articles from the Help Center.
- Because the title of each article is not included in the article's HTML body, get the title from the API and insert it as an h1 tag in the article's HTML.
- Write an HTML file for each article.
Set up the folder structure
Create the following folder structure to store the handoff files and your scripts:
/production
/handoff
/current
/graphics
/scripts
Create the script
Save the script below as create_package.py in the scripts folder.
Important: When using the scripts in this article, make sure to indent lines as shown. Indentation matters in Python.
The script downloads the default-language "translation" of the articles from Help Center and writes them to HTML files in the handoff/current folder. The sections that follow describe how to change the settings and run the script, and how it works.
import os
import requests
from bs4 import BeautifulSoup
subdomain = 'your_subdomain' # setting
email = 'your_email' # setting
password = 'your_zd_password' # setting
session = requests.Session()
session.auth = (email, password)
session.headers = {'Content-Type': 'application/json'}
locale = 'en-us' # setting
sections = ['36406', '200122006'] # setting
# ignore = ['23650177', '33992907', '44356063'] # setting
ignore = []
file_path = os.path.join('..', 'handoff', 'current')
for section in sections:
articles = []
url = 'https://{}.zendesk.com/api/v2/help_center/sections/{}/articles.json'.format(subdomain, section)
response = session.get(url)
if response.status_code != 200:
print('Failed to get articles in section {}. Error {}.'.format(section, response.status_code))
exit()
print('\nSuccessfully retrieved the articles in section {}:'.format(section))
data = response.json()
[articles.append(article['id']) for article in data['articles']]
for article in articles:
if str(article) in ignore: continue
url = 'https://{}.zendesk.com/api/v2/help_center/articles/{}/translations/{}.json'.format(subdomain, article, locale)
response = session.get(url)
data = response.json()
if 'error' in data:
continue
tree = BeautifulSoup(data['translation']['body'], 'lxml')
meta = tree.new_tag('meta')
meta['charset'] = 'utf-8'
head = tree.new_tag('head')
head.append(meta)
h1 = tree.new_tag('h1')
h1.string = data['translation']['title']
tree.body.insert_before(h1)
tree.h1.insert_before(head)
filename = '{}.html'.format(article)
with open(os.path.join(file_path, filename), mode='w', encoding='utf-8') as f:
f.write('<!DOCTYPE html>\n' + tree.prettify())
print('- Saved {}'.format(filename))
Change the settings
-
Replace the values for subdomain, email, and password:
subdomain = 'your_subdomain' email = 'your_email' password = 'your_zd_password'
For example, if your Zendesk Support URL is obscura.zendesk.com, then use 'obscura' as the value of your Zendesk Support subdomain:
subdomain = 'obscura'
The email and password are the same as the ones you use to sign in to Zendesk Support. You must be a Zendesk Support admin or agent with access to the content. To be safe in case your laptop is lost or stolen, enter the password value only when you run the script and then delete it when you're done.
-
Specify the locale of your default language:
locale = 'en-us'
You can get the valid locale value by looking at the URL of your Help Center in your default language:
https://obscura.zendesk.com/hc/en-us
You can also get a list of valid locales for your Zendesk Support account with the following curl request:
curl https://{subdomain}.zendesk.com/api/v2/locales.json \ -v -u {email_address}:{password}
-
Use the sections and ignore variables to specify the content you want to include in the handoff:
sections = ['21535856', '21825876', '33838271'] ignore = ['23650177', '24919921', '33992907', '44356063']
Use the sections variable to list all the sections to include in the handoff. If a section is not listed, the script ignores it. You can use the setting to skip sections that don't need to be translated, such as agent-only sections.
Use the ignore variable to list all the articles within the included sections that you don't want to include in the handoff.
If you don't want to ignore any articles, use the following statement:
ignore = []
-
Save the file.
Run the script
-
Navigate to your scripts folder with your command line interface.
-
Run the script from the command line as follows:
$ python3 create_package.py
Note: The dollar sign ($) represents the command prompt. Don't enter it.You should get messages confirming that the articles were retrieved and saved to HTML files.
-
Check your handoff/current folder to confirm that it contains the new files. The folder should contain all the articles in the sections you listed, minus the articles you listed to ignore.
-
Open a few files in a text editor to confirm they contain the HTML for the articles. Check to make sure the title was inserted as an h1 tag.
How it works
The script performs the following tasks.
Import libraries and define variables
-
Import the libraries used in the script:
import os import requests from bs4 import BeautifulSoup
The
os
library is a native Python library that lets you work with files and file paths. You installed the other two libraries. -
Specify the parameters for making requests to the Zendesk API:
subdomain = 'your_subdomain' email = 'your_email' password = 'your_zd_password' session = requests.Session() session.auth = (email, password) session.headers = {'Content-Type': 'application/json'}
The requests module is used to create a session object and set the authentication and headers for your HTTP requests.
-
Specify the locale of your default language:
locale = 'en-us'
-
Specify the section content to include in the handoff:
sections = ['21535856', '21825876', '33838271'] ignore = ['23650177', '24919921', '33992907', '44356063']
-
Specify where to write the HTML files:
file_path = os.path.join('..', 'handoff', 'current')
The location is expressed as a path relative to folder where the script runs -- in this case, your scripts folder. The path is ../handoff/current/.
The
os.path.join()
function ensures the path is valid regardless of your OS. For example, in Windows, it expresses the path as ..\handoff\current\.
Fetch the article ids
The script needs to article ids to get the content of the articles.
-
Create a loop to iterate through each section in your sections list and create a list variable to store the article ids:
for section in sections: articles = []
On each iteration, you'll retrieve the ids of all the articles in the section.
-
Define the url to use for the API request:
url = 'https://{}.zendesk.com/api/v2/help_center/sections/{}/articles.json'.format(subdomain, section)
The url changes dynamically with each iteration. The
format()
string method inserts (or interpolates) the values of its arguments in the placeholders ({}
) in the string.The url points to the list articles endpoint, which is defined in the API docs as follows:
GET /api/v2/help_center/sections/{id}/articles.json
where
{id}
is the section id. The endpoint fetches the metadata, including article ids, for all the articles in the specified section. Because you want to get articles from several sections, the script makes one request for each section in your sections list. The url will be different for each request. The section variable that's interpolated in the url provides a different{id}
value for each iteration of the loop.Using the example values, on the first iteration the value of url is as follows:
https://your_subdomain.zendesk.com/api/v2/help_center/sections/21535856/articles.json
-
Make the request and save the response:
response = session.get(url)
This simple statement makes a GET request to the API using the authentication and header parameters defined earlier. It fetches the metadata for all the articles in the section and assigns it to the response variable. For an example of the response, see the API doc.
-
Check for request for errors and report back:
if response.status_code != 200: print('Failed to get articles in section {}. Error {}.'.format(section, response.status_code)) exit() print('Successfully retrieved the articles in section {}:'.format(section))
According to the API doc, the API returns a status code of 200 if the request was successful. In other words, if the status code is anything other than 200 (
if response.status_code != 200:
), then something went wrong. The script prints an error message and exits.If the script didn't exit, it means a status code of 200 was returned and the script prints a success message.
Access the article ids in the response
At this point, the metadata for all the section's articles is contained in the response variable as JSON data. The script needs to decode the data and store the id of each article so that it can fetch the contents of the articles from the translations API.
-
Decode and assign the response to a new variable:
data = response.json()
The
json()
function decodes the data returned by the API into a Python dictionary, which is a built-in data type. A dictionary is simply a set of key/value pairs formatted almost identically to JSON. Example dictionary:{'id': 35436, 'author_id': 88887, 'draft': true }
Consult the Zendesk API docs to figure out how the data dictionary is structured. For example, according to the articles API doc, the JSON returned by the API has the following structure:
You can deduce from the doc that the Python dictionary consists of one key named 'articles'. Its value is a list of articles, as indicated by the square brackets. Each item in the list is a dictionary of article properties, as indicated by the curly braces.
-
Create a list of article ids from the data:
[articles.append(article['id']) for article in data['articles']]
This is a Python list comprehension, a concise way of looping through a data collection to create a list. It's equivalent to the following snippet:
articles = [] for article in data['articles']: articles.append(article['id'])
The for loop iterates through each article in the data['articles'] list, gets the value of the article id, and appends it to the articles list.
-
Create a loop to iterate through the list of article ids:
for article in articles:
You want to loop through each article in the current forum to retrieve the article's content from the translations API.
-
Check to see if the current article id is on your ignore list:
if str(article) in ignore: continue
If the article id (converted to a string to avoid a data-type mismatch) is on the ignore list, the script skips the rest of the loop and gets the next article in the articles list.
If the article is not on the ignore list, the script fetches its content and writes it to a file, as described in the next sections.
Fetch the content of the article
In the Help Center API, the metadata and the content of an article are provided by different resources. You get the metadata from the articles API. You get the content from the translations API, which also includes the default-language content.
-
Define the url to use for the API request:
url = 'https://{}.zendesk.com/api/v2/help_center/articles/{}/translations/{}.json'.format(subdomain, article, locale)
The url points to the get translations endpoint, which is defined in the API docs as follows:
GET /api/v2/help_center/articles/{article_id}/translations/{locale}.json
The endpoint gets the content of the specified article for the specified locale.
In the url string, a different article value is interpolated at the second placeholder in the string on each iteration of the articles loop. The locale value is interpolated in the string at the third placeholder and doesn't change. Its value is defined at the top of the script, in this case
en-us
. -
Make the request, decode it, and save the response:
response = session.get(url) data = response.json()
Parse and modify the HTML
Before writing the article to a file, the script needs to get the title of the article from the data variable and insert it in the article's HTML as an h1 tag. Otherwise the translators won't know the title of the article.
The script should also add a meta tag that specifies a UTF-8 character set. Some browsers and localization tools might default to a different encoding without it.
The script manipulates the HTML using the Beautiful Soup library, which provides tools for parsing, navigating, searching, and modifying HTML trees.
-
Start by creating the Beautiful Soup tree (also known as making the soup) with the article body:
tree = BeautifulSoup(data['translation']['body'], 'lxml')
The article body is specified by the body property of the translation object in the API response.
-
Create a
<meta>
tag and set its charset attribute to "utf-8" for browsers and localization tools:meta = tree.new_tag('meta') meta['charset'] = 'utf-8'
The lines create the following HTML tag:
<meta charset="utf-8"/>
-
Create a
<head>
tag and insert your<meta>
tag in ithead = tree.new_tag('head') head.append(meta)
The lines create the following tag structure:
<head> <meta charset="utf-8"/> </head>
-
Create a
<h1>
heading tag with the article title:h1 = tree.new_tag('h1') h1.string = data['translation']['title']
The article title is specified by the title property of the translation object in the API response. The lines create the following HTML tag:
<h1>Some article title</h1>
-
Insert the
<h1>
tag before the<body>
tag, and the<head>
tag before<h1>
:tree.body.insert_before(h1) tree.h1.insert_before(head)
Write the article to a file
The modified HTML in the Beautiful Soup tree is ready to be written to a file for the translators.
-
Specify a unique filename based on the article id:
filename = '{}.html'.format(article)
The name is based on the article id to ensure there are no duplicates. Example: 24919921.html
-
Write the article to a new file:
with open(os.path.join(file_path, filename), mode='w', encoding='utf-8') as f: f.write('<!DOCTYPE html>\n' + tree.prettify())
The snippet creates a new file in write mode (
mode='w'
) and writes the contents of the HTML tree to it.The value of file_path is specified at the top of the script and points to your handoff/current folder. The script creates the files in this folder.
The
prettify()
function converts a Beautiful Soup tree into a nicely formatted string with each HTML tag on its own line. A DOCTYPE tag is prepended to the HTML. -
Print the results:
print('- Saved {}'.format(filename))
Prepare the readme file and hand off the package
You're ready to hand off the HTML files to a localization vendor for a cost estimate and translation.
Before you do, make sure to write a readme file that includes any extra strings that need to be localized. For the initial handoff, this includes the name of your Help Center, the title of the documentation category, and the titles of all the sections included in your handoff.
Part 4: Publishing the translated content
After a while, you get the translated content back from the localization vendor. The vendor should have returned translated versions of the articles, as well as the names of your Help Center, documentation category, and sections. This part describes how to publish the content in your Help Center.
Uploading translations of articles to Help Center is a fairly straightforward operation using the API. The problem is making sure categories and sections exist for each supported language before uploading the articles, especially on your first localization pass when they don't exist yet.
Any translated page must have a parent page translated in the same language. The page hierarchy is as follows: Category > Section > Article. For example, if you add an article translated in German, the article must be contained in a German section. The German section must in turn be contained in a German category. You can't upload an orphan article. For more information on the page hierarchy, see Anatomy of the Help Center.
When localizing your Help Center, it makes sense to start by adding localized versions of category landing pages, followed by section landing pages, followed by articles. This workflow guarantees that any new translated article has a section and category in the same language.
Topics covered in this section:
- Enable your supported languages
- Add the category translations
- Add the section translations
- Add the article translations
- Updating the article translations
Enable your supported languages
Before you begin, make sure you defined the languages in your Zendesk Support account. For instructions, see Selecting the languages you want to support.
After defining the languages for your account, enable them for Help Center as follows.
-
In Help Center, click General Settings in the tools panel on the lower-right side of the page.
-
On the General Settings page, select the languages you want to enable for Help Center.
-
Enter the localized Help Center name for each of your languages.
-
Click Update to save your changes.
Add category translations
Because you're only localizing one category, it's easier to add a category translation with the user interface rather than with the API.
To add translations of your category
-
After signing in as a Help Center manager, navigate to the category containing your docs.
-
Click Edit category in the tools panel on the lower-right side of the page.
-
Click to add a translation of the category from the list on the right side of the page:
-
Enter the translated name for the category.
(You did include the category name in the strings in your readme file to the localization vendor, right?)
-
Click Update Translation to save your changes.
-
Repeat for each supported language.
Add the section translations
This article assumes you included more than a handful of sections in your handoff package, so it uses the API to add section translations. The script is designed to read the translated titles from a csv file, so a little preparatory work is necessary.
Add the translated titles to a csv file
-
Create a csv (Comma Separated Values) file named translated_titles.csv in the scripts folder.
-
In a text editor, enter the following information on each line:
section_id,"title for first locale","title for second locale",...
The section ids should be the same ids specified in the sections variable in the create_package.py file. They're the sections you included in the handoff package.
Example for a Help Center being localized in German and French:
42453453,"Erste Schritte","Mise en route" 42535344,"Konfigurieren des E-Mail-Kanals","Configuration du canal des emails" 53434522,"Business-Regeln","Règles de gestion"
Rules:
- Add a row for each section you included in the handoff package
- Don't include a heading row
- Add a column for each translated version of a section title
- Make sure the title columns are in alphabetical order by locale. Example: de, fr, ja.
- Make sure there's no space before or after the commas that separate the columns
- Use double quotes for the titles in case they contain commas
Create the script
Save the script below as add_section_translations.py in the scripts folder.
This script creates language-specific translations for each section included in your handoff package. It assumes you created a csv file as described in the previous topic. The topics that follow describe how to change the settings and run the script, and how it works.
import csv
import json
import requests
subdomain = 'your_subdomain' # setting
email = 'your_email' # setting
password = 'your_zd_password' # setting
session = requests.Session()
session.auth = (email, password)
session.headers = {'Content-Type': 'application/json'}
locales = ['de', 'fr'] # setting
with open('translated_titles.csv', encoding='utf-8', newline='') as f:
title_reader = csv.reader(f, delimiter=',')
sections = {}
for row in title_reader:
titles = {}
count = 1
for locale in locales:
titles[locale] = row[count]
count += 1
sections[row[0]] = titles
for section in sections:
url = 'https://{}.zendesk.com/api/v2/help_center/sections/{}/translations.json'.format(subdomain, section)
for locale in locales:
data = {'translation': {'locale': locale, 'title': sections[section][locale]}}
payload = json.dumps(data)
response = session.post(url, data=payload)
print("Translation for locale '{}' created for section {}.".format(locale, section))
Change the settings
-
Replace the values for subdomain, email, and password.
You must be a Zendesk Support admin or agent with access to the content. Enter the password value only when you run the script and then delete it when you're done.
-
For the locales variable, add or remove the locales for your supported languages. The locales should be the same as they appear in the URLs of your Help Center.
locales = ['de', 'fr']
-
Save the file.
Run the script
-
Run the script from your command-line interface:
$ python3 add_section_translations.py
You should get a message confirming that translations were created for the sections.
-
Check the section in your Help Center to see if it contains the new translations.
Because the sections don't contain any articles yet, they're not visible to end-users, signed-in or otherwise. The sections are visible to agents and administrators.
How it works
-
Import the required libraries:
import csv import json import requests
Python has built-in support for csv files.
-
Configure the request parameters:
subdomain = 'your_subdomain' email = 'your_email' password = 'your_zd_password' session = requests.Session() session.auth = (email, password) session.headers = {'Content-Type': 'application/json'}
-
Specify the locales of your translated articles.
locales = ['de', 'fr']
-
Read the csv file:
with open('translated_titles.csv', encoding='utf-8', newline='') as f: title_reader = csv.reader(f, delimiter=',')
-
Store the translated section titles in a Python dictionary:
sections = {} for row in title_reader: titles = {} count = 1 for locale in locales: titles[locale] = row[count] count += 1 sections[row[0]] = titles
The first loop creates a dictionary of sections, with the section id as the key of each item. The locales loop creates a dictionary of translated titles for each section. The titles dictionary is assigned as the value of each section key (
sections[row[0]] = titles
). The resulting value of sections might look like this:{ '42453453': {'de': 'Erste Schritte', 'fr': 'Mise en route'}, '42535344': {'de': 'Konfigurieren des E-Mail-Kanals', 'fr': 'Configuration du canal des emails'}, '53434522': {'de': 'Business-Regeln', 'fr': 'Règles de gestion'} }
The titles have their own locale-specific keys to select them later.
-
For each section, add a section translation using the creating translation endpoint:
for section in sections: url = 'https://{}.zendesk.com/api/v2/help_center/sections/{}/translations.json'.format(subdomain, section)
-
For each locale, retrieve the section title from the sections dictionary, build the API payload, and make the post request:
for locale in locales: data = {'translation': {'locale': locale, 'title': sections[section][locale]}} payload = json.dumps(data) response = session.post(url, data=payload) print("Translation for locale '{}' created for section {}.".format(locale, section))
Each new translation is defined in the API payload as follows:
data = {'translation': {'locale': locale, 'title': sections[section][locale]}}
The script defines the locale and title for each section translation. The title comes from the sections dictionary created from the csv file:
sections[section][locale]
The expression identifies a value in the dictionary. The section and locale variables change as the script iterates through the loop. Using the example dictionary above,
sections[53434522]['de']
resolves to the value of 'Business-Regeln'.Note that because the new language-specific sections don't contain any articles yet, end-users won't be able to see them. Only admins and agents can see them at this point.
Add the article translations
The final step in the workflow is to add the article translations to Help Center.
Get the article files ready
-
Create folders for the translated files.
Create the following locale-specific folders in the handoff folder on your hard drive:
/handoff /current /localized /de /fr /script
Make sure the structure is identical relative to the scripts folder.
Add or remove the locale-specific folders for your supported languages. The locale folder names should be the same as the locales in the URL of Help Center.
-
Place the HTML files for each language in the appropriate locale-specific folder.
Make sure the HTML files from the vendor have the same file names as the default-language files. Example:
/handoff /current 22521706.html 27808948.html /localized /de 22521706.html 27808948.html /fr 22521706.html 27808948.html
-
Upload the localized image files to your image file server.
On the server, you should organize the images in locale-specific folders. Example:
/documentation /user_guide (existing folder with English images) /user_guide_de /user_guide_fr
You can use a file client such as Cyberduck to upload the images.
-
Change the image paths in the localized HTML files to point to the language-specific folder on the file server.
For example, if an image
src
attribute in a German article is as follows:src="http://cdn.oscura.com/images/documentation/user_guide/visit_icon.png"
then add the "_de" suffix to "user_guide" to display the German image:
src="http://cdn.oscura.com/images/documentation/user_guide_de/visit_icon.png"
You can make this change for all the articles in your localized/de folder with a global search-and-replace of "_guide/" with "_guide_de/". Repeat for all your languages. For more information, see Managing images
Create the script
Save the script below as publish_package.py in the scripts folder.
The script publishes the translated articles to the appropriate language-specific sections in Help Center. The topics that follow describe how to change the settings and run the script, and how it works.
Change the settings
-
Replace the values for subdomain, email, and password.
You must be a Zendesk Support admin or agent with access to the content. Enter the password value only when you run the script and then delete it when you're done.
-
For the locales variable, add or remove the locales for your supported languages. The locales should be the same as they appear in the URLs of your Help Center.
Run the script
-
Run the script from your command-line interface:
$ python3 publish_package.py
You should get messages confirming that each translation was created in Help Center.
-
Check your Help Center to see if it contains the new article translations in your language-specific categories and sections.
How it works
The script performs the following basic tasks:
- Read each file in each locale folder.
- Because the title of each article on Help Center is not included in the HTML body, grab the article title in the h1 tag in the HTML, and then delete the h1 tag.
- Create an article translation in Help Center with the translated HTML content.
Here are the details.
-
Import the required libraries.
import os import glob import json import requests from bs4 import BeautifulSoup, Comment
-
Define a function named main().
def main():
Unlike the previous scripts, this one defines a few custom functions. The
main()
function is defined here but called on the last line of the script to make sure Python reads the other function definitions first. -
Configure the request parameters.
You should be familiar with these lines by now.
-
Specify the locales of the supported languages:
locales = ['de', 'fr']
-
For each locale, grab, or glob, the names of all the HTML files in the ../handoff/localized/{locale} folder:
for locale in locales: print('Processing {} articles ...\n'.format(locale)) files = glob.glob(os.path.join('..', 'handoff', 'localized', locale, '*.html'))
The glob.glob() function in Python scrapes the contents of a specified folder. In this case, the function gets the names of all the html files (
'*.html'
) in the locale folder. The resulting value of files is a list that looks like['../handoff/localized/fr/22521706.html', '../handoff/localized/fr/27808948.html']
. -
Define a variable to use later to slice off the file path to obtain the article id contained in the file name:
path_len = 22 + len(locale)
The value of file is a relative path and a file name. The path_len variable specifies the length of the relative path (always the same 22 characters), plus the length of the locale, which can vary. Examples, 'fr', 'en-gb'.
-
Start a loop to read the translated content in each file in the folder:
for file in files: print('Reading {} ...'.format(file[path_len:]))
The status message uses the Python slice expression,
file[path_len:]
, to print the name of the file. If the current locale is 'de', the expression slices the first 24 characters (22 + 2) from the beginning of the file string. For example, if the string is'../handoff/localized/fr/22521706.html'
, then the expression produces'22521706.html'
. -
For each file, create a JSON payload containing the file's translated content:
payload = create_payload(locale, file)
The create_payload() function is defined in the Functions section later in the script. It parses the translated HTML in the file, retrieves the article title from the h1 tag in the HTML, strips out the h1 tag, strips out any HTML comments for good measure, packages the HTML in a dictionary, and returns it as a JSON-encoded payload for the API request:
def create_payload(locale, file): with open(file, mode='r', encoding='utf-8') as f: html_source = f.read() tree = BeautifulSoup(html_source, 'lxml') title = tree.h1.string.strip() tree.h1.decompose() # Strip out html comments comments = tree.find_all(text=lambda text: isinstance(text, Comment)) for comment in comments: comment.extract() # Package the payload in a dict and JSON-encode it data = {'translation': {'locale': locale, 'title': title, 'body': str(tree)}} return json.dumps(data)
-
Use the JSON payload in an API request to create the article translation in Help Center:
url = 'https://{}.zendesk.com/api/v2/help_center/articles/{}/translations.json'.format(subdomain, file[path_len:-5]) post_translation(payload, url, session)
The script uses the following endpoint to create the article translations:
POST /api/v2/help_center/articles/{id}/translations.json
where
{id}
is the id of the article to update. The script gets the id from the file variable:file[24:-5]
The second number, -5, slices five characters starting from the end of the string, which removes the '.html' extension and leaves the article id.
The post_translation() custom function makes the request and checks for problems:
def post_translation(payload, url, session): response = session.post(url, data=payload) if response.status_code != 201: print('Status:', response.status_code, 'Problem with the post request. Exiting.') exit() print('Translation created successfully.')
-
Print the URL of the newly updated article:
print('https://{}.zendesk.com/hc/{}/articles/{}.html\n'.format(subdomain, locale, file[path_len:-5]))
The localized content is live. Help Center shows or hides the content in the sections based on the language setting of the end-user’s browser, or based on the language selector on Help Center pages.
Update the article translations
You might need to fix errors in the translated content and republish it. Make the edits in the locale folders in your localized folder and then run the script described below. The script updates the translated articles to the appropriate language-specific sections in Help Center. The topics that follow it describe how to change the settings and run the script, and how it works.
Save the script as update_package.py in the scripts folder.
Change the settings
-
Replace the values for subdomain, email, and password.
You must be a Zendesk Support admin or agent with access to the content. Enter the password value only when you run the script and then delete it when you're done.
-
For the locales variable, add or remove the locales for your supported languages. The locales should be the same as they appear in the URLs of your Help Center.
Run the script
-
Run the script from your command-line interface:
$ python3 update_package.py
You should get messages confirming that each translation was updated in Help Center.
-
Check your Help Center to see if it contains the updated article translations in your language-specific categories and sections.
How it works
The script works like the create_package.py script, except that it uses a different endpoint and makes a PUT request instead of a POST request. See Updating translations in the API docs. If you're interested, see "How in works" in Add the article translations above.
25 Comments
Hi, Charles
Thanks for this post - my company has a goal to localize 60+ pages of Help articles into 4+ languages, and we’d like to find out a way to export English text in bulk, translate via SDL WorldServer GMS, and then import translated text in bulk.
Since I am a Program Manager, not a developer, though, I am unclear how to go about actually using the available Help Center Beta APIs – do we need to construct a technical framework to deploy the APIs on our side, before start using Zendesk APIs ?
Anyy plans to make any tools and services available for assisting in deployment of web APIs ?
As with many companies, we have the need, but no dedicated technical resources to actually execute on the designing and deployment of APIs, so I’d appreciate your insight.
Mira
Hi Mira,
The APIs are included with your Zendesk, and like Zendesk they live on the web. The APIs for the Help Center are in beta right now and you'll need to sign up to the beta, but you won't have to deploy them on your side.
The requirements for this article include a command-line interface like the command prompt in Windows or the Terminal in Mac. You also have to install a scripting language and a few libraries to work with the APIs. See [Part 2: Setting up your scripting environment](https://support.zendesk.com/entries/53090153#tlb).
Try out (or get a technically minded colleague to try out) [Part 3: Creating the initial handoff package](https://support.zendesk.com/entries/53090153#ho) to get a feel for how it works. Copy the script, change the settings as indicated, and run the script from the command line. You can ignore the "How it works" section if it doesn't interest you. The script should export articles in bulk from your Help Center as HTML files.
If you'd rather have assistance, we offer a paid service. Contact [Zendesk Services](http://www.zendesk.com/services) for options and pricing.
Charles
Hi, Charles
Thanks for all the info, I appreciate it !
Mira
What is the best way to go about gathering all of the section ID's. It's not covered in the article anywhere as far as I can tell.
And... if you have to put the section ID's in the section array, what is the point of the ignore array?? Wouldn't you just leave out the ID's of the sections you don't want it to include?
A question about the "publish\_package.py": It seems the api it use only works once.
The second time we get: Status: 400 Problem with the post request. Exiting.
What API should we use, if we want to fix some errors in the translated file, and upload it again?
@Jason
The **ignore** array should list all the _articles_ within the included sections that you don't want to include in the handoff. If you don't want to ignore any articles, specify an empty array:
ignore = []
**Getting the section ids**
If you have less than 15 to 20 sections in the category, you can get the ids manually. Right-click the section title and open it in a new tab. The section id is specified in the URL.
If you have more than 15-20 sections, you can use a script and the API to get the ids:
```
import requests
# Set the request parameters
subdomain = 'your\_subdomain'
email = 'your\_email\_address'
pwd = 'your\_password'
# Specify category id
category = 200142577 # change this value
# Do the HTTP get request using the Sections API
url = 'https://{}.zendesk.com/api/v2/help\_center/categories/{}/sections.json'.format(subdomain, category)
response = requests.get(url, auth=(email, pwd))
# Check for HTTP codes other than 200
if response.status\_code != 200:
print('Status:', response.status\_code, 'Problem with the request. Exiting.')
exit()
data = response.json()
# Print the section ids
for section in data['sections']:
print(section['id'])
```
@Xiaochen
I ran into the same 400 errors when trying to publish the articles more than once. If I remember correctly, the cause was trying to _create_ (through a post request) translations that already existed, which the API didn't allow. The solution was to _update_ rather than _create_, which you can do by changing the post request to a put request.
To change to a put request, change the session method and status code in following lines in the post\_translation() function definition:
response = session.**post**(url, data=payload)
if response.status_code !=**201**:
to
response = session.**put**(url, data=payload)
if response.status_code !=**200**:
@Charles
Thanks for your reply, but it seems the PUT method is not available, we will get 404 in return.
Maybe because this is a beta version and we are expecting it to official release : )
Sorry, Xiaochen, my mistake. The endpoint for updating is different than the one used for creating. See [Updating translations](http://developer.zendesk.com/documentation/rest_api/hc/translations.html#updating-translations) in the API docs. The endpoint takes a locale value. Also, a 'locale' value isn't needed in the payload.
I added a new section in the article to cover your use case:
- [Update the article translations](https://support.zendesk.com/entries/53090153#updates)
The section contains a new **update\_package.py** script that you can use. Thanks.
Charles
Okay, so I've set up the structure and successfully pulled down the articles, but none of the graphics came over.
Does the following text mean that I have to get the from the S3 cloud myself, or should this script pull them over automatically?
_The images in most articles in the Zendesk documentation are hosted on an [Amazon S3](http://aws.amazon.com/s3/) file server. This option provides more flexility for managing images. You can add, update, or delete the images on the file server using a file client such as [Cyberduck](http://cyberduck.io/)._
Hey Doug, the scripts don't bring down the images from the Amazon server. Sorry it's not clear in the article. Our images are already local, on the writers' drives in a couple of shared folders sync'ed to Box. We hand those off to loc directly -- no need to download. When it comes time to publish, we upload copies of the files to the server.
Hi Charles,
Um, okay. So how do I pull down my images since I don't have them local?
thanks,
Doug
We don't have a script for that, but it would involve getting all the img tags in the downloaded files, and then making HTTP requests to download the images like any other resource. I'd use Beautiful Soup to get the image urls:
```
tree = BeautifulSoup('html\_source')
images = tree.find\_all('img')
```
Then I'd grab the src attribute in each img tag and use it to make a request for the image file from the server using the Requests module:
```
for image in images:
src = image['src']
if src[:4] != 'http': continue
response = session.get(src, stream=True)
```
Note: I'm checking to make sure the first 4 characters start with http so it's a valid request url.
At this point, this image is in memory on your system. Next, I'd grab the filename from the src attribute and write it to file:
```
file_name = src.split('/')[-1]
with open(os.path.join(file\_path, file\_name), mode='wb') as f:
for chunk in response.iter\_content():
f.write(chunk)
```
One thing to be careful about: Most web servers only allow browsers to download images. So I'd set a header so my request looked like it's coming from a browser:
```
session = requests.Session()
session.auth = ('your\_email', 'your\_pwd')
session.headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0'}
```
Hope this helps.
I'm circling back to this topic now that I have some breathing room. When I run the create_package.py it retrieves the articles but throws an error when it tries to write:
Traceback (most recent call last):
File "create_package.py", line 46, in
with open(os.path.join(file_path, filename), mode='w', encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: '..\\handoff\\current\\203514393.html'
Any idea what's going on? I can shoot you my create_package.py if that would help.
thanks,
Doug
Hi,
In a straightforward Zendesk, it works great!
However, on a different, more complex ZenDesk Help Center, I get error:
Traceback (most recent call last):
File "create_package.py", line 38, in
body = BeautifulSoup(data['translation']['body'])
KeyError: 'translation'
I know for a fact that the category is set for translation, as is the section I want to create for translation.
What am I missing?
Thanks,
Gordon
Hi Charles, thanks for posting this. It's working pretty well so far. I'm having an utf-8 encoding issue with the create_package.py script.
I'm finding  symbols in weird places in the HTML output after I run the script.
I searched stackoverflow for some information.
http://stackoverflow.com/questions/7219361/python-and-beautifulsoup-encoding-issues
When I inspect an article's JSON, in the metadata, it says is encoded as UTF-8. This stackoverflow article says that sometimes BeautifulSoup can get confused and encode things wrong.
One of the stackoverflow articles says to add a UTF-8 ignore line. I've experimented a bit, but I don't know enough about either soup or python to figure it out. Any ideas?
Am I missing something?
Thanks!
Hi Neal,
Encoding problems are a pain. Let's start with the basics. What application are you using to view the problem HTML? Sometime an app itself has trouble interpreting encoded characters. Have you tried opening the file in a different text editor? TextWrangler is a great free text editor for the Mac - http://www.barebones.com/products/TextWrangler/.
Hi Charles,
There aren't any encoding issues when I view it in a Textwrangler or Atom. They only show up when I view it in Firefox. And when I selected an encoding by clicking View>Text Encoding>Unicode, the issue was fixed. More importantly, it's causing problems when I send it to my translation service. When I inspect the HTML, I notice there's an empty head element when I run the script as is. I'm not sure if that's the normal result, but that's the result I get.
I got halfway to a solution on my own. I added the head variable.
tree = BeautifulSoup('<html></html>')
head = BeautifulSoup('<head><meta charset="utf-8"></head>')
body = BeautifulSoup(data['translation']['body'])
title = tree.new_tag('h1')
title.string = data['translation']['title']
tree.html.append(title)
tree.html.append(head)
tree.html.append(body)
I know this is all wrong and it adds the meta tag in the wrong place, but even with the meta element inserted after the h1, it cleared up the encoding problem in Firefox. I'm going to experiment some more to figure out how to stick the meta in the head element. My working hypothesis right now is that there isn't an encoding issue, but Firefox/Chrome are guessing the wrong encoding. I'm going to add an encoding hint for the browsers and translation tools and see if that works.
Thanks for getting back to me so quickly!
Update: I have a solution for my encoding issue and I want to share it. My hypothesis was that FF/chrome was guessing the encoding wrong, so I added a meta tag element to tell the browser what encoding to use. That seemed to work. My last post put the meta tag in the wrong place. I learned a little more and figured out how to put it in the right place. :-)
I also added a DOCTYPE statement that identifies the HTML as HTML5, a title element that automatically includes the title from each article, and an en-us language attribute. It took me a long time to figure out that I couldn't stick the DOCTYPE statement in with the other append functions and that I had to stick it in as a string in the write function before it wrote the payload to the file.
If you use this code, you'll need to remove the title element from the HTML before you import the translated content into Zendesk. So, I included the code I modified for publish_package.py and update_package.py. I added tree.title.decompose() at the end to remove the title element and its contents.
Thanks for sharing this code and this article. I'm kind of a novice at this but your explanation really helped me understand what was going on in the code enough to modify it.
This is great information, Neal. Thanks for sharing.
Charles,
Thanks so much for posting these scripts and explaining them so well. We'll probably localize our KB into eight languages, which would have been a copy-paste headache. This is the automated solution I was hoping to find, and probably wouldn't have come up with myself (at least not for a long time!) since I'm not a programmer.
Hi Charles,
I have come across this article while trying to understand how to port HTML5 files (generated from DITA content) to ZD articles.
Am I right in thinking that this article (or some of the sections within) s a good one to use for my requirements?
Thanks,
Ann
Hi Ann,
We just published an article that describes the process our team uses to publish DITA-based product docs to the Zendesk Help Centers:
https://support.zendesk.com/hc/en-us/articles/360001570207-How-product-docs-are-produced-at-Zendesk
One section describes how we batch-update existing articles from DITA source files.
Hope this helps.
Curious if anyone else had this issue. The script to download the articles works great but only 30 because of the limits. I assume pagination will fix this but haven't gotten it to work.
Please sign in to leave a comment.